Geometric Data Analytics, Inc.

Expertise: Topological Data Analysis

Measuring the circularity of a data set.

Data collected on a number of subjects can typically be represented as a set of points in high-dimensional space, called a point-cloud. The coordinates of each point are measurements, or records, with one point for each subject. The more measurements we make of the subjects, the higher dimension the point cloud. Making a lot of measurements for a small set of subjects creates a sparse point cloud in high dimensions. Increasing the number of subjects (or experiments) makes the point cloud denser and more suitable for shape analytics.

Point clouds exhibit "shape" characteristics that are in some way informative about the data they record. For example, suppose that we measure the average heart rate (HR), breathing rate (BR), body temperature (T) and pace (P) of a number of athletes competing in a marathon. We might expect that these numbers will be relatively independent or we might predict that there is a correlation between them. Faster pace should mean increased HR, BR and T, perhaps. In any case, we expect there to be shape in the data. Why do we care? Well, if, say, only one variable (e.g. pace) predicts the others, then the shape should be that of a line or perhaps a curve. If two variables are independent but determine the others, then the shape should be that of a surface. And so on. The shape of the data gives you ideas about what predicts what.

Statistical models are the standard for how useful information can be extracted from data. Most statistical models make some assumptions about the form of the data and then learn “best fit” parameters. Topological Data Analysis (TDA), however, does not use these models, but instead looks directly at the shape of the data, creating instead a geometric model. This shape is captured with tools that find both local and global shape characteristics, based on mathematical methods from topology and geometry. TDA provides a new set of statistics in which over-fitting is avoided naturally and in which learning the characteristics of a family of related datasets is completely natural.

GDA has extensive expertise in the recognition of shape for non-linear data using Topological Data Analysis, as well as other methods from statistics, geometry and machine learning.