Shape analytics

shape_analytics

In many machine learning applications, feature extraction is a key step prior to the application of learning methods. Shape analytics offers tools to generate geometric featurizations of data for use in data analysis and machine learning pipelines. Topological data analysis (TDA) approaches supplement traditional data analysis approaches, extracting underlying patterns in time series data, images, and other high-dimensional data sets. Although there are certainly machine learning methodologies that can work directly from raw data to generate shape descriptors, our shape analytics toolkit allows us to address several issues that are not currently handled by these strategies.

  • How do we turn non-traditional problems with shape into traditional machine learning problems?
  • How do we knowingly generate and use specific, interpretable shape features?
  • Can we generate low-dimensional features to avoid the “curse of dimensionality”?
  • Can we offer any robustness guarantees for our featurization?

Our Solutions

GDA’s methods augment traditional data analysis with sophisticated concepts from contemporary mathematics. Beginning with the cutting-edge area of topological data analysis (TDA), our toolkit has expanded to incorporate ideas from geometry, measure theory, information theory, graph theory, and analysis.

GDA’s shape analytics toolkit can be used to extract descriptors on non-traditional data such as images, signals, and high-dimensional point clouds, resulting in feature vectors ready for standard machine learning pipelines.

With more frequently used techniques, including passing full datasets into neural networks or using convolutional neural networks for images, the resulting descriptors are embedded in a black box. GDA’s techniques, however, can preemptively generate directly interpretable shape features. Importantly, this allows us to avoid any shape descriptors we might explicitly want to exclude. The resulting shape descriptors are low dimensional, avoiding the “curse of dimensionality” and mitigating computational complexity as the generation of these features scales well even on high-dimensional data. By their mathematical construction, these features come with robustness guarantees that allow for better uncertainty awareness in downstream machine learning pipelines.

Using persistent homology to segment an image of wood cells.

 Persistent Homology

As a reference for collaborators, GDA published a Towards Data Science post covering many of the uses of persistent homology, a tool we frequently use. The post walks through the intuition and demonstrates potential uses with point cloud data, signal compression, and image segmentation.