Data fusion approaches combine multiple data sources in an effort to increase the information content and improve machine learning results over any single source.
These data sources may be homogenous—many streams from the same type of sensor in a sensor field—or heterogeneous—different modalities such as audio, seismic, or video. Fusion can be performed at the data/feature level—combining information content from multiple sources prior to input into learning methods—or at the decision level—determining a consensus decision from the output of multiple models constructed with individual modalities. While data fusion methods are a powerful part of a data scientist’s toolkit, challenges often arise with alignment of data streams (e.g. temporal or spatial alignment), reliance on specific pre-processing approaches, and a disconnect between theoretical utility and real-world application.
GDA leverages and develops cutting-edge fusion approaches that operate well in real-world environments. While significant consideration is given to the specific application and use cases of these methods, our modular and unsupervised pipelines are agnostic to pre-processing methods and provide robust, interpretable decisions with uncertainty estimates. We have successfully applied data fusion to diverse signal data, both homogeneous and heterogeneous sensor networks, and biological datasets including ‘omics data. In several vehicle-type identification problems our unsupervised pipeline achieves high accuracy (~95%), leveraging solely the difficult seismic and acoustic modalities and making significant improvements over baseline unsupervised methods that use less sophisticated tools.
Our methods can be easily adapted to any other context where a network of heterogeneous sensors observes a scene. They are especially useful when communication constraints are present or when novel situations render the standard deep-learning assumption of large amounts of labeled training data irrelevant.