Machine learning provides techniques to monitor system behavior and predict failures from sensor data. However, such algorithms are "scale resistant" - high computational complexity and not parallelizable. The problem then becomes identifying and delivering the relevant subset of the vast amount of sensor data to each monitoring node, despite the lack of explicit "relevance" labels. The simplest solution is to deliver only the "closest" data items under some distance metric.