Maximally Divergent Intervals for Anomaly Detection

The Maximally Divergent Intervals (MDI) Algorithm can be used to detect anomalous intervals (as opposed to anomalous points) in multi-variate spatio-temporal time-series. A description of the algorithm along with a variety of application examples can be found in the following article:

Detecting Regions of Maximal Divergence for Spatio-Temporal Anomaly Detection.
Björn Barz, Erik Rodner, Yanira Guanche Garcia, Joachim Denzler.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.

libmaxdiv is a library providing an efficient C++ implementation of the MDI algorithm. A C-style interface may be used to run the algorithm from any other programming language, while bindings for Python are also available.

For non-spatial, purely temporal time-series, a convenient graphical user interface (GUI) is provided to facilitate experimentation with your data without having to write a single line of code.

A detailed installation and user guide can be found in the file libmaxdiv user guide.pdf.

What is this all about?

Anomalies occurring in real data are often driven by complex natural processes. The detection of such anomalies can, thus, provide useful insights leading to a deeper understanding of the system being observed. Automated methods for anomaly detection are especially important nowadays, where huge amounts of data are available that cannot be analyzed by humans. Machine learning techniques can hence be employed to point the expert analysts to the interesting portions of the data, the anomalies, which cannot be explained by any existing model of the system.

Most of the anomaly detection methods available today analyze the anomalousness of the data on a point-wise basis. This is a sub-optimal approach for many applications dealing with time-series data, since anomalies driven by natural processes rather occur over a space of time and, in the case of spatio-temporal data, in a spatial region rather than at a single location at a single time.

libmaxdiv, in contrast, searches for anomalous intervals in the data and returns region detections.

What can be done with libmaxdiv?

While the GUI is limited to non-spatial time-series, the library can also handle spatio-temporal data such as measurements of climate variables or videos.

For example, libmaxdiv is able to detect North Sea Storms in a data set of marine climate variables (wind speed, significant wave height, mean wave period) measured hourly over the 50 years from 1958 to 2007. This results in a time-series with more than 400,000 time-steps, which libmaxdiv is able to process in less than a second.

The following image shows animated heat maps of the data during the first detection returned by the algorithm. The duration of the actual detection is indicated by a red box:

This detection corresponds to the well-known “Hamburg-Flut”, which flooded one fifth of Hamburg in February 1962 and caused 340 deaths.

46 out of the top 50 detections seem to be storms like that indeed and we could verify 28 of them by matching them against historical storm records. 4 of the detections, however, are the very opposite of the storm, but anomalies as well, since they cover time-frames of unusually calm sea conditions with nearly no wind and no waves:

As a completely different example, libmaxdiv could also be used to detect anomalies in surveillance videos. The following two examples show the same video of a street scene together with the top 5 detections resulting from the use of two different divergence measures implemented in libmaxdiv: the Kullback-Leibler divergence (left) and cross entropy (right). We have used features extracted from a Convolutional Neural Network to represent video frames.

Note the different interpretation of the two divergence measures regarding the nominal state of the system: The Kullback-Leibler divergence identifies traffic as the normal behaviour and detects intervals of no traffic, while cross entropy does the opposite and identifies time-frames with a high traffic frequency as anomalies. Moreover, both approaches identify the pedestrians on the sidewalk, but the localization accuracy is much better with cross entropy.