Data obtained using fiber optic cables attached to a fence
Intrusion events cause measurable changes in intensity, phase, wavelength or transit time of light in the fiber.
Data obtained using fiber optic cables attached to a fence
Intrusion events cause measurable changes in intensity, phase, wavelength or transit time of light in the fiber.
Aim: Find anomalous time series (the location of the intrusion event)
Yahoo data breach in late 2014 --- world's largest ever cyber attack
Intrusion attacks cause measurable changes in times of logins, command executed during a single user session, number of password failures
Yahoo data breach in late 2014 --- world's largest ever cyber attack
Intrusion attacks cause measurable changes in times of logins, command executed during a single user session, number of password failures
All these applications generate millions or even billions of individual time series simultaneously
Research question: Finding anomalous time series within a large collection of time series
All these applications generate millions or even billions of individual time series simultaneously
Research question: Finding anomalous time series within a large collection of time series
Approaches to solving the problem of anomaly detection for temporal data :
All these applications generate millions or even billions of individual time series simultaneously
Research question: Finding anomalous time series within a large collection of time series
Approaches to solving the problem of anomaly detection for temporal data :
Batch scenario:
whole set of data is available, focus - complete events
All these applications generate millions or even billions of individual time series simultaneously
Research question: Finding anomalous time series within a large collection of time series
Approaches to solving the problem of anomaly detection for temporal data :
Batch scenario:
whole set of data is available, focus - complete events
Data stream scenario: continuous, unbounded, flow at high speed, high volume
Image credit: Wikimedia Commons
Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system
A representative data set of the system's typical behavior is available to define the model for the typical behavior of the system.
Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system
A representative data set of the system's typical behavior is available to define the model for the typical behavior of the system.
Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system
A representative data set of the system's typical behavior is available to define the model for the typical behavior of the system.
Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system
A representative data set of the system's typical behavior is available to define the model for the typical behavior of the system.
Off-line Phase: Forecast a boundary for system's typical behavior (similar to (Clifton, Hugueny & Tarassenko, 2011))
On-line Phases: Testing newly arrived data using the boundary
Mean
Variance
Changing variance in remainder
Level shift using rolling window
Variance change
Strength of linearity
Strength of curvature
Strength of spikiness
Burstiness of time series (Fano Factor)
Minimum
Maximum
The ratio between interquartile mean and the arithmetic mean
Moment
Ratio of means of data that is below and upper the global mean
Figure: Extreme value distributions corresponding to m = 1; 10; 100; 1000, each describing where the maximum of m samples drawn from N(0; 1) will lie.
Let X=X1,X2,...,Xm
be a sequence of independent and identically distributed random variables and Xmax=max(X)
. If there exist centering constant dm(∈R)
and normalizing constant cm(>0)
, and some non-degenerate distribution function H+
such that
then H+
belongs to one of the following three distribution functions:
Figure: Distribution of 1000 extremes generated from bivariate kernel density function with m=500
Define a Ψ
-transform space, using the Ψ
-transformation defined by
Ψ
-transform maps the density values back into space into which a Gumbel distribution can be fitted.
Figure: Distribution of transformed values
Image credit: Wikimedia Commons
oddstream::find_odd_streams(train_data, test_stream)
Clifton, D. A., Hugueny, S., & Tarassenko, L. (2011). Novelty detection with multivariate extreme value statistics. Journal of signal processing systems, 65 (3), (pp. 371-389).
Fulcher, B. D. (2012). Highly comparative time-series analysis. PhD thesis, University of Oxford.
Hyndman, R. J., Wang, E., & Laptev, N. (2015). Large-scale unusual time series detection. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW), (pp. 1616-1619). IEEE.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |