Anomaly Detection in Streaming Time Series Data

Anomaly Detection in 
 Streaming Time Series DataPriyanga Dilini Talagala 
 with 
 Rob J Hyndman 
 Kate Smith-Miles 
 Sevvandi Kandanaarachchi 
 Mario A. Muñoz1 / 23

Motivation: Fence-mounted perimeter intrusion detection systems

Data obtained using fiber optic cables attached to a fence

2 / 23

Motivation: Fence-mounted perimeter intrusion detection systems

Data obtained using fiber optic cables attached to a fence
Intrusion events cause measurable changes in intensity, phase, wavelength or transit time of light in the fiber.

2 / 23

Motivation: Fence-mounted perimeter intrusion detection systems

Data obtained using fiber optic cables attached to a fence
Intrusion events cause measurable changes in intensity, phase, wavelength or transit time of light in the fiber.
Aim: Find anomalous time series (the location of the intrusion event)

2 / 23

Motivation: Network intrusion detection systems

Yahoo data breach in late 2014 --- world's largest ever cyber attack

3 / 23

Motivation: Network intrusion detection systems

Yahoo data breach in late 2014 --- world's largest ever cyber attack
Intrusion attacks cause measurable changes in times of logins, command executed during a single user session, number of password failures

3 / 23

Motivation: Network intrusion detection systems

Yahoo data breach in late 2014 --- world's largest ever cyber attack
Intrusion attacks cause measurable changes in times of logins, command executed during a single user session, number of password failures
Aim: find anomalous time series (locate intrusion attacks)

3 / 23

MotivationAll these applications generate millions or even billions of individual time series simultaneously
4 / 23

Motivation

All these applications generate millions or even billions of individual time series simultaneously
Research question: Finding anomalous time series within a large collection of time series

4 / 23

Motivation

All these applications generate millions or even billions of individual time series simultaneously
Research question: Finding anomalous time series within a large collection of time series
Approaches to solving the problem of anomaly detection for temporal data :

4 / 23

Motivation

All these applications generate millions or even billions of individual time series simultaneously
Research question: Finding anomalous time series within a large collection of time series
Approaches to solving the problem of anomaly detection for temporal data :
Batch scenario:
whole set of data is available, focus - complete events

4 / 23

Motivation

All these applications generate millions or even billions of individual time series simultaneously
Research question: Finding anomalous time series within a large collection of time series
Approaches to solving the problem of anomaly detection for temporal data :
Batch scenario:
whole set of data is available, focus - complete events

Data stream scenario: continuous, unbounded, flow at high speed, high volume

4 / 23

Automatic anomaly detection algorithm for streaming data is required:to give real-time support
5 / 23

Automatic anomaly detection algorithm for streaming data is required:to give real-time support
to provide early detection of anomalies
5 / 23

Automatic anomaly detection algorithm for streaming data is required:to give real-time support
to provide early detection of anomalies
to learn and adapt to the changing environment automatically (concept drift)
5 / 23

Automatic anomaly detection algorithm for streaming data is required:to give real-time support
to provide early detection of anomalies
to learn and adapt to the changing environment automatically (concept drift)
to deal with large amounts of data efficiently
5 / 23

What is an anomaly ?6 / 23

Image credit: Wikimedia Commons

What is an anomaly ?By definition, anomalies are rare in comparison to a system's typical behaviour.
We define an anomaly as an observation that is very unlikely given the forecast distribution.
7 / 23

Algorithm of the proposed frameworkAimTo detect anomalous time series within a large collection of time series in a streaming data context
8 / 23

Algorithm of the proposed frameworkAimTo detect anomalous time series within a large collection of time series in a streaming data context
Main Assumptions8 / 23

Algorithm of the proposed frameworkAimTo detect anomalous time series within a large collection of time series in a streaming data context
Main AssumptionsAnomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system
8 / 23

Algorithm of the proposed framework

Aim

To detect anomalous time series within a large collection of time series in a streaming data context

Main Assumptions

Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system
A representative data set of the system's typical behavior is available to define the model for the typical behavior of the system.

8 / 23

Algorithm of the proposed framework

Aim

To detect anomalous time series within a large collection of time series in a streaming data context

Main Assumptions

Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system
A representative data set of the system's typical behavior is available to define the model for the typical behavior of the system.

Proposed Algorithm

8 / 23

Algorithm of the proposed framework

Aim

To detect anomalous time series within a large collection of time series in a streaming data context

Main Assumptions

Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system
A representative data set of the system's typical behavior is available to define the model for the typical behavior of the system.

Proposed Algorithm

Off-line Phase: Forecast a boundary for system's typical behavior (similar to (Clifton, Hugueny & Tarassenko, 2011))

8 / 23

Algorithm of the proposed framework

Aim

To detect anomalous time series within a large collection of time series in a streaming data context

Main Assumptions

Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system
A representative data set of the system's typical behavior is available to define the model for the typical behavior of the system.

Proposed Algorithm

Off-line Phase: Forecast a boundary for system's typical behavior (similar to (Clifton, Hugueny & Tarassenko, 2011))
On-line Phases: Testing newly arrived data using the boundary

8 / 23

Feature Based Representation of Time series

Mean
Variance
Changing variance in remainder
Level shift using rolling window
Variance change
Strength of linearity
Strength of curvature

Strength of spikiness
Burstiness of time series (Fano Factor)
Minimum
Maximum
The ratio between interquartile mean and the arithmetic mean
Moment
Ratio of means of data that is below and upper the global mean

9 / 23

Feature Based Representation of Time series

10 / 23

Dimension Reduction for Time Series

First two PCs explain 85% of variation

11 / 23

Classical Extreme Value Theory

Figure: Extreme value distributions corresponding to m = 1; 10; 100; 1000, each describing where the maximum of m samples drawn from N(0; 1) will lie.

12 / 23

Theorem 1: Fisher-Tippett theorem (Limit laws for maxima)

(Embrechts et al. (2013), p. 121)

Let $X = X_{1}, X_{2}, . . ., X_{m}$ be a sequence of independent and identically distributed random variables and $X_{m a x} = m a x (X)$ . If there exist centering constant $d_{m} (\in ℜ)$ and normalizing constant $c_{m} (> 0)$ , and some non-degenerate distribution function $H^{+}$ such that

then $H^{+}$ belongs to one of the following three distribution functions:

13 / 23

Extreme Value Distribution of the Probability Density Values (Clifton et al., 2011)

Estimate the probability density function of the 2D PC space --> Kernel density estimation

Draw a large number N of extremes from the estimated probability density function

Figure: Distribution of 1000 extremes generated from bivariate kernel density function with m=500

14 / 23

Extreme Value Distribution of Probability Density Values (Clifton et al., 2011)

Define a $Ψ$ -transform space, using the $Ψ$ -transformation defined by
$Ψ$ -transform maps the density values back into space into which a Gumbel distribution can be fitted.

Figure: Distribution of transformed values

15 / 23

16 / 23

Image credit: Wikimedia Commons

17 / 23

How it works?

18 / 23

How it works?

19 / 23

oddstream::find_odd_streams(train_data, test_stream)

20 / 23

What Next?Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
21 / 23

What Next?Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
21 / 23

What Next?Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
Do more experiments on density estimation methods to get a better tail estimation.
21 / 23

What Next?Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
Do more experiments on density estimation methods to get a better tail estimation.
Extend the algorithm to work with Multidimensional Multivariate Data Streams. 
21 / 23

References

Images were taken:

Main references

Clifton, D. A., Hugueny, S., & Tarassenko, L. (2011). Novelty detection with multivariate extreme value statistics. Journal of signal processing systems, 65 (3), (pp. 371-389).
Fulcher, B. D. (2012). Highly comparative time-series analysis. PhD thesis, University of Oxford.
Hyndman, R. J., Wang, E., & Laptev, N. (2015). Large-scale unusual time series detection. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW), (pp. 1616-1619). IEEE.

22 / 23

Thank YouR package available at:  github.com/pridiltal/oddstreamEmail: dilini.talagala@monash.eduSlides available at: github.com/pridiltal/ISF-CAIRNS2017-talk23 / 23

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Anomaly Detection in Streaming Time Series Data

Priyanga Dilini Talagala with Rob J Hyndman Kate Smith-Miles Sevvandi Kandanaarachchi Mario A. Muñoz

Motivation: Fence-mounted perimeter intrusion detection systems

Motivation: Fence-mounted perimeter intrusion detection systems

Motivation: Fence-mounted perimeter intrusion detection systems

Motivation: Network intrusion detection systems

Motivation: Network intrusion detection systems

Motivation: Network intrusion detection systems

Motivation

Motivation

Motivation

Motivation

Motivation

Automatic anomaly detection algorithm for streaming data is required:

Automatic anomaly detection algorithm for streaming data is required:

Automatic anomaly detection algorithm for streaming data is required:

Automatic anomaly detection algorithm for streaming data is required:

What is an anomaly ?

What is an anomaly ?

Algorithm of the proposed framework

Aim

Algorithm of the proposed framework

Aim

Main Assumptions

Algorithm of the proposed framework

Aim

Main Assumptions

Algorithm of the proposed framework

Aim

Main Assumptions

Algorithm of the proposed framework

Aim

Main Assumptions

Proposed Algorithm

Algorithm of the proposed framework

Aim

Main Assumptions

Proposed Algorithm

Algorithm of the proposed framework

Aim

Main Assumptions

Proposed Algorithm

Feature Based Representation of Time series

Feature Based Representation of Time series

Dimension Reduction for Time Series

Classical Extreme Value Theory

Theorem 1: Fisher-Tippett theorem (Limit laws for maxima)

(Embrechts et al. (2013), p. 121)

Extreme Value Distribution of the Probability Density Values (Clifton et al., 2011)

Extreme Value Distribution of Probability Density Values (Clifton et al., 2011)

How it works?

How it works?

What Next?

What Next?

What Next?

What Next?

References

Images were taken:

Main references

Thank You

R package available at: github.com/pridiltal/oddstream

Email: dilini.talagala@monash.edu

Slides available at: github.com/pridiltal/ISF-CAIRNS2017-talk

Motivation: Fence-mounted perimeter intrusion detection systems

Help

Anomaly Detection in
Streaming Time Series Data

Priyanga Dilini Talagala
with
Rob J Hyndman
Kate Smith-Miles
Sevvandi Kandanaarachchi
Mario A. Muñoz