+ - 0:00:00
Notes for current slide
Notes for next slide

Anomaly Detection in
Streaming Time Series Data

Priyanga Dilini Talagala
with
Rob J Hyndman
Kate Smith-Miles
Sevvandi Kandanaarachchi
Mario A. Muñoz

1 / 23

Motivation: Fence-mounted perimeter intrusion detection systems

  • Data obtained using fiber optic cables attached to a fence
2 / 23

Motivation: Fence-mounted perimeter intrusion detection systems

  • Data obtained using fiber optic cables attached to a fence

  • Intrusion events cause measurable changes in intensity, phase, wavelength or transit time of light in the fiber.

2 / 23

Motivation: Fence-mounted perimeter intrusion detection systems

  • Data obtained using fiber optic cables attached to a fence

  • Intrusion events cause measurable changes in intensity, phase, wavelength or transit time of light in the fiber.

  • Aim: Find anomalous time series (the location of the intrusion event)

2 / 23

Motivation: Network intrusion detection systems

  • Yahoo data breach in late 2014 --- world's largest ever cyber attack
3 / 23

Motivation: Network intrusion detection systems

  • Yahoo data breach in late 2014 --- world's largest ever cyber attack

  • Intrusion attacks cause measurable changes in times of logins, command executed during a single user session, number of password failures

3 / 23

Motivation: Network intrusion detection systems

  • Yahoo data breach in late 2014 --- world's largest ever cyber attack

  • Intrusion attacks cause measurable changes in times of logins, command executed during a single user session, number of password failures

  • Aim: find anomalous time series (locate intrusion attacks)
3 / 23

Motivation

  • All these applications generate millions or even billions of individual time series simultaneously
4 / 23

Motivation

  • All these applications generate millions or even billions of individual time series simultaneously

  • Research question: Finding anomalous time series within a large collection of time series

4 / 23

Motivation

  • All these applications generate millions or even billions of individual time series simultaneously

  • Research question: Finding anomalous time series within a large collection of time series

  • Approaches to solving the problem of anomaly detection for temporal data :

4 / 23

Motivation

  • All these applications generate millions or even billions of individual time series simultaneously

  • Research question: Finding anomalous time series within a large collection of time series

  • Approaches to solving the problem of anomaly detection for temporal data :

    Batch scenario:
    whole set of data is available, focus - complete events

4 / 23

Motivation

  • All these applications generate millions or even billions of individual time series simultaneously

  • Research question: Finding anomalous time series within a large collection of time series

  • Approaches to solving the problem of anomaly detection for temporal data :

    Batch scenario:
    whole set of data is available, focus - complete events

    Data stream scenario: continuous, unbounded, flow at high speed, high volume

4 / 23

Automatic anomaly detection algorithm for streaming data is required:

  • to give real-time support
5 / 23

Automatic anomaly detection algorithm for streaming data is required:

  • to give real-time support
  • to provide early detection of anomalies
5 / 23

Automatic anomaly detection algorithm for streaming data is required:

  • to give real-time support
  • to provide early detection of anomalies
  • to learn and adapt to the changing environment automatically (concept drift)
5 / 23

Automatic anomaly detection algorithm for streaming data is required:

  • to give real-time support
  • to provide early detection of anomalies
  • to learn and adapt to the changing environment automatically (concept drift)
  • to deal with large amounts of data efficiently
5 / 23

What is an anomaly ?

6 / 23

Image credit: Wikimedia Commons

What is an anomaly ?

  • By definition, anomalies are rare in comparison to a system's typical behaviour.
  • We define an anomaly as an observation that is very unlikely given the forecast distribution.
7 / 23

Algorithm of the proposed framework

Aim

  • To detect anomalous time series within a large collection of time series in a streaming data context
8 / 23

Algorithm of the proposed framework

Aim

  • To detect anomalous time series within a large collection of time series in a streaming data context

Main Assumptions

8 / 23

Algorithm of the proposed framework

Aim

  • To detect anomalous time series within a large collection of time series in a streaming data context

Main Assumptions

  • Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system
8 / 23

Algorithm of the proposed framework

Aim

  • To detect anomalous time series within a large collection of time series in a streaming data context

Main Assumptions

  • Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system

  • A representative data set of the system's typical behavior is available to define the model for the typical behavior of the system.

8 / 23

Algorithm of the proposed framework

Aim

  • To detect anomalous time series within a large collection of time series in a streaming data context

Main Assumptions

  • Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system

  • A representative data set of the system's typical behavior is available to define the model for the typical behavior of the system.

Proposed Algorithm

8 / 23

Algorithm of the proposed framework

Aim

  • To detect anomalous time series within a large collection of time series in a streaming data context

Main Assumptions

  • Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system

  • A representative data set of the system's typical behavior is available to define the model for the typical behavior of the system.

Proposed Algorithm

  • Off-line Phase: Forecast a boundary for system's typical behavior (similar to (Clifton, Hugueny & Tarassenko, 2011))
8 / 23

Algorithm of the proposed framework

Aim

  • To detect anomalous time series within a large collection of time series in a streaming data context

Main Assumptions

  • Anomaly is a rare event which has a very low chance of occurrence with respect to the typical behavior of the system

  • A representative data set of the system's typical behavior is available to define the model for the typical behavior of the system.

Proposed Algorithm

  • Off-line Phase: Forecast a boundary for system's typical behavior (similar to (Clifton, Hugueny & Tarassenko, 2011))

  • On-line Phases: Testing newly arrived data using the boundary

8 / 23

Feature Based Representation of Time series

  • Mean

  • Variance

  • Changing variance in remainder

  • Level shift using rolling window

  • Variance change

  • Strength of linearity

  • Strength of curvature

  • Strength of spikiness

  • Burstiness of time series (Fano Factor)

  • Minimum

  • Maximum

  • The ratio between interquartile mean and the arithmetic mean

  • Moment

  • Ratio of means of data that is below and upper the global mean

9 / 23

Feature Based Representation of Time series

10 / 23

Dimension Reduction for Time Series

  • First two PCs explain 85% of variation
11 / 23

Classical Extreme Value Theory

Figure: Extreme value distributions corresponding to m = 1; 10; 100; 1000, each describing where the maximum of m samples drawn from N(0; 1) will lie.

Figure: Extreme value distributions corresponding to m = 1; 10; 100; 1000, each describing where the maximum of m samples drawn from N(0; 1) will lie.

12 / 23

Theorem 1: Fisher-Tippett theorem (Limit laws for maxima)

(Embrechts et al. (2013), p. 121)

Let X=X1,X2,...,Xm be a sequence of independent and identically distributed random variables and Xmax=max(X). If there exist centering constant dm() and normalizing constant cm(>0), and some non-degenerate distribution function H+ such that

then H+ belongs to one of the following three distribution functions:

13 / 23

Extreme Value Distribution of the Probability Density Values (Clifton et al., 2011)

  • Estimate the probability density function of the 2D PC space --> Kernel density estimation
  • Draw a large number N of extremes from the estimated probability density function
Figure: Distribution of 1000 extremes generated from bivariate kernel density function with m=500

Figure: Distribution of 1000 extremes generated from bivariate kernel density function with m=500

14 / 23

Extreme Value Distribution of Probability Density Values (Clifton et al., 2011)

  • Define a Ψ-transform space, using the Ψ-transformation defined by

  • Ψ-transform maps the density values back into space into which a Gumbel distribution can be fitted.

    Figure: Distribution of transformed values

    Figure: Distribution of transformed values

15 / 23

16 / 23

Image credit: Wikimedia Commons

17 / 23

How it works?

18 / 23

How it works?

19 / 23
oddstream::find_odd_streams(train_data, test_stream)

20 / 23

What Next?

  • Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
21 / 23

What Next?

  • Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
  • Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
21 / 23

What Next?

  • Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
  • Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
  • Do more experiments on density estimation methods to get a better tail estimation.
21 / 23

What Next?

  • Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
  • Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
  • Do more experiments on density estimation methods to get a better tail estimation.
  • Extend the algorithm to work with Multidimensional Multivariate Data Streams.
21 / 23

References

Images were taken:

Main references

  • Clifton, D. A., Hugueny, S., & Tarassenko, L. (2011). Novelty detection with multivariate extreme value statistics. Journal of signal processing systems, 65 (3), (pp. 371-389).

  • Fulcher, B. D. (2012). Highly comparative time-series analysis. PhD thesis, University of Oxford.

  • Hyndman, R. J., Wang, E., & Laptev, N. (2015). Large-scale unusual time series detection. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW), (pp. 1616-1619). IEEE.

22 / 23

Thank You

R package available at: github.com/pridiltal/oddstream

Email: dilini.talagala@monash.edu

Slides available at: github.com/pridiltal/ISF-CAIRNS2017-talk

23 / 23

Motivation: Fence-mounted perimeter intrusion detection systems

  • Data obtained using fiber optic cables attached to a fence
2 / 23
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow