+ - 0:00:00
Notes for current slide
Notes for next slide

Forecasting and Anomaly Detection in Large-Scale Time Series

Priyanga Dilini Talagala

26/01/2023


pridiltal
prital.netlify.app


The slides are powered by xaringan R package

1

Acknowledgement

2

High Dimensional data

4

High Dimensional data

Temporal data

5

Anomalies in temporal data

6

Anomalies in temporal data

7

Anomalous series within a space of a collection of series

8
  • All these applications generate millions or even billions of individual time series simultaneously
9
  • All these applications generate millions or even billions of individual time series simultaneously
  • Research question: Finding anomalous time series within a large collection of time series
10
  • All these applications generate millions or even billions of individual time series simultaneously
  • Research question: Finding anomalous time series within a large collection of time series
  • Approaches to solving the problem of anomaly detection for temporal data :
11
  • All these applications generate millions or even billions of individual time series simultaneously
  • Research question: Finding anomalous time series within a large collection of time series
  • Approaches to solving the problem of anomaly detection for temporal data :

    Batch scenario

    • whole set of data is available
    • complete events


12
  • All these applications generate millions or even billions of individual time series simultaneously
  • Research question: Finding anomalous time series within a large collection of time series
  • Approaches to solving the problem of anomaly detection for temporal data :

    Batch scenario

    • whole set of data is available
    • complete events


    Data stream scenario

    • continuous, unbounded, flow at high speed, high volume
    • incomplete events
13

stray (Search and TRace AnomalY)

on CRAN

on CRAN

devtools::install_github("pridiltal/stray")

14

Stray algorithm in Python

Recently, Kate Buchhorn has ported stray algorithms to Python and made it available in sktime:

15

Anomaly detection in high dimensional data

Main contributions

  • Propose a framework to detect anomalies in high dimensional data. Our proposed algorithm addresses the limitations of HDoutliers algorithm (Wilkinson, 2018).
16

Anomaly detection in high dimensional data

Main contributions

  • Propose a framework to detect anomalies in high dimensional data. Our proposed algorithm addresses the limitations of HDoutliers algorithm (Wilkinson, 2018).

What is an anomaly ?

  • We define an anomaly as an observation that deviates markedly from the majority with a large distance gap.
17

Anomaly detection in high dimensional data

Main contributions

  • Propose a framework to detect anomalies in high dimensional data. Our proposed algorithm addresses the limitations of HDoutliers algorithm (Wilkinson, 2018).

What is an anomaly ?

  • We define an anomaly as an observation that deviates markedly from the majority with a large distance gap.

Main assumptions

  • There is a large distance between typical data and the anomalies in comparison to the distance among typical data.
18

stray

  • Normalize the columns of the data. (median and IQR)
  • This prevents variables with large variances having disproportional influence on Euclidean distances.
19

Why not "nearest neighbour" distances?

  • Calculate the nearest neighbour distance
20

stray

  • Select the k nearest neighbour distance with the maximum gap
21

Calculate anomalous threshold

  • Use extreme value theory (EVT) to calculate a data driven outlier threshold.
22

Calculate anomalous threshold

  • Use extreme value theory (EVT) to calculate a data driven outlier threshold.

  • Let n be the size of the dataset

23

Calculate anomalous threshold

  • Use extreme value theory (EVT) to calculate a data driven outlier threshold.

  • Let n be the size of the dataset

  • Sort the resulting n outlier scores

24

Calculate anomalous threshold

  • Use extreme value theory (EVT) to calculate a data driven outlier threshold.

  • Let n be the size of the dataset

  • Sort the resulting n outlier scores

  • Consider the half of the outlier scores with the smallest values as typical

25

Calculate anomalous threshold

  • Use extreme value theory (EVT) to calculate a data driven outlier threshold.

  • Let n be the size of the dataset

  • Sort the resulting n outlier scores

  • Consider the half of the outlier scores with the smallest values as typical

  • Search for any significant large gap in the upper tail (Bottom up searching algorithm proposed by Schwarz, 2008)

26

Spacing theorem (Weissman, 1978)

Let X1,X2,...,Xn be a sample from a distribution function F .
Let X1:nX2:n...Xn:n be the order statistics.
The available data are X1:n,X2:n,...,Xk:n for some fixed k.
Let Di,n=Xi:nXi+1:n, (i=1,2,...,k) be the spacing between successive order statistics.
If F is in the maximum domain of attraction of the Gumbel distribution, then the spacings Di,n are asymptotically independent and exponentially distributed with mean proportional to i1.

27

stray

outliers <- find_HDoutliers(data)
display_HDoutliers(data, outliers)

28

Advantages of the proposed algorithm

  • Detect clusters of outlying points
29

Advantages of the proposed algorithm

  • Detect clusters of outlying points

  • Applied to both uni- and multi- dimensional data

30

Advantages of the proposed algorithm

  • Detect clusters of outlying points

  • Applied to both uni- and multi- dimensional data

  • Handle large datasets due to the use of approximate KNN searching algorithm

31

Advantages of the proposed algorithm

  • Detect clusters of outlying points

  • Applied to both uni- and multi- dimensional data

  • Handle large datasets due to the use of approximate KNN searching algorithm

  • Does not require a training set to build the decision model

32

Advantages of the proposed algorithm

  • Detect clusters of outlying points

  • Applied to both uni- and multi- dimensional data

  • Handle large datasets due to the use of approximate KNN searching algorithm

  • Does not require a training set to build the decision model

  • Deal with multimodal typical classes

33

Advantages of the proposed algorithm

  • Detect clusters of outlying points

  • Applied to both uni- and multi- dimensional data

  • Handle large datasets due to the use of approximate KNN searching algorithm

  • Does not require a training set to build the decision model

  • Deal with multimodal typical classes

  • Outlier threshold has a probabilistic interpretation

34

Feature based representation of time series

  • Mean

  • Variance

  • Changing variance in remainder

  • Level shift using rolling window

  • Variance change

  • Strength of linearity

  • Strength of curvature

  • Strength of spikiness

  • Burstiness of time series (Fano Factor)

  • Minimum

  • Maximum

  • The ratio between 50% trimmed mean and the arithmetic mean

  • Moment

  • Ratio of means of data that is below and above the global mean

35

Approach 1: Using stray

  • Use a moving window to deal with streaming data
  • Extract time series features from window
  • Apply stray algorithm to identify anomalous series

tsfeatures <- oddstream::extract_tsfeatures(ts_data)
outliers <- stray::find_HDoutliers(tsfeatures)
stray::display_HDoutliers(tsfeatures, outliers)

36

37

38

oddstream
(O
utlier Detection in Data STREAMs)

devtools::install_github("pridiltal/oddstream")

39

How oddstream works

40

How oddstream works

41

Dimension reduction for time series

load(train_data)

42

Dimension reduction for time series

load(train_data)

tsfeatures <- oddstream::extract_tsfeatures
(train_data)

43

tsfeatures <- oddstream::extract_tsfeatures
(train_data)

44

tsfeatures <- oddstream::extract_tsfeatures
(train_data)

pc<- oddstream::get_pc_space(tsfeatures)
oddstream::plotpc(pc$pcnorm)

45

Anomalous threshold calculation

  • Estimate the probability density function of the 2D PC space Kernel density estimation
46

Anomalous threshold calculation

  • Estimate the probability density function of the 2D PC space Kernel density estimation
  • Draw a large number N of extremes (argminxX[f2(x)]) from the estimated probability density function
47

Anomalous threshold calculation

  • Estimate the probability density function of the 2D PC space Kernel density estimation
  • Draw a large number N of extremes (argminxX[f2(x)]) from the estimated probability density function
  • Define a Ψ-transform space, using the Ψ-transformation defined by (Clifton et al., 2011)

  • Ψ-transform maps the density values back into space into which a Gumbel distribution can be fitted.
48

Anomalous threshold calculation

  • Estimate the probability density function of the 2D PC space Kernel density estimation
  • Draw a large number N of extremes (argminxX[f2(x)]) from the estimated probability density function
  • Define a Ψ-transform space, using the Ψ-transformation defined by (Clifton et al., 2011)

  • Ψ-transform maps the density values back into space into which a Gumbel distribution can be fitted.
  • Anomalous threshold calculation extreme value theory
49

oddstream::find_odd_streams(train_data, test_stream)

50

Feature Based Representation of Time series

51

Anomaly Detection with
Non-stationarity

52

Anomaly detection with non-stationarity

53

Anomaly detection with non-stationarity

54

Anomaly detection with non-stationarity

55

Anomaly detection with non-stationarity

56

Anomaly detection with non-stationarity

  • H0:ft0=ftt
  • squared discrepancy measure T=[ft0(x)ftt(x)]2dx (Anderson et al., 1994)
57

Anomaly detection with non-stationarity

58

stray

  • Definition: distance
  • no training set

oddstream

  • Definition: density
  • need a training set
59

Priyanga Dilini Talagala, Rob J Hyndman, Kate Smith-Miles, (2020) Anomaly detection in high-dimensional data. Journal of Computational & Graphical Statistics, to appear

on CRAN

on CRAN

Priyanga Dilini Talagala, Rob J Hyndman, Kate Smith-Miles, Sevvandi Kandanaarachchi and Mario A Munoz (2020) Anomaly detection in streaming nonstationary temporal data. Journal of Computational & Graphical Statistics, 20(1), 13-27.

on CRAN

on CRAN

60

Anomaly Detection in Image Time Series (ITS)

61

Image Time Series (ITS)

  • A stack of images or a videos - Image Time Series (ITS)
62

Image Time Series (ITS)

  • A stack of images or a videos - Image Time Series (ITS)

  • An ITS is basically a set of images of the same scene, ordered chronologically.

63

Image Time Series (ITS)

  • A stack of images or a videos - Image Time Series (ITS)

  • An ITS is basically a set of images of the same scene, ordered chronologically.

  • It can be encoded as a data-cube, two spatial and one temporal dimensions.

64

Image Time Series (ITS)

  • A stack of images or a videos - Image Time Series (ITS)

  • An ITS is basically a set of images of the same scene, ordered chronologically.

  • It can be encoded as a data-cube, two spatial and one temporal dimensions.

  • The acquisition of an ITS can be done with one or multiple sensors to obtain a larger data series with a high temporal frequency.

65

Image Time Series (ITS)

  • A stack of images or a videos - Image Time Series (ITS)

  • An ITS is basically a set of images of the same scene, ordered chronologically.

  • It can be encoded as a data-cube, two spatial and one temporal dimensions.

  • The acquisition of an ITS can be done with one or multiple sensors to obtain a larger data series with a high temporal frequency.

  • The produced 2D+t data carry rich spatial and temporal information that must be taken into account to understand particular phenomena not being observable from a single image of the sequence.

66

Satellite Image Time Series (SITS)

  • A Satellite Image Time Series (SITS) is a set of satellite images taken from the same scene at different times

67

Approach 1: Traditional Machine Learning Approach

68

Approach 2: Deep Learning Approach

69

Binary Classification using EVT based Threshold

70

Fisher-Tippett theorem, limit laws for maxima

  • Asymptotic distribution of extreme order statistics
  • The maximum (minima) of a sample of iid random variables after proper renormalization can only converge in distribution to one of 3 possible distributions, the Gumbel distribution, the Fréchet distribution, or the Weibull distribution.
71

EVT based Anomaly Threshold Calculation

72

Binary Classification using EVT based Threshold

73

What Next?

  • Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
74

What Next?

  • Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
  • Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
75

What Next?

  • Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
  • Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
  • Do more experiments on density estimation methods to get a better tail estimation.
76

What Next?

  • Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
  • Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
  • Do more experiments on density estimation methods to get a better tail estimation.
  • Implement a suitable explainable model for anomaly detection in image streams.
77

What Next?

  • Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
  • Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
  • Do more experiments on density estimation methods to get a better tail estimation.
  • Implement a suitable explainable model for anomaly detection in image streams.
  • Extend the algorithm to work with Multidimensional Multivariate Data streams
78

Thank You

priyangad@uom.lk

pridiltal

prital.netlify.app
(Slides and papers available)



The slides are powered by xaringan R package

This work was supported in part by RETINA research lab funded by the OWSD, a program unit of United Nations Educational, Scientific and Cultural Organization (UNESCO).

79

Acknowledgement

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow