Forecasting and Anomaly Detection in Large-Scale Time SeriesPriyanga Dilini Talagala26/01/2023

 pridiltal
 prital.netlify.app

 The slides are powered by xaringan R package 1

Acknowledgement

Anomaly detection

CRAN Task View: Anomaly Detection with R

High Dimensional data

Temporal data

Anomalies in temporal data6

Anomalies in temporal data7

Anomalous series within a space of a collection of series8

All these applications generate millions or even billions of individual time series simultaneously
9

All these applications generate millions or even billions of individual time series simultaneously
Research question: Finding anomalous time series within a large collection of time series
10

All these applications generate millions or even billions of individual time series simultaneously
Research question: Finding anomalous time series within a large collection of time series
Approaches to solving the problem of anomaly detection for temporal data :
11

All these applications generate millions or even billions of individual time series simultaneously
Research question: Finding anomalous time series within a large collection of time series
Approaches to solving the problem of anomaly detection for temporal data :
Batch scenario
- whole set of data is available
- complete events

All these applications generate millions or even billions of individual time series simultaneously
Research question: Finding anomalous time series within a large collection of time series
Approaches to solving the problem of anomaly detection for temporal data :
Batch scenario
- whole set of data is available
- complete events
Data stream scenario
- continuous, unbounded, flow at high speed, high volume
- incomplete events

stray (Search and TRace AnomalY)

on CRAN

devtools::install_github("pridiltal/stray")

Stray algorithm in Python

Recently, Kate Buchhorn has ported stray algorithms to Python and made it available in sktime:

Anomaly detection in high dimensional dataMain contributionsPropose a framework to detect anomalies in high dimensional data. Our proposed algorithm addresses the limitations of HDoutliers algorithm (Wilkinson, 2018).
16

Anomaly detection in high dimensional dataMain contributionsPropose a framework to detect anomalies in high dimensional data. Our proposed algorithm addresses the limitations of HDoutliers algorithm (Wilkinson, 2018).
What is an anomaly ?We define an anomaly as an observation that deviates markedly from the majority with a large distance gap.
17

Anomaly detection in high dimensional dataMain contributionsPropose a framework to detect anomalies in high dimensional data. Our proposed algorithm addresses the limitations of HDoutliers algorithm (Wilkinson, 2018).
What is an anomaly ?We define an anomaly as an observation that deviates markedly from the majority with a large distance gap.
Main assumptionsThere is a large distance between typical data and the anomalies in comparison to the distance among typical data.
18

stray

Normalize the columns of the data. (median and IQR)
This prevents variables with large variances having disproportional influence on Euclidean distances.

Why not "nearest neighbour" distances?

Calculate the nearest neighbour distance

stray

Select the k nearest neighbour distance with the maximum gap

Calculate anomalous thresholdUse extreme value theory (EVT) to calculate a data driven outlier threshold.
22

Calculate anomalous threshold

Use extreme value theory (EVT) to calculate a data driven outlier threshold.
Let n be the size of the dataset

Calculate anomalous threshold

Use extreme value theory (EVT) to calculate a data driven outlier threshold.
Let n be the size of the dataset
Sort the resulting n outlier scores

Calculate anomalous threshold

Use extreme value theory (EVT) to calculate a data driven outlier threshold.
Let n be the size of the dataset
Sort the resulting n outlier scores
Consider the half of the outlier scores with the smallest values as typical

Calculate anomalous threshold

Use extreme value theory (EVT) to calculate a data driven outlier threshold.
Let n be the size of the dataset
Sort the resulting n outlier scores
Consider the half of the outlier scores with the smallest values as typical
Search for any significant large gap in the upper tail (Bottom up searching algorithm proposed by Schwarz, 2008)

Spacing theorem (Weissman, 1978)

Let $X_{1}, X_{2}, . . ., X_{n}$ be a sample from a distribution function $F$ .
Let $X_{1 : n} \geq X_{2 : n} \geq . . . \geq X_{n : n}$ be the order statistics.
The available data are $X_{1 : n}, X_{2 : n}, . . ., X_{k : n}$ for some fixed $k$ .
Let $D_{i, n} = X_{i : n} - X_{i + 1 : n},$ $(i = 1, 2, . . ., k)$ be the spacing between successive order statistics.
If $F$ is in the maximum domain of attraction of the Gumbel distribution, then the spacings $D_{i, n}$ are asymptotically independent and exponentially distributed with mean proportional to $i^{- 1}$ .

stray

outliers <- find_HDoutliers(data)
display_HDoutliers(data, outliers)

Advantages of the proposed algorithmDetect clusters of outlying points
29

Advantages of the proposed algorithm

Detect clusters of outlying points
Applied to both uni- and multi- dimensional data

Advantages of the proposed algorithm

Detect clusters of outlying points
Applied to both uni- and multi- dimensional data
Handle large datasets due to the use of approximate KNN searching algorithm

Advantages of the proposed algorithm

Detect clusters of outlying points
Applied to both uni- and multi- dimensional data
Handle large datasets due to the use of approximate KNN searching algorithm
Does not require a training set to build the decision model

Advantages of the proposed algorithm

Detect clusters of outlying points
Applied to both uni- and multi- dimensional data
Handle large datasets due to the use of approximate KNN searching algorithm
Does not require a training set to build the decision model
Deal with multimodal typical classes

Advantages of the proposed algorithm

Detect clusters of outlying points
Applied to both uni- and multi- dimensional data
Handle large datasets due to the use of approximate KNN searching algorithm
Does not require a training set to build the decision model
Deal with multimodal typical classes
Outlier threshold has a probabilistic interpretation

Feature based representation of time series

Mean
Variance
Changing variance in remainder
Level shift using rolling window
Variance change
Strength of linearity
Strength of curvature

Strength of spikiness
Burstiness of time series (Fano Factor)
Minimum
Maximum
The ratio between 50% trimmed mean and the arithmetic mean
Moment
Ratio of means of data that is below and above the global mean

Approach 1: Using stray

Use a moving window to deal with streaming data
Extract time series features from window
Apply stray algorithm to identify anomalous series

tsfeatures <- oddstream::extract_tsfeatures(ts_data)
outliers <- stray::find_HDoutliers(tsfeatures)
stray::display_HDoutliers(tsfeatures, outliers)

oddstream
(Outlier Detection in Data STREAMs)

devtools::install_github("pridiltal/oddstream")

How oddstream works

Dimension reduction for time series

load(train_data)

Dimension reduction for time series

load(train_data)

tsfeatures <- oddstream::extract_tsfeatures
(train_data)

pc<- oddstream::get_pc_space(tsfeatures)
oddstream::plotpc(pc$pcnorm)

Anomalous threshold calculationEstimate the probability density function of the 2D PC space ⟶⟶ Kernel density estimation
46

Anomalous threshold calculationEstimate the probability density function of the 2D PC space ⟶⟶ Kernel density estimation
Draw a large number N of extremes (argminx∈X[f2(x)])(argminx∈X[f2(x)]) from the estimated probability density function
47

Anomalous threshold calculation

Estimate the probability density function of the 2D PC space $⟶$ Kernel density estimation
Draw a large number N of extremes $(a r g m i n_{x \in X} [f_{2} (x)])$ from the estimated probability density function
Define a $Ψ$ -transform space, using the $Ψ$ -transformation defined by (Clifton et al., 2011)

$Ψ$ -transform maps the density values back into space into which a Gumbel distribution can be fitted.

Anomalous threshold calculation

Estimate the probability density function of the 2D PC space $⟶$ Kernel density estimation
Draw a large number N of extremes $(a r g m i n_{x \in X} [f_{2} (x)])$ from the estimated probability density function
Define a $Ψ$ -transform space, using the $Ψ$ -transformation defined by (Clifton et al., 2011)

$Ψ$ -transform maps the density values back into space into which a Gumbel distribution can be fitted.
Anomalous threshold calculation $⟶$ extreme value theory

oddstream::find_odd_streams(train_data, test_stream)

Feature Based Representation of Time series

Anomaly Detection with 
   Non-stationarity 52

Anomaly detection with non-stationarity

$H_{0} : f_{t_{0}} = f_{t_{t}}$

squared discrepancy measure $T = \int [f_{t_{0}} (x) - f_{t_{t}} (x)]^{2} d x$ (Anderson et al., 1994)

Anomaly detection with non-stationarity

stray

Definition: distance
no training set

oddstream

Definition: density
need a training set

Priyanga Dilini Talagala, Rob J Hyndman, Kate Smith-Miles, (2020) Anomaly detection in high-dimensional data. Journal of Computational & Graphical Statistics, to appear

on CRAN

Priyanga Dilini Talagala, Rob J Hyndman, Kate Smith-Miles, Sevvandi Kandanaarachchi and Mario A Munoz (2020) Anomaly detection in streaming nonstationary temporal data. Journal of Computational & Graphical Statistics, 20(1), 13-27.

on CRAN

 Anomaly Detection in Image Time Series (ITS)  61

Image Time Series (ITS)A stack of images or a videos -  Image Time Series (ITS) 
62

Image Time Series (ITS)

A stack of images or a videos - Image Time Series (ITS)
An ITS is basically a set of images of the same scene, ordered chronologically.

Image Time Series (ITS)

A stack of images or a videos - Image Time Series (ITS)
An ITS is basically a set of images of the same scene, ordered chronologically.
It can be encoded as a data-cube, two spatial and one temporal dimensions.

Image Time Series (ITS)

A stack of images or a videos - Image Time Series (ITS)
An ITS is basically a set of images of the same scene, ordered chronologically.
It can be encoded as a data-cube, two spatial and one temporal dimensions.
The acquisition of an ITS can be done with one or multiple sensors to obtain a larger data series with a high temporal frequency.

Image Time Series (ITS)

A stack of images or a videos - Image Time Series (ITS)
An ITS is basically a set of images of the same scene, ordered chronologically.
It can be encoded as a data-cube, two spatial and one temporal dimensions.
The acquisition of an ITS can be done with one or multiple sensors to obtain a larger data series with a high temporal frequency.
The produced $2 D + t$ data carry rich spatial and temporal information that must be taken into account to understand particular phenomena not being observable from a single image of the sequence.

Satellite Image Time Series (SITS)

A Satellite Image Time Series (SITS) is a set of satellite images taken from the same scene at different times

Approach 1: Traditional Machine Learning Approach

Approach 2: Deep Learning Approach

Binary Classification using EVT based Threshold

Fisher-Tippett theorem, limit laws for maxima

Asymptotic distribution of extreme order statistics
The maximum (minima) of a sample of iid random variables after proper renormalization can only converge in distribution to one of 3 possible distributions, the Gumbel distribution, the Fréchet distribution, or the Weibull distribution.

EVT based Anomaly Threshold Calculation

Binary Classification using EVT based Threshold

What Next?Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
74

What Next?Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
75

What Next?Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
Do more experiments on density estimation methods to get a better tail estimation.
76

What Next?Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
Do more experiments on density estimation methods to get a better tail estimation.
Implement a suitable explainable model for anomaly detection in image streams.
77

What Next?Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework.
Do more experiments on density estimation methods to get a better tail estimation.
Implement a suitable explainable model for anomaly detection in image streams.
Extend the algorithm to work with Multidimensional Multivariate Data streams
78

Thank You

priyangad@uom.lk

pridiltal

prital.netlify.app
(Slides and papers available)

The slides are powered by xaringan R package

This work was supported in part by RETINA research lab funded by the OWSD, a program unit of United Nations Educational, Scientific and Cultural Organization (UNESCO).

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Forecasting and Anomaly Detection in Large-Scale Time Series

Priyanga Dilini Talagala

26/01/2023 pridiltal prital.netlify.app The slides are powered by xaringan R package

Acknowledgement

Anomaly detection

High Dimensional data

High Dimensional data

Temporal data

Anomalies in temporal data

Anomalies in temporal data

Anomalous series within a space of a collection of series

Stray algorithm in Python

Anomaly detection in high dimensional data

Main contributions

Anomaly detection in high dimensional data

Main contributions

What is an anomaly ?

Anomaly detection in high dimensional data

Main contributions

What is an anomaly ?

Main assumptions

stray

Why not "nearest neighbour" distances?

stray

Calculate anomalous threshold

Calculate anomalous threshold

Calculate anomalous threshold

Calculate anomalous threshold

Calculate anomalous threshold

Spacing theorem (Weissman, 1978)

stray

Advantages of the proposed algorithm

Advantages of the proposed algorithm

Advantages of the proposed algorithm

Advantages of the proposed algorithm

Advantages of the proposed algorithm

Advantages of the proposed algorithm

Feature based representation of time series

Approach 1: Using stray

How oddstream works

How oddstream works

Dimension reduction for time series

Dimension reduction for time series

Anomalous threshold calculation

Anomalous threshold calculation

Anomalous threshold calculation

Anomalous threshold calculation

Feature Based Representation of Time series

Anomaly Detection with Non-stationarity

Anomaly detection with non-stationarity

Anomaly detection with non-stationarity

Anomaly detection with non-stationarity

Anomaly detection with non-stationarity

Anomaly detection with non-stationarity

Anomaly detection with non-stationarity

stray

oddstream

Anomaly Detection in Image Time Series (ITS)

Image Time Series (ITS)

Image Time Series (ITS)

Image Time Series (ITS)

Image Time Series (ITS)

Image Time Series (ITS)

Satellite Image Time Series (SITS)

Approach 1: Traditional Machine Learning Approach

Approach 2: Deep Learning Approach

Binary Classification using EVT based Threshold

Fisher-Tippett theorem, limit laws for maxima

EVT based Anomaly Threshold Calculation

Binary Classification using EVT based Threshold

What Next?

What Next?

What Next?

What Next?

What Next?

Thank You

Acknowledgement

Help

26/01/2023

pridiltal
prital.netlify.app

The slides are powered by xaringan R package

Anomaly Detection with
Non-stationarity