+ - 0:00:00
Notes for current slide
Notes for next slide

Anomaly Detection in Spatio-Temporal Tensor Streams

Priyanga Dilini Talagala

JSM 2021

08/08/2021


pridiltal
prital.netlify.app

1
2
3

Motivation

  • All these applications generate millions or even billions of individual time series simultaneously
4

Motivation

  • All these applications generate millions or even billions of individual time series simultaneously
  • Research question: Finding locations of unusual behaviours
4

Spatio-Temporal Data

5

Spatio-Temporal Data

6

Spatio-Temporal Data

7

Spatio-Temporal Tensor Data

8

Spatio-Temporal Tensor Data

9

Spatio-Temporal Tensor Data

10

Spatio-Temporal Tensor Data

11

Anomaly Detection in
Multivariate Spatio-temporal data
with K-measurements

devtools::install_github(

devtools::install_github("pridiltal/mask")

12

India hourly air pollution data from 244 stations from October 2019 to September 2020

7 of the world's 10 most polluted cities are in India

(Source: IQAir AirVisual 2018 World Air Quality Report & Greenpeace)

  • PM2.5, PM10, NO2, NH3, SO2, CO, OZONE
13

MASK Framework

Main Contributions

  • Propose a framework to detect spatial anomalies in spatio-temporal tensor data
  • Unsupervised anomaly detection algorithm
14

MASK Framework

Main Contributions

  • Propose a framework to detect spatial anomalies in spatio-temporal tensor data
  • Unsupervised anomaly detection algorithm

What is an anomaly ?

  • An anomaly is a spatial point or region that deviates significantly from the global and/or local distribution of a given network
14

Time-domain representation

Feature-domain representation

Figure reproduced from Talagala, T. S., Hyndman, R. J., & Athanasopoulos, G. (2018). Meta-learning how to forecast time series. Monash Econometrics and Business Statistics Working Papers, 6, 18.

15

Feature based representation of time series

  • Mean
  • Variance
  • Changing variance in remainder
  • Level shift using rolling window
  • Variance change
  • Strength of linearity
  • Strength of curvature
  • Strength of spikiness
  • Burstiness of time series (Fano Factor)
  • Minimum
  • Maximum
  • The ratio between 50% trimmed mean and the arithmetic mean
  • Moment
  • Ratio of means of data that is below and above the global mean
16

Feature based representation of time series

17

Feature based representation of time series

18

Naive approach (Unfold PCA)

Batch-wise unfolding of the three-way matrix into a two-dimensional matrix.

19

Anomalous score calculation using Robust three-way analysis

20

Anomalous score calculation using Robust three-way analysis

21

Anomalous score calculation using Robust three-way analysis

  • Matrix SVD (Singular-Value Decomposition)

22

Anomalous score calculation using Robust three-way analysis

  • Matrix SVD (Singular-Value Decomposition)

  • The Tucker3 Model

22

The Robust Tucker3 Model

XA=AG(CB)t+E

  • Fitted model X^A=A^G^(C^B^)t
  • Objective function: i=1Ij=1Jk=1K(xijkx^ijk)2 =i=1I(xixi^)(xixi^)t

  • Residual distance:

RDi=j=1Jk=1K(xijkx^ijk)2

Palma, Todorov and Gallo (2014)

23

Anomalous score calculation using Robust three-way analysis

24

Anomalous threshold calculation using Spacing theorem (Weissman, 1978)

Let X1,X2,...,Xn be a sample from a distribution function F.

Let X1:nX2:n...Xn:n be the order statistics.

The available data are X1:n,X2:n,...,Xk:n for some fixed k.

Let Di,n=Xi:nXi+1:n, (i=1,2,...,k) be the spacing between successive order statistics.

If F is in the maximum domain of attraction of the Gumbel distribution, then the spacings Di,n are asymptotically independent and exponentially distributed with mean proportional to i1.

25

India hourly air pollution data from 244 stations from October 2019 to September 2020

26

India hourly air pollution data from 244 stations from October 2019 to September 2020

27

Advantages of the mask framework

  • Detect spatial anomalies in spatio-temporal tensor data
28

Advantages of the mask framework

  • Detect spatial anomalies in spatio-temporal tensor data
  • Can take the correlation structure of the variables into account when detecting anomalies
28

Advantages of the mask framework

  • Detect spatial anomalies in spatio-temporal tensor data
  • Can take the correlation structure of the variables into account when detecting anomalies
  • Deal with large amounts of data efficiently
28

Advantages of the mask framework

  • Detect spatial anomalies in spatio-temporal tensor data
  • Can take the correlation structure of the variables into account when detecting anomalies
  • Deal with large amounts of data efficiently
  • Deal with time series of different lengths and/or starting points
28

Advantages of the mask framework

  • Detect spatial anomalies in spatio-temporal tensor data
  • Can take the correlation structure of the variables into account when detecting anomalies
  • Deal with large amounts of data efficiently
  • Deal with time series of different lengths and/or starting points
  • Anomalous scoring techniques- unsupervised
28

Advantages of the mask framework

  • Detect spatial anomalies in spatio-temporal tensor data
  • Can take the correlation structure of the variables into account when detecting anomalies
  • Deal with large amounts of data efficiently
  • Deal with time series of different lengths and/or starting points
  • Anomalous scoring techniques- unsupervised
  • Anomalous threshold has a probabilistic interpretation
28

Advantages of the mask framework

  • Detect spatial anomalies in spatio-temporal tensor data
  • Can take the correlation structure of the variables into account when detecting anomalies
  • Deal with large amounts of data efficiently
  • Deal with time series of different lengths and/or starting points
  • Anomalous scoring techniques- unsupervised
  • Anomalous threshold has a probabilistic interpretation
  • The framework can easily be extended to streaming data such that it can provide near-real-time support
28

stray/oddstream Vs mask


  • Definition: Recent past distribution of a given system
  • Semi-supervised

  • Definition: Current global and/or local distribution of a given system
  • Unsupervised
29

What next?

  • Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
30

What next?

  • Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
  • Use other dimension reduction techniques for tensor data such as multilinear PLS (N-PLS) to see the effect on the performance of the proposed framework.
30

What next?

  • Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context.
  • Use other dimension reduction techniques for tensor data such as multilinear PLS (N-PLS) to see the effect on the performance of the proposed framework.
  • Develop effective, interactive data visualisation tools for further investigation of the detected spatial anomalies.
30

Thank you

priyangad@uom.lk

pridiltal

https://prital.netlify.app/
(Slides available)

devtools::install_github(

devtools::install_github("pridiltal/mask")

Slides created via xaringan.

31

Key References

  • Di Palma, M. A., V. Todorov, and M. Gallo. "Robust multiway analysis of compositional data in R."
  • Talagala, Priyanga Dilini, Rob J. Hyndman, and Kate Smith-Miles. "Anomaly detection in high-dimensional data." Journal of Computational and Graphical Statistics (2020): 1-15.
  • Talagala, P. D., Hyndman, R. J., Smith-Miles, K., Kandanaarachchi, S., & Munoz, M. A. (2020). Anomaly detection in streaming nonstationary temporal data. Journal of Computational and Graphical Statistics, 29(1), 13-27.
32
2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow