class: center, middle, inverse, title-slide # Forecasting and Anomaly Detection in Large-Scale Time Series ### Priyanga Dilini Talagala ###
26/01/2023
<i class="fab fa-twitter faa-horizontal animated " style=" color:#0fb7fa;"></i>
pridiltal
<i class="fas fa-globe faa-horizontal animated " style=" color:green;"></i>
prital.netlify.app
The slides are powered by xaringan R package
--- ### Acknowledgement <img src="fig/1_teamb.png" width="80%" style="display: block; margin: auto;" /> --- class: center, middle ## Anomaly detection [CRAN Task View: Anomaly Detection with R](https://github.com/pridiltal/ctv-AnomalyDetection) --- class: center, middle .pull-left[ ### High Dimensional data <img src="fig/2_outtypea.png" width="90%" /> ] -- .pull-right[ ### Temporal data <img src="fig/2_outtypeb.png" width="90%" /> ] --- background-image:url('fig/3_outtype2.png') background-position: 80% 50% background-size: 75% class: left, top, clear ## Anomalies in temporal data --- background-image:url('fig/4_outtype2.png') background-position: 80% 50% background-size: 75% class: left, top, clear ## Anomalies in temporal data --- background-image:url('fig/5_applications.png') background-position: 70% 70% background-size: 100% class: left, top, clear ## Anomalous series within a space of a collection of series --- - All these applications generate millions or even billions of individual time series simultaneously -- - Research question: Finding anomalous time series within a large collection of time series -- - Approaches to solving the problem of anomaly detection for temporal data : -- .pull-left[ **Batch scenario** - whole set of data is available - complete events <br/><br/><br/> <img src="fig/6_batch.png" width="75%" /> ] -- .pull-right[ **Data stream scenario** - continuous, unbounded, flow at high speed, high volume - incomplete events <img src="fig/7_stream.gif" width="75%" /> ] --- class: center, middle <p><font size=12> <span style="color:blue"> stray (S</span>earch and <span style="color:blue">TR</span>ace <span style="color:blue">A</span>nomal<span style="color:blue">Y<span>) </font size=12></p> <div class="figure"> <img src="fig/8_stray-logo.png" alt="on CRAN" width="30%" /> <p class="caption">on CRAN</p> </div> `devtools::install_github("pridiltal/stray")` --- ## Stray algorithm in Python Recently, Kate Buchhorn has ported stray algorithms to Python and made it available in sktime: <img src="fig/8_stray_python.png" width="80%" style="display: block; margin: auto;" /> --- ## Anomaly detection in high dimensional data ### Main contributions - Propose a framework to detect anomalies in high dimensional data. Our proposed algorithm addresses the limitations of HDoutliers algorithm (Wilkinson, 2018). -- ### What is an anomaly ? - We define an anomaly as an observation that deviates markedly from the majority with a large distance gap. -- ### Main assumptions - There is a large distance between typical data and the anomalies in comparison to the distance among typical data. --- ## stray <img src="fig/9_stray_plot1.png" width="50%" /> - Normalize the columns of the data. (median and IQR) - This prevents variables with large variances having disproportional influence on Euclidean distances. --- ## Why not "nearest neighbour" distances? <img src="fig/9_stray_plot2.png" width="50%" /> - Calculate the nearest neighbour distance --- ## stray <img src="fig/9_stray_plot5.png" width="50%" /> - Select the <span style="color:red"> k nearest neighbour </span> distance with the <span style="color:red"> maximum gap </span> --- ## Calculate anomalous threshold - Use extreme value theory (EVT) to calculate a data driven outlier threshold. -- - Let **n** be the size of the dataset -- - Sort the resulting **n** outlier scores -- - Consider the half of the outlier scores with the smallest values as typical -- - Search for any significant large gap in the upper tail (Bottom up searching algorithm proposed by Schwarz, 2008) --- ## Spacing theorem (Weissman, 1978) .pull-left[ Let `\(X_{1}, X_{2}, ..., X_{n}\)` be a sample from a distribution function `\(F\)` . </br> Let `\(X_{1:n} \geq X_{2:n} \geq ... \geq X_{n:n}\)` be the order statistics. </br> The available data are `\(X_{1:n}, X_{2:n}, ..., X_{k:n}\)` for some fixed `\(k\)`. </br> Let `\(D_{i,n} = X_{i:n} - X_{i+1:n},\)` `\((i = 1,2,..., k)\)` be the spacing between successive order statistics.</br> If `\(F\)` is in the maximum domain of attraction of the Gumbel distribution, then the spacings `\(D_{i,n}\)` are asymptotically independent and exponentially distributed with mean proportional to `\(i^{-1}\)`. ].pull-right[ <img src="fig/10_spacingTheorem.png" width="100%" /> ] --- ## stray <img src="fig/11_stray_plot6.png" width="50%" /> `outliers <- find_HDoutliers(data)` <br/> `display_HDoutliers(data, outliers)` --- ## Advantages of the proposed algorithm - Detect clusters of outlying points -- - Applied to both uni- and multi- dimensional data -- - Handle large datasets due to the use of approximate KNN searching algorithm -- - Does not require a training set to build the decision model -- - Deal with multimodal typical classes -- - Outlier threshold has a probabilistic interpretation --- ## Feature based representation of time series .pull-left[ - Mean - Variance - Changing variance in remainder - Level shift using rolling window - Variance change - Strength of linearity - Strength of curvature ] .pull-right[ - Strength of spikiness - Burstiness of time series (Fano Factor) - Minimum - Maximum - The ratio between 50% trimmed mean and the arithmetic mean - Moment - Ratio of means of data that is below and above the global mean ] --- ### Approach 1: Using stray - Use a moving window to deal with streaming data - Extract time series features from window - Apply stray algorithm to identify anomalous series .pull-left[ <img src="fig/12_strayts.png" width="80%" /> ] .pull-right[ <img src="fig/13_stray.gif" width="50%" /> ] `tsfeatures <- oddstream::extract_tsfeatures(ts_data)` <br/> `outliers <- stray::find_HDoutliers(tsfeatures)` <br/> `stray::display_HDoutliers(tsfeatures, outliers)` --- class:: center, clear .pull-left[ <img src="fig/14_P2_plot21a.png" width="75%" /> ] -- .pull-right[ <img src="fig/14_P2_plot21b.png" width="75%" /> ] --- class: center, clear <p><font size=12> <span style="color:blue">oddstream </br> (O</span>utlier <span style="color:blue">D</span>etection in <span style="color:blue">D</span>ata <span style="color:blue">STREAM</span>s) </font size=12></p> <img src="fig/15_oddstream_logo.png" width="30%" /> `devtools::install_github("pridiltal/oddstream")` --- ### How oddstream works <img src="fig/3_batch.png" width="80%" /> --- ### How oddstream works <img src="fig/14_oddstream_typical.png" width="80%" /> --- ## Dimension reduction for time series .pull-left[ `load(train_data)` <img src="fig/16_typical.png" width="100%" /> ] -- .pull-right[ `tsfeatures <- oddstream::extract_tsfeatures`</br>`(train_data)` <img src="fig/17_high_typical.gif" width="60%" /> ] --- .pull-left[ `tsfeatures <- oddstream::extract_tsfeatures`</br>`(train_data)` <img src="fig/17_high_typical.gif" width="70%" /> ] -- .pull-right[ `pc<- oddstream::get_pc_space(tsfeatures)`</br> `oddstream::plotpc(pc$pcnorm)` <img src="fig/18_typicalfeature.png" width="90%" /> ] --- ## Anomalous threshold calculation - Estimate the probability density function of the 2D PC space `\(\longrightarrow\)` Kernel density estimation -- - Draw a large number N of extremes `\((arg min_{x\in X}[f_{2}(x)])\)` from the estimated probability density function -- - Define a `\(\Psi\)`-transform space, using the `\(\Psi\)`-transformation defined by (Clifton et al., 2011) <img src="fig/19_psitrans.png" width="50%" /> - `\(\Psi\)`-transform maps the density values back into space into which a Gumbel distribution can be fitted. -- - Anomalous threshold calculation `\(\longrightarrow\)` extreme value theory --- class: center, top, clear `oddstream::find_odd_streams(train_data, test_stream)` <img src="fig/19_oddstream_mvtsplot.gif" width="40%" /> .pull-left[ <img src="fig/20_oddstream_out_loc.gif" width="60%" /> ] .pull-right[ <img src="fig/21_oddstream_pcplot.gif" width="60%" /> ] --- class: top ### Feature Based Representation of Time series .pull-left[ <img src="fig/6_batch.png" width="100%" /> ] .pull-right[ <img src="fig/22_tsfeatures.png" width="75%" /> ] --- class: center, middle, inverse # Anomaly Detection with <br/> <span style="color:#ffff05"> Non-stationarity </span> --- class: center, middle #### Anomaly detection with non-stationarity <img src="fig/23_nonstationaritytypes.png" width="50%" /> --- class: center, middle ### Anomaly detection with non-stationarity <img src="fig/24_suddenplot2.png" width="70%" /> <img src="fig/25_noCD1.png" width="25%" /> --- class: center, middle ### Anomaly detection with non-stationarity <img src="fig/26_suddenplot3.png" width="70%" /> <img src="fig/27_noCD2.png" width="25%" /> --- class: center, middle ### Anomaly detection with non-stationarity <img src="fig/28_suddenplot4.png" width="70%" /> <img src="fig/29_noCD3.png" width="25%" /> --- class: center, middle ### Anomaly detection with non-stationarity <img src="fig/30_suddenplot2.png" width="80%" /> <img src="fig/31_conceptdrift_pval.png" width="80%" /> - `\(H_{0} : f_{t_{0}} = f_{t_{t}}\)` - squared discrepancy measure `\(T = \int[f_{t_{0}}(x) - f_{t_{t}}(x)]^{2}dx\)` (Anderson et al., 1994) --- class: center, middle ### Anomaly detection with non-stationarity <img src="fig/32_sudden_out.png" width="70%" /> --- class: clear, middle, center .pull-left[ ### stray <img src="fig/33_P2_plot21a.png" width="50%" /> - Definition: distance - no training set ] .pull-right[ ### oddstream <img src="fig/34_P2_plot21b.png" width="50%" /> - Definition: density - need a training set ] --- class: clear, middle .pull-left[ <img src="fig/35_JCGS_logo.png" width="20%" /> Priyanga Dilini Talagala, Rob J Hyndman, Kate Smith-Miles, (2020) **Anomaly detection in high-dimensional data**. Journal of Computational & Graphical Statistics, *to appear* <div class="figure"> <img src="fig/8_stray-logo.png" alt="on CRAN" width="25%" /> <p class="caption">on CRAN</p> </div> ] .pull-right[ <img src="fig/35_JCGS_logo.png" width="20%" /> Priyanga Dilini Talagala, Rob J Hyndman, Kate Smith-Miles, Sevvandi Kandanaarachchi and Mario A Munoz (2020) **Anomaly detection in streaming nonstationary temporal data**. Journal of Computational & Graphical Statistics, 20(1), 13-27. <div class="figure"> <img src="fig/36_oddstream1.png" alt="on CRAN" width="25%" /> <p class="caption">on CRAN</p> </div> ] --- class: center, middle, inverse # <span style="color:#ffff05"> Anomaly Detection in Image Time Series (ITS) </span> --- ## Image Time Series (ITS) - A stack of images or a videos - Image Time Series (ITS) -- - An ITS is basically a set of images of the same scene, ordered chronologically. -- - It can be encoded as a data-cube, two spatial and one temporal dimensions. -- - The acquisition of an ITS can be done with one or multiple sensors to obtain a larger data series with a high temporal frequency. -- - The produced `\(2D+t\)` data carry rich spatial and temporal information that must be taken into account to understand particular phenomena not being observable from a single image of the sequence. <!-- Chelali, M., Kurtz, C., Puissant, A., & Vincent, N. (2021). Deep-STaR: Classification of image time series based on spatio-temporal representations. Computer Vision and Image Understanding, 208, 103221. --> --- ## Satellite Image Time Series (SITS) - A Satellite Image Time Series (SITS) is a set of satellite images taken from the same scene at different times .pull-left[ <img src="fig/37_deforestation.gif" width="75%" /> ].pull-right[ <img src="fig/38_volcano.gif" width="75%" /> ] --- ## Approach 1: Traditional Machine Learning Approach <img src="fig/39_ML.png" width="90%" /> --- ## Approach 2: Deep Learning Approach <img src="fig/40_Deep.png" width="90%" /> --- ## Binary Classification using EVT based Threshold <img src="fig/41_threshold.png" width="100%" /> --- ### Fisher-Tippett theorem, limit laws for maxima <img src="fig/43_EVDchange.png" width="60%" style="display: block; margin: auto;" /> - Asymptotic distribution of extreme order statistics - The maximum (minima) of a sample of iid random variables after proper renormalization can only converge in distribution to one of 3 possible distributions, the Gumbel distribution, the Fréchet distribution, or the Weibull distribution. --- ## EVT based Anomaly Threshold Calculation <img src="fig/42_evt.png" width="100%" /> --- ## Binary Classification using EVT based Threshold <img src="fig/41_threshold.png" width="100%" /> --- ### What Next? - Explore more on feature extraction and feature selection methods to create a better feature space suitable for streaming data context. -- - Use other dimension reduction techniques such as multidimensional scaling analysis, random projection to see the effect on the performance of the proposed framework. -- - Do more experiments on density estimation methods to get a better tail estimation. -- - Implement a suitable explainable model for anomaly detection in image streams. -- - Extend the algorithm to work with Multidimensional Multivariate Data streams --- class: center, middle # Thank You
priyangad@uom.lk
pridiltal
prital.netlify.app </br> (Slides and papers available) <br/><br/>The slides are powered by `xaringan` R package This work was supported in part by RETINA research lab funded by the OWSD, a program unit of United Nations Educational, Scientific and Cultural Organization (UNESCO).