class: center, middle, inverse, title-slide # Tidy Time Series Anomaly Detection for Load Forecasting ##
Priyanga Dilini Talagala ### 41st International Symposium on Forecasting
30.07.2021 --- background-image:url('figure/1_power.png') background-position: 50% 50% background-size: 125% class: left, bottom <SPAN STYLE="font-size:10.0pt"><span style="color: black">Image credit: picxbay</span></span> --- background-image:url('figure/2_blackout.png') background-position: 90% 20% background-size: 110% class: right, bottom --- ## Tidy forecasting workflow (Hyndman & Athanasopoulos, 2021) <img src="figure/3_tidy_workflow.png" width="95%" style="display: block; margin: auto;" /> --- ##
<i class="fas fa-wrench faa-tada animated " data-fa-transform="grow-20 " style=" color:orange;"></i>
Tidy Time Series Anomaly Detection <!-- up, down, left and right--> <img src="figure/4_outstable.png" width="95%" style="display: block; margin: auto;" /> --- class: center, top # <span style="color: orange">outstable</span> ## <span style="color: orange">TABLE</span> of <span style="color: orange">OUT</span>liers in <span style="color: orange">T</span>ime <span style="color: orange">S</span>eries Data <img src="figure/5_hex-outstable.png" width="25%" style="display: block; margin: auto;" /> ### `devtools::install_github("pridiltal/outstable")` --- class: center, top # <span style="color: orange">outstable</span> ## <span style="color: orange">TABLE</span> of <span style="color: orange">OUT</span>liers in <span style="color: orange">T</span>ime <span style="color: orange">S</span>eries Data
<i class="fas fa-wrench fa-2x faa-shake animated " data-fa-transform="grow-20 " style=" color:orange;"></i>
<img src="figure/6_hex.png" width="130%" style="display: block; margin: auto;" /> --- ## Outliers in Time Series Data <img src="figure/7_outtype.png" width="95%" style="display: block; margin: auto;" /> --- ## Outliers in Time Series Data <img src="figure/8_outtype_oddstream.png" width="95%" style="display: block; margin: auto;" /> --- ## Outliers in Time Series Data <img src="figure/9_outtype_outstable.png" width="95%" style="display: block; margin: auto;" /> --- ## Outlier Detection in Time Series Data ## Main contributions - This work develops a framework for detecting outliers in tidy time series data. -- - The algorithm works with tidy temporal data provided by the `tsibble` package and produces an **`outstable`**, a tsibble with flagged anomalies and their degree of outlierness. -- - The proposed framework can also provide a cleansed tsibble that closely integrates with the tidy forecasting workflow used in the `tidyverts` toolbox. -- - Data driven outlier threshold with a valid probabilistic interpretation -- ## What is an outlier ? -- - We define an outlier as an observation that is very unlikely given the forecast distribution. -- - Outlier is a rare observation which has a very low chance of occurrence with respect to the **typical behaviour** of the time series. --- <img src="figure/Model1.png" width="65%" style="display: block; margin: auto;" /> --- <img src="figure/Model2.png" width="66%" style="display: block; margin: auto;" /> -- ## Forecast combinations - Use several different methods on the same time series, and average the resulting forecasts (Hyndman & Athanasopoulos, 2021) -- - Dramatic performance improvements by simply averaging the forecasts (Clemen, 1989) -- - Combining forecasts often leads to **better forecast accuracy** (Bates & Granger, 1969) <!--- Use several different methods on the same time series, and to average the resulting forecasts (Hyndman & Athanasopoulos, 2021). Nearly 50 years ago, John Bates and Clive Granger wrote a famous paper (Bates & Granger, 1969), showing that combining forecasts often leads to better forecast accuracy. Twenty years later, Clemen (1989) wrote "The results have been virtually unanimous: combining multiple forecasts leads to increased forecast accuracy. In many cases one can make dramatic performance improvements by simply averaging the forecasts." - It has been well-known since at least 1969, when Bates and Granger wrote their famous paper on "The Combination of Forecasts", that combining forecasts often leads to better forecast accuracy. https://robjhyndman.com/hyndsight/forecast-combinations/ https://otexts.com/fpp2/combinations.html - One forecast is based on variables or information that the other forecast has considered - The forecast makes a different assumption about the form of the relationship between the variables --> --- # Outstable <img src="figure/original-1.png" style="display: block; margin: auto;" /> --- # outstable <img src="figure/outliers-1.png" style="display: block; margin: auto;" /> --- ## Outstable: Outlier Threshold Calculation <img src="figure/combfor-1.png" width="1283" style="display: block; margin: auto;" /> --- ## Outstable - Visualize residuals with true outlier <img src="figure/res-density.png" width="80%" height="60%" style="display: block; margin: auto;" /> --- ## Anomalous threshold calculation - Estimate the probability density function of the residual series `\(\longrightarrow\)` Kernel density estimation. -- - Draw a large number N of extremes `\((arg min_{x\in X}[f(x)])\)` from **the high density region** of the estimated probability density function. -- - Define a `\(\Psi\)`-transform space, using the `\(\Psi\)`-transformation defined by (Clifton et al., 2011) `$$\Psi[{f_{2}}(\mathbf{x})]=\;\begin{cases}(-2ln({f}(\mathbf{x}))-2ln(2\pi))^{1/2},& {f}(\mathbf{x}) < (2\pi)^{-1}\\ 0, & {f}(\mathbf{x}) \ge (2\pi)^{-1}.\end{cases}$$` -- - `\(\Psi\)`-transform maps the density values back into space into which a Gumbel distribution can be fitted. -- - Fit a Gumbel distribution to the resulting `\(\Psi[{f}(\mathbf{x})]\)` values. The Gumbel parameter values are obtained via maximum likelihood estimation. -- - Determine the anomalous threshold using the corresponding univariate CDF, `\(F^{e}\)` in the transformed `\(\Psi\)`-space. -- - Density based, data driven anomalous threshold `\(\longrightarrow\)` extreme value theory --- ## `outstable:detect_outliers` .pull-left[ ```r # devtools::install_github(pridiltal/outstable) *library(outstable) ``` ].pull-right[ ] --- ## `outstable:detect_outliers` .pull-left[ ```r # devtools::install_github(pridiltal/outstable) library(outstable) *data %>% * tsibble::as_tsibble(index = time) ``` ].pull-right[ ``` ## # A tsibble: 13,124 x 2 [1h] <UTC> ## time value ## <dttm> <dbl> ## 1 2006-12-31 19:00:00 20601 ## 2 2006-12-31 20:00:00 20377 ## 3 2006-12-31 21:00:00 20745 ## 4 2006-12-31 22:00:00 21648 ## 5 2006-12-31 23:00:00 23220 ## 6 2007-01-01 00:00:00 22846 ## 7 2007-01-01 01:00:00 21856 ## 8 2007-01-01 02:00:00 20912 ## 9 2007-01-01 03:00:00 20005 ## 10 2007-01-01 04:00:00 18592 ## # … with 13,114 more rows ``` ] --- ## `outstable:detect_outliers` .pull-left[ ```r # devtools::install_github(pridiltal/outstable) library(outstable) data %>% tsibble::as_tsibble(index = time) %>% * outstable::detect_outliers( * variable = "value", * cmbn_model = c("lm", "theta","fasster"), * p_rate = 0.01) ``` ].pull-right[ ``` ## # A tsibble: 13,124 x 4 [1h] <UTC> ## time value .outscore .outtype ## <dttm> <dbl> <dbl> <fct> ## 1 2006-12-31 19:00:00 20601 0.146 typical ## 2 2006-12-31 20:00:00 20377 0.0727 typical ## 3 2006-12-31 21:00:00 20745 0.175 typical ## 4 2006-12-31 22:00:00 21648 0.276 typical ## 5 2006-12-31 23:00:00 23220 0.482 typical ## 6 2007-01-01 00:00:00 22846 0.332 typical ## 7 2007-01-01 01:00:00 21856 0.286 typical ## 8 2007-01-01 02:00:00 20912 0.307 typical ## 9 2007-01-01 03:00:00 20005 0.333 typical ## 10 2007-01-01 04:00:00 18592 0.228 typical ## # … with 13,114 more rows ``` ] --- ## `outstable::cleanse_data()` .pull-left[ ```r # devtools::install_github(pridiltal/outstable) library(outstable) data %>% tsibble::as_tsibble(index = time) %>% outstable::detect_outliers( variable = "value", cmbn_model = c("lm", "theta","fasster"), p_rate = 0.01) %>% * outstable::cleanse_data( * variable = "value", * impute = "linear") ``` ].pull-right[ ``` ## # A tsibble: 13,124 x 5 [1h] <UTC> ## time value .outscore .outtype .altered ## <dttm> <dbl> <dbl> <fct> <dbl> ## 1 2006-12-31 19:00:00 20601 0.146 typical 20601 ## 2 2006-12-31 20:00:00 20377 0.0727 typical 20377 ## 3 2006-12-31 21:00:00 20745 0.175 typical 20745 ## 4 2006-12-31 22:00:00 21648 0.276 typical 21648 ## 5 2006-12-31 23:00:00 23220 0.482 typical 23220 ## 6 2007-01-01 00:00:00 22846 0.332 typical 22846 ## 7 2007-01-01 01:00:00 21856 0.286 typical 21856 ## 8 2007-01-01 02:00:00 20912 0.307 typical 20912 ## 9 2007-01-01 03:00:00 20005 0.333 typical 20005 ## 10 2007-01-01 04:00:00 18592 0.228 typical 18592 ## # … with 13,114 more rows ``` ] --- ## tidyverts: Tidy tools for time series .pull-left[ ```r # devtools::install_github(pridiltal/outstable) library(outstable) data %>% tsibble::as_tsibble(index = time) %>% outstable::detect_outliers( variable = "value", cmbn_model = c("lm", "theta","fasster"), p_rate = 0.01) %>% outstable::cleanse_data( variable = "value", impute = "linear") %>% * fabletools::autoplot(.altered) ``` ].pull-right[ <img src="figure/altered-1.png" style="display: block; margin: auto;" /> ] --- ## tidyverts: Tidy tools for time series .pull-left[ ```r # devtools::install_github(pridiltal/outstable) library(outstable) data %>% tsibble::as_tsibble(index = time) %>% outstable::detect_outliers( variable = "value", cmbn_model = c("lm", "theta","fasster"), p_rate = 0.01) %>% outstable::cleanse_data( variable = "value", impute = "linear") %>% * fabletools::autoplot(.altered) ``` ].pull-right[ <img src="figure/altered2-1.png" style="display: block; margin: auto;" /> <img src="figure/original3-1.png" style="display: block; margin: auto;" /> ] --- <img src="figure/timetk.png" width="100%" style="display: block; margin: auto;" /> --- ## Load forecasting comparison .pull-left[ <img src="figure/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> ].pull-right[ <img src="figure/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> ] --- # What next? - Combinations forecast with **robust methods** for time series -- - Incorporating weights when combining the forecasts -- - Extend the algorithm to work with Multivariate time series and High-Dimensional Tensor Time Series -- - Controlling **both** false positive and false negative rate -- - Do more experiments on density estimation methods to get a better tail estimation. --- class: center, middle, inverse # Thank You ### `devtools::install_github("pridiltal/outstable")` <img src="figure/5_hex-outstable.png" width="15%" style="display: block; margin: auto;" /> Slides available at: prital.netlify.app
<i class="fas fa-wrench faa-passing animated " data-fa-transform="grow-20 " style=" color:orange;"></i>
priyangad@uom.lk
pridiltal
@pridiltal <br/> <sub><sup>Slides created via the R package xaringan</sup></sub> --- ## CRAN Task View: Anomaly Detection with R <img src="figure/ctv.png" width="100%" style="display: block; margin: auto;" />