Tidy Time Series Anomaly Detection for Load Forecasting
 Priyanga Dilini Talagala41st International Symposium on Forecasting
 
30.07.20211

Image credit: picxbay

Tidy forecasting workflow (Hyndman & Athanasopoulos, 2021)

Tidy Time Series Anomaly Detection

outstable

TABLE of OUTliers in Time Series Data

`devtools::install_github("pridiltal/outstable")`

outstable

TABLE of OUTliers in Time Series Data

Outliers in Time Series Data

Outlier Detection in Time Series DataMain contributionsThis work develops a framework for detecting outliers in tidy time series data.
11

Outlier Detection in Time Series DataMain contributionsThis work develops a framework for detecting outliers in tidy time series data.
The algorithm works with tidy temporal data provided by the tsibble package and produces an outstable, a tsibble with flagged anomalies and their degree of outlierness.
11

Outlier Detection in Time Series DataMain contributionsThis work develops a framework for detecting outliers in tidy time series data.
The algorithm works with tidy temporal data provided by the tsibble package and produces an outstable, a tsibble with flagged anomalies and their degree of outlierness.
The proposed framework can also provide a cleansed tsibble that closely integrates with the tidy forecasting workflow used in the tidyverts toolbox.
11

Outlier Detection in Time Series DataMain contributionsThis work develops a framework for detecting outliers in tidy time series data.
The algorithm works with tidy temporal data provided by the tsibble package and produces an outstable, a tsibble with flagged anomalies and their degree of outlierness.
The proposed framework can also provide a cleansed tsibble that closely integrates with the tidy forecasting workflow used in the tidyverts toolbox.
Data driven outlier threshold with a valid probabilistic interpretation
11

Outlier Detection in Time Series DataMain contributionsThis work develops a framework for detecting outliers in tidy time series data.
The algorithm works with tidy temporal data provided by the tsibble package and produces an outstable, a tsibble with flagged anomalies and their degree of outlierness.
The proposed framework can also provide a cleansed tsibble that closely integrates with the tidy forecasting workflow used in the tidyverts toolbox.
Data driven outlier threshold with a valid probabilistic interpretationWhat is an outlier ?
11

Outlier Detection in Time Series DataMain contributionsThis work develops a framework for detecting outliers in tidy time series data.
The algorithm works with tidy temporal data provided by the tsibble package and produces an outstable, a tsibble with flagged anomalies and their degree of outlierness.
The proposed framework can also provide a cleansed tsibble that closely integrates with the tidy forecasting workflow used in the tidyverts toolbox.
Data driven outlier threshold with a valid probabilistic interpretationWhat is an outlier ?
We define an outlier as an observation that is very unlikely given the forecast distribution.
11

Outlier Detection in Time Series DataMain contributionsThis work develops a framework for detecting outliers in tidy time series data.
The algorithm works with tidy temporal data provided by the tsibble package and produces an outstable, a tsibble with flagged anomalies and their degree of outlierness.
The proposed framework can also provide a cleansed tsibble that closely integrates with the tidy forecasting workflow used in the tidyverts toolbox.
Data driven outlier threshold with a valid probabilistic interpretationWhat is an outlier ?
We define an outlier as an observation that is very unlikely given the forecast distribution.
Outlier is a rare observation which has a very low chance of occurrence with respect to the typical behaviour of the time series.
11

Forecast combinations

Use several different methods on the same time series, and average the resulting forecasts (Hyndman & Athanasopoulos, 2021)

Forecast combinations

Use several different methods on the same time series, and average the resulting forecasts (Hyndman & Athanasopoulos, 2021)
Dramatic performance improvements by simply averaging the forecasts (Clemen, 1989)

Forecast combinations

Use several different methods on the same time series, and average the resulting forecasts (Hyndman & Athanasopoulos, 2021)
Dramatic performance improvements by simply averaging the forecasts (Clemen, 1989)
Combining forecasts often leads to better forecast accuracy (Bates & Granger, 1969)

Outstable

outstable

Outstable: Outlier Threshold Calculation

Outstable - Visualize residuals with true outlier

Anomalous threshold calculationEstimate the probability density function of the residual series ⟶⟶ Kernel density estimation.
18

Anomalous threshold calculationEstimate the probability density function of the residual series ⟶⟶ Kernel density estimation.
Draw a large number N of extremes (argminx∈X[f(x)])(argminx∈X[f(x)]) from the high density region of the  estimated probability density function.
18

Anomalous threshold calculation

Estimate the probability density function of the residual series $⟶$ Kernel density estimation.
Draw a large number N of extremes $(a r g m i n_{x \in X} [f (x)])$ from the high density region of the estimated probability density function.
Define a $Ψ$ -transform space, using the $Ψ$ -transformation defined by (Clifton et al., 2011)

$Ψ [f_{2} (x)] = {\begin{cases} (- 2 l n (f (x)) - 2 l n (2 π))^{1 / 2}, & f (x) < (2 π)^{- 1} \\ 0, & f (x) \geq (2 π)^{- 1} . \end{cases}$

Anomalous threshold calculation

Estimate the probability density function of the residual series $⟶$ Kernel density estimation.
Draw a large number N of extremes $(a r g m i n_{x \in X} [f (x)])$ from the high density region of the estimated probability density function.
Define a $Ψ$ -transform space, using the $Ψ$ -transformation defined by (Clifton et al., 2011)

$Ψ [f_{2} (x)] = {\begin{cases} (- 2 l n (f (x)) - 2 l n (2 π))^{1 / 2}, & f (x) < (2 π)^{- 1} \\ 0, & f (x) \geq (2 π)^{- 1} . \end{cases}$

$Ψ$ -transform maps the density values back into space into which a Gumbel distribution can be fitted.

Anomalous threshold calculation

Estimate the probability density function of the residual series $⟶$ Kernel density estimation.
Draw a large number N of extremes $(a r g m i n_{x \in X} [f (x)])$ from the high density region of the estimated probability density function.
Define a $Ψ$ -transform space, using the $Ψ$ -transformation defined by (Clifton et al., 2011)

$Ψ [f_{2} (x)] = {\begin{cases} (- 2 l n (f (x)) - 2 l n (2 π))^{1 / 2}, & f (x) < (2 π)^{- 1} \\ 0, & f (x) \geq (2 π)^{- 1} . \end{cases}$

$Ψ$ -transform maps the density values back into space into which a Gumbel distribution can be fitted.
Fit a Gumbel distribution to the resulting $Ψ [f (x)]$ values. The Gumbel parameter values are obtained via maximum likelihood estimation.

Anomalous threshold calculation

Estimate the probability density function of the residual series $⟶$ Kernel density estimation.
Draw a large number N of extremes $(a r g m i n_{x \in X} [f (x)])$ from the high density region of the estimated probability density function.
Define a $Ψ$ -transform space, using the $Ψ$ -transformation defined by (Clifton et al., 2011)

$Ψ [f_{2} (x)] = {\begin{cases} (- 2 l n (f (x)) - 2 l n (2 π))^{1 / 2}, & f (x) < (2 π)^{- 1} \\ 0, & f (x) \geq (2 π)^{- 1} . \end{cases}$

$Ψ$ -transform maps the density values back into space into which a Gumbel distribution can be fitted.
Fit a Gumbel distribution to the resulting $Ψ [f (x)]$ values. The Gumbel parameter values are obtained via maximum likelihood estimation.
Determine the anomalous threshold using the corresponding univariate CDF, $F^{e}$ in the transformed $Ψ$ -space.

Anomalous threshold calculation

Estimate the probability density function of the residual series $⟶$ Kernel density estimation.
Draw a large number N of extremes $(a r g m i n_{x \in X} [f (x)])$ from the high density region of the estimated probability density function.
Define a $Ψ$ -transform space, using the $Ψ$ -transformation defined by (Clifton et al., 2011)

$Ψ [f_{2} (x)] = {\begin{cases} (- 2 l n (f (x)) - 2 l n (2 π))^{1 / 2}, & f (x) < (2 π)^{- 1} \\ 0, & f (x) \geq (2 π)^{- 1} . \end{cases}$

$Ψ$ -transform maps the density values back into space into which a Gumbel distribution can be fitted.
Fit a Gumbel distribution to the resulting $Ψ [f (x)]$ values. The Gumbel parameter values are obtained via maximum likelihood estimation.
Determine the anomalous threshold using the corresponding univariate CDF, $F^{e}$ in the transformed $Ψ$ -space.
Density based, data driven anomalous threshold $⟶$ extreme value theory

outstable:detect_outliers# devtools::install_github(pridiltal/outstable)
library(outstable)

19

outstable:detect_outliers# devtools::install_github(pridiltal/outstable)
library(outstable) 
data %>%
  tsibble::as_tsibble(index = time)

## # A tsibble: 13,124 x 2 [1h] <UTC>
##    time                value
##    <dttm>              <dbl>
##  1 2006-12-31 19:00:00 20601
##  2 2006-12-31 20:00:00 20377
##  3 2006-12-31 21:00:00 20745
##  4 2006-12-31 22:00:00 21648
##  5 2006-12-31 23:00:00 23220
##  6 2007-01-01 00:00:00 22846
##  7 2007-01-01 01:00:00 21856
##  8 2007-01-01 02:00:00 20912
##  9 2007-01-01 03:00:00 20005
## 10 2007-01-01 04:00:00 18592
## # … with 13,114 more rows
20

outstable:detect_outliers# devtools::install_github(pridiltal/outstable)
library(outstable) 
data %>% 
  tsibble::as_tsibble(index = time) %>%
   outstable::detect_outliers(
      variable = "value",
      cmbn_model = c("lm", "theta","fasster"),
      p_rate = 0.01)

## # A tsibble: 13,124 x 4 [1h] <UTC>
##    time                value .outscore .outtype
##    <dttm>              <dbl>     <dbl> <fct>   
##  1 2006-12-31 19:00:00 20601    0.146  typical 
##  2 2006-12-31 20:00:00 20377    0.0727 typical 
##  3 2006-12-31 21:00:00 20745    0.175  typical 
##  4 2006-12-31 22:00:00 21648    0.276  typical 
##  5 2006-12-31 23:00:00 23220    0.482  typical 
##  6 2007-01-01 00:00:00 22846    0.332  typical 
##  7 2007-01-01 01:00:00 21856    0.286  typical 
##  8 2007-01-01 02:00:00 20912    0.307  typical 
##  9 2007-01-01 03:00:00 20005    0.333  typical 
## 10 2007-01-01 04:00:00 18592    0.228  typical 
## # … with 13,114 more rows
21

outstable::cleanse_data()# devtools::install_github(pridiltal/outstable)
library(outstable) 
data %>% 
  tsibble::as_tsibble(index = time) %>%
   outstable::detect_outliers( 
      variable = "value", 
      cmbn_model = c("lm", "theta","fasster"), 
      p_rate = 0.01)  %>%
   outstable::cleanse_data(
    variable = "value",
    impute = "linear")

## # A tsibble: 13,124 x 5 [1h] <UTC>
##    time                value .outscore .outtype .altered
##    <dttm>              <dbl>     <dbl> <fct>       <dbl>
##  1 2006-12-31 19:00:00 20601    0.146  typical     20601
##  2 2006-12-31 20:00:00 20377    0.0727 typical     20377
##  3 2006-12-31 21:00:00 20745    0.175  typical     20745
##  4 2006-12-31 22:00:00 21648    0.276  typical     21648
##  5 2006-12-31 23:00:00 23220    0.482  typical     23220
##  6 2007-01-01 00:00:00 22846    0.332  typical     22846
##  7 2007-01-01 01:00:00 21856    0.286  typical     21856
##  8 2007-01-01 02:00:00 20912    0.307  typical     20912
##  9 2007-01-01 03:00:00 20005    0.333  typical     20005
## 10 2007-01-01 04:00:00 18592    0.228  typical     18592
## # … with 13,114 more rows
22

tidyverts: Tidy tools for time series

# devtools::install_github(pridiltal/outstable)
library(outstable) 
data %>% 
  tsibble::as_tsibble(index = time) %>%
   outstable::detect_outliers( 
     variable = "value", 
     cmbn_model = c("lm", "theta","fasster"), 
     p_rate = 0.01)  %>%
   outstable::cleanse_data( 
     variable = "value",
     impute = "linear") %>%
  fabletools::autoplot(.altered)

tidyverts: Tidy tools for time series

# devtools::install_github(pridiltal/outstable)
library(outstable) 
data %>% 
  tsibble::as_tsibble(index = time) %>%
   outstable::detect_outliers( 
     variable = "value", 
     cmbn_model = c("lm", "theta","fasster"), 
     p_rate = 0.01)  %>%
   outstable::cleanse_data( 
     variable = "value",
     impute = "linear") %>%
  fabletools::autoplot(.altered)

Load forecasting comparison

What next?Combinations forecast with robust methods for time series
27

What next?Combinations forecast with robust methods for time series
Incorporating weights when combining the forecasts 
27

What next?Combinations forecast with robust methods for time series
Incorporating weights when combining the forecasts 
Extend the algorithm to work with Multivariate time series and High-Dimensional Tensor Time Series
27

What next?Combinations forecast with robust methods for time series
Incorporating weights when combining the forecasts 
Extend the algorithm to work with Multivariate time series and High-Dimensional Tensor Time Series
Controlling both false  positive and false negative rate
27

What next?Combinations forecast with robust methods for time series
Incorporating weights when combining the forecasts 
Extend the algorithm to work with Multivariate time series and High-Dimensional Tensor Time Series
Controlling both false  positive and false negative rate
Do more experiments on density estimation methods to get a better tail estimation.
27

Thank You

`devtools::install_github("pridiltal/outstable")`

Slides available at: prital.netlify.app

priyangad@uom.lk

pridiltal

@pridiltal

_{^{Slides created via the R package xaringan}}

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Tidy Time Series Anomaly Detection for Load Forecasting

Priyanga Dilini Talagala

41st International Symposium on Forecasting 30.07.2021

Tidy forecasting workflow (Hyndman & Athanasopoulos, 2021)

Tidy Time Series Anomaly Detection

outstable

TABLE of OUTliers in Time Series Data

devtools::install_github("pridiltal/outstable")

outstable

TABLE of OUTliers in Time Series Data

Outliers in Time Series Data

Outliers in Time Series Data

Outliers in Time Series Data

Outlier Detection in Time Series Data

Main contributions

Outlier Detection in Time Series Data

Main contributions

Outlier Detection in Time Series Data

Main contributions

Outlier Detection in Time Series Data

Main contributions

Outlier Detection in Time Series Data

Main contributions

What is an outlier ?

Outlier Detection in Time Series Data

Main contributions

What is an outlier ?

Outlier Detection in Time Series Data

Main contributions

What is an outlier ?

Forecast combinations

Forecast combinations

Forecast combinations

Outstable

outstable

Outstable: Outlier Threshold Calculation

Outstable - Visualize residuals with true outlier

Anomalous threshold calculation

Anomalous threshold calculation

Anomalous threshold calculation

Anomalous threshold calculation

Anomalous threshold calculation

Anomalous threshold calculation

Anomalous threshold calculation

outstable:detect_outliers

outstable:detect_outliers

outstable:detect_outliers

outstable::cleanse_data()

tidyverts: Tidy tools for time series

tidyverts: Tidy tools for time series

Load forecasting comparison

What next?

What next?

What next?

What next?

What next?

Thank You

devtools::install_github("pridiltal/outstable")

CRAN Task View: Anomaly Detection with R

Help

41st International Symposium on Forecasting

30.07.2021

`devtools::install_github("pridiltal/outstable")`

`outstable:detect_outliers`

`outstable:detect_outliers`

`outstable:detect_outliers`

`outstable::cleanse_data()`

`devtools::install_github("pridiltal/outstable")`