Batch scenario
Batch scenario
Data stream scenario
stray (Search and TRace AnomalY)
on CRAN
devtools::install_github("pridiltal/stray")
Recently, Kate Buchhorn has ported stray algorithms to Python and made it available in sktime:
Use extreme value theory (EVT) to calculate a data driven outlier threshold.
Let n be the size of the dataset
Use extreme value theory (EVT) to calculate a data driven outlier threshold.
Let n be the size of the dataset
Sort the resulting n outlier scores
Use extreme value theory (EVT) to calculate a data driven outlier threshold.
Let n be the size of the dataset
Sort the resulting n outlier scores
Consider the half of the outlier scores with the smallest values as typical
Use extreme value theory (EVT) to calculate a data driven outlier threshold.
Let n be the size of the dataset
Sort the resulting n outlier scores
Consider the half of the outlier scores with the smallest values as typical
Search for any significant large gap in the upper tail (Bottom up searching algorithm proposed by Schwarz, 2008)
Let X1,X2,...,Xn be a sample from a distribution function F .
Let X1:n≥X2:n≥...≥Xn:n be the order statistics.
The available data are X1:n,X2:n,...,Xk:n for some fixed k.
Let Di,n=Xi:n−Xi+1:n, (i=1,2,...,k) be the spacing between successive order statistics.
If F is in the maximum domain of attraction of the Gumbel distribution, then the spacings Di,n are asymptotically independent and exponentially distributed with mean proportional to i−1.
outliers <- find_HDoutliers(data)
display_HDoutliers(data, outliers)
Detect clusters of outlying points
Applied to both uni- and multi- dimensional data
Detect clusters of outlying points
Applied to both uni- and multi- dimensional data
Handle large datasets due to the use of approximate KNN searching algorithm
Detect clusters of outlying points
Applied to both uni- and multi- dimensional data
Handle large datasets due to the use of approximate KNN searching algorithm
Does not require a training set to build the decision model
Detect clusters of outlying points
Applied to both uni- and multi- dimensional data
Handle large datasets due to the use of approximate KNN searching algorithm
Does not require a training set to build the decision model
Deal with multimodal typical classes
Detect clusters of outlying points
Applied to both uni- and multi- dimensional data
Handle large datasets due to the use of approximate KNN searching algorithm
Does not require a training set to build the decision model
Deal with multimodal typical classes
Outlier threshold has a probabilistic interpretation
Mean
Variance
Changing variance in remainder
Level shift using rolling window
Variance change
Strength of linearity
Strength of curvature
Strength of spikiness
Burstiness of time series (Fano Factor)
Minimum
Maximum
The ratio between 50% trimmed mean and the arithmetic mean
Moment
Ratio of means of data that is below and above the global mean
tsfeatures <- oddstream::extract_tsfeatures(ts_data)
outliers <- stray::find_HDoutliers(tsfeatures)
stray::display_HDoutliers(tsfeatures, outliers)
oddstream
(Outlier Detection in Data STREAMs)
devtools::install_github("pridiltal/oddstream")
load(train_data)
load(train_data)
tsfeatures <- oddstream::extract_tsfeatures
(train_data)
tsfeatures <- oddstream::extract_tsfeatures
(train_data)
tsfeatures <- oddstream::extract_tsfeatures
(train_data)
pc<- oddstream::get_pc_space(tsfeatures)
oddstream::plotpc(pc$pcnorm)
oddstream::find_odd_streams(train_data, test_stream)
Priyanga Dilini Talagala, Rob J Hyndman, Kate Smith-Miles, (2020) Anomaly detection in high-dimensional data. Journal of Computational & Graphical Statistics, to appear
on CRAN
Priyanga Dilini Talagala, Rob J Hyndman, Kate Smith-Miles, Sevvandi Kandanaarachchi and Mario A Munoz (2020) Anomaly detection in streaming nonstationary temporal data. Journal of Computational & Graphical Statistics, 20(1), 13-27.
on CRAN
A stack of images or a videos - Image Time Series (ITS)
An ITS is basically a set of images of the same scene, ordered chronologically.
A stack of images or a videos - Image Time Series (ITS)
An ITS is basically a set of images of the same scene, ordered chronologically.
It can be encoded as a data-cube, two spatial and one temporal dimensions.
A stack of images or a videos - Image Time Series (ITS)
An ITS is basically a set of images of the same scene, ordered chronologically.
It can be encoded as a data-cube, two spatial and one temporal dimensions.
The acquisition of an ITS can be done with one or multiple sensors to obtain a larger data series with a high temporal frequency.
A stack of images or a videos - Image Time Series (ITS)
An ITS is basically a set of images of the same scene, ordered chronologically.
It can be encoded as a data-cube, two spatial and one temporal dimensions.
The acquisition of an ITS can be done with one or multiple sensors to obtain a larger data series with a high temporal frequency.
The produced 2D+t data carry rich spatial and temporal information that must be taken into account to understand particular phenomena not being observable from a single image of the sequence.
priyangad@uom.lk
pridiltal
prital.netlify.app
(Slides and papers available)
The slides are powered by xaringan
R package
This work was supported in part by RETINA research lab funded by the OWSD, a program unit of United Nations Educational, Scientific and Cultural Organization (UNESCO).
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |