Changepoints detection in time series in R

Solution 1:

If the signal isn't too noisy, you could use diff to detect changepoints in slope instead of mean:

library(changepoint)

set.seed(1)
slope <- rep(sample(10,10)-5,sample(100,10))
sig <- cumsum(slope)+runif(n=length(slope),min = -1, max = 1)
cpt =cpt.mean(diff(sig),method="PELT")

# Show change point
(cpt.pts <- attributes(cpt)$cpts)
#> [1]  58 109 206 312 367 440 447 520 599

plot(sig,type="l")
lines(x=cpt.pts,y=sig[cpt.pts],type="p",col="red",cex=2)

enter image description here

Another option which seems to work better with the data you provided is to use piecewise linear segmentation:

library(ifultools)
changepoints <- linearSegmentation(x=1:length(data),y=data,angle.tolerance = 90,n.fit=10,plot=T)
changepoints
#[1]  13  24  36  58  72 106

enter image description here

Solution 2:

In R, there are many packages available for time series changepoint detection. changepoint is definitely a very useful one. A partial list of the packages is summarized in CRAN Task View:

Change point detection is provided in strucchange (using linear regression models), and in trend (using nonparametric tests). The changepoint package provides many popular changepoint methods, and ecp does nonparametric changepoint detection for univariate and multivariate series. changepoint.np implements the nonparametric PELT algorithm, while changepoint.mv detects changepoints in multivariate time series. InspectChangepoint uses sparse projection to estimate changepoints in high-dimensional time series. robcp provides robust change-point detection using Huberized cusum tests, and Rbeast provides Bayesian change-point detection and time series decomposition.

Here is also a great blog comparing several alternative packages: https://www.marinedatascience.co/blog/2019/09/28/comparison-of-change-point-detection-methods/. Another impressive comparison is from Dr. Jonas Kristoffer Lindeløv who developed the mcp package: https://lindeloev.github.io/mcp/articles/packages.html.

Below I used your sample time series to generate some quick results using the Rbeast package developed by myself (chosen here apparently for ego of self-promoting as well as perceived relvance). Rbeast is a Baysian changepoint detection algorithm and it can estimate the probability of changepoint occurrence. It can also be used for decomposing time series into seasonality and trend, but apparently, your time series is trend-only, so in the beast function below, season='none' is specified.

y = c(10.695,10.715,10.700,10.665,10.830,10.830,10.800,11.070,11.145,11.270,11.015,11.060,10.945,10.965,10.780,10.735,10.705,
    10.680,10.600,10.335,10.220,10.125,10.370,10.595,10.680,11.000,10.980,11.065,11.060,11.355,11.445,11.415,11.350,11.310,11.330,
    11.360,11.445,11.335,11.275,11.300,11.295,11.470,11.445,11.325,11.300,11.260,11.200,11.210,11.230,11.240,11.300,11.250,11.285,
    11.215,11.260,11.395,11.410,11.235,11.320,11.475,11.470,11.685,11.740,11.740,11.700,11.905,11.720,12.230,12.285,12.505,12.410,
    11.995,12.110,12.005,11.915,11.890,11.820,11.730,11.700,11.660,11.685,11.615,11.360,11.425,11.185,11.275,11.265,11.375,11.310,
    11.250,11.050,10.880,10.775,10.775,10.805,10.755,10.595,10.700,10.585,10.510,10.290,10.255,10.395,10.290,10.425,10.405,10.365,
    10.010,10.305,10.185,10.400,10.700,10.725,10.875,10.750,10.760,10.905,10.680,10.670,10.895,10.790,10.990,10.925,10.980,10.975,
    11.035,10.895,10.985,11.035,11.295,11.245,11.535 ,11.510,11.430,11.450,11.390,11.520,11.585)

library(Rbeast)
out=beast(y, season='none')
plot(out)
print(out)

Sample out from the script above

In the figure above, dashed vertical lines mark the most likely locations of changepoints; the green curve of Pr(tcp) shows the point-wise probability of changepoint occurrence over time. The order_t curve gives the estimated mean order of the piecewise polynomials needed to adequately fit the trend (the 0-th order is constant and the 1st order is linear): An average order toward 0 means that the trend is more likely to be flat and an order close to 1 means that the trend is linear. The output can be also printed as some ascii outputs, as shown below. Again, it says that the time series is most likely to have 8 changepoints; their most probable locations are given in out$trend$cp.

Result for time series #1 (total number of time series in 'out': 1)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+                     SEASONAL CHANGEPOINTS                    +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


 No seasonal/periodic component present (i.e., season='none')


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+                     TREND CHANGEPOINTS                       +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


An ascii plot of the probability dist for number of chgpts(ncp)
---------------------------------------------------------------
Pr(ncp=0 )=0.000|*                                            |
Pr(ncp=1 )=0.000|*                                            |
Pr(ncp=2 )=0.000|*                                            |
Pr(ncp=3 )=0.000|*                                            |
Pr(ncp=4 )=0.000|*                                            |
Pr(ncp=5 )=0.000|*                                            |
Pr(ncp=6 )=0.055|*****                                        |
Pr(ncp=7 )=0.074|******                                       |
Pr(ncp=8 )=0.575|******************************************** |
Pr(ncp=9 )=0.240|*******************                          |
Pr(ncp=10)=0.056|*****                                        |
---------------------------------------------------------------
Max ncp : 10   | A parameter you set (e.g., maxTrendKnotNum)  |
Mode ncp: 8    | Pr(ncp= 8)=0.57; there is a 57.5% probability|
           | that the trend componet has  8 chngept(s).   |
Avg ncp : 8.17 | Sum[ncp*Pr(ncp)]                             |
---------------------------------------------------------------

List of most probable trend changepoints (avg number of changpts: 8.17) 
--------------------------------.
tcp# |time (cp)      |prob(cpPr)|
-----|---------------|----------|
1    |8.0000         |   0.92767|
2    |112.0000       |   0.91433|
3    |68.0000        |   0.84213|
4    |21.0000        |   0.80188|
5    |32.0000        |   0.78171|
6    |130.0000       |   0.76938|
7    |101.0000       |   0.66404|
8    |62.0000        |   0.61171|
--------------------------------'