Categories
blog

Pitfalls of time-series correlation

One of the common pitfalls time-series analysis is spurious correlation. Lets take an example of sales vs social media impressions:

Visually, both sales and impressions appear to be moving together and correlated. Here is a scatterplot to further visualise this:

Boom! near perfect correlation and R-square of 87%. All the more reason to spend more on social media to get more impressions and drive sales! Well, not so fast, as as time-series correlation is more nuanced than this.

We have ignored the ‘trend’ part of the equation. As both series are trending upwards, it is the trend that is correlated, not necessarily the actual data. In statistical terms, this is known as non-stationarity, where the distribution (mean and variance) changes with time. To get the actual correlation in non stationary cases, the data needs to be made stationary. A common approach to do that is via ‘first-differencing’.

First differencing is the change in value between consecutive time periods:

First five observations

If we plot it on a time series, the trend part is neutralised and we are now looking at a much more random data where correlation doesn’t appear to be strong.

A scatter plot confirms this:

Weak correlation here with R-square of only 11%.

Slight change in analysis but 180-degree change in conclusion!