Exponential smoothing for time series forecasting in R

Last Updated: 23 Apr 2019

Learn about exponential smoothing and the characteristics of time series data in order to create and validate time series forecasts in R.

In this blog post, I will show you how to create and validate exponential smoothing time series forecasts with the statistical software R. The post uses website traffic data from organic search as real-world example and R code snippets throughout. 

Depending on the type of online business, web traffic forecasts are usually of interest in regard to customer acquisition, advertisement revenues, and brand awareness. As part of the business planning, web traffic forecasts can inform resource allocation for different traffic channels such as for traffic from paid campaigns, content partnerships, email campaigns and/or other online marketing channels.

Exponential smoothing is one of the most popular time series forecasting techniques. It uses historical data with its inherent characteristics (more on that later) as input for the forecasting model, which means that time series forecasting techniques like this are generally most suitable and accurate if:

  • sufficient historical data is available,
  • the forecasted metric is relatively stable and does not underly extreme fluctuations, and
  • the forecasting horizon is short-term such as the next quarter or potentially six months.

Also, remember that even with sophistication and comprehensiveness applied to a given forecast, it still remains an educated guess. Forecasting is equally an art as it is a science. However, this post will also address how the validity of an exponential smoothing forecast with its underlying assumptions can be evaluated and understood. In summary, the post aims to educate on:

  • understanding the characteristics of time series data,
  • create exponential smoothing models, and
  • assess their validity and applicability to a given business scenario.

Now, let’s get to it and see how you can use exponential smoothing to produce forecasts. The post is organized as follows:

1. Characteristics of time series data

This chapter addresses the characteristics of time series data. A time series is a sequence of data points against successive equally spaced points in time. Examples from the business world include annual profits, quarterly sales revenue and monthly website traffic from organic search as used for forecasts later. The described characteristics and components of time series data are important as they represent factors of the later introduced exponential smoothing forecasting technique.

1.1 Time series components

Real-world time series data commonly contains several components or recognizable patterns. Those time series components can be divided into systematic components and non-systematic components. Systematic time series components are those that allow modeling as they have some consistent or recurring pattern. Examples for systematic time series components are (1) trend and (2) seasonality. Any other observation in the data is categorized as unsystematic and is called (3) error or remainder. The general magnitude of the time series is also referred to with “level”.

In the following, we will look at how to recognize each component in time series data.

Trend

A trend is a consistent increase or decrease in units of the underlying data. Consistent in this context means that there are consecutively increasing peaks during a so-called uptrend and consecutively decreasing lows during a so-called downtrend. Also, if there is no trend, this can be called stationary.

The trend component can also sometimes additionally reflect a cycle. A cyclic pattern is usually longer than a trend. In other words, a trend can be a pattern within a longer cycle. Hence, the trend component is sometimes also called “trend-cycle” component. Note that the dotted Uptrend, No trend and Downtrend lines in image 1 are not fitted regression lines but only loosely plotted lines for illustration.

time-series-component-trend
Img. 1: Trend in time series


Seasonality

Seasonality describes cyclical effects due to the time of the year. Daily data may show valleys on the weekend, weekly data may show peaks for the first week of every or some month(s) or monthly data may show lower levels for the winter or summer months respectively. It is also possible to have multiple seasonalities.

time-series-component-seasonality
Img. 2: Seasonality in time series


Error or Remainder

Error or remainder is the uncategorized component that is part of time series data and is not described by trend and seasonality.

time-series-component-error
Img. 3: Error/Remainder in time series

This has been the generic description of the respective time series components, the next section introduces some real-world data and shows the process of time series decomposition.

1.2 Time series decomposition

Time series decomposition describes the process of decomposing time series data into its components: (1) Trend, (2) Seasonality and (3) Error. It is used to better understand data and builds a basis for some time series forecasting approaches.

The data we will use in this blog post is real-world organic traffic data. Before we get to the time series decomposition, we will start with some required prework and plot the data for exploration. The data "organic-traffic.csv" (image 4) is a file from a Google Analytics export that has been formatted to consist of two columns.

google-analytics-organic-traffic-export
Img. 4: Google Analytics export “organic-traffic.csv”

The first column displays the month of the year and the second column displays the volume of website traffic from organic search (e.g. Google) measured in Google Analytics sessions. Image 4 shows only the first 5 months, but overall the file has 54 rows excluding the header which equates to 4.5 years (54 months) worth of monthly data.

The following R code converts the Month column in the appropriate format as well as the Organic Sessions column into a time series object of monthly data (frequency = 12) and plots the data as a line graph. Below that, in image 5, you can see the plot of the organic traffic .csv file that we retrieved from Google Analytics.

If you like to see the whole R code shared in snippets throughout this post, you can do so here.

plot-of-organic-traffic-data
Img. 5: Plot of organic traffic data

The first step for the time series decomposition is to determine whether the underlying seasonality in the time series data is additive or multiplicative. Additive seasonality means that the magnitude of the seasonal amplitude swings remains approximately the same throughout independently of the level of the time series. Multiplicative seasonality means that the magnitude of the seasonal amplitude changes in proportion to the level of the time series. The difference can be observed from the two plots in image 6. Also, note that the two time series in image 6 also have an additive and multiplicative trend, which may appear as the more visible feature at first glance.

additive-vs-multiplicative-seasonality
Img. 6: Additive vs. multiplicative seasonality

In the case of the “Organic_Traffic” data, the magnitude of seasonality increases proportionally with level suggesting multiplicative seasonality as can be seen in image 7.

increasing-magnitude-seasonality-level
Img. 7: Increasing magnitude of seasonality with level

Knowing that the seasonality of the “Organic_Traffic” data is multiplicative, we can proceed to the actual time series decomposition. There are various different methodologies of time series decomposition including classical, X11, SEATS and STL that come with varied advantages and disadvantages as well as requirements for the time unit of the data. In this case, we will use the classical time series decomposition as it is widely used while STL (Seasonal Decomposition of Time Series by Loess) is more sophisticated. It is executed in R by decompose requiring “additive” or “multiplicative” as input for the type argument, which refers to the seasonal component in the time series. The following R code decomposes the “Organic_Traffic” data, prints each component’s contribution (image 8) and plots it (image 9).

output-classical-time-series-decomposition
Img. 8: Output of classical time series decomposition

 

time-series-decomposition-plot
Img. 9: Plot of time series decomposition

Since the applied time series decomposition was multiplicative, the individual values for a month can be multiplied to yield the month-respective organic sessions count shown in image 4. The mathematical formula is:

\(y_{t} = S_{t} * T_{t} * R_{t}\)

For instance, see below the rounded calculation for \(t = 13\) representing January 2015 with 251,927 organic sessions:

\(251927_{13} \approx 1.0479508_{13} * 242858.6_{13} * 0.9898751_{13}\)

The steps of the decompose command that have led to those calculated values are as follows:

(1) Calculate the trend with a centered moving average

The moving average of order 3 for instance, calculates the average value of the 3 nearest neighboring values of the time series. If the moving average is centered, it means that the moving average is only calculated, if the neighboring values are organized symmetrically around the given observation of the time series. In the case of order 3, this means that there is no calculated moving average for the first observation but for the second observation as the latter is centered between the first and third observation. For order 5, the first calculated data point of the centered moving average will only exist for the third observation, which is centered among observations 1 to 5. The centering of the moving average causes it to not calculate data points for the beginning and ending of the underlying time series. This delayed start and early end of the trend line is what can be seen in image 9. Given that the first calculated data point of the trend line is in July 2014, 7th observation of the time series, we can conclude that decompose used an order of 13 (6 observations to the left of the centered moving average data point and 6 observations to its right).

(2) Remove the trend from the time series

In the second step, the trend line (based on the centered moving average with order 13 in this case) is removed from the time series. This is also called detrending. The remaining components in the time series are therefore seasonality and remainder.

(3) Calculate the average for each time unit over all time periods

With the trend removed from the time series, the average for each time unit over all time periods is taken. For the monthly “Organic_Traffic” data, this means that the average is calculated for all January observations, February observations and so on over all time periods. All time periods are 4.5 years in this case, which equates 5 observations for January to June and 4 observations for July to December. These average values for each time period are then centered to have a baseline of the time series level without seasonality. In the case of a multiplicative time series, the effect of seasonality for each time period is expressed by a factor. For example, the seasonality effect for December is 0.8500177 and for November 0.9979517. You can observe that the values of the seasonal component do repeat every 12 months as the seasonality considered exhibits the same annual effect for every year of the dataset. The calculated effect of the seasonality is therefore sensitive to the time window spanned by the dataset. Longer time spans of data will accordingly yield a better model of the seasonality effect.

The seasonality factors have very practical implications if you like to know whether traffic really decreased in one month compared to the previous beyond seasonality.

You can deseasonalize your traffic using the seasonality factors. For December 2014, this looks like this:

\(Dec\thinspace 2014\thinspace deseasonalized = \frac{206609_{12}} {0.8500177_{12}} = 243064.4\)

Accordingly, for November 2014 this looks like this:

\(Nov\thinspace 2014\thinspace deseasonalized =\frac{237982_{11}}{0.9979517_{11}} = 238470.5\)

If you looked at the “Organic_Traffic” numbers, you will see that December 2014 had 206,609 organic sessions while November 2014 had 237,982. So, you would conclude that December must have been an overall weaker month but if you factor out seasonality, December was actually a stronger month than November in terms of number of organic sessions.

(4) Calculate the error component

In the fourth step, the trend and seasonality are removed from the original time series to yield the error component.

In conclusion, time series decomposition allows you to understand your time series data at a deeper level. For instance, you can quickly identify the underlying trend and take it as the “growth rate” of the organic traffic or you can de-seasonalize data points to identify whether traffic really went down in a given month or whether it was due to seasonality only.

Do you like what you read? Subscribe to my blog below to read the latest posts.

2. Exponential smoothing for time series forecasts

Exponential smoothing is a popular forecasting method for short-term predictions. Such forecasts of future values are based on past data whereby the most recent observations are weighted more than less recent observations. As part of this weighting, constants are being smoothed. This is different from the simple moving average method, in which every data point has equal weight in the average calculation. Exponential smoothing introduces the idea of building a forecasted value as the average figure from differently weighted data points for the average calculation.

There are different exponential smoothing methods that differ from each other in the components of the time series that are modeled.

For instance, simple exponential smoothing (SES) uses only one smoothing constant, double exponential smoothing or Holt exponential smoothing uses two smoothing constants and triple exponential smoothing or Holt-Winters exponential smoothing accordingly uses three smoothing constants.

2.1 Simple exponential smoothing

Simple exponential smoothing assumes that the time series data has only a level and some error (or remainder) but no trend or seasonality. It is therefore not applicable to the “Organic_Traffic” data as that data has underlying seasonality and trend. Nevertheless, for illustration purposes, we will apply simple exponential smoothing to our organic traffic data in this section purely to explore its mechanics. For exponential smoothing, all past observations are part of the calculation for the forecasted value. The smoothing parameter \(\alpha\) determines the distribution of weights of past observations and with that how heavily a given time period is factored into the forecasted value. If the smoothing parameter is close to 1, recent observations carry more weight and if the smoothing parameter is closer to 0, weights of older and recent observations are more balanced. For example, in the case of \(\alpha\) = 0.5, the weight halves with every next older observation: 0.5, 0.25, 0.125, 0.0625, etc. As you can see in image 10, the forecasted value of the next period in the future is the sum of the respective products of Weight and past Organic Sessions for every previous time period.

simple-exponential-smoothing
Img. 10: Simple exponential smoothing

Here is also the mathematical notation for columns D and E respectively:

\(Weight = \alpha*(1-\alpha)^t\\Forecast = \sum\limits_{t=0}^n (Organic~Sessions_{t} * Weight_{t})\)

In this case, our freely chosen smoothing parameter \(\alpha\) = 0.5 yields a forecast of 591,069 Organic Sessions for the next future period.

While there are instances, in which a forecaster may want to set the smoothing parameter manually based on previous experience with a dataset, most commonly the smoothing parameter is automatically determined by optimizing for minimum forecasting error.

If we fit a simple exponential smoothing model to the “Organic_Traffic” data in R, then the smoothing parameter \(\alpha\) is automatically determined by optimizing for minimum forecasting error. The following R code fits a simple exponential smoothing model to the “Organic_Traffic” data and prints a summary including the calculated smoothing parameter \(\alpha\) and forecasting error measures. Note that instead of the ses(Organic_Traffic) function, you can also use ets(Organic_Traffic, model = "ANN") from the forecast package but in this section we go with the simpler ses function.

R-output-simple-exponential-smoothing
Img. 11: R output for simple exponential smoothing model

You can see in this output under Smoothing parameters that alpha = 0.6347. This means that 63.47% of the forecast are based on the most recent observation. Below that various Error measures that will be discussed later are shown and the summary ends with Forecasts. Since simple exponential smoothing assumes there is no trend or seasonality the forecasts for future time periods is at a certain level but flat in its development over time. Lo 80 and Hi 80 are the boundaries of an 80% confidence interval and accordingly, the wider bounds of Lo 95 and Hi 95 represent the 95% confidence interval (the higher the confidence, the wider the bounds). Also, note that with increasing distance into the future the confidence intervals widen. The plot function produces the plot below which represents the forecasted level for future time periods and visualizes the 80% and 95% confidence intervals.

simple-exponential-smoothing-plot
Img. 12: Simple exponential smoothing plot

We know from our time series decomposition that simple exponential smoothing is a too simplified time series forecasting method for our underlying data. By a visual check, you can see this confirmed above. This section had the purpose to demonstrate the mechanics of simple exponential smoothing in contrast to the Holt and Holt-Winters exponential smoothing methods that will follow next.

2.2 Holt exponential smoothing

Holt exponential smoothing is a time series forecasting approach that fits time series data with an overall level as well as a trend. Additionally, to simple exponential smoothing, which uses smoothing parameter \(\alpha\) only there is also a \(\beta\) smoothing parameter for the exponential decay of the modeled trend component. This \(\beta\) smoothing parameter ranges between 0 and 1, with higher values indicating more weight to recent observations. A \(\beta\) value of 0.5 means that the most recent observation’s trend component is weighted with 50% in the modeled trend slope. The mechanics of the \(\beta\) smoothing parameter are the same as described for the \(\alpha\) smoothing parameter previously. The difference is that they refer to different time series components, e.g. trend and level, respectively. The following R code does the same like code snippet 3-04 but this time a Holt exponential smoothing model is fitted to the “Organic_Traffic” data.

holt-exponential-smoothing-summary
Img. 13: Summary for holt exponential smoothing model

The output of the summary function will this time detect values for both the \(\alpha\) and \(\beta\) smoothing parameters by optimizing for minimum forecasting error. The forecasted values take trend into account and hence, you can see the increase in the values over time. The plot function produces the plot below based on the forecasted values and the confidence intervals.

holt-exponential-smoothing-plot
Img. 14: Holt exponential smoothing plot

The inclusion of the trend component in the model produces a forecast that takes the uptrend of the “Organic_Traffic” data accordingly into account. However, the seasonal fluctuations in the data are not part of the model. The method to do so in time series forecasting is called Holt-Winter exponential smoothing and is explained in the next section.

2.3 Holt-Winters exponential smoothing

Holt-Winters exponential smoothing is a time series forecasting approach that takes the overall level, trend and seasonality of the underlying dataset into account for its forecast. Hence, the Holt-Winters model has three smoothing parameters indicating the exponential decay from most recent to older observations: \(\alpha\) (alpha) for the level component, \(\beta\) (beta) for the trend component, and \(\gamma\) (gamma) for the seasonality component. The following R code fits the Holt-Winters model to the time series data. The argument h = 10 sets the forecasted time periods to 10 since its default is 24, while the default for the previously used functions ses and holt is 10. Also, the argument seasonal = "multiplicative" indicates that the seasonal component in the data increases in proportion to the level. You can see in the earlier plots that the downward movement of the level before the beginning of the year 2015, 2016, 2017 and 2018 (the “Christmas dip”) increases in its magnitude in proportion to the level. If the magnitude of this downward movement for those years would be roughly equal and not in proportion to the level, we would choose the argument seasonal = "additive". If in doubt, you can print each summary, for additive and multiplicative seasonality, and choose the one with the lower forecast error values.

holt-winters-exponential-smoothing-summary
Img. 15: Summary of Holt-Winters exponential smoothing

The summary function shows the estimates for \(\alpha\), \(\beta\), and also \(\gamma\) for the seasonality component. The plot function visualizes the forecast including seasonality.

holt-winters-exponential-smoothing-plot
Img. 16: Holt-Winters exponential smoothing plot

2.4 Automated exponential smoothing forecasts

The previous three sections introduced simple, Holt and Holt-Winters exponential smoothing. These three exponential smoothing models use different (combinations of) time series components (e.g. level, trend, seasonality) for their respective forecasts. There is also an automated exponential smoothing forecast that can be accomplished with the ets() function from the forecast package, already briefly mentioned in section 2.4. In this case, it is automatically determined whether a time series component is additive or multiplicative. The initial state variable and smoothing parameters that were shown earlier in image 11, 13 & 15 are optimized to yield the model by default with the least AICc (Akaike information criterion with a correction for small sample sizes) as chosen automated forecast model. The previously used functions ses(), holt() and hw() are convenience wrappers to the ets() function. The ets() function takes several arguments including model. The model argument takes three letters as input. The first letter denotes the error type (A, M or Z), the second letter denotes the trend type (N, A, M or Z), and the third letter denotes the seasonality type (N, A, M or Z). The letters stand for N = none, A = additive, M = multiplicative and Z = automatically selected. This means that ses(Organic_Traffic) is for instance the convenience wrapper to ets(Organic_Traffic, model = "ANN"), which fits a model with additive error and without trend or seasonality. Also, the ets() function can be used with Z for all time series components or without the model argument to fit the automated exponential smoothing forecast to the data. This means ets(Organic_Traffic) is the same like ets(Organic_Traffic, model = "ZZZ". One drawback of the ets() function is that no time horizon can be chosen. For visual consistency of the upcoming plot (image 18) with the plots of previous methods, we like to use a horizon of 10 time periods again. For this, we use the forecast() function, which is equivalent to the ets() function but it allows a forecast horizon argument as can be seen below.

summary-automated-exponential-smoothing
Img. 17: Summary of automated exponential smoothing

From the output in image 17, you can see an AICc value of 1338.090, which is lower than for all other presented exponential smoothing models. Also, the headline of the plot below contains ETS(M,A,M) showing that the automated model characterized error of the “Organic_Traffic” data as multiplicative (M), trend as additive (A) and seasonality as multiplicative (M).

automated-exponential-smoothing-plot
Img. 18: Automated exponential smoothing plot

Do you like what you read? Subscribe to my blog below to read the latest posts.

3. Evaluating and using exponential smoothing forecasts

As statistician George Box once wrote:

 “The most that can be expected from any model is that it can supply a useful approximation to reality: All models are wrong; some models are useful”.

While chapter 2 dealt with the creation of different exponential smoothing model for time series forecasting, this chapter focuses on understanding more about the “usefulness” of a model. This involves using suitable input data for the forecast model, understanding the specific mechanics of exponential smoothing and model validation.

3.1 Use suitable time series data for forecasting

Since any exponential smoothing forecast model is based on the data used to calibrate the model, input data is unsurprisingly crucially important.

If you are producing a forecast that includes seasonality, the used input data should ideally contain several repeated seasonality patterns.

In the case of an annual seasonality, which means the monthly values within the year underly seasonal fluctuations, several years worth of data will allow a more balanced calculation of the seasonality component than only one year worth of data. In the latter case, the model would assume the present seasonality of that one year for any future year. This means the model is overly sensitive to the seasonal fluctuations of that one year and the risk that these seasonal fluctuations are not representative of the actual seasonality is higher compared to a model based on several seasonality patterns.

Similarly, modeling the trend component requires a sufficiently long time span of the input data. The additive or multiplicative nature of an underlying trend can be determined with higher confidence the longer the time span of the input data. If the time span of the input data tends to be short (e.g. less than 6 months), a multiplicative trend could be mistaken for an additive trend for instance.

Apart from comprehensive input data (data that spans the aimed forecasted period at least by a high single digit multiple), the input data also needs to be complete without any missing values. If there are missing values in the dataset, those should be filled with a reasonable estimation to fill in any gaps in the data – knowing that a complete dataset is superior in the first place.

3.2 Understand the specifics of the exponential smoothing model

This section addresses and reiterates some of the mechanics of exponential smoothing models the forecaster should be aware of in order to see limitations of such a forecast.

Exponential smoothing models are built off of historical data points in reverse chronological order. This means the further you go back in time with your underlying data, the less this “information” is factored into the model.

In other words, there is by definition a certain recency bias in your model by which more recent observations are factored in more than less recent observations. This is in-particular true for models with high values for the smoothing parameter(s). When forecasting one time period into the future with \(\alpha = 0.8\) the most recent observation of the level is factored in with a weight of 0.8 and the second most recent observation with a weight of 0.16 in accordance to \(Weight = \alpha*(1-\alpha)^t\). This means 96% of that forecasted level value is based on the last two observations only. For \(\alpha = 0.2\) the most recent observation of the level is factored in with a weight of 0.2 and the second most recent observation also with a weight of 0.16, which means that 36% of that forecasted level value is based on the last two observations. Note that the second most recent observation is weighted equally with 0.16 for both \(\alpha = 0.8\) and \(\alpha = 0.2\), which demonstrates the high decay rate as you can see illustrated in image 19.

exponential-decay
Img. 19: Exponential decay

Hence, defining momentum of a time series that has not been recent may not be appropriately reflected in the model, more so with higher smoothing parameters. An extreme example would be a time series of the stock market (S&P 500) from 2007 to 2017. With the financial crisis in 2007/2008, the time series has very strong downward momentum in its first 2 years while the following 8 years generally show an upward momentum. A fitted exponential smoothing model would likely forecast a continuous upward trend from 2017 onward for years to come. This shows the inherent recency bias and the importance to also be aware of multi-year cycles that may apply to a time series. This also refers to the mentioned criterion for forecasting suitability from the introduction: “the forecasted metric is relatively stable and does not underly extreme fluctuations”.

3.3 Validate the forecasting model

In order to validate a forecast and to select the most accurate forecasting model, we look at two conditions as part of this post. First, we test the forecasting models against “new” data looking for minimum RMSE (root mean squared error) and second, we check that the residuals of the forecasting models (difference between the observations and the corresponding fitted values) have a mean of zero and are approximately normally distributed. For this section, we will compare the Holt-Winters exponential smoothing model (section 2.3) with the automated exponential smoothing model (section 2.4).

In the previous sections of chapter 2, we used the entire “Organic_Traffic” data from January 2014 to June 2018 to fit a model and plotted its forecast for 10 months. We have used all available data to create a forecast for unknown periods. In this section, we fit a model to a subset of the “Organic_Traffic” data (called training dataset), create a forecast for the remaining “Organic_Traffic” period (called test dataset) and calculate the forecasting error by comparing the original observations with the corresponding fitted values.

The following R code first divides the “Organic_Traffic” data into a training dataset (January 2014 to March 2018) and test dataset (April 2018 to June 2018). It is assumed we are interested in creating a short term three-month forecast (mentioned as one criterion for suitable forecasts in the introduction) and hence equally choose a three-month time window for the test set. Then the Holt-Winters as well as the automated exponential smoothing model are both fitted to the training dataset. Finally, the RMSE forecasting error (more below) for the forecasts against the test dataset is printed.

rmse-for-holt-winters-automated-exponential-smoothing-model
Img. 20: RMSE for Holt-Winters & automated exponential smoothing model

The RMSE (root mean squared error) for the Holt-Winters forecast is 26404.53 and for the automated exponential smoothing forecast 22995.27. The mathematical formula for RMSE is as follows:

\(RMSE = \sqrt{\frac{1}{n}\Sigma_{i=1}^{n}{({f_i -o_i})^2}}\)

The calculation steps are:

  1. Subtract observation value \(o_i\) from corresponding forecasted value \(f_i\) to yield error values.
  2. Square each error value and sum them up to yield the sum of squared errors (SSE).
  3. Divide the sum of squared errors by the number of forecasted periods \(n\).
  4. Take the square root of the previous quotient to yield the RMSE.

The lower RMSE (for the automated exponential smoothing model) indicates a more accurate forecast.

At this point, we want to keep in mind that the RMSE values we looked at are representative of the relationship between our model fitted to the January 2014 to March 2018 training dataset and the April 2018 to June 2018 test dataset. There is potentially a chance of overfitting by which the automated model (with the lower RMSE) demonstrates higher accuracy than the Holt-Winters model for this particular training and test dataset combination but not for other combinations. This can be addressed with time series cross-validation whereby various training datasets are tested against various test datasets. In this post, however, we proceed with our chosen training and test dataset and verify the Holt-Winters and automated exponential smoothing forecasting model further by analyzing the residuals in order to select the more suitable model.

We know at this point that measured by RMSE the automated exponential smoothing model is more accurate. In the following, we will further analyze the residuals. If the mean of the residuals does not equal zero, this means there is still more “information” in the residuals that can be used to improve the model. Plotting a histogram of the residuals shows the distribution of the residuals and potentially reveals some anomalies such as particular outliers or skewed tendencies among the residuals. Given that the residuals mean is near zero, it is not a necessity for the residuals to be approximately normal distributed in order validate a forecasting model.

The following R code calculates the mean of the residuals for the Holt-Winters and automated exponential smoothing model and plots histograms to show the distribution of the residuals.

residuals-means-for-holt-winter-automated-exponential-smoothing-model
Img. 21: Residual means for both models

The residual mean for Holt-Winters is 639.4847 and for the automated exponential smoothing forecast 0.0013154197. This suggests that the automated exponential smoothing model has utilized more of the “information” in the training data for its fitted model than the Holt-Winters method and qualifies it thereby as the more suitable forecasting model. By plotting a histogram of the residuals, we understand the residual distribution and anomalies for the used Holt-Winters method.

histogram-holt-winters-residuals
Img. 22: Histogram of residuals from Holt-Winters model

In the above histogram, we see that the peak of residuals with the most common residual values falls in the range of -20,000 to -10,000 giving the distribution a non-central mode. Also, the distribution is skewed to the right with most residual values being positive, which is also reflected in the positive residual mean. Overall, this histogram provides more color around the non-zero mean of residuals for the Holt-Winters model. Now, we look at the distribution of residuals for the automated exponential smoothing model, which has a near-zero mean.

histogram-automated-model-residuals
Img. 23: Histogram of residuals from automated model

For the near-zero residual mean from the automated model the spread of distributed values is much narrower. Also, the residual distribution is not particularly skewed but balanced between negative and positive values. This histogram of residuals underlines the integrity of the automated exponential smoothing model and validates it as the more suitable forecasting model.

If you like to see the whole R code shared in snippets throughout this post, you can do so here.

4. Conclusion

As eluded to in chapter 1, time series data such as web sessions from organic traffic is very common in the real world. With the use of the statistical software R, time series can be decomposed in its components (error, trend, seasonality and level) in order to de-seasonalize website traffic for instance and understand whether traffic really decreased in a given month beyond seasonal fluctuations.
Chapter 2 introduces different exponential smoothing models. Those models differ from each other in the components of the time series that are modeled – from simple to more complex. Characteristics of each exponential smoothing model as part of the forecast for the “Organic_Traffic” dataset are shown and plotted for 10 future time periods.
Chapter 3 provides practical considerations around using suitable input data for the forecast and the mechanics of exponential smoothing. As part of the forecast validation, the “Organic_Traffic” data is divided in a training and test dataset and both a Holt-Winters as well as an automated exponential smoothing model are fitted to the training dataset. Those models are then compared by their RMSE (root mean squared error) for their forecast against the test dataset. Further validation is provided by analyzing the mean of the residuals and their distribution for both models with the automated exponential smoothing model qualifying as the more suitable forecasting model.

Do you like what you read? Subscribe to my blog below to read the latest posts.