Logo

Using dates with timeseries models

In [1]: import statsmodels.api as sm

In [2]: import numpy as np

In [3]: import pandas

Getting started

In [4]: data = sm.datasets.sunspots.load()

Right now an annual date series must be datetimes at the end of the year.

In [5]: from datetime import datetime

In [6]: dates = sm.tsa.datetools.dates_from_range('1700', length=len(data.endog))

Using Pandas

Make a pandas TimeSeries or DataFrame

In [7]: endog = pandas.TimeSeries(data.endog, index=dates)

and instantiate the model

In [8]: ar_model = sm.tsa.AR(endog, freq='A')

In [9]: pandas_ar_res = ar_model.fit(maxlag=9, method='mle', disp=-1)

Let’s do some out-of-sample prediction

In [10]: pred = pandas_ar_res.predict(start='2005', end='2015')

In [11]: print pred
2005-12-31    20.003279
2006-12-31    24.703990
2007-12-31    20.026125
2008-12-31    23.473652
2009-12-31    30.858577
2010-12-31    61.335458
2011-12-31    87.024706
2012-12-31    91.321273
2013-12-31    79.921651
2014-12-31    60.799541
2015-12-31    40.374888

Using explicit dates

In [12]: ar_model = sm.tsa.AR(data.endog, dates=dates, freq='A')

In [13]: ar_res = ar_model.fit(maxlag=9, method='mle', disp=-1)

In [14]: pred = ar_res.predict(start='2005', end='2015')

In [15]: print pred
[ 20.0032793   24.70399005  20.02612536  23.47365212  30.85857664
  61.33545847  87.02470608  91.32127258  79.92165091  60.79954052
  40.37488761]

This just returns a regular array, but since the model has date information attached, you can get the prediction dates in a roundabout way.

In [16]: print ar_res._data.predict_dates
[2005-12-31 00:00:00 2006-12-31 00:00:00 2007-12-31 00:00:00
 2008-12-31 00:00:00 2009-12-31 00:00:00 2010-12-31 00:00:00
 2011-12-31 00:00:00 2012-12-31 00:00:00 2013-12-31 00:00:00
 2014-12-31 00:00:00 2015-12-31 00:00:00]

This attribute only exists if predict has been called. It holds the dates associated with the last call to predict. .. TODO: should this be attached to the results instance?

Table Of Contents

This Page