In [1]: import statsmodels.api as sm
In [2]: import numpy as np
In [3]: import pandas
In [4]: data = sm.datasets.sunspots.load()
Right now an annual date series must be datetimes at the end of the year.
In [5]: from datetime import datetime
In [6]: dates = sm.tsa.datetools.dates_from_range('1700', length=len(data.endog))
Make a pandas TimeSeries or DataFrame
In [7]: endog = pandas.TimeSeries(data.endog, index=dates)
and instantiate the model
In [8]: ar_model = sm.tsa.AR(endog, freq='A')
In [9]: pandas_ar_res = ar_model.fit(maxlag=9, method='mle', disp=-1)
Let’s do some out-of-sample prediction
In [10]: pred = pandas_ar_res.predict(start='2005', end='2015')
In [11]: print pred
2005-12-31 20.003293
2006-12-31 24.703991
2007-12-31 20.026129
2008-12-31 23.473652
2009-12-31 30.858570
2010-12-31 61.335450
2011-12-31 87.024687
2012-12-31 91.321249
2013-12-31 79.921624
2014-12-31 60.799522
2015-12-31 40.374882
In [12]: ar_model = sm.tsa.AR(data.endog, dates=dates, freq='A')
In [13]: ar_res = ar_model.fit(maxlag=9, method='mle', disp=-1)
In [14]: pred = ar_res.predict(start='2005', end='2015')
In [15]: print pred
[ 20.00329321 24.70399074 20.02612936 23.47365216 30.85857014
61.33545027 87.02468702 91.32124879 79.92162358 60.79952187
40.37488189]
This just returns a regular array, but since the model has date information attached, you can get the prediction dates in a roundabout way.
In [16]: print ar_res._data.predict_dates
[2005-12-31 00:00:00 2006-12-31 00:00:00 2007-12-31 00:00:00
2008-12-31 00:00:00 2009-12-31 00:00:00 2010-12-31 00:00:00
2011-12-31 00:00:00 2012-12-31 00:00:00 2013-12-31 00:00:00
2014-12-31 00:00:00 2015-12-31 00:00:00]
This attribute only exists if predict has been called. It holds the dates associated with the last call to predict. .. TODO: should this be attached to the results instance?