Skip to content

2_ARIMA模型与GARCH模型

ChannelCMT edited this page Jun 25, 2019 · 2 revisions

ARIMA模型与GARCH模型

目录

  1. 什么是AR模型?
  2. 什么是MA模型?
  3. ARIMA模型案例
  4. 什么是ARCH?
  5. GARCH模型案例

什么是AR模型?

AR(p)模型试图捕捉(解释)动量和均值回归效果。

AR模型,也就是自回归。

自回归很容易理解,也就是自己对自己以前的值做回归。最简单的AR(1),也就是自己对自己上一个时间点的值做回归。$\epsilon_t$为误差项。

下面我们模拟一段AR(1)的序列,我们设定beta为0.5, -0.5, 0.9, -0.9。

import numpy as np
import matplotlib.pyplot as plt
e = np.random.normal(0, 1, 100)
beta = [0.5, -0.5, 0.9, -0.9]
for n, j in enumerate(beta):
    ar1=np.zeros(101)
    ax=plt.subplot(2, 2, n+1)
    for i in np.arange(100):
        ar1[i+1]=j*ar1[i]+e[i]
    ax.plot(ar1, label=j)
    plt.legend(loc='upper left')
plt.show()

什么是MA模型?

MA(q)模型试图捕捉(解释)冲击影响白噪声条件。这些冲击效应可以被认为是意外事件影响观测过程,例如情绪恐慌等。

import numpy as np
import matplotlib.pyplot as plt

e = np.random.normal(0,1,100)
theta = [0.5, -0.5, 0.9, -0.9]
for n, j in enumerate(theta):
    ma1=np.zeros(101)
    ax=plt.subplot(2, 2, n+1)
    for i in np.arange(100):
        ma1[i+1] = j*e[i-1]+e[i]
    ax.plot(ma1, label=j)
    plt.legend()
plt.show()

什么是ARIMA模型?

ARMA模型其实就是将AR和MA模型结合起来。看一个ARMA(1,1)模型好了。

ARIMA建模步骤: 1. 检验平稳性,非平稳需要做差分与去趋势的操作。 2. 确定平稳后,需要对模型定阶 3. 评估模型的绩效,对残差进行分析,如果不好需要重新建模 4. 模拟数据并反转化到序列预测

# 导入模块
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.graphics.api import qqplot
import scipy.stats as scs
import matplotlib.pyplot as plt
%matplotlib inline
# 读取数据
data = pd.read_excel('./HFData.xlsx').set_index('datetime')
print(data.close.head())
datetime
2018-06-01 09:30:00    3764.4
2018-06-01 09:35:00    3766.8
2018-06-01 09:40:00    3761.6
2018-06-01 09:45:00    3758.8
2018-06-01 09:50:00    3755.4
Name: close, dtype: float64
# 进行差分
diffClose = data.close.diff().dropna()[-500:]

print(diffClose.head())
datetime
2019-02-14 13:20:00    1.0
2019-02-14 13:25:00    2.6
2019-02-14 13:30:00   -2.6
2019-02-14 13:35:00    6.8
2019-02-14 13:40:00   -2.8
Name: close, dtype: float64
# 观察处理数据后的ACF和PACF
fig = plt.figure(figsize=(12,8))
ax1=fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(diffClose,lags=10,ax=ax1)#自相关系数图
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(diffClose,lags=10,ax=ax2)#偏自相关系数图
plt.show()

from statsmodels.tsa.stattools import adfuller 
#平稳性检验
print(u'ADF 检验p-value结果为: ', adfuller(diffClose)[1])  
#一阶差分后的序列的时序图在均值附近比较平稳的波动, 自相关性有很强的短期相关性, 单位根检验 p值需要小于 0.05 
ADF 检验p-value结果为:  0.0007177360623283547

aic/bic 定阶

import warnings
warnings.filterwarnings("ignore") 

order_a=sm.tsa.arma_order_select_ic(diffClose,max_ar=6,max_ma=4,ic='aic')['aic_min_order']  # AIC
order_b=sm.tsa.arma_order_select_ic(diffClose,max_ar=6,max_ma=4,ic='bic')['bic_min_order']  # BIC
print(order_a)
print (order_b)
(4, 3)
(0, 0)
xLen = 500
predictLag = 50
train=diffClose[-xLen : -predictLag]

Model = sm.tsa.ARMA(train, order=(4,3))
arma = Model.fit(trend='nc', disp=-1)
print(arma.summary())
                              ARMA Model Results                              
==============================================================================
Dep. Variable:                  close   No. Observations:                  450
Model:                     ARMA(4, 3)   Log Likelihood               -1603.887
Method:                       css-mle   S.D. of innovations              8.487
Date:                Mon, 24 Jun 2019   AIC                           3223.773
Time:                        13:33:22   BIC                           3256.647
Sample:                    02-14-2019   HQIC                          3236.730
                         - 02-27-2019                                         
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
ar.L1.close    -1.4240      0.048    -29.398      0.000      -1.519      -1.329
ar.L2.close    -1.4644      0.069    -21.216      0.000      -1.600      -1.329
ar.L3.close    -0.9685      0.069    -14.116      0.000      -1.103      -0.834
ar.L4.close    -0.0207      0.048     -0.431      0.667      -0.115       0.074
ma.L1.close     1.4717      0.016     94.815      0.000       1.441       1.502
ma.L2.close     1.4728      0.036     41.367      0.000       1.403       1.543
ma.L3.close     0.9950      0.030     33.613      0.000       0.937       1.053
                                    Roots                                    
=============================================================================
                 Real           Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1           -0.2321           -0.9735j            1.0008           -0.2873
AR.2           -0.2321           +0.9735j            1.0008            0.2873
AR.3           -1.0658           -0.0000j            1.0658           -0.5000
AR.4          -45.2011           -0.0000j           45.2011           -0.5000
MA.1           -1.0040           -0.0000j            1.0040           -0.5000
MA.2           -0.2381           -0.9718j            1.0005           -0.2882
MA.3           -0.2381           +0.9718j            1.0005            0.2882
-----------------------------------------------------------------------------

计算模型得分

模型得分越接近1越好。

delta = arma.fittedvalues - train
score = 1 - delta.var()/train.var()
print(score)
0.05854440432146335

对残差进行分析

resid =arma.resid
# 观察残差的ACF和PACF,判断ARMA模型是否完全涵盖了模型的自相关性
# 如果完全包含,则残差不应该自相关
# 如果观察到残差自相关,说明ARMA模型的阶数有问题,或者ARMA模型本身有问题
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid, lags=5, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid, lags=5, ax=ax2)
plt.show()

用ARIMA预测

Y = diffClose[-predictLag:]
predict = arma.predict(start=(len(diffClose[-xLen:])-predictLag-1),end=len(diffClose[-xLen:]),dynamic=True)[-predictLag:]

predict.index = Y.index
chart = pd.concat([predict,Y], axis=1, keys=['predict', 'actual'])
print (chart)
                      predict  actual
datetime                             
2019-02-27 14:50:00  0.575431    17.6
2019-02-27 14:55:00  3.145757    -0.4
2019-02-28 09:30:00 -4.403016     9.2
2019-02-28 09:35:00  1.124328   -22.4
2019-02-28 09:40:00  1.788006     9.0
2019-02-28 09:45:00  0.006694     4.6
2019-02-28 09:50:00 -3.625513     3.8
2019-02-28 09:55:00  3.397799     6.2
2019-02-28 10:00:00  0.427204   -10.2
2019-02-28 10:05:00 -2.072739     7.0
2019-02-28 10:10:00 -0.889752     7.6
2019-02-28 10:15:00  3.818065    -0.6
2019-02-28 10:20:00 -2.135249    -6.0
2019-02-28 10:25:00 -1.645845     8.8
2019-02-28 10:30:00  1.791016   -15.6
2019-02-28 10:35:00  1.848672     7.8
2019-02-28 10:40:00 -3.616883     4.0
2019-02-28 10:45:00  0.742666     1.8
2019-02-28 10:50:00  2.411343    -1.6
2019-02-28 10:55:00 -1.056522    -9.4
2019-02-28 11:00:00 -2.670973     3.0
2019-02-28 11:05:00  2.999706    -9.6
2019-02-28 11:10:00  0.613095     3.4
2019-02-28 11:15:00 -2.656938   -13.0
2019-02-28 11:20:00  0.035691    10.6
2019-02-28 11:25:00  3.183961     2.0
2019-02-28 13:00:00 -2.025545    -2.2
2019-02-28 13:05:00 -1.757695     5.6
2019-02-28 13:10:00  2.384596     1.8
2019-02-28 13:15:00  1.074113     8.4
2019-02-28 13:20:00 -3.277104    -2.8
2019-02-28 13:25:00  0.820501     2.6
2019-02-28 13:30:00  2.540816    -6.8
2019-02-28 13:35:00 -1.667895    -1.0
2019-02-28 13:40:00 -2.072420   -15.0
2019-02-28 13:45:00  2.915654     0.0
2019-02-28 13:50:00  0.445719     3.2
2019-02-28 13:55:00 -2.862554   -10.8
2019-02-28 14:00:00  0.642582    12.6
2019-02-28 14:05:00  2.784714    -3.2
2019-02-28 14:10:00 -2.143132     1.2
2019-02-28 14:15:00 -1.589129    -6.6
2019-02-28 14:20:00  2.690855     0.6
2019-02-28 14:25:00  0.513327     5.4
2019-02-28 14:30:00 -3.087861     3.2
2019-02-28 14:35:00  1.072115    -2.4
2019-02-28 14:40:00  2.442187    -8.2
2019-02-28 14:45:00 -2.067569     9.4
2019-02-28 14:50:00 -1.606482    -1.0
2019-02-28 14:55:00  2.927753    -2.2
# 观察预测值与实际值
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(15,8))
ax.plot(chart.predict.values, label='predict')
ax.plot(chart.actual.values, label='actual')
ax.legend(loc='upper left')
plt.show()

# 观察拟合的值与训练值
fig, ax = plt.subplots(figsize=(15,8))
plt.plot(arma.fittedvalues.values)
plt.plot(train[0:].values)
plt.show()

ARMA模型存在严重的波动集群,应考虑异方差模型。

用ARIMA模型进行预测

# 还原价差
transformDiff = data.close.shift(1)+chart.predict
dataPredict = pd.DataFrame({'transformDiff': transformDiff, 'close': data.close.iloc[-xLen:]})
dataPredictPlot = dataPredict.reset_index()
# 观察预测的值
fig, ax = plt.subplots(figsize=(15,8))
plt.plot(dataPredictPlot['close'])
plt.plot(dataPredictPlot['transformDiff'])
plt.show()

作业

根据ARIMA模型对其他时间段的收益率做预测。

什么是ARCH模型?

ARCH模型的基本思想是指在以前信息集下,某一时刻一个噪声的发生是服从正态分布。该正态分布的均值为零,方差是一个随时间变化的量(即为条件异方差)。并且这个随时间变化的方差是过去有限项噪声值平方的线性组合(即为自回归)。这样就构成了自回归条件异方差模型。

# 观察残差平方的自相关性
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid**2, lags=10, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid**2, lags=10, ax=ax2)
plt.show()

构建AR(4)+GARCH(1, 1)模型

from arch import arch_model
garchModle = arch_model(diffClose,mean='ARX',lags=4, vol='Garch',p=1,q=1,dist='t')
res = garchModle.fit(update_freq=5)
print(res.summary())
Iteration:      5,   Func. Count:     64,   Neg. LLF: 1708.047543829128
Iteration:     10,   Func. Count:    126,   Neg. LLF: 1702.7651460135385
Iteration:     15,   Func. Count:    186,   Neg. LLF: 1700.6819142853956
Iteration:     20,   Func. Count:    242,   Neg. LLF: 1700.4243537149878
Optimization terminated successfully.    (Exit mode 0)
            Current function value: 1700.42415845727
            Iterations: 22
            Function evaluations: 264
            Gradient evaluations: 22
                              AR - GARCH Model Results                              
====================================================================================
Dep. Variable:                        close   R-squared:                      -0.002
Mean Model:                              AR   Adj. R-squared:                 -0.011
Vol Model:                            GARCH   Log-Likelihood:               -1700.42
Distribution:      Standardized Student's t   AIC:                           3418.85
Method:                  Maximum Likelihood   BIC:                           3456.71
                                              No. Observations:                  496
Date:                      Mon, Jun 24 2019   Df Residuals:                      487
Time:                              13:39:20   Df Model:                            9
                                  Mean Model                                  
==============================================================================
                  coef    std err          t      P>|t|       95.0% Conf. Int.
------------------------------------------------------------------------------
Const           0.2086      0.316      0.660      0.509      [ -0.411,  0.828]
close[1]   -7.9658e-03  4.582e-02     -0.174      0.862 [-9.777e-02,8.184e-02]
close[2]       -0.0144  4.297e-02     -0.334      0.738 [-9.858e-02,6.985e-02]
close[3]   -7.0037e-03  4.671e-02     -0.150      0.881 [-9.856e-02,8.455e-02]
close[4]   -5.0018e-03  4.573e-02     -0.109      0.913 [-9.463e-02,8.463e-02]
                              Volatility Model                             
===========================================================================
                 coef    std err          t      P>|t|     95.0% Conf. Int.
---------------------------------------------------------------------------
omega          0.7628      0.818      0.932      0.351    [ -0.840,  2.366]
alpha[1]       0.0561  4.240e-02      1.324      0.186 [-2.698e-02,  0.139]
beta[1]        0.9356  4.802e-02     19.484  1.491e-84    [  0.842,  1.030]
                              Distribution                              
========================================================================
                 coef    std err          t      P>|t|  95.0% Conf. Int.
------------------------------------------------------------------------
nu             5.1912      1.480      3.508  4.513e-04 [  2.291,  8.091]
========================================================================

Covariance estimator: robust
forecasts = res.forecast()
garchResid = res.resid
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf((garchResid.dropna()), lags=10, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf((garchResid.dropna()), lags=10, ax=ax2)
plt.show()

fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf((garchResid.dropna()**2), lags=10, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf((garchResid.dropna()**2), lags=10, ax=ax2)
plt.show()

可见该模型残差的方差还有未能解释的波动

滚动预测策略思路

  1. 预测均值,大于0即可持有多头,小于0平仓
  2. 根据条件异方差,过大可减少仓位加大止损止盈
  3. 根据残差波动率来预测异常的波动率
index = diffClose.index
index = diffClose.index
nLen = 50
start_loc = 0
end_loc = np.where(index >= index[-nLen])[0].min()
forecastsDict = {'forecastMean' : [], 'forecastVariance' : [], 'forecastDateTime': []}


for i in range(nLen):
    res = garchModle.fit(first_obs=i, last_obs=i+end_loc, disp='off')
    varPredict = res.forecast(horizon=1).variance
    meanPredict = res.forecast(horizon=1).mean
    fcastTime = varPredict.iloc[i+end_loc-1].name
    forecastsDict['forecastDateTime'].append(fcastTime)
    forecastsDict['forecastVariance'].append(varPredict.loc[fcastTime].values[0])
    forecastsDict['forecastMean'].append(meanPredict.loc[fcastTime].values[0])

forecastsDf = pd.DataFrame(forecastsDict).set_index('forecastDateTime')
print(forecastsDf)
                     forecastMean  forecastVariance
forecastDateTime                                   
2019-02-27 14:45:00      0.375441         97.404606
2019-02-27 14:50:00      0.626383        109.917894
2019-02-27 14:55:00     -0.275001        103.516831
2019-02-28 09:30:00      0.734192        102.582255
2019-02-28 09:35:00     -0.896586        129.534316
2019-02-28 09:40:00      1.004591        127.730149
2019-02-28 09:45:00      0.082879        120.325856
2019-02-28 09:50:00      0.185926        113.517117
2019-02-28 09:55:00      0.213696        108.921339
2019-02-28 10:00:00     -0.280423        108.655401
2019-02-28 10:05:00      0.525741        105.046201
2019-02-28 10:10:00      0.143881        103.008271
2019-02-28 10:15:00      0.031996         96.512156
2019-02-28 10:20:00      0.078959         92.513342
2019-02-28 10:25:00      0.655022         92.481375
2019-02-28 10:30:00     -0.422346        104.298276
2019-02-28 10:35:00      0.669809        101.232775
2019-02-28 10:40:00      0.406357         96.338026
2019-02-28 10:45:00     -0.027869         89.654771
2019-02-28 10:50:00      0.229978         83.373531
2019-02-28 10:55:00      0.126718         84.211979
2019-02-28 11:00:00      0.534856         78.646795
2019-02-28 11:05:00      0.229282         80.620851
2019-02-28 11:10:00      0.539062         75.762255
2019-02-28 11:15:00      0.320032         83.792501
2019-02-28 11:20:00      0.580332         85.669273
2019-02-28 11:25:00      0.475009         79.820854
2019-02-28 13:00:00      0.317949         74.713723
2019-02-28 13:05:00      0.308057         71.176361
2019-02-28 13:10:00      0.353087         66.110586
2019-02-28 13:15:00      0.291773         66.438652
2019-02-28 13:20:00      0.268873         62.159386
2019-02-28 13:25:00      0.350848         57.583946
2019-02-28 13:30:00      0.530538         57.325015
2019-02-28 13:35:00      0.578680         52.870445
2019-02-28 13:40:00      0.796254         69.226748
2019-02-28 13:45:00      0.755668         64.710037
2019-02-28 13:50:00      0.821036         60.680783
2019-02-28 13:55:00      0.808116         66.831402
2019-02-28 14:00:00      0.358170         73.156217
2019-02-28 14:05:00      0.547716         69.072733
2019-02-28 14:10:00      0.561676         64.494573
2019-02-28 14:15:00      0.601228         64.072003
2019-02-28 14:20:00      0.607006         59.436437
2019-02-28 14:25:00      0.536895         57.205551
2019-02-28 14:30:00      0.626306         53.682997
2019-02-28 14:35:00      0.573439         50.958985
2019-02-28 14:40:00      0.695452         53.773563
2019-02-28 14:45:00      0.404629         56.167642
2019-02-28 14:50:00      0.791904         52.277182
# 求预测残差的标准差
forecastsDf['forecastVolatility'] = np.sqrt(forecastsDf['forecastVariance'])
volPredict = pd.DataFrame({
                           'close': data.close.values[-500:],
                           'residual': res.resid,
                           'forecastVolatility': forecastsDf['forecastVolatility'],
                           'conditional_volatility': res.conditional_volatility
                          })
volChart = volPredict.reset_index()[-300:]
plt.figure(figsize=(12,8))
plt.subplot(311)
plt.plot(volChart['close'], color='b', label='close')
plt.legend()
plt.subplot(312)
plt.plot(volChart['residual'], color='y')
plt.plot(volChart['forecastVolatility'], color='r')
plt.legend()
plt.subplot(313)
plt.plot(volChart['conditional_volatility'], color='g')
plt.legend()
plt.show()

# 模型预测的均值大于0持有多头
maenValues = forecastsDf['forecastMean'].values
plt.figure(figsize=(12,8))
plt.subplot(211)
plt.plot(data.close.iloc[-nLen:].values)
plt.subplot(212)
plt.plot(maenValues, color='b')
plt.hlines(0, 0, nLen, linestyles='dashed')
plt.show()

改进思路

  1. 预测的对象是两个相关性高的价差
  2. 根据条件异方差对策略类型的权重和仓位进行调整
  3. 直接用条件异方差作为因子选股
  4. 针对做低波动率的策略进行预测过滤,防范异常大的残差
Clone this wiki locally