-
Notifications
You must be signed in to change notification settings - Fork 54
2_ARIMA模型与GARCH模型
ChannelCMT edited this page Jun 25, 2019
·
2 revisions
- 什么是AR模型?
- 什么是MA模型?
- ARIMA模型案例
- 什么是ARCH?
- GARCH模型案例
AR(p)模型试图捕捉(解释)动量和均值回归效果。
AR模型,也就是自回归。
自回归很容易理解,也就是自己对自己以前的值做回归。最简单的AR(1),也就是自己对自己上一个时间点的值做回归。$\epsilon_t$为误差项。
下面我们模拟一段AR(1)的序列,我们设定beta为0.5, -0.5, 0.9, -0.9。
import numpy as np
import matplotlib.pyplot as plt
e = np.random.normal(0, 1, 100)
beta = [0.5, -0.5, 0.9, -0.9]
for n, j in enumerate(beta):
ar1=np.zeros(101)
ax=plt.subplot(2, 2, n+1)
for i in np.arange(100):
ar1[i+1]=j*ar1[i]+e[i]
ax.plot(ar1, label=j)
plt.legend(loc='upper left')
plt.show()
MA(q)模型试图捕捉(解释)冲击影响白噪声条件。这些冲击效应可以被认为是意外事件影响观测过程,例如情绪恐慌等。
import numpy as np
import matplotlib.pyplot as plt
e = np.random.normal(0,1,100)
theta = [0.5, -0.5, 0.9, -0.9]
for n, j in enumerate(theta):
ma1=np.zeros(101)
ax=plt.subplot(2, 2, n+1)
for i in np.arange(100):
ma1[i+1] = j*e[i-1]+e[i]
ax.plot(ma1, label=j)
plt.legend()
plt.show()
ARMA模型其实就是将AR和MA模型结合起来。看一个ARMA(1,1)模型好了。
ARIMA建模步骤: 1. 检验平稳性,非平稳需要做差分与去趋势的操作。 2. 确定平稳后,需要对模型定阶 3. 评估模型的绩效,对残差进行分析,如果不好需要重新建模 4. 模拟数据并反转化到序列预测
# 导入模块
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.graphics.api import qqplot
import scipy.stats as scs
import matplotlib.pyplot as plt
%matplotlib inline
# 读取数据
data = pd.read_excel('./HFData.xlsx').set_index('datetime')
print(data.close.head())
datetime
2018-06-01 09:30:00 3764.4
2018-06-01 09:35:00 3766.8
2018-06-01 09:40:00 3761.6
2018-06-01 09:45:00 3758.8
2018-06-01 09:50:00 3755.4
Name: close, dtype: float64
# 进行差分
diffClose = data.close.diff().dropna()[-500:]
print(diffClose.head())
datetime
2019-02-14 13:20:00 1.0
2019-02-14 13:25:00 2.6
2019-02-14 13:30:00 -2.6
2019-02-14 13:35:00 6.8
2019-02-14 13:40:00 -2.8
Name: close, dtype: float64
# 观察处理数据后的ACF和PACF
fig = plt.figure(figsize=(12,8))
ax1=fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(diffClose,lags=10,ax=ax1)#自相关系数图
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(diffClose,lags=10,ax=ax2)#偏自相关系数图
plt.show()
from statsmodels.tsa.stattools import adfuller
#平稳性检验
print(u'ADF 检验p-value结果为: ', adfuller(diffClose)[1])
#一阶差分后的序列的时序图在均值附近比较平稳的波动, 自相关性有很强的短期相关性, 单位根检验 p值需要小于 0.05
ADF 检验p-value结果为: 0.0007177360623283547
import warnings
warnings.filterwarnings("ignore")
order_a=sm.tsa.arma_order_select_ic(diffClose,max_ar=6,max_ma=4,ic='aic')['aic_min_order'] # AIC
order_b=sm.tsa.arma_order_select_ic(diffClose,max_ar=6,max_ma=4,ic='bic')['bic_min_order'] # BIC
print(order_a)
print (order_b)
(4, 3)
(0, 0)
xLen = 500
predictLag = 50
train=diffClose[-xLen : -predictLag]
Model = sm.tsa.ARMA(train, order=(4,3))
arma = Model.fit(trend='nc', disp=-1)
print(arma.summary())
ARMA Model Results
==============================================================================
Dep. Variable: close No. Observations: 450
Model: ARMA(4, 3) Log Likelihood -1603.887
Method: css-mle S.D. of innovations 8.487
Date: Mon, 24 Jun 2019 AIC 3223.773
Time: 13:33:22 BIC 3256.647
Sample: 02-14-2019 HQIC 3236.730
- 02-27-2019
===============================================================================
coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------
ar.L1.close -1.4240 0.048 -29.398 0.000 -1.519 -1.329
ar.L2.close -1.4644 0.069 -21.216 0.000 -1.600 -1.329
ar.L3.close -0.9685 0.069 -14.116 0.000 -1.103 -0.834
ar.L4.close -0.0207 0.048 -0.431 0.667 -0.115 0.074
ma.L1.close 1.4717 0.016 94.815 0.000 1.441 1.502
ma.L2.close 1.4728 0.036 41.367 0.000 1.403 1.543
ma.L3.close 0.9950 0.030 33.613 0.000 0.937 1.053
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 -0.2321 -0.9735j 1.0008 -0.2873
AR.2 -0.2321 +0.9735j 1.0008 0.2873
AR.3 -1.0658 -0.0000j 1.0658 -0.5000
AR.4 -45.2011 -0.0000j 45.2011 -0.5000
MA.1 -1.0040 -0.0000j 1.0040 -0.5000
MA.2 -0.2381 -0.9718j 1.0005 -0.2882
MA.3 -0.2381 +0.9718j 1.0005 0.2882
-----------------------------------------------------------------------------
模型得分越接近1越好。
delta = arma.fittedvalues - train
score = 1 - delta.var()/train.var()
print(score)
0.05854440432146335
对残差进行分析
resid =arma.resid
# 观察残差的ACF和PACF,判断ARMA模型是否完全涵盖了模型的自相关性
# 如果完全包含,则残差不应该自相关
# 如果观察到残差自相关,说明ARMA模型的阶数有问题,或者ARMA模型本身有问题
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid, lags=5, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid, lags=5, ax=ax2)
plt.show()
Y = diffClose[-predictLag:]
predict = arma.predict(start=(len(diffClose[-xLen:])-predictLag-1),end=len(diffClose[-xLen:]),dynamic=True)[-predictLag:]
predict.index = Y.index
chart = pd.concat([predict,Y], axis=1, keys=['predict', 'actual'])
print (chart)
predict actual
datetime
2019-02-27 14:50:00 0.575431 17.6
2019-02-27 14:55:00 3.145757 -0.4
2019-02-28 09:30:00 -4.403016 9.2
2019-02-28 09:35:00 1.124328 -22.4
2019-02-28 09:40:00 1.788006 9.0
2019-02-28 09:45:00 0.006694 4.6
2019-02-28 09:50:00 -3.625513 3.8
2019-02-28 09:55:00 3.397799 6.2
2019-02-28 10:00:00 0.427204 -10.2
2019-02-28 10:05:00 -2.072739 7.0
2019-02-28 10:10:00 -0.889752 7.6
2019-02-28 10:15:00 3.818065 -0.6
2019-02-28 10:20:00 -2.135249 -6.0
2019-02-28 10:25:00 -1.645845 8.8
2019-02-28 10:30:00 1.791016 -15.6
2019-02-28 10:35:00 1.848672 7.8
2019-02-28 10:40:00 -3.616883 4.0
2019-02-28 10:45:00 0.742666 1.8
2019-02-28 10:50:00 2.411343 -1.6
2019-02-28 10:55:00 -1.056522 -9.4
2019-02-28 11:00:00 -2.670973 3.0
2019-02-28 11:05:00 2.999706 -9.6
2019-02-28 11:10:00 0.613095 3.4
2019-02-28 11:15:00 -2.656938 -13.0
2019-02-28 11:20:00 0.035691 10.6
2019-02-28 11:25:00 3.183961 2.0
2019-02-28 13:00:00 -2.025545 -2.2
2019-02-28 13:05:00 -1.757695 5.6
2019-02-28 13:10:00 2.384596 1.8
2019-02-28 13:15:00 1.074113 8.4
2019-02-28 13:20:00 -3.277104 -2.8
2019-02-28 13:25:00 0.820501 2.6
2019-02-28 13:30:00 2.540816 -6.8
2019-02-28 13:35:00 -1.667895 -1.0
2019-02-28 13:40:00 -2.072420 -15.0
2019-02-28 13:45:00 2.915654 0.0
2019-02-28 13:50:00 0.445719 3.2
2019-02-28 13:55:00 -2.862554 -10.8
2019-02-28 14:00:00 0.642582 12.6
2019-02-28 14:05:00 2.784714 -3.2
2019-02-28 14:10:00 -2.143132 1.2
2019-02-28 14:15:00 -1.589129 -6.6
2019-02-28 14:20:00 2.690855 0.6
2019-02-28 14:25:00 0.513327 5.4
2019-02-28 14:30:00 -3.087861 3.2
2019-02-28 14:35:00 1.072115 -2.4
2019-02-28 14:40:00 2.442187 -8.2
2019-02-28 14:45:00 -2.067569 9.4
2019-02-28 14:50:00 -1.606482 -1.0
2019-02-28 14:55:00 2.927753 -2.2
# 观察预测值与实际值
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(15,8))
ax.plot(chart.predict.values, label='predict')
ax.plot(chart.actual.values, label='actual')
ax.legend(loc='upper left')
plt.show()
# 观察拟合的值与训练值
fig, ax = plt.subplots(figsize=(15,8))
plt.plot(arma.fittedvalues.values)
plt.plot(train[0:].values)
plt.show()
ARMA模型存在严重的波动集群,应考虑异方差模型。
# 还原价差
transformDiff = data.close.shift(1)+chart.predict
dataPredict = pd.DataFrame({'transformDiff': transformDiff, 'close': data.close.iloc[-xLen:]})
dataPredictPlot = dataPredict.reset_index()
# 观察预测的值
fig, ax = plt.subplots(figsize=(15,8))
plt.plot(dataPredictPlot['close'])
plt.plot(dataPredictPlot['transformDiff'])
plt.show()
根据ARIMA模型对其他时间段的收益率做预测。
ARCH模型的基本思想是指在以前信息集下,某一时刻一个噪声的发生是服从正态分布。该正态分布的均值为零,方差是一个随时间变化的量(即为条件异方差)。并且这个随时间变化的方差是过去有限项噪声值平方的线性组合(即为自回归)。这样就构成了自回归条件异方差模型。
# 观察残差平方的自相关性
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid**2, lags=10, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid**2, lags=10, ax=ax2)
plt.show()
from arch import arch_model
garchModle = arch_model(diffClose,mean='ARX',lags=4, vol='Garch',p=1,q=1,dist='t')
res = garchModle.fit(update_freq=5)
print(res.summary())
Iteration: 5, Func. Count: 64, Neg. LLF: 1708.047543829128
Iteration: 10, Func. Count: 126, Neg. LLF: 1702.7651460135385
Iteration: 15, Func. Count: 186, Neg. LLF: 1700.6819142853956
Iteration: 20, Func. Count: 242, Neg. LLF: 1700.4243537149878
Optimization terminated successfully. (Exit mode 0)
Current function value: 1700.42415845727
Iterations: 22
Function evaluations: 264
Gradient evaluations: 22
AR - GARCH Model Results
====================================================================================
Dep. Variable: close R-squared: -0.002
Mean Model: AR Adj. R-squared: -0.011
Vol Model: GARCH Log-Likelihood: -1700.42
Distribution: Standardized Student's t AIC: 3418.85
Method: Maximum Likelihood BIC: 3456.71
No. Observations: 496
Date: Mon, Jun 24 2019 Df Residuals: 487
Time: 13:39:20 Df Model: 9
Mean Model
==============================================================================
coef std err t P>|t| 95.0% Conf. Int.
------------------------------------------------------------------------------
Const 0.2086 0.316 0.660 0.509 [ -0.411, 0.828]
close[1] -7.9658e-03 4.582e-02 -0.174 0.862 [-9.777e-02,8.184e-02]
close[2] -0.0144 4.297e-02 -0.334 0.738 [-9.858e-02,6.985e-02]
close[3] -7.0037e-03 4.671e-02 -0.150 0.881 [-9.856e-02,8.455e-02]
close[4] -5.0018e-03 4.573e-02 -0.109 0.913 [-9.463e-02,8.463e-02]
Volatility Model
===========================================================================
coef std err t P>|t| 95.0% Conf. Int.
---------------------------------------------------------------------------
omega 0.7628 0.818 0.932 0.351 [ -0.840, 2.366]
alpha[1] 0.0561 4.240e-02 1.324 0.186 [-2.698e-02, 0.139]
beta[1] 0.9356 4.802e-02 19.484 1.491e-84 [ 0.842, 1.030]
Distribution
========================================================================
coef std err t P>|t| 95.0% Conf. Int.
------------------------------------------------------------------------
nu 5.1912 1.480 3.508 4.513e-04 [ 2.291, 8.091]
========================================================================
Covariance estimator: robust
forecasts = res.forecast()
garchResid = res.resid
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf((garchResid.dropna()), lags=10, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf((garchResid.dropna()), lags=10, ax=ax2)
plt.show()
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf((garchResid.dropna()**2), lags=10, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf((garchResid.dropna()**2), lags=10, ax=ax2)
plt.show()
可见该模型残差的方差还有未能解释的波动
- 预测均值,大于0即可持有多头,小于0平仓
- 根据条件异方差,过大可减少仓位加大止损止盈
- 根据残差波动率来预测异常的波动率
index = diffClose.index
index = diffClose.index
nLen = 50
start_loc = 0
end_loc = np.where(index >= index[-nLen])[0].min()
forecastsDict = {'forecastMean' : [], 'forecastVariance' : [], 'forecastDateTime': []}
for i in range(nLen):
res = garchModle.fit(first_obs=i, last_obs=i+end_loc, disp='off')
varPredict = res.forecast(horizon=1).variance
meanPredict = res.forecast(horizon=1).mean
fcastTime = varPredict.iloc[i+end_loc-1].name
forecastsDict['forecastDateTime'].append(fcastTime)
forecastsDict['forecastVariance'].append(varPredict.loc[fcastTime].values[0])
forecastsDict['forecastMean'].append(meanPredict.loc[fcastTime].values[0])
forecastsDf = pd.DataFrame(forecastsDict).set_index('forecastDateTime')
print(forecastsDf)
forecastMean forecastVariance
forecastDateTime
2019-02-27 14:45:00 0.375441 97.404606
2019-02-27 14:50:00 0.626383 109.917894
2019-02-27 14:55:00 -0.275001 103.516831
2019-02-28 09:30:00 0.734192 102.582255
2019-02-28 09:35:00 -0.896586 129.534316
2019-02-28 09:40:00 1.004591 127.730149
2019-02-28 09:45:00 0.082879 120.325856
2019-02-28 09:50:00 0.185926 113.517117
2019-02-28 09:55:00 0.213696 108.921339
2019-02-28 10:00:00 -0.280423 108.655401
2019-02-28 10:05:00 0.525741 105.046201
2019-02-28 10:10:00 0.143881 103.008271
2019-02-28 10:15:00 0.031996 96.512156
2019-02-28 10:20:00 0.078959 92.513342
2019-02-28 10:25:00 0.655022 92.481375
2019-02-28 10:30:00 -0.422346 104.298276
2019-02-28 10:35:00 0.669809 101.232775
2019-02-28 10:40:00 0.406357 96.338026
2019-02-28 10:45:00 -0.027869 89.654771
2019-02-28 10:50:00 0.229978 83.373531
2019-02-28 10:55:00 0.126718 84.211979
2019-02-28 11:00:00 0.534856 78.646795
2019-02-28 11:05:00 0.229282 80.620851
2019-02-28 11:10:00 0.539062 75.762255
2019-02-28 11:15:00 0.320032 83.792501
2019-02-28 11:20:00 0.580332 85.669273
2019-02-28 11:25:00 0.475009 79.820854
2019-02-28 13:00:00 0.317949 74.713723
2019-02-28 13:05:00 0.308057 71.176361
2019-02-28 13:10:00 0.353087 66.110586
2019-02-28 13:15:00 0.291773 66.438652
2019-02-28 13:20:00 0.268873 62.159386
2019-02-28 13:25:00 0.350848 57.583946
2019-02-28 13:30:00 0.530538 57.325015
2019-02-28 13:35:00 0.578680 52.870445
2019-02-28 13:40:00 0.796254 69.226748
2019-02-28 13:45:00 0.755668 64.710037
2019-02-28 13:50:00 0.821036 60.680783
2019-02-28 13:55:00 0.808116 66.831402
2019-02-28 14:00:00 0.358170 73.156217
2019-02-28 14:05:00 0.547716 69.072733
2019-02-28 14:10:00 0.561676 64.494573
2019-02-28 14:15:00 0.601228 64.072003
2019-02-28 14:20:00 0.607006 59.436437
2019-02-28 14:25:00 0.536895 57.205551
2019-02-28 14:30:00 0.626306 53.682997
2019-02-28 14:35:00 0.573439 50.958985
2019-02-28 14:40:00 0.695452 53.773563
2019-02-28 14:45:00 0.404629 56.167642
2019-02-28 14:50:00 0.791904 52.277182
# 求预测残差的标准差
forecastsDf['forecastVolatility'] = np.sqrt(forecastsDf['forecastVariance'])
volPredict = pd.DataFrame({
'close': data.close.values[-500:],
'residual': res.resid,
'forecastVolatility': forecastsDf['forecastVolatility'],
'conditional_volatility': res.conditional_volatility
})
volChart = volPredict.reset_index()[-300:]
plt.figure(figsize=(12,8))
plt.subplot(311)
plt.plot(volChart['close'], color='b', label='close')
plt.legend()
plt.subplot(312)
plt.plot(volChart['residual'], color='y')
plt.plot(volChart['forecastVolatility'], color='r')
plt.legend()
plt.subplot(313)
plt.plot(volChart['conditional_volatility'], color='g')
plt.legend()
plt.show()
# 模型预测的均值大于0持有多头
maenValues = forecastsDf['forecastMean'].values
plt.figure(figsize=(12,8))
plt.subplot(211)
plt.plot(data.close.iloc[-nLen:].values)
plt.subplot(212)
plt.plot(maenValues, color='b')
plt.hlines(0, 0, nLen, linestyles='dashed')
plt.show()
- 预测的对象是两个相关性高的价差
- 根据条件异方差对策略类型的权重和仓位进行调整
- 直接用条件异方差作为因子选股
- 针对做低波动率的策略进行预测过滤,防范异常大的残差
-
python基础
-
python进阶
-
数据格式处理
-
数据计算与展示
-
因子横截面排序分析
-
信号时间序列分析
-
CTA策略类型
-
附录:因子算法