如何用numpy获得指数加权移动平均线,就像在pandas中一样:
import pandas as pd
import pandas_datareader as pdr
from datetime import datetime
#declare variables
ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close']
windowSize = 20
#get PANDAS exponential weighted moving average
ewm_pd = pd.DataFrame(ibm).ewm(span=windowSize, min_periods=windowSize).mean().as_matrix()
print(ewm_pd)
使用numpy库如下面这样:
import numpy as np
import pandas_datareader as pdr
from datetime import datetime
# From this post : http://stackoverflow.com/a/40085052/3293881 by @Divakar
def strided_app(a, L, S): # Window len = L, Stride len/stepsize = S
nrows = ((a.size - L) // S) + 1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows, L), strides=(S * n, n))
def numpyEWMA(price, windowSize):
weights = np.exp(np.linspace(-1., 0., windowSize))
weights /= weights.sum()
a2D = strided_app(price, windowSize, 1)
returnArray = np.empty((price.shape[0]))
returnArray.fill(np.nan)
for index in (range(a2D.shape[0])):
returnArray[index + windowSize-1] = np.convolve(weights, a2D[index])[windowSize - 1:-windowSize + 1]
return np.reshape(returnArray, (-1, 1))
#declare variables
ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close']
windowSize = 20
#get NUMPY exponential weighted moving average
ewma_np = numpyEWMA(ibm, windowSize)
print(ewma_np)
但结果与pandas库得出的结果不一致。
是否有更好的方法可以直接在numpy中计算指数加权移动平均值,并获得与pandas.ewm.mean()完全相同的结果?
在使用pandas处理60000个计算,我得到大约230秒。 我估计在numpy中,时间可以显著减少。
numpy_ewma函数:
def numpy_ewma(data, window):
returnArray = np.empty((data.shape[0]))
returnArray.fill(np.nan)
e = data[0]
alpha = 2 / float(window + 1)
for s in range(data.shape[0]):
e = ((data[s]-e) *alpha ) + e
returnArray[s] = e
return returnArray
自定义一个vectorized版本的numpy_ewma函数numpy_ewma_vectorized:
def numpy_ewma_vectorized(data, window):
alpha = 2 /(window + 1.0)
alpha_rev = 1-alpha
scale = 1/alpha_rev
n = data.shape[0]
r = np.arange(n)
scale_arr = scale**r
offset = data[0]*alpha_rev**(r+1)
pw0 = alpha*alpha_rev**(n-1)
mult = data*pw0*scale_arr
cumsums = mult.cumsum()
out = offset + cumsums*scale_arr[::-1]
return out
为了numpy_ewma_vectorized函数的重用性,自定义numpy_ewma_vectorized_v2函数:
def numpy_ewma_vectorized_v2(data, window):
alpha = 2 /(window + 1.0)
alpha_rev = 1-alpha
n = data.shape[0]
pows = alpha_rev**(np.arange(n+1))
scale_arr = 1/pows[:-1]
offset = data[0]*pows[1:]
pw0 = alpha*alpha_rev**(n-1)
mult = data*pw0*scale_arr
cumsums = mult.cumsum()
out = offset + cumsums*scale_arr[::-1]
return out
运行时间测试:
在相同循环函数中处理大数据集,结果如下:
In [97]: data = np.random.randint(2,9,(5000))
...: window = 20
...:
In [98]: np.allclose(numpy_ewma(data, window), numpy_ewma_vectorized(data, window))
Out[98]: True
In [99]: np.allclose(numpy_ewma(data, window), numpy_ewma_vectorized_v2(data, window))
Out[99]: True
In [100]: %timeit numpy_ewma(data, window)
100 loops, best of 3: 6.03 ms per loop
In [101]: %timeit numpy_ewma_vectorized(data, window)
1000 loops, best of 3: 665 µs per loop
In [102]: %timeit numpy_ewma_vectorized_v2(data, window)
1000 loops, best of 3: 357 µs per loop
In [103]: 6030/357.0
Out[103]: 16.89075630252101
速度大约提升了17倍。
DataFrame.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0)[source]Provides exponential weighted functions New in version 0.18.0.Exactly one of center of mass, span, half-life, and alpha must be provided. Allowed values and relationship betwe
加权移动平均法,是对观察值分别给予不同的权数,按不同权数求得移动平均值,并以最后的移动平均值为基础,确定预测值的方法。采用加权移动平均法,是因为观察期的近期观察值对预测值有较大影响,它更能反映近期变化的趋势。指数移动加权平均法,是指各数值的加权系数随时间呈指数式递减,越靠近当前时刻的数值加权系数就越大。指数移动加权平均较传统的平均法来说,一是不需要保存过去所有的数值;而是计算量显著减小。