Python实现Numpy版本的“指数加权移动平均线”,相当于pandas.ewm.mean()

如何用numpy获得指数加权移动平均线,就像在pandas中一样:

import pandas as pd    
import pandas_datareader as pdr
from datetime import datetime

#declare variables
ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close']
windowSize = 20

#get PANDAS exponential weighted moving average
ewm_pd = pd.DataFrame(ibm).ewm(span=windowSize, min_periods=windowSize).mean().as_matrix()

print(ewm_pd)

使用numpy库如下面这样:

import numpy as np
import pandas_datareader as pdr
from datetime import datetime

# From this post : http://stackoverflow.com/a/40085052/3293881 by @Divakar
def strided_app(a, L, S):  # Window len = L, Stride len/stepsize = S
    nrows = ((a.size - L) // S) + 1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows, L), strides=(S * n, n))

def numpyEWMA(price, windowSize):
    weights = np.exp(np.linspace(-1., 0., windowSize))
    weights /= weights.sum()

    a2D = strided_app(price, windowSize, 1)

    returnArray = np.empty((price.shape[0]))
    returnArray.fill(np.nan)
    for index in (range(a2D.shape[0])):
        returnArray[index + windowSize-1] = np.convolve(weights, a2D[index])[windowSize - 1:-windowSize + 1]
    return np.reshape(returnArray, (-1, 1))

#declare variables
ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close']
windowSize = 20

#get NUMPY exponential weighted moving average
ewma_np = numpyEWMA(ibm, windowSize)

print(ewma_np)

但结果与pandas库得出的结果不一致。

是否有更好的方法可以直接在numpy中计算指数加权移动平均值,并获得与pandas.ewm.mean()完全相同的结果?

在使用pandas处理60000个计算,我得到大约230秒。 我估计在numpy中,时间可以显著减少。

 
python
performance
pandas
numpy
vectorization
3s

推荐解答

numpy_ewma函数:

def numpy_ewma(data, window):
    returnArray = np.empty((data.shape[0]))
    returnArray.fill(np.nan)
    e = data[0]
    alpha = 2 / float(window + 1)
    for s in range(data.shape[0]):
        e =  ((data[s]-e) *alpha ) + e
        returnArray[s] = e
    return returnArray

自定义一个vectorized版本的numpy_ewma函数numpy_ewma_vectorized:

def numpy_ewma_vectorized(data, window):

    alpha = 2 /(window + 1.0)
    alpha_rev = 1-alpha

    scale = 1/alpha_rev
    n = data.shape[0]

    r = np.arange(n)    
    scale_arr = scale**r
    offset = data[0]*alpha_rev**(r+1)
    pw0 = alpha*alpha_rev**(n-1)

    mult = data*pw0*scale_arr
    cumsums = mult.cumsum()
    out = offset + cumsums*scale_arr[::-1]
    return out

为了numpy_ewma_vectorized函数的重用性,自定义numpy_ewma_vectorized_v2函数:

def numpy_ewma_vectorized_v2(data, window):

    alpha = 2 /(window + 1.0)
    alpha_rev = 1-alpha
    n = data.shape[0]

    pows = alpha_rev**(np.arange(n+1))

    scale_arr = 1/pows[:-1]
    offset = data[0]*pows[1:]
    pw0 = alpha*alpha_rev**(n-1)

    mult = data*pw0*scale_arr
    cumsums = mult.cumsum()
    out = offset + cumsums*scale_arr[::-1]
    return out

运行时间测试:

在相同循环函数中处理大数据集,结果如下:

In [97]: data = np.random.randint(2,9,(5000))
    ...: window = 20
    ...: 

In [98]: np.allclose(numpy_ewma(data, window), numpy_ewma_vectorized(data, window))
Out[98]: True

In [99]: np.allclose(numpy_ewma(data, window), numpy_ewma_vectorized_v2(data, window))
Out[99]: True

In [100]: %timeit numpy_ewma(data, window)
100 loops, best of 3: 6.03 ms per loop

In [101]: %timeit numpy_ewma_vectorized(data, window)
1000 loops, best of 3: 665 µs per loop

In [102]: %timeit numpy_ewma_vectorized_v2(data, window)
1000 loops, best of 3: 357 µs per loop

In [103]: 6030/357.0
Out[103]: 16.89075630252101

速度大约提升了17倍。


  nopapp推荐


其它解答 (2)

pandas.DataFrame.ewm — pandas 0.23.4 documentation

DataFrame.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0)[source]Provides exponential weighted functions New in version 0.18.0.Exactly one of center of mass, span, half-life, and alpha must be provided. Allowed values and relationship betwe

  pandas.pydata.org
指数加权移动平均法(EWMA) - 微笑sun - 博客园

加权移动平均法,是对观察值分别给予不同的权数,按不同权数求得移动平均值,并以最后的移动平均值为基础,确定预测值的方法。采用加权移动平均法,是因为观察期的近期观察值对预测值有较大影响,它更能反映近期变化的趋势。指数移动加权平均法,是指各数值的加权系数随时间呈指数式递减,越靠近当前时刻的数值加权系数就越大。指数移动加权平均较传统的平均法来说,一是不需要保存过去所有的数值;而是计算量显著减小。

  cnblogs.com