Sunday, December 29, 2013

Price Spread based Mean Reversion Strategy within R and Python

Below piece of code within R and Python show how to apply basic mean reversion strategy based on price spread (also log price spread) for Gold and USD Oil ETFs.

#
# R code
#
# load price data of Gold and Usd Oil ETF 
g=read.csv("gold.csv", header=F)
o=read.csv("uso.csv", header=F)

# one month window length
wLen=22 

len=dim(g)[1]
hedgeRatio=matrix(rep(0,len),len)

# to verify if spread is stationary 
adfResP=0
# flag to enable log price
isLogPrice=0
for (t in wLen:len){
  g_w=g[(t-wLen+1):t,1]
  o_w=o[(t-wLen+1):t,1]
  
  if (isLogPrice==1){
    g_w=log(g_w)
    o_w=log(o_w)
  }
# linear regression
  reg=lm(o_w~g_w)
# get hedge ratio 
  hedgeRatio[t]=reg$coefficients[2];  
# verify if spread (residual) is stationary 
 adfRes=adf.test(reg$residuals, alternative='stationary')
# sum of p values  
  adfResP=adfResP+adfRes$p.value
}
# estimate mean p value
avgPValue=adfResP/(len-wLen)
# > 0.5261476
# as avg p value (0.5261476) indicates, actually, spread is not stationary, so strategy wont make much return. 


portf=cbind(g,o)
sportf=portf
if (isLogPrice==1){
  sportf=log(portf)
}
# estimate spread of portfolio = oil - headgeRatio*gold
spread=matrix(rowSums(cbind(-1*hedgeRatio,1)*sportf))

plot(spread[,1],type='l')

# trim N/A sections
start=wLen+1
hedgeRatio=hedgeRatio[start:len,1]
portf=portf[start:len,1:2]
spread=matrix(spread[start:len,1])

# negative Z score will be used as number of shares
# runmean and runsd are in caTools package
meanSpread=runmean(spread,wLen,endrule="constant") 
stdSpread=runsd(spread,wLen,endrule="constant")
numUnits=-(spread-meanSpread)/stdSpread #

positions=cbind(numUnits,numUnits)*cbind(-1*hedgeRatio,1)*portf

# daily profit and loss
lagPortf=lags(portf,1)[,3:4]
lagPos=lags(positions,1)[,3:4]
pnl=rowSums(lagPos*(portf-lagPortf)/lagPortf);

# return is P&L divided by gross market value of portfolio
ret=tail(pnl,-1)/rowSums(abs(lagPos))
plot(cumprod(1+ret)-1,type='l')

# annual percentage rate
APR=prod(1+ret)^(252/length(ret)) 
# > 1.032342 
sharpRatio=sqrt(252)*mean(ret)/stdev(ret)
# > 0.3713589

'''

Python code

Created on 29 Dec 2013

@author: deniz turan (denizstij@gmail.com)
'''

import numpy as np
import pandas as pd
from scipy.stats import linregress

o=pd.read_csv("uso.csv",header=0,names=["price"])
g=pd.read_csv("gold.csv",header=0,names=["price"])

len=o.price.count()
wLen=22
hedgeRatio= np.zeros((len,2))

for t in range(wLen, len):
    o_w=o.price[t-wLen:t]
    g_w=g.price[t-wLen:t]

    slope, intercept, r, p, stderr = linregress(g_w, o_w)
    hedgeRatio[t,0]=slope*-1
    hedgeRatio[t,1]=1


portf=np.vstack((g.price,o.price)).T
# spread 
spread=np.sum(np.multiply(portf,hedgeRatio),1)

# negative Z score will be used as number of shares
meanSpread=pd.rolling_mean(spread,wLen); 
stdSpread=pd.rolling_std(spread,wLen); 
numUnits=-(spread-meanSpread)/stdSpread #

#drop NaN values
start=wLen
g=g.drop(g.index[:start])
o=o.drop(o.index[:start])
hedgeRatio=hedgeRatio[start:,]
portf=portf[start:,]
spread=spread[start:,]
# number of units

numUnits=numUnits[start:,]
# position
positions=np.multiply(np.vstack((numUnits,numUnits)).T,np.multiply(portf,hedgeRatio))

# get lag 1
lagPortf=np.roll(portf,1,0);
lagPortf[0,]=lagPortf[1,];
lagPos=np.roll(positions,1,0);
lagPos[0,]=lagPos[1,];

spread=np.sum(np.multiply(portf,hedgeRatio),1)
pnl=np.sum(np.divide(np.multiply(lagPos,(portf-lagPortf)),lagPortf),1)

# return
ret=np.divide(pnl,np.sum(np.abs(lagPos),1))

APR=np.power(np.prod(1+ret),(252/float(np.size(ret,0)))) 
sharpRatio=np.sqrt(252)*float(np.mean(ret))/float(np.std(ret))
print " APR %f, sharpeRatio=%f" %( APR,sharpRatio)

Although, p-value for ADF test, APR (annual percentage rate) and sharpe ratio indicate, this strategy is not profitable, it is very basic strategy to apply.

Tuesday, December 03, 2013

Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) within R

Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) is a kind of ARMA (p,q) model to represent volatility:
$$ \sigma_t^2=\alpha_0 + \alpha_1 \epsilon_{t-1}^2 + \cdots + \alpha_q \epsilon_{t-q}^2 + \beta_1 \sigma_{t-1}^2 + \cdots + \beta_p\sigma_{t-p}^2 = \alpha_0 + \sum_{i=1}^q \alpha_i \epsilon_{t-i}^2 + \sum_{i=1}^p \beta_i \sigma_{t-i}^2 $$
where $\epsilon$ is a i.i.d random variable generally from a normal N(0,1) or student distribution and $\alpha_0 >0 , \alpha_t ≥ 0 , \beta_t ≥0, \alpha_t + \beta_t <1 $. In practice, generally low order of GARCH models are used in many application such as GARCH(1,1), GARCH (1,2), GARCH (2,1).

By utilising GARCH (1,1), to forecaste variance in k step ahead, following formula can be used:
$$ \sigma_t^2 (k)=\frac{\alpha_0}{1-\alpha_1 - \beta_1} ; k \to \infty$$


Lets have a look how to apply GARCH over monthly return of SP 500 from 1926 within R.
>library("fGarch") ## Library for GARCH
# get data
>data=read.table("http://www.mif.vu.lt/~rlapinskas/DUOMENYS/Tsay_fts3/sp500.dat",header=F)
> dim(data)
[1] 792   1
## First step is to model monthly return.  
>data=read.table("http://www.mif.vu.lt/~rlapinskas/DUOMENYS/Tsay_fts3/m-intc7308.txt",header=T)
# lets find out lag 
>ret=pacf(data)
> which.max(abs(ret$acf))
[1] 3
## Lets use ARMA(3,0) and GARCH(1,1) to model return 
> m1=garchFit(~arma(3,0)+garch(1,1),data=data,trace=F)
> summary(m1)

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~arma(3, 0) + garch(1, 1), data = data, trace = F) 

Mean and Variance Equation:
 data ~ arma(3, 0) + garch(1, 1)

 [data = data]

Conditional Distribution:
 norm 

Coefficient(s):
         mu          ar1          ar2          ar3        omega       alpha1  
 7.7077e-03   3.1968e-02  -3.0261e-02  -1.0649e-02   7.9746e-05   1.2425e-01  
      beta1  
 8.5302e-01  

Std. Errors:
 based on Hessian 

Error Analysis:
         Estimate  Std. Error  t value Pr(>|t|)    
mu      7.708e-03   1.607e-03    4.798 1.61e-06 ***
ar1     3.197e-02   3.837e-02    0.833  0.40473    
ar2    -3.026e-02   3.841e-02   -0.788  0.43076    
ar3    -1.065e-02   3.756e-02   -0.284  0.77677    
omega   7.975e-05   2.810e-05    2.838  0.00454 ** 
alpha1  1.242e-01   2.247e-02    5.529 3.22e-08 ***
beta1   8.530e-01   2.183e-02   39.075  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So, monthly return can be modeled as:
$$ r_t= 0.0078+0.032r_{t-1}-0.03r_{t-2}-0.01r_{t-3} + a_t$$ $$ \sigma_t^2=0.000080 + 0.12\alpha_{t-1}+0.85(\sigma_{t-1})^2$$
But since as t values of ARMA models are insignificant, we can use directly GARCH(1,1) to model it
> m2=garchFit(~garch(1,1),data=data,trace=F)
> summary(m2)

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~garch(1, 1), data = data, trace = F) 

Mean and Variance Equation:
 data ~ garch(1, 1)

 [data = data]

Conditional Distribution:
 norm 

Coefficient(s):
        mu       omega      alpha1       beta1  
7.4497e-03  8.0615e-05  1.2198e-01  8.5436e-01  

Std. Errors:
 based on Hessian 

Error Analysis:
        Estimate  Std. Error  t value Pr(>|t|)    
mu     7.450e-03   1.538e-03    4.845 1.27e-06 ***
omega  8.061e-05   2.833e-05    2.845  0.00444 ** 
alpha1 1.220e-01   2.202e-02    5.540 3.02e-08 ***
beta1  8.544e-01   2.175e-02   39.276  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So we can reduce model to more simpler form such as:
$$ r_t= 0.0074+a_t$$ $$ \sigma_t^2=0.000080 + 0.12a{t-1}+0.85(\sigma_{t-1})^2$$
Hence, variance of $a_t$ is:
$$ \frac{0.000080}{1-0.12 - 0.85} = 0.0027$$
Lets analyse if residuals are serially independent.
>stress=residuals(m2,standardize=T)
> Box.test(stress,12,type='Ljung')

 Box-Ljung test

data:  stress
X-squared = 11.9994, df = 12, p-value = 0.4457
So, above test indicates that there is not significant serial correlation in residuals, so we can conclude GARCH(1,1) is a good model for SP 500. We can also predict next 5 volatility of monthly return of SP 500 as:
> predict(m2,5)
  meanForecast  meanError standardDeviation
1  0.007449721 0.05377242        0.05377242
2  0.007449721 0.05388567        0.05388567
3  0.007449721 0.05399601        0.05399601
4  0.007449721 0.05410353        0.05410353
5  0.007449721 0.05420829        0.05420829