## Sunday, November 10, 2013

### Cointegration Tests (ADF and Johansen) within R

In pair trading, in addition to correlation, cointegration can be very useful tool to determine which securities to be picked up. In this text, i demonstrate two approaches to test cointegration between two financial time series, two ETFs (EWA and EWC), within R. As you see below chart, in last 5 years, there is a cointegration between EWA and EWC.

In this test, we use linear regression to estimate spread between two securities and then ACF to test if spread is stationary, which in a way also test of cointegration for two securities.
>library("quantmod")  # To get data for symbols
> library("fUnitRoots") # Unit test

## Lets get first data  for EWA and EWC from yahoo finance and extract adjusted close prices
>getSymbols("EWA")
>getSymbols("EWC")
>ewaAdj=unclass(EWA$EWA.Adjusted) >ewcAdj=unclass(EWC$EWC.Adjusted)

## Now lets do linear regression where we assume drift is zero. Since we are not sure which security is dependent and independent, we need to apply following for both case

## EWC is dependent here

## And now lets use adf test on spread (which is actually residuals of regression above step)
> adfTest(reg$residuals, type="nc") Title: Augmented Dickey-Fuller Test Test Results: PARAMETER: Lag Order: 1 STATISTIC: Dickey-Fuller: -1.8082 P VALUE: 0.07148 ## EWA is dependent here this time > reg=lm (ewaAdj~ewcAdj+0) > adfTest(reg$residuals, type="nc")

Title:
Augmented Dickey-Fuller Test

Test Results:
PARAMETER:
Lag Order: 1
STATISTIC:
Dickey-Fuller: -1.7656
P VALUE:
0.07793


We use most negative Dickey-Fuller value (-1.8082 and -1.7656) to choice which regression formula to use. Based on that, We choice EWC is dependent. Within 90% confidence level (p-value is 7%), we can reject null hypothesis (unit root), so we can assume spread (residual) is stationary, therefore there is a cointegration. Below coded a function for this purpose:

cointegrationTestLM_ADF <-function(A, B, startDate) {
cat("Processing stock:",A ," and ", B, " start date:",startDate)

bData=getSymbols(B,from=startDate,auto.assign = FALSE)
N= min(lenA,lenB)
startA=0
startB=0
if (lenA!=N || lenB!=N){
startA=lenA-N+1
startB=lenB-N+1
}
cat("\nIndex start",A,":",startA," Length ",lenA )
cat("\nIndex start",B,":",startB," Length ",lenB)

summary(regA)
summary(regB)

coA <- adfTest(regA$residuals, type="nc") coB=adfTest(regB$residuals, type="nc")

cat("\n",A," p-value",coA@test$p.value," statistics:",coA@test$statistic)
cat("\n",B," p-value",coB@test$p.value," statistics:",coB@test$statistic)

# Lets choice most negative
if (coA@test$statistic < coB@test$statistic){
cat("\nStock ",A, " is dependent on stock ",B)
cat("\np-value",coA@test$p.value," statistics:",coA@test$statistic)
p=coA@test$p.value s=coA@test$statistic
}else {
cat("\n Stock ",B, " is dependent on stock:",A)
cat("\n p-value",coB@test$p.value," statistics:",coB@test$statistic)
p=coB@test$p.value s=coB@test$statistic
}
return(c(s,p))
}

How to run it:
res=cointegrationTestLM_ADF("EWA","EWC",'2007-01-01')
Processing stock: EWA  and  EWC  start date: 2007-01-01
Index start EWA : 0  Length  1731
Index start EWC : 0  Length  1731
EWA  p-value 0.0501857  statistics: -1.948774
EWC  p-value 0.04719164  statistics: -1.981454
Stock  EWC  is dependent on stock: EWA
p-value 0.04719164  statistics: -1.981454

res
-1.98145360    0.04719164


### Johansen Test

As you see above ADF approach has some drawbacks such as:
- Not sure which security is dependent or independent
- Can not test multiple instruments

> library("urca") # For cointegration

> summary(coRes)

######################
# Johansen-Procedure #
######################

Test type: trace statistic , with linear trend

Eigenvalues (lambda):
 0.004881986 0.001200577

Values of teststatistic and critical values of test:

test 10pct  5pct  1pct
r <= 1 |  2.07  6.50  8.18 11.65
r = 0  | 10.51 15.66 17.95 23.52

Eigenvectors, normalised to first column:
(These are the cointegration relations)

Weights W: