Sunday, January 19, 2014

Mean reversion with Linear Regression and Bollinger Band for Spread Trading within Python

Following code demonstrates how to utilize to linear regression to estimate hedge ratio and Bollinger band for spread trading. The code can be back tested at Quantopian.com
#   Mean reversion Spread Trading  with Linear Regression
#
#   Deniz Turan, (denizstij AT gmail DOT com), 19-Jan-2014
import numpy as np
from scipy.stats import linregress

R_P = 1 # refresh period in days
W_L = 30 # window length in days
def initialize(context):
    context.y=sid(14517) # EWC
    context.x=sid(14516) # EWA
    
    
    # for long and shorting 
    context.max_notional = 1000000
    context.min_notional = -1000000.0
    # set a fixed slippage
    set_slippage(slippage.FixedSlippage(spread=0.01))
        
    context.long=False;
    context.short=False;
    
    
def handle_data(context, data):
    xpx=data[context.x].price
    ypx=data[context.y].price
    
    retVal=linearRegression(data,context)    
    # lets dont do anything if we dont have enough data yet    
    if retVal is None:
        return  None 
    
    hedgeRatio,intercept=retVal;
    spread=ypx-hedgeRatio*xpx      
    data[context.y]['spread'] = spread

    record(ypx=ypx,spread=spread,xpx=xpx)

    # find moving average 
    rVal=getMeanStd(data, context)       
    # lets dont do anything if we dont have enough data yet    
    if rVal is None:
        return   
    
    meanSpread,stdSpread = rVal
    # zScore is the number of unit
    zScore=(spread-meanSpread)/stdSpread;
    QTY=1000
    qtyX=-hedgeRatio*QTY*xpx;        
    qtyY=QTY*ypx;        

    entryZscore=1;
    exitZscore=0;

    if zScore < -entryZscore and canEnterLong(context):
        # enter long the spread
        order(context.y, qtyY)
        order(context.x, qtyX)
        context.long=True
        context.short=False    
 
    if zScore > entryZscore and canEnterShort(context):
        #  enter short the spread
        order(context.y, -qtyY)
        order(context.x, -qtyX)
        context.short=True
        context.long=False
        
    record(cash=context.portfolio.cash, stock=context.portfolio.positions_value)
    
@batch_transform(window_length=W_L, refresh_period=R_P) 
def linearRegression(datapanel, context):
    xpx = datapanel['price'][context.x]
    ypx = datapanel['price'][context.y]

    beta, intercept, r, p, stderr = linregress(ypx, xpx)
#    record(beta=beta, intercept=intercept)
    return (beta, intercept)
        
@batch_transform(window_length=W_L, refresh_period=R_P) 
def getMeanStd(datapanel, context):    

    spread = datapanel['spread'][context.y]
    meanSpread=spread.mean()
    stdSpread=spread.std()
    if meanSpread is not None and stdSpread is not None :
        return (meanSpread, stdSpread)
    else:
        return None

def canEnterLong(context):
    notional=context.portfolio.positions_value

    if notional < context.max_notional and not context.long: # and not context.short:
        return True
    else:
        return False

def canEnterShort(context):
    notional=context.portfolio.positions_value

    if notional > context.max_notional and not context.short:  #and not context.short:
        return True
    else:
        return False

Mean reversion with Kalman Filter as Dynamic Linear Regression for Spread Trading within Python

Following code demonstrates how to utilize to kalman filter to estimate hedge ratio for spread trading. The code can be back tested at Quantopian.com
#   Mean reversion with Kalman Filter as Dynamic Linear Regression
#
#   Following algorithm trades based on mean reversion logic of spread
#   between cointegrated securities  by using Kalman Filter as 
#   Dynamic Linear Regression. Kalman filter is used here to estimate hedge (beta)
#
#   Kalman Filter structure 
# 
# - measurement equation (linear regression):
#   y= beta*x+err  # err is a guassian noise 
#  
# - Prediction model:
#   beta(t) = beta(t-1) + w(t-1) # w is a guassian noise
#   Beta is here our hedge unit.
# 
# - Prediction section
#   beta_hat(t|t-1)=beta_hat(t-1|t-1)  # beta_hat is expected value of beta
#   P(t|t-1)=P(t-1|t-1) + V_w          # prediction error, which is cov(beta-beta_hat)
#   y_hat(t)=beta_hat(t|t-1)*x(t)      # measurement prediction
#   err(t)=y(t)-y_hat(t)                 # forecast error
#   Q(t)=x(t)'*P(t|t-1)*x(t) + V_e     # variance of forecast error, var(err(t))
#
# - Update section
#   K(t)=R(t|t-1)*x(t)/Q(t)                       # Kalman filter between 0 and 1
#   beta_hat(t|t)=beta_hat(t|t-1)+ K*err(t)       # State update
#   P(t|t)=P(t|t-1)(1-K*x(t))                     # State covariance update
#   
#   Deniz Turan, (denizstij AT gmail DOT com), 19-Jan-2014
#   
import numpy as np

# Initialization logic 
def initialize(context):
    context.x=sid(14517) # EWC
    context.y=sid(14516) # EWA
    
    # for long and shorting 
    context.max_notional = 1000000
    context.min_notional = -1000000.0
    # set a fixed slippage
    set_slippage(slippage.FixedSlippage(spread=0.01))
    
    # between 0 and 1 where 1 means fastes change in beta, 
    #whereas small values indicates liniar regression
    
    delta = 0.0001 
    context.Vw=delta/(1-delta)*np.eye(2);
    # default peridiction error variance
    context.Ve=0.001;

    # beta, holds slope and intersection
    context.beta=np.zeros((2,1));    
    context.postBeta=np.zeros((2,1));   # previous beta
    
    
    # covariance of error between projected beta and  beta
    # cov (beta-priorBeta) = E[(beta-priorBeta)(beta-priorBeta)']
    context.P=np.zeros((2,2));
    context.priorP=np.ones((2,2));    
    
    context.started=False;
    context.warmupPeriod=3
    context.warmupCount=0
    
    context.long=False;
    context.short=False;
     
# Will be called on every trade event for the securities specified. 
def handle_data(context, data):
    ##########################################
    # Prediction 
    ##########################################    
    if context.started:    
        # state prediction 
        context.beta=context.postBeta;
        #prior P prediction 
        context.priorP=context.P+context.Vw
    else:        
        context.started=True;
    
    
    xpx=np.mat([[1,data[context.x].price]])
    ypx=data[context.y].price
    
    # projected y
    yhat=np.dot(xpx,context.beta)[0,0]    
    # prediction error
    err=(ypx-yhat);
    # variance of err, var(err)
    Q=(np.dot(np.dot(xpx,context.priorP),xpx.T)+context.Ve)[0,0]

    # Kalman gain
    K=(np.dot(context.priorP,xpx.T)/Q)[0,0]
    
    ##########################################
    # Update section
    ##########################################    
    context.postBeta=context.beta + np.dot(K,err)

    context.warmupCount+=1
    if context.warmupPeriod > context.warmupCount:
        return
    
    #order(sid(24), 50)
    message='started: {st}, xprice: {xpx}, yprice: {ypx},\
            yhat:{yhat} beta: {b}, postBeta: {pBeta} err: {e}, Q: {Q}, K: {K}'
    message= message.format(st=context.started,xpx=xpx,ypx=ypx,\
                            yhat=yhat, b=context.beta, \
                            pBeta=context.postBeta, e=err, Q=Q, K=K)     
    log.info(message)  
   
#    record(xpx=data[context.x].price, ypx=data[context.y].price,err=err, yhat=yhat, beta=context.beta[1,0])
    ##########################################
    # Trading section
    # Spread (y-beta*x) is traded
    ##########################################    

    QTY=1000
    qtyX=-context.beta[1,0]*xpx[0,1]*QTY;        
    qtyY=ypx*QTY;        

    # similar to zscore in bollinger band 
    stdQ=np.sqrt(Q)

    if err < -stdQ and canEnterLong(context):
        # enter long the spread
        order(context.y, qtyY)
        order(context.x, qtyX)
        context.long=True
        
    if err > -stdQ and canExitLong(context):
        # exit long the spread
        order(context.y, -qtyY)
        order(context.x, -qtyX) 
        context.long=False        
 
    if err > stdQ and canEnterShort(context):
        #  enter short the spread
        order(context.y, -qtyY)
        order(context.x, -qtyX)
        context.short=True
    
    if err < stdQ and canExitShort(context):
        # exit short the spread
        order(context.y,qtyY)
        order(context.x,qtyX) 
        context.short=False
    
    record(cash=context.portfolio.cash, stock=context.portfolio.positions_value)

def canEnterLong(context):
    notional=context.portfolio.positions_value

    if notional < context.max_notional \
       and not context.long and not context.short:
        return True
    else:
        return False

def canExitLong(context):
    if context.long and not context.short:
        return True
    else:
        return False
    
def canEnterShort(context):
    notional=context.portfolio.positions_value

    if notional > context.max_notional \
       and not context.long and not context.short:
        return True
    else:
        return False

def canExitShort(context):
    if  context.short and not  context.long:
        return True
    else:
        return False

Sunday, December 29, 2013

Price Spread based Mean Reversion Strategy within R and Python

Below piece of code within R and Python show how to apply basic mean reversion strategy based on price spread (also log price spread) for Gold and USD Oil ETFs.

#
# R code
#
# load price data of Gold and Usd Oil ETF 
g=read.csv("gold.csv", header=F)
o=read.csv("uso.csv", header=F)

# one month window length
wLen=22 

len=dim(g)[1]
hedgeRatio=matrix(rep(0,len),len)

# to verify if spread is stationary 
adfResP=0
# flag to enable log price
isLogPrice=0
for (t in wLen:len){
  g_w=g[(t-wLen+1):t,1]
  o_w=o[(t-wLen+1):t,1]
  
  if (isLogPrice==1){
    g_w=log(g_w)
    o_w=log(o_w)
  }
# linear regression
  reg=lm(o_w~g_w)
# get hedge ratio 
  hedgeRatio[t]=reg$coefficients[2];  
# verify if spread (residual) is stationary 
 adfRes=adf.test(reg$residuals, alternative='stationary')
# sum of p values  
  adfResP=adfResP+adfRes$p.value
}
# estimate mean p value
avgPValue=adfResP/(len-wLen)
# > 0.5261476
# as avg p value (0.5261476) indicates, actually, spread is not stationary, so strategy wont make much return. 


portf=cbind(g,o)
sportf=portf
if (isLogPrice==1){
  sportf=log(portf)
}
# estimate spread of portfolio = oil - headgeRatio*gold
spread=matrix(rowSums(cbind(-1*hedgeRatio,1)*sportf))

plot(spread[,1],type='l')

# trim N/A sections
start=wLen+1
hedgeRatio=hedgeRatio[start:len,1]
portf=portf[start:len,1:2]
spread=matrix(spread[start:len,1])

# negative Z score will be used as number of shares
# runmean and runsd are in caTools package
meanSpread=runmean(spread,wLen,endrule="constant") 
stdSpread=runsd(spread,wLen,endrule="constant")
numUnits=-(spread-meanSpread)/stdSpread #

positions=cbind(numUnits,numUnits)*cbind(-1*hedgeRatio,1)*portf

# daily profit and loss
lagPortf=lags(portf,1)[,3:4]
lagPos=lags(positions,1)[,3:4]
pnl=rowSums(lagPos*(portf-lagPortf)/lagPortf);

# return is P&L divided by gross market value of portfolio
ret=tail(pnl,-1)/rowSums(abs(lagPos))
plot(cumprod(1+ret)-1,type='l')

# annual percentage rate
APR=prod(1+ret)^(252/length(ret)) 
# > 1.032342 
sharpRatio=sqrt(252)*mean(ret)/stdev(ret)
# > 0.3713589

'''

Python code

Created on 29 Dec 2013

@author: deniz turan (denizstij@gmail.com)
'''

import numpy as np
import pandas as pd
from scipy.stats import linregress

o=pd.read_csv("uso.csv",header=0,names=["price"])
g=pd.read_csv("gold.csv",header=0,names=["price"])

len=o.price.count()
wLen=22
hedgeRatio= np.zeros((len,2))

for t in range(wLen, len):
    o_w=o.price[t-wLen:t]
    g_w=g.price[t-wLen:t]

    slope, intercept, r, p, stderr = linregress(g_w, o_w)
    hedgeRatio[t,0]=slope*-1
    hedgeRatio[t,1]=1


portf=np.vstack((g.price,o.price)).T
# spread 
spread=np.sum(np.multiply(portf,hedgeRatio),1)

# negative Z score will be used as number of shares
meanSpread=pd.rolling_mean(spread,wLen); 
stdSpread=pd.rolling_std(spread,wLen); 
numUnits=-(spread-meanSpread)/stdSpread #

#drop NaN values
start=wLen
g=g.drop(g.index[:start])
o=o.drop(o.index[:start])
hedgeRatio=hedgeRatio[start:,]
portf=portf[start:,]
spread=spread[start:,]
# number of units

numUnits=numUnits[start:,]
# position
positions=np.multiply(np.vstack((numUnits,numUnits)).T,np.multiply(portf,hedgeRatio))

# get lag 1
lagPortf=np.roll(portf,1,0);
lagPortf[0,]=lagPortf[1,];
lagPos=np.roll(positions,1,0);
lagPos[0,]=lagPos[1,];

spread=np.sum(np.multiply(portf,hedgeRatio),1)
pnl=np.sum(np.divide(np.multiply(lagPos,(portf-lagPortf)),lagPortf),1)

# return
ret=np.divide(pnl,np.sum(np.abs(lagPos),1))

APR=np.power(np.prod(1+ret),(252/float(np.size(ret,0)))) 
sharpRatio=np.sqrt(252)*float(np.mean(ret))/float(np.std(ret))
print " APR %f, sharpeRatio=%f" %( APR,sharpRatio)

Although, p-value for ADF test, APR (annual percentage rate) and sharpe ratio indicate, this strategy is not profitable, it is very basic strategy to apply.

Tuesday, December 03, 2013

Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) within R

Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) is a kind of ARMA (p,q) model to represent volatility:
$$ \sigma_t^2=\alpha_0 + \alpha_1 \epsilon_{t-1}^2 + \cdots + \alpha_q \epsilon_{t-q}^2 + \beta_1 \sigma_{t-1}^2 + \cdots + \beta_p\sigma_{t-p}^2 = \alpha_0 + \sum_{i=1}^q \alpha_i \epsilon_{t-i}^2 + \sum_{i=1}^p \beta_i \sigma_{t-i}^2 $$
where $\epsilon$ is a i.i.d random variable generally from a normal N(0,1) or student distribution and $\alpha_0 >0 , \alpha_t ≥ 0 , \beta_t ≥0, \alpha_t + \beta_t <1 $. In practice, generally low order of GARCH models are used in many application such as GARCH(1,1), GARCH (1,2), GARCH (2,1).

By utilising GARCH (1,1), to forecaste variance in k step ahead, following formula can be used:
$$ \sigma_t^2 (k)=\frac{\alpha_0}{1-\alpha_1 - \beta_1} ; k \to \infty$$


Lets have a look how to apply GARCH over monthly return of SP 500 from 1926 within R.
>library("fGarch") ## Library for GARCH
# get data
>data=read.table("http://www.mif.vu.lt/~rlapinskas/DUOMENYS/Tsay_fts3/sp500.dat",header=F)
> dim(data)
[1] 792   1
## First step is to model monthly return.  
>data=read.table("http://www.mif.vu.lt/~rlapinskas/DUOMENYS/Tsay_fts3/m-intc7308.txt",header=T)
# lets find out lag 
>ret=pacf(data)
> which.max(abs(ret$acf))
[1] 3
## Lets use ARMA(3,0) and GARCH(1,1) to model return 
> m1=garchFit(~arma(3,0)+garch(1,1),data=data,trace=F)
> summary(m1)

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~arma(3, 0) + garch(1, 1), data = data, trace = F) 

Mean and Variance Equation:
 data ~ arma(3, 0) + garch(1, 1)

 [data = data]

Conditional Distribution:
 norm 

Coefficient(s):
         mu          ar1          ar2          ar3        omega       alpha1  
 7.7077e-03   3.1968e-02  -3.0261e-02  -1.0649e-02   7.9746e-05   1.2425e-01  
      beta1  
 8.5302e-01  

Std. Errors:
 based on Hessian 

Error Analysis:
         Estimate  Std. Error  t value Pr(>|t|)    
mu      7.708e-03   1.607e-03    4.798 1.61e-06 ***
ar1     3.197e-02   3.837e-02    0.833  0.40473    
ar2    -3.026e-02   3.841e-02   -0.788  0.43076    
ar3    -1.065e-02   3.756e-02   -0.284  0.77677    
omega   7.975e-05   2.810e-05    2.838  0.00454 ** 
alpha1  1.242e-01   2.247e-02    5.529 3.22e-08 ***
beta1   8.530e-01   2.183e-02   39.075  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So, monthly return can be modeled as:
$$ r_t= 0.0078+0.032r_{t-1}-0.03r_{t-2}-0.01r_{t-3} + a_t$$ $$ \sigma_t^2=0.000080 + 0.12\alpha_{t-1}+0.85(\sigma_{t-1})^2$$
But since as t values of ARMA models are insignificant, we can use directly GARCH(1,1) to model it
> m2=garchFit(~garch(1,1),data=data,trace=F)
> summary(m2)

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~garch(1, 1), data = data, trace = F) 

Mean and Variance Equation:
 data ~ garch(1, 1)

 [data = data]

Conditional Distribution:
 norm 

Coefficient(s):
        mu       omega      alpha1       beta1  
7.4497e-03  8.0615e-05  1.2198e-01  8.5436e-01  

Std. Errors:
 based on Hessian 

Error Analysis:
        Estimate  Std. Error  t value Pr(>|t|)    
mu     7.450e-03   1.538e-03    4.845 1.27e-06 ***
omega  8.061e-05   2.833e-05    2.845  0.00444 ** 
alpha1 1.220e-01   2.202e-02    5.540 3.02e-08 ***
beta1  8.544e-01   2.175e-02   39.276  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So we can reduce model to more simpler form such as:
$$ r_t= 0.0074+a_t$$ $$ \sigma_t^2=0.000080 + 0.12a{t-1}+0.85(\sigma_{t-1})^2$$
Hence, variance of $a_t$ is:
$$ \frac{0.000080}{1-0.12 - 0.85} = 0.0027$$
Lets analyse if residuals are serially independent.
>stress=residuals(m2,standardize=T)
> Box.test(stress,12,type='Ljung')

 Box-Ljung test

data:  stress
X-squared = 11.9994, df = 12, p-value = 0.4457
So, above test indicates that there is not significant serial correlation in residuals, so we can conclude GARCH(1,1) is a good model for SP 500. We can also predict next 5 volatility of monthly return of SP 500 as:
> predict(m2,5)
  meanForecast  meanError standardDeviation
1  0.007449721 0.05377242        0.05377242
2  0.007449721 0.05388567        0.05388567
3  0.007449721 0.05399601        0.05399601
4  0.007449721 0.05410353        0.05410353
5  0.007449721 0.05420829        0.05420829

Friday, November 29, 2013

Testing AutoRegressive Conditional Heteroskedasticity (ARCH) Effect within R

Even though title as it self "AutoRegressive Conditional Heteroskedasticity (ARCH)" sounds scary, base idea is simple: conditional volatility (heteroskedasticity) of a time series is self (auto) regressive (ARMA (p,q)), in other words volatility is not constant over time:
$$ \sigma_t^2=\alpha_0 + \alpha_1 \epsilon_{t-1}^2 + \cdots + \alpha_q \epsilon_{t-q}^2 = \alpha_0 + \sum_{i=1}^q \alpha_i \epsilon_{t-i}^2 $$

In this text, i explain how to test if a time series has ARCH effect. In above formula, if null hypothesis is chosen as $ \alpha_t=0 $, we can conclude it has ARCH effect if null hypothesis is rejected. Box.test command in R can be used for this purpose. Note Box.test computes Ljung–Box test statistic for examining the null hypothesis of independence in a given time series. Below we analyse log return of Intel from 1973 to 2008.

>data=read.table("http://www.mif.vu.lt/~rlapinskas/DUOMENYS/Tsay_fts3/m-intc7308.txt",header=T)
>ret=log(data[,2]+1)
# lets find out lag 
>a=pacf(ret)
>summary(a)
       Length Class  Mode     
acf    26     -none- numeric  
type    1     -none- character
n.used  1     -none- numeric  
lag    26     -none- numeric  
series  1     -none- character
snames  0     -none- NULL  
# now test if there is a serial correlation
>Box.test(ret,lag=26,type='Ljung')

 Box-Ljung test

data:  ret
X-squared = 37.0152, df = 26, p-value = 0.07452

## As we can not reject the null hypothesis (independence) , we assume there is no serial correlation. 
## So we can now test if variance is constant or not.
> var=(ret-mean(ret))^2
> Box.test(var,lag=26,type='Ljung')

 Box-Ljung test

data:  var
X-squared = 104.7286, df = 26, p-value = 2.073e-11
Box.test shows we can reject the null hypothesis (independence) on variance, so it has significant serial correlation, in other words ARCH effect.

Friday, November 22, 2013

Java Garbage Collection Log File Scraper

'''
 
Created by Deniz Turan
 
Oracle Java (tested on 7), young generation garbage collection activity scraper. 
Extracts following fields from GC log file and save to a csv file.

Count,LogTime,logGCOffsetTime,logGCOffsetTime2, 
YGPreSize,YGPostSize,YGTotalSize, YGElapsedTime,     # Young generation
OLDPreSize,OLDPostSize,OLDTotalSize,OLDElapsedTime   # Old generation 

 
'''
from subprocess import call
import glob
import os
 
logDir="C:\\temp\\gc\\"
finalResultFileName=logDir+"finalResults.csv"
filterExtension="*.log";
 
def getLogFileList(search_dir):
        files = filter(os.path.isfile, glob.glob(search_dir + filterExtension))
        files.sort(key=lambda x: os.path.getmtime(x))       
        return files
 
def openResultFile():
    print "Creating result file : %s"% (finalResultFileName)
    # remove previous file
    call("rm "+finalResultFileName,shell=True)
    resultFileFD = open( finalResultFileName ,"a")
    ## create header
    resultFileFD.write("Count,LogTime,logGCOffsetTime,logGCOffsetTime2,")
    resultFileFD.write("YGPreSize,YGPostSize,YGTotalSize, YGElapsedTime,")
    resultFileFD.write("OLDPreSize,OLDPostSize,OLDTotalSize,OLDElapsedTime\n")
    return resultFileFD
       
def closeResultFile(resultFileFD):   
    print "Closing result file "
    resultFileFD.close();
 
def getFieldValue(strVal):
    index=strVal.index("K")
    index2=strVal.index("K", index+1)
    index3=strVal.index("K", index2+1)
    
    part1=strVal[:index]
    part2=strVal[index+3:index2]
    part3=strVal[index2+2:index3]
    return (part1,part2,part3)
 
#####################################################
# Main
#####################################################
if __name__ == '__main__':
    # prepare result file   
    resultFileFD=openResultFile ()
    
    print "Started to process log files"
    logFileList=getLogFileList(logDir)
    count=0
    for f in logFileList:
        print "Processing GC Log file %s"%f
        logFD = open(f)
        line = logFD.readline()
        while (line != "" ):           
            if "ParNew" in line :
                    count=count+1                   
                    fields=line.split(" ")
                    logTime=fields[0]
                    logGCOffsetTime=fields[1]
                    logGCOffsetTime2=fields[3]
                    res=getFieldValue(fields[5])
                    YGPreSize,YGPostSize,YGTotalSize=res
                    YGElapsedTime=fields[6]
                    res=getFieldValue(fields[8])
                    OLDPreSize,OLDPostSize,OLDTotalSize=res
                    OLDElapsedTime=fields[9]
                    print line
                   
                    print "%d %s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n"%(count,logTime,logGCOffsetTime,logGCOffsetTime2, \
                                                                   YGPreSize,YGPostSize,YGTotalSize, YGElapsedTime,\
                                                                   OLDPreSize,OLDPostSize,OLDTotalSize,OLDElapsedTime)
                    # print to file as CSV now
                    resultFileFD.write("%d,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n"%(count,logTime,logGCOffsetTime,logGCOffsetTime2, \
                                                                   YGPreSize,YGPostSize,YGTotalSize, YGElapsedTime,\
                                                                   OLDPreSize,OLDPostSize,OLDTotalSize,OLDElapsedTime))                   
                    
            line = logFD.readline()
            continue
        logFD.close();
    closeResultFile(resultFileFD);           
    print "finished processing log files"
    pass

Monday, November 18, 2013

Simple Passive Momentum Trading with Bollinger Band

Below, you can see a simple trading algorithm based on momentum and bollinger band on Quantopian.com

# Simple Passive Momentum Trading with Bollinger Band
import numpy as np
import statsmodels.api as stat
import statsmodels.tsa.stattools as ts

# globals for batch transform decorator
R_P = 1 # refresh period in days
W_L = 30 # window length in days
lookback=22
def initialize(context):
    context.stock = sid(24) # Apple (ignoring look-ahead bias)
    # for long and shorting 
    context.max_notional = 1000000
    context.min_notional = -1000000.0
    # set a fixed slippage
    set_slippage(slippage.FixedSlippage(spread=0.01))
                
def handle_data(context, data):
    # find moving average 
    rVal=getMeanStd(data)

    # lets dont do anything if we dont have enough data yet    
    if rVal is None:
        return    
    
    meanPrice,stdPrice = rVal
    price=data[context.stock].price
    notional = context.portfolio.positions[context.stock].amount * price
    
    # Passive momentum trading where for trading signal, Z-score is estimated
    h=((price-meanPrice)/stdPrice)
    # Bollinger band, if price is out of 2 std of moving mean, than lets trade     
    if h>2 and notional < context.max_notional  :
       # long
       order(context.stock,h*1000)
    if h<-2 and notional > context.min_notional:
       # short
       order(context.stock,h*1000)
     
@batch_transform(window_length=W_L, refresh_period=R_P) 
def getMeanStd(datapanel):
    prices = datapanel['price']
    meanPrice=prices.mean()
    stdPrice=prices.std()
    if meanPrice is not None and stdPrice is not None :
        return (meanPrice, stdPrice)
    else:
        return None

Screen shot of the back testing result is:
Click here to run algorithm on Quantopian.com.

Sunday, November 10, 2013

Cointegration Tests (ADF and Johansen) within R

In pair trading, in addition to correlation, cointegration can be very useful tool to determine which securities to be picked up. In this text, i demonstrate two approaches to test cointegration between two financial time series, two ETFs (EWA and EWC), within R. As you see below chart, in last 5 years, there is a cointegration between EWA and EWC.


ADF Test

In this test, we use linear regression to estimate spread between two securities and then ACF to test if spread is stationary, which in a way also test of cointegration for two securities.
>library("quantmod")  # To get data for symbols
> library("fUnitRoots") # Unit test 

## Lets get first data  for EWA and EWC from yahoo finance and extract adjusted close prices
>getSymbols("EWA")
>getSymbols("EWC")
>ewaAdj=unclass(EWA$EWA.Adjusted)
>ewcAdj=unclass(EWC$EWC.Adjusted)

## Now lets do linear regression where we assume drift is zero. Since we are not sure which security is dependent and independent, we need to apply following for both case

## EWC is dependent here 
> reg=lm (ewcAdj~ewaAdj+0)

## And now lets use adf test on spread (which is actually residuals of regression above step)
> adfTest(reg$residuals, type="nc")

Title:
 Augmented Dickey-Fuller Test

Test Results:
  PARAMETER:
    Lag Order: 1
  STATISTIC:
    Dickey-Fuller: -1.8082
  P VALUE:
    0.07148 

## EWA is dependent here this time
> reg=lm (ewaAdj~ewcAdj+0)
> adfTest(reg$residuals, type="nc") 

Title:
 Augmented Dickey-Fuller Test

Test Results:
  PARAMETER:
    Lag Order: 1
  STATISTIC:
    Dickey-Fuller: -1.7656
  P VALUE:
    0.07793 

We use most negative Dickey-Fuller value (-1.8082 and -1.7656) to choice which regression formula to use. Based on that, We choice EWC is dependent. Within 90% confidence level (p-value is 7%), we can reject null hypothesis (unit root), so we can assume spread (residual) is stationary, therefore there is a cointegration. Below coded a function for this purpose:

cointegrationTestLM_ADF <-function(A, B, startDate) {
  cat("Processing stock:",A ," and ", B, " start date:",startDate)
  
  aData=getSymbols(A,from=startDate,auto.assign = FALSE)
  aAdj=unclass(aData[,6])
  bData=getSymbols(B,from=startDate,auto.assign = FALSE)
  bAdj=unclass(bData[,6])
  lenA=length(aAdj)
  lenB=length(bAdj)
  N= min(lenA,lenB) 
  startA=0
  startB=0
  if (lenA!=N || lenB!=N){
    startA=lenA-N+1
    startB=lenB-N+1
  }
  cat("\nIndex start",A,":",startA," Length ",lenA )
  cat("\nIndex start",B,":",startB," Length ",lenB)
  aAdj=aAdj[startA:lenA,]
  bAdj=bAdj[startB:lenB,]
  
  regA=lm(aAdj~bAdj+0)
  
  summary(regA)
  regB=lm(bAdj~aAdj+0)
  summary(regB)
  
  coA <- adfTest(regA$residuals, type="nc")
  coB=adfTest(regB$residuals, type="nc")   
  
  
  cat("\n",A," p-value",coA@test$p.value," statistics:",coA@test$statistic)     
  cat("\n",B," p-value",coB@test$p.value," statistics:",coB@test$statistic)     
  
  
  # Lets choice most negative
  if (coA@test$statistic < coB@test$statistic){
   cat("\nStock ",A, " is dependent on stock ",B)
    cat("\np-value",coA@test$p.value," statistics:",coA@test$statistic)     
    p=coA@test$p.value
    s=coA@test$statistic
  }else {
    cat("\n Stock ",B, " is dependent on stock:",A)
    cat("\n p-value",coB@test$p.value," statistics:",coB@test$statistic)     
    p=coB@test$p.value
    s=coB@test$statistic     
   }   
  return(c(s,p))
}
How to run it:
res=cointegrationTestLM_ADF("EWA","EWC",'2007-01-01')
Processing stock: EWA  and  EWC  start date: 2007-01-01
Index start EWA : 0  Length  1731
Index start EWC : 0  Length  1731
 EWA  p-value 0.0501857  statistics: -1.948774
 EWC  p-value 0.04719164  statistics: -1.981454
 Stock  EWC  is dependent on stock: EWA
 p-value 0.04719164  statistics: -1.981454

res
  -1.98145360    0.04719164 

Johansen Test

As you see above ADF approach has some drawbacks such as:
- Not sure which security is dependent or independent
- Can not test multiple instruments

Johansen test addresses these points.
> library("urca") # For cointegration 

> coRes=ca.jo(data.frame(ewaAdj,ewcAdj),type="trace",K=2,ecdet="none", spec="longrun")
> summary(coRes)

###################### 
# Johansen-Procedure # 
###################### 

Test type: trace statistic , with linear trend 

Eigenvalues (lambda):
[1] 0.004881986 0.001200577

Values of teststatistic and critical values of test:

          test 10pct  5pct  1pct
r <= 1 |  2.07  6.50  8.18 11.65
r = 0  | 10.51 15.66 17.95 23.52

Eigenvectors, normalised to first column:
(These are the cointegration relations)

                EWA.Adjusted.l2 EWC.Adjusted.l2
EWA.Adjusted.l2        1.000000       1.0000000
EWC.Adjusted.l2       -1.253545      -0.3702406

Weights W:
(This is the loading matrix)

               EWA.Adjusted.l2 EWC.Adjusted.l2
EWA.Adjusted.d     0.007172485    -0.003894786
EWC.Adjusted.d     0.011970316    -0.001504604

Johansen test estimates the rank (r) of given matrix of time series with confidence level. In our example we have two time series, therefore Johansen tests null hypothesis of r=0 < (no cointegration at all), r<1 (till n-1, where n=2 in our example). As in example above, if r<=1 test value (2.07) was greater than a confidence level's value (say 10%: 6.50), we would assume there is a cointegration of r time series (in this case r<=1). But as you see, none of our test values are greater than than critical values at r<0 and r<=1, therefore there is no cointegration. This is opposite of ADF result we found above. Based on my some research, i've found that Johansen test can be misleading in some extreme case (see that discussion for more info). Once a cointegration is established, eigenvector (normalized first column) would be used as weight for a portfolio.

In addition above methods, KPSS(Kwiatkowski–Phillips–Schmidt–Shin)can be also used to test stationarity.

Sunday, November 03, 2013

Stationary Tests : Augmented Dickey–Fuller (ADF), Hurst Exponent, Variance Ratio (VRTest) of Time Series within R

Several trading strategies (momentum, mean reverting, ...) are based on if data is stationary or not. In this text, i demonstrate how to test it statistically.
library("quantmod") # for downloading fx data
library("pracma") # for hurst exponent 
library("vrtest") # variance ratio test
library("tseries") # for adf test
library("fUnitRoots")  # for adf test

## first lets fetch USD/CAD data for last 5 years
getFX("UDSCAD")
usdCad=unclass(USDCAD) # unwrap price column
# estimate log return
n=length(usdCad)
usdcadLog=log(usdCad[1:n])

## First use Augmented Dickey–Fuller Test (adf.test) to test USD/CAD is statationary
>adfTest(usdCad, lag=1)

Title:
 Augmented Dickey-Fuller Test

Test Results:
  PARAMETER:
    Lag Order: 1
  STATISTIC:
    Dickey-Fuller: 0.2556
  P VALUE:
    0.6978 

Description:
 Sun Nov 03 16:47:27 2013 by user: deniz

## As you see above, null hypothesis (unit root) can not be rejected with p-value ~70 %

## So we demonstrated it is not stationary. So if there is a trend or mean reverting. Hurst exponent (H) can be used for this purpose (Note Hursy exponent relies on that random walk diffuse in proportion to square root of time.). 
#Value of H can be interpreted such as: 
#H=0.5:Brownian motion (Random walk)
#H<0.5:Mean reverting  
#H>0.5:Trending 
> hurst(usdcadLog) Hurst exponent
[1] 0.9976377

#So, USDCAD is in trending phase. 

## Another way to test stationary is to use V:
> vrtest::Auto.VR(usdcadLog)
[1] 83.37723
> vrtest::Lo.Mac(usdcadLog,c(2,4,10))
$Stats
           M1       M2
k=2  22.10668 15.98633
k=4  35.03888 25.46031
k=10 56.21660 41.58861

## Another way to analyse stationary condition is via linear regression in which we will try to establish if there is a link between data and diff(data(t-1))

>deltaUsdcadLog=c(0,usdcadLog[2:n]-usdcadLog[1:n-1])
> r=lm(deltaUsdcadLog ~ usdcadLog)
> summary(r)

Call:
lm(formula = deltaUsdcadLog ~ usdcadLog)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0121267 -0.0013094 -0.0000982  0.0012327  0.0103982 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept) -7.133e-05  1.306e-04  -0.546   0.5853  
usdcadLog    8.772e-03  5.234e-03   1.676   0.0944 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.002485 on 498 degrees of freedom
Multiple R-squared:  0.005608, Adjusted R-squared:  0.003611 
F-statistic: 2.808 on 1 and 498 DF,  p-value: 0.0944

> r$coefficients[2]
  usdcadLog 
0.008771754 

## Coefficient (Beta) gives clue about if there is mean reverting. If it is negative, there is a mean reverting.  As you see above, it is positive, therefore as we already concluded before, it is trending. If it was negative, we would use following to find out half life of mean revertion:

>-log(2)/r$coefficients[2]

Wednesday, October 30, 2013

Downloading daily USD/CAD fx rate within R

Below code illustrates how to download daily USD/CAD fx rate within R from http://ratedata.gaincapital.com
downloadAndExtract_USD_CAD <- function(years,outFile) {
  mStr=c("01","02","03","04","05","06","07","08","09","10","11","12")

  if (file.exists(outFile)) {
    file.remove(outFile)
  }
  
  URL="http://ratedata.gaincapital.com"
  for (y in years) {
    for (m in mStr) {
      for (w in 1:5){
        mName=paste(m,month.name[as.numeric(m)])
        wName=paste("USD_CAD_Week",w,".zip",sep="");
        fullName=paste(URL,y,mName,wName,sep="/")
        a=paste("Downloading:",fullName,"\n" ,sep=" ",collapse="")
        cat(a)
        try(downloadAndExtractData(fullName,outFile),silent=TRUE)
      }
    }
  }
}

downloadAndExtractData <- function(zipfile,outFile) {
  # Create a name for the dir where we'll unzip
  zipdir <- tempfile()
  # Create the dir using that name
  dir.create(zipdir)
  dFile=paste(zipdir,"\\zzz.zip",sep="");
  print (dFile)
  try(download.file(zipfile,destfile=dFile, mode="wb"),silent=T)
  # Unzip the file into the dir
  unzip(dFile, exdir=zipdir)
  # Get the files into the dir
  files <- list.files(zipdir)
  # Throw an error if there's more than one
  if(length(files)>2) stop("More than one data file inside zip")
  # Get the full name of the file
  f<- paste(zipdir, files[1], sep="/")
  size=file.info(f)$size
  if (size==0) { 
    cat("Zero file size\n")
    return()
  }
  # Read the file
  cat("\n")
  print(c("Downladed tmp file:",f))
  cat("\n")
  dat=read.csv(f,header=F)
  len=dim(dat)
  print(c("Downloaded #nrows:",len[1]))
   
  #we are just interested in prices at 16:59
  index=grepl(" 16:59",dat[,"V3"])
  d=dat[index,]
  
  # append to output file
  write.table(d,outFile,append=TRUE,row.names=FALSE,col.names=FALSE) 
  
  # lets read all written data so far
  dat=read.csv(outFile)  
  cat("\n")
  print(c("Total rows:",dim(dat)[1]))
  
  # tidy up a bit
  file.remove(f)
}

Friday, October 25, 2013

fBasics (Basic Stats) library in R

% Load the package fBasics.
> library(fBasics)

% Load the data.
% header=T means 1st row of the data file contains variable names. The default is header=F, i.e., no names.
> da=read.table("http://www.mif.vu.lt/~rlapinskas/DUOMENYS/Tsay_fts3/d-ibm3dx7008.txt",header=T) 

> ibm=da[,2] % Obtain IBM simple returns
> sibm=ibm*100 % Percentage simple returns

> basicStats(sim) % Compute the summary statistics

% Turn to log returns in percentages
> libm=log(ibm+1)*100
> t.test(libm) % Test mean being zero.

Sunday, September 29, 2013

Efficient Frontier Portfolio Monte Carlo Simulation in Python

'''
Created on 29 Sep 2013
@author: deniz turan (denizstij AT gmail DOT com)
'''

import numpy as np
import QSTK.qstkutil.qsdateutil as du
import QSTK.qstkutil.tsutil as tsu
import QSTK.qstkutil.DataAccess as da

import datetime as dt
import matplotlib.pyplot as plt
import pandas as pd


class PortfolioEfficientFrontierMonteCarloSimulator():
    
    
    def simulate(self, dt_start, endDate, ls_symbols, weights):        
        dt_timeofday = dt.timedelta(hours=16)
        ldt_timestamps = du.getNYSEdays(dt_start, dt_end, dt_timeofday)

        c_dataobj = da.DataAccess('Yahoo')
        ls_keys = ['open', 'high', 'low', 'close', 'volume', 'actual_close']
        ldf_data = c_dataobj.get_data(ldt_timestamps, ls_symbols, ls_keys)
        d_data = dict(zip(ls_keys, ldf_data))        
        closePrices = d_data['close'].values
        
        rows = closePrices.shape[0]
        
        dailyReturns = (closePrices[0:rows, :] / closePrices[0, :])
        dailyPortfolioCum = dailyReturns * weights;
        dailyPortfolioSum = np.sum(dailyPortfolioCum, axis=1)
        dailyPortfolioRet = (dailyPortfolioSum[1:rows] / dailyPortfolioSum[0:rows - 1]) - 1
        dailyPortfolioRet = np.append(0, dailyPortfolioRet) 

        mean = np.mean(dailyPortfolioRet, axis=0)
        vol = np.std(dailyPortfolioRet, axis=0)
        cumReturn = dailyPortfolioSum[-1]
        T = 252;
        riskFree = 0
        sharpRatio = np.sqrt(T) * (mean - riskFree) / vol
        return [mean, vol, sharpRatio,cumReturn]

    def generateShortWeights(self, numWeight):        
        weights = np.zeros(numWeight)
        weights[0:numWeight - 1] = np.random.randint(-101, 101, numWeight - 1)
        weights[numWeight - 1] = 100 - np.sum(weights)
        np.random.shuffle(weights) # eliminate bias on last element
        weights = weights / np.sum(weights)    
        return weights

    def generateNoShortWeights(self, numWeight):        
        weights = np.random.rand(numWeight)
        weights = weights / np.sum(weights)
        return weights

    
    def genareteWeight(self, numWeight, isShortAllowed):
        if isShortAllowed :
            return self.generateShortWeights(numWeight)
        else:
            return self.generateNoShortWeights(numWeight)
        
                
myClass = PortfolioEfficientFrontierMonteCarloSimulator()
dt_start = dt.datetime(2011, 1, 1)
dt_end = dt.datetime(2011, 12, 31)
stockList = ['AAPL', 'GLD', 'GOOG', 'XOM']
# weight=[0.4, 0.4, 0.0, 0.2]
numTrial=10000
res= np.zeros((numTrial,4))
weights= np.zeros((numTrial,4))

for i in range(0, numTrial):
    weights[i] = myClass.genareteWeight(4, True)    
    res[i,:]=myClass.simulate(dt_start, dt_end, stockList, weights[i])
    
#  find index of min/max of mean and vol
minMeanIndex=res[:,0].argmin()
maxMeanIndex=res[:,0].argmax()
minVolIndex=res[:,1].argmin()
maxVolIndex=res[:,1].argmax()

# min and max mean and vol
maxVol=res[maxVolIndex,1]
maxMean=res[maxMeanIndex,0]
minVol=res[minVolIndex,1]
minMean=res[minMeanIndex,0]

# lets plot now 
plt.clf()
plt.scatter(res[:,1],res[:,0],marker="+", linewidths=0.5)
# Plot global mean variance portfolio
plt.scatter(res[minVolIndex,1],res[minVolIndex,0],c='m',marker='x',linewidths=3) 
plt.xlim([minVol*0.8,maxVol*1.2])
plt.ylim([minMean*0.8,maxMean*1.2])
plt.ylabel('Return')
plt.xlabel('Vol')
plt.savefig('efficientFrontier.png', format='png')

# lets print some stats now
print "Global Mean-Variance, mean=%s, vol=%s, weights: %s" %( res[minVolIndex,0],res[minVolIndex,1], weights[minVolIndex,:])
print "minMeanIndex  mean=%s, vol=%s, weights: %s" %( res[minMeanIndex,0],res[minMeanIndex,1], weights[minMeanIndex,:])
print "maxMeanIndex  mean=%s, vol=%s, weights: %s" %( res[maxMeanIndex,0],res[maxMeanIndex,1], weights[maxMeanIndex,:])
print "maxVolIndex mean=%s, vol=%s, weights: %s" %( res[maxVolIndex,0],res[maxVolIndex,1], weights[maxVolIndex,:])

Efficient Frontier Portfolio (no short allowed)

Sunday, April 22, 2012

JVM Command Line Options for Low Latency Applications


In this article, I elaborate Java Hotspot JVM command line options for low latency applications.

Garbage Collection (GC)

When tuning JVM garbage collection following principles should be taken into consideration:

  • Minor GC Reclaim Principle: Maximize the number of objects reclaimed in each minor GC.  This will reduce number of full garbage collections and its frequency.
  • GC Maximize Memory Principle: The larger the java heap size, the better garbage collector and application performance.
  • Constraints: Tune JVM garbage collector based on two of following performance attributes: throughput, footprint and latency.

Minimum JVM command line options for GC are as follows:

-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-Xloggc:

Calendar date and time stamp of GC activities to link application’s own logs.

-XX:PrintGCDateStamps

To monitor and analyse pause events arising from VM safe point operations for low latency applications:

-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintSafepointStatistics

Note, safe point operations can happen due to many reasons, not only GC (for example, bias unlocking/revoking, deoptimization, thread stopping/resuming, exit).  So analysing reasons behind stop world pause times is very important.

Java Heap Size

Amount of data in java heap after a full garbage collection when application in a steady state is the size of live data. Live data size determines size of heap sections as follows:


  • Overall Heap size (-Xms and -Xmx) should be 3 or 4 times of live data size
  • Permanent Generation (-XX:PermSize, -XX:MaxPermSize) size should be  1.2 to 1.5x times of permanent generation space occupancy.
  • Young generation (-Xmn) size should be bigger than 1.5 times of live data size and not less than 10 % heap size.  Size of young generation space should be optimum to reduce frequency and duration of minor GC. For example, if minor GC happens very frequently size of young generation should be increased. Whereas, if duration of minor GC is so high, size of young generation should be decreased.
  • Old generation (Heap size - Young generation) size should be 2 to 3 times of live data size.
Initial heap size (-Xms) and maximum heap size (-Xmx) should be same for low latency applications in order to avoid dynamic sizing of heap. This is true for young generation too therefore size of young generation should be specified with flag –Xmn (max and initial size).

In Concurrent Garbage collector (CMS) (-XX:+UseConcMarkSweepGC), special attention should be given to survivor ratio (-XX:SurvivorRatio) and tenuring threshold.

Survivor ratio (-XX:SurvivorRatio) is the ratio  of eden space to survivor spaces. It must be greater than 0 and it is used to determine size of survivor and eden space compare to overall young generation size such as:

survivor space size = -Xmn/(-XX:SurvivorRatio= + 2)

For example if -XX:SurvivorRatio=6 and –Xmn=512m is specified, survivor size will be 64m and eden size will be 384m.

As explained above, since Survivor ratio (-XX:SurvivorRatio) determines size of eden space, so the smaller eden space, the higher frequency of minor GC, which leads short minor GC duration time and more objects are promoted to survivor space. When objects promoted from eden space to survivor spaces, age of objects are increased too.  Objects older than an age (tenuring threshold) are promoted to old generation. Therefore, survivor ratio (-XX:SurvivorRatio) affects tenuring of objects.

Tenuring threshold is calculated by JVM dynamically by analysing aging activity of objects. But -XX:MaxTenuringThreshold  (between 0 and 31 in java 6) can be used to specify tenuring threshold. Objects older than this value are promoted to old generation space.  Hence, an optimum tenuring threshold must be used to prevent promoting objects to old generation.

Tuniring activity can be monitored by enabling -XX:+PrintTenuringDistribution which prints out statistics for each age:

Desired survivor size 8388608 bytes, new threshold 1 (max 15)
- age 1: 16690480 bytes, 16690480 total

In this output it should be analysed in a way that if the number of bytes surviving at each object age decreases as the object age increases and whether the tenuring threshold calculated by the JVM is equal to or stays close to the value set for the max tenuring threshold.  For example, similar to previous example, if internal threshold is less then max tenuring threshold and desired survivor space less than aged byte consistently, then survivor space should be increased.

Another point in CMS garbage collector is also when to initialize the cycle. Following command line options are used to control when CMS garbage collection should be started based on occupancy of old generation space:

-XX:CMSInitiatingOccupancyFraction=
-XX:+UseCMSInitiatingOccupancyOnly

An optimum value of -XX:+UseCMSInitiatingOccupancyOnly should be used minimize frequency of CMS and duration and risk of stop the all world garbage collection. If a high value of -XX:+UseCMSInitiatingOccupancyOnly is used, CMS duration would be longer, also there is a chance that old generation will be full before CMS reclaim space, if application generates more objects than reclaimed.

Explicit Garbage Collections

Unless it is required specifically in the application, garbage collection arising by System.gc() should be disabled or executed as concurrent GC with following options:

-XX:+DisableExplicitGC
-XX:+ExplicitGCInvokesConcurrent
-XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses

Aggressive Optimization Techniques

Many new and slightly risky optimization techniques can be enabled by -XX:+AggressiveOpts options. This option is not recommended for application in which, stability is important, as this feature enable new optimization techniques which are not fully proven stable yet.

Biased Locking

Hotspot Java 6 has a new default feature which bias locking for towards last thread holding the lock. This approach is true for many applications. But in some cases, that can not be true. For example, a resource locked by producer and consumer threads.  Since revoking a  biased locking requires a full stop the world safe operation, it is better to disable that future for that sort of application with -XX:-UseBiasedLocking option. Statistics regarding to stop the world time and activities can be monitored by following options

-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintSafepointStatistics

Example JVM parameters:

$JAVA_HOME/bin/java –Xmx10G –Xms10G -Xmn1200m -XX:SurvivorRatio=4 -XX:PermSize=128m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -XX:MaxGCPauseMillis=1000 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=90 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:gc.log -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics -XX:+AggressiveOpts -XX:+UseCompressedOops -XX:+OptimizeStringConcat -XX:+UseStringCache -XX:+UseFastAccessorMethods

Commands to analyse heap

Getting histogram of objects in JVM heap

PID= |Process id of a java process|
FILE_NAME_HIST=heapHist.dump
FILE_NAME_HIST_LIVE=heapHist_Live.dump 

%% All objects 
$JAVA_HOME/bin/jmap -histo $pid > $FILE_NAME_HIST
%% Live objects 
$JAVA_HOME/bin/jmap -histo:live $pid > $FILE_NAME_HIST_LIVE 

Getting a memory dump of a java process:

pid= |Process id of a java process| 
FILE_NAME_LIVE=heapDump_Live_$pid.dump
FILE_NAME=heapDump_$pid.dump 

%% dump everything in heap to $FILE_NAME 
$JAVA_HOME/bin/jmap  -F -dump:format=b,file=$FILE_NAME $pid 
%%For only Live objects, we need to do following. Note this will force a full GC first, therefore first try above dump. 
$JAVA_HOME/bin/jmap  -F -dump:live,format=b,file=$FILE_NAME_LIVE $pid 

Starting jstatd (to connect a remote java process via visualVM)

# start jstatd (with same user name/functional id)
# an example of tools.policy file is provided below
jstatd  -J-Djava.security.policy=tools.policy  &

# example of tools.policy file
grant codebase "file:${java.home}/../lib/tools.jar" {
   permission java.security.AllPermission;
};

Sunday, March 11, 2012

Performance Monitoring on Operating System Level

It is that time of year again in my office. We have to fill up forms for this year’s goals. I can not say that I do enjoy filling up these forms. But it is one of these things you don’t like doing it, but you have to do it, because it is part of your job.

Some of our goals must be in align with business’s direction. One of them is to make our application faster. So, performance and performance related tasks are the main project in my work for this year. We  aim at boasting our performance to single digit millisecond, if not a nanosecond.

In software development, performance enhancement is not focused in first release or iterations, unless performance is a vital feature of application. Of course, as part of best practices, generally during design stage, performance side is considered. But main goal is to ship the product with a reasonable performance, initially. Once application is released, along with other new features, performance of the application is improved (provided that application is a success and some people are still using it.J). At the moment, in my work, we are in performance tuning stage for our application.

First step in performance enhancement is to gather performance related measurements, so you will have some quantitative data to determine which component of the application needs to be addressed. These initial data will be used also as benchmark for performance enhancement. The measurements are collected by profiling application, looking at OS level data and benchmarking.

OS level data provides you first-hand knowledge of utilization of your resources. In the first place, you may resolve some of these issues by increasing your resources (CPU, memory, disk, network) within given budget. In my current company, a new hardware or resource is not bought unless, either it is broken or a given utilization threshold for that resource is not met. For example, to get a new server with a better CPU specification, its CPU utilization must be more than 40% generally in a normal trading day. Fair enough! No need to waste money on new toys, if applications do not already utilise these resources. So, in performance analysing, first step is to measure how an application utilise resources such as CPU, memory, network, disk.

In this article, I will list commonly used tools for monitoring performance related data in OS (Windows and Linux). I am not going to dive into how to use these tools, as there are many related online/offline resources (man, docs …). If you want to get into more detail, I can also recommend a new book, Java Performance by C. Hunt and B. John, of which this and upcoming performance related text will be based.

CPU Utilization
Monitoring OS system allow us to see how an application utilize CPU cycles. For example, 
if a multithreading application causes saturation of CPU resources, that issue needs to be resolved before considering the increase of CPUs number. 

When monitoring CPU utilization, two measurements have to be collected: User and Kernel or System (sys) CPU utilization. Impact of an application will be displayed as user utilization in tools.

Windows
To monitor performance of CPU in Windows, there are three tools. Task Manager, perfmon, typeperf.

Task Manager is widely recognized and one click away on the Desktop. In performance tab of Task Manager, you can see CPU related graphs along with memory measurements. In order to display kernel utilization, you need to enable it by View à Show Kernel Times. By doing so, kernel utilization measurement will be shown in red line in charts. User utilization will be difference between these two charts.

Task Manager does not provide low level measurements. For that one perfmon tool is needed.This is a very advance tool and you can display many measurements in graph by selecting it from Add Counters … window. For example to display user and kernel time of a processor, you need to select % User Time and % Privileged Time in Processor item.

Perfmon is a graphical interface. To automate monitoring in a batch script, there is a command line command typeperf which is a command line interface of perfmon. For example to show user, kernel and total CPU utilization with 5 sec interval, following command is used.
typeperf –si 5 "\Processor(_Total)\% User Time" "\Processor(_Total)% Privileged Time" "\Processor(_Total)% Processor Time”

Linux
For linux type operating system, there are two main tools to display CPU utilizations (and other measurements too): vmstat, top.
In vmstat, CPU related measurements are shown under cs, sy, id and wa column of cpu header:
  • us: Time spent (%) running non-kernel code. (user time, including nice time)
  • sy: Time spent (%) running kernel code. (system time)
  • id: Time spent (%) idle. Prior to Linux 2.5.41, this includes IO-wait time.
  • wa: Time spent (%) waiting for IO.

By using top command, similar CPU related measurements in top of the screen can be seen.

CPU Schedule Run Queue
CPU schedule run queues hold light-weight processes which are ready to be run but are waiting for a CPU to be executed. Size of queue increases when there are more lightweight process are ready to be executed than system can handle. So, if queue size is an indication of performance issue in system. Generally, if size of a queue is 3 or 4 times bigger than the number of processes (in java this is Runtime.availableProcessprs()), then it can be assumed that system is not good enough to handle lightweight processes. As a solution, to this issue, either increase the CPU number or optimize the source code of applications to reduce CPU cycles.

In Windows, following typeperf command will show run queues:
typeperf -si 5 ”\System\Processor Queue Length”
In vmstat, first column r shows the actual number of lightweight processes.

Memory Utilization
In memory utilization, paging or swapping, locking, voluntary and involuntary context switching activities should be monitored.

If a system’s main memory is not enough to process a request, then a disk space from memory (swap space) is used to dump the content of the memory. Therefore, application causing so much swapping or paging will come to a slowdown. For example, for a Java application, if a part of heap memory is paged, then during garbage collector phrase, this part has to be taken in to memory. That would increase garbage collector time (if no concurrent garbage collector is used, that means application stop longer time).

In Windows following command show available memory and paging (per second) activities:
typeperf -si 5 ”\Memory\Available Mbytes” ”\Memory\Pages/sec”

In Linux, again top and vmstat commands are our best friend again. In vmstat, memory and si and so columns shows memory related measurements:
  • swpd: the amount of virtual memory used.
  • free: the amount of idle memory.
  • buff: the amount of memory used as buffers.
  • cache: the amount of memory used as cache.
  • inact: the amount of inactive memory. (-a option)
  • active: the amount of active memory. (-a option)
  • si: Amount of memory swapped in from disk (/s).
  • so: Amount of memory swapped to disk (/s).
In the case of less memory size and the rise of paging, increase of RAM should be considered 
as an appropriate step. Please, note that paging can happen frequently when you launch 
an application.

To monitoring locking, voluntary and involuntary context switching activities in linux a package called pidstat has to be installed.

Network Utilization
A system with heavy network communication, must utilise well its network resources (network bandwidth or network IO), otherwise that will degrade the performance of applications.

In Linux, we can use netstat to display network communications. But this tool does not provide total utilization of resources. A manual estimation has to be done with according to capacity of network resource and current network activities reported by netstat. Following formula can be used for utilization:

Network Utilization= Bytes Total/Sec/(Current Bandwith/8)*100

Note: Current bandwith in bits, therefore we convert it to byte by diving 8.

Similar estimation can be done in Windows by using following command:

typeperf -si 5 "\Network Interface(*)\Bytes Total/sec"

Disk Utilization
Apart from network, disk IO is another factor for performance. An application has significant IO interaction, such as a database, must consider disk IO utilizations.

In Linux iostat command is used for monitoring disk IO activities. For example when iostat is used with extended statistic argument (x) it will provide utilization for each devices.

SAR
These tools described above provide information related to current state of system. But if we are interested in historical performance data, sar command can be useful. This command in Linux, provides measurements data for an extended period of time (e.x : last 10 days).

CPU/Cache Utilization
To gather CPU and cache (L1, L2, L3) related statistics (such as load, stores, misses, number of cycles, instruction)  perf  and likwid.  For example to find detailed L1 cache statistics of a java process, following command can be used:

$ perf stat -d -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores,L1-dcache-store-misses,cache-references,cache-misses,cycles,instructions java myProcess




Sunday, January 01, 2012

Affordable House prices in London

What is the biggest issue in London? I think, it is affordable housing issue for working class people. To get a secure, habitable, accessible (close to transports) house in London is a big issue. Although, Compare to many cities in the world, of course London is maybe the more secure and accessible (transport infra-structure is good) city, but perception among Londoner is not that much optimistic, as far as I see, from my experiences. In this text, i will explore the state of housing in London in terms of affordability for local, common people and try to answer question, if house prices in London will decrease or increase in 2012, given supply and demand parameters. And my conclusion is, it will decrease, but it won’t crash, yet. I will try to establish it in terms of demand/supply factors.

Foremost, I am not expert on real-estate properties. But last one year, I am doing my own research to find a suitable house in London, therefore, this text will be based on my own research and personal experiences. So, don’t forget to do your own research. I don’t work for any real estate related company or any whatsoever relation. I know, statistics, is white lies, but in order to justify my conclusion, I will use some statistics. Most of statistics in this text will be based on following publication from Local London Authority and Land Registry:

http://www.london.gov.uk/sites/default/files/Housing%20in%20London%20Dec11.pdf

http://www.landreg.gov.uk/upload/documents/HPI_Report_Nov_11_ws13pm4.pdf

I think housing issues is a timer bomb, waiting to explode. And at the moment, I cannot see a sound policy to address it, neither from local and governmental authorities agencies. They hope the issue resolves itself. Would it be resolved itself for common people in a free market capitalist economy, where rich foreigner can buy properties or it is open to migration?

In Maslow's hierarchy of needs, sheltering is at the bottom, among side of breathing, food, water, sleep, sex, excretion. People need a shelter, a home. That is very basic. We need a safe place to nurture. We either need to rent or buy house to live in. In some worst cases, some of us are forced to streets or squatting. Actually, because of current economic crisis, homeless people in London streets increase (around 4000 people in London streets, increased almost 33 % since economic crisis started in 2008).

According to statistics, in 2011, in London, 53 % of people own their house, from high of 57 % in 2001. Around 25% people in private or social rental and private rental is increasing.

So let’s we assume that you are one of these lucky person in London who has a job and afford to rent a place at least. Actually, most of these people prefer to buy a house rather than renting. Statistically, 85% of Londoners would prefer to buy a house. So, these people somehow have to save money to buy a house, if they are in rent.

Average house price in London in July 2011 was £347,000 which includes very expensive houses in exclusive places of London. In that text, since I am considering affordable house for Londoners, I assume average house price in London £250,000, which is actually of mean of house prices and also what I have observed since last year.

For a first time buyer, you need to provide at least %20 deposit for mortgage which is be around £50K. In addition, you need at least £5K for other services (mortgage, legal, surveyor, and I am not including %3 stamp duty tax for houses less than £250K). For someone, who would be saving £500 per month, it would take at least 10 years to make a 20% deposit which leads people to not save but spend, as it seems very long term investment. Given that people in UK are bombard to spend by capitalist system, rather than save, most people do not save, but enjoy today, rather than thinking tomorrow.

So, I think, most of first time buyer cannot afford to buy house because of restriction on mortgage. Therefore, in terms of first time buyer, demand is low.

There is also another type of buyers who prefers to work in London, but who do not prefer to live in London. They prefer to commute every day 2-3 hours than living in London. For example, some of my colleagues and friends, commutes from Brighton and Oxford (2-3 hours journey everyday), rather than to live in London. So, another low demand factor.

But, we have also foreigner factors in London in terms of demand. This factor, foreigner investing in London, buying houses in London, is one of the biggest factors in increase house prices in the last decade. Rich foreigner sees London as a safe place to invest. They think, they will make good profit. And actually they do. In exclusive neighbourhood in zone 1, house prices has increased almost 9% last year, but overall increase all over London was %1. So, simply, some rich people will invest in zone 1, but I wonder if they will invest in houses, in zone 2 or zone 3? For example Stratford which is going to host 2012 Olympics.

I doubt it. I don’t think, any rich foreigner will invest it in zones apart from Zone 1 or Zone 2. I was walking in Stratford last week, I noticed many empty flats. It seems not much demand for these new small flats which is built Olympics pipe dreams. For example, house prices in Newham (which holds Olympics and zone 3) has reduced 0.3% in the last year. Besides that, I don’t think, rich foreigner will see London as safe place while EU is in credit crisis. Remember the attitude of Chinese government regarding to buying EU zone debts? They refused it. They are not investing in EU as they do not have faith. Sure, UK and EU is not exactly same, but they have strong trading ties. I think, these rich people would invest more in emerging markets, such as Brazil, Turkey and Poland or Zone 1 in London. So rich foreigner investment won’t effect houses prices for common people who cannot afford to buy a house in Zone 1 or 2, they are already priced out in these zones.

Now let’s have a look at possible seller. Let starts with local Londoners, who bought house very long time ago and would like to sell it and leave the city. Actually, in 2011, net migration is negative. For example, between mid 2009 and 2010 11,200 more people moved out London than moving in. And I think, in 2011, that number has increased due to economic crisis. So that would increase housing supply.

Other type of people who would like to sell house are those who bought big houses before boom, and would like to sell now to make a profit and get a smaller or bigger places. That’s very common. Lately, a friend of mine sold his house to get a bigger place to settle down and have a family. And I read from many websites, people sells big houses to make profit and buy smaller and more affordable inner central places and invest house abroad. Since, these buys and sells kinda offset each other, in terms of demand and supply; they are negligible, if not increase the supply.

There is an emotional factor during selling and buying house prices. Many house owners feel that their house still worth as much as peak of 2007. For example, one of the house owners got very emotional once I did not offer as much as she thought so. After my discussion and email about my reasoning why it does not worth, she sliced her prices £15K (5% lesser), but it was still so high for that house. Statistically, in UK, average asking price of house is £236,597, but selling price is £168,205 according to land registry. That means almost 30% higher asking price and a very good negotiation (middle-eastern or Asian way) has to be done, as house owner expect very high.

Let’s have a look at also housing supply in London. In last 3-4 years, in average, around 25000 new house is delivered. A total of 40,870 affordable homes were delivered in the three years 2008/09 to 2010/11. As future, there are 172,000 homes in London with outstanding planning permission, of which just over a third are under construction. Most of these houses (55000) will be built in Greenwich, Tower Hamlets and Newham. Camden and Richmond and Kingston Upon Thames will have smallest new home, 550 and 1,200 home respectively . It is estimated that London population have grown by 71,600 between 2009 and 2010. That growth is attributed to mostly natural changes (birth, dead) . Therefore, delivered new houses (25000) are a bit short of demand due to population growth (71000) in long term, but for short term demand-supply is equal, if supply is not less.

As I mentioned above, most of the new house (almost third) will be built in Greenwich, Tower Hamlets and Newham area of London. These are the most deprived places of London. With Olympics 2012 and Royal Docks Enterprise Zone, Cross Rail and several private initiatives(for example Westfield shopping centre in Stratford), east London is getting attention. But Londoners emotional attitude towards these areas are very distant. West and North areas are still very popular. In that sense, to get people attention to the east, house prices must be lower; this is already lower compare to North and West London, in fact.

Although house prices in London have increased 1%, when you consider inflation, house prices are actually is going down. Royal Institution of Chartered Surveyors reckons that house prices will fail %3 in 2012. Given that house prices is still 40% higher than 2006, it will go further, I think. But the question is how much and longer? I don’t think, house prices would go so much down, as that would be disaster for so many people for whom bought house in peak times. According to Council of Mortgage Lenders, around 13% mortgages approved after 2005 is in negative equity, while house prices are down around 10% meanwhile. Therefore, government will take some action to make sure, house prices to stabilise otherwise they will have an issue in election.

So, government has a dilemma. House prices cannot go much down, in that case it will scare foreign investor and also many public will be in negative equity. Therefore, I reckon, government will use inflation to wipe out its debt and public dept. With higher inflation, house prices will decrease in real terms, like in 2011. Government will also control supply chain to make sure there is not much new houses.

Overall, I think, house prices will decrease in 2012, especially when you consider inflation. The factors increase house prices are: Low level of new houses compare to population growth, high renting prices and basic needs (people need a shelter). On other hand, I can count many factors which will decrease house prices: Already overprices, low level of mortgage approval for first time buyer, reluctant foreign investor, poor housing and habitation. But, I think, house prices won’t crash due to historical low level interest rate. Once interest rate starts soaring and people could not pay their mortgage, and repossessions start, supply will be higher and then we may see property crash. But policy makers won’t sit idle, they will come up with something, as public are very emotional with their house and many people see them also as a retirement investment.


Monday, December 26, 2011

Types of Market Orders

Type of order is very important during execution of a trade, as it would lead to unsolicited higher trading cost, if wrong type of order is used. In this text, I will shortly talk about types of market orders.

But foremost, let's shed some light on what is order and it is common properties. An order is an instruction to the market how to execute an order with some criteria. All orders have instrument id, size and if it is a sell or buy. In addition to that, orders have some attached conditions such as limit price, expiry date, market type, market trend and quantity. Lets dive into these types now.

1- Price Related Orders

These orders are executed once some price related criteria are met in order book.

1.1 Market Orders
This is most straightforward order type in which order is executed with available market bid and ask prices. An buy order will be matched with lowest ask price and a sell order will be matched with a highest bid price.

Inpatient traders may give this order to fill their order immediately with some degree of price uncertainty. While it is guaranteed to be executed, the price may move. This order type would have market impact, if big quantity is asked/offered which leads to huge unwanted losses. That order type provides demand and supply liquity to market. Because of spread between bid and ask prices, sequential sell/buy trades cause an immediate loss (ask-bid price). In market orders, sometimes, bid/ask price is improved, if market maker wants to have a priority in order book. Market maker has to improve the ask/bid price if a retail customer is in the order book due to exchange rules.

1.2 Limit Orders
If a trader wants to limit the risk of market move, limit orders can be used. Trader specify a maximum/minimum prices to buy or sell, respectively.In other words, it is the highest buy price and lowest sell price for the trader would be happy to accept. Limit prices are used to minimize cost of market move by making sure execution price is always higher (sell) or lower (buy) than a predefined price.

A limit order can be very aggressive or passive. If limit order is around the market price (marketable limit order), it will be filled quicker than those far away from the bid/ask prices (behind the market). If a limit order is equal the bid/ask prices, then it will be at the market limit order.

Standing limit orders provides option for other traders to trade for free. For example, a sell limit order is an call option for another trader. Similarly, a buy limit order is a put option for other trader. Deepness and structure (symmetric or aysmetric) of a order book in terms of limit orders can be used in trading strategies.

There are two risk in limit order: Execution uncertainty and ex-post regret. If limit order is placed far away from a bid/ask prices, it may not be executed at all. Second risk is that a limit order may be triggered and filled in a short period of time and then move in the same direction. In other words, trader estimates wrong exit point and regret to exit trade earlier.

1.3 Stop Orders
Let say, you want an order get executed once the bid/ask price reach to a a pre-defined price. This order mostly used to stop losses when price moves against their position. For example, a trader may give a sell stop order if the price of a stock drops to a pre-defined lower price (60p) than current price (65p).

Although, stop and limit orders looks like similar, there are different. In limit order, execution price wont be be lower(sell) or higher (buy) than predefined limit. In stop order, once order is triggered, the order is filled with market price. It can be higher or lower than pre-defined stop price.

Stop order is generally used with other order types. For example, with a market order, a trader may put a stop order above of a resistance to buy a stock or below of support point to short a stock.

Stops orders may accelerate price changes. Especially in illiquid markets, if there are lots of asymmetric stop orders, market makers or well informed traders would take advantage of it to trigger sudden huge sell or buy movement to obtain much unavailable stocks (by lowering price) or dumping stocks (by increasing prices). That practice is called tree shaking, and it is applied by market makers time to time to purchase unavailable stocks from public by reducing price of a stock suddenly and triggering stop orders (shake the tree) and then picking up the stocks (apples).

1.4 Limit Stop Orders
If a stop order is used with a limit order, it is called stop-limit order. Trader has to provide two prices in order: Stop price and limit price. Stop price is used for activation and limit price is used to limit cost of market limit. For example, a trader predicts a bull market once a resistance point is passed. But he also wants to limit his exposure to market move. For this one, he would place a stop limit order in which resistance point (maybe slightly higher) as stop price and limit price as 5% of stop price.

1.5 Market-If-Touched Orders
This order type is similar to limit orders with one difference. Once the specified price is touched, it is executed with market price. In that sense, it is similar to stop order but, it is in the opposite direction of current market price. That type of orders are not common, traders mostly use limit orders.

2- Trend Related orders
This type of orders utilise the trend or movement of price. For example, tick sensitive orders uses previous tick price. a previous price is higher than current price then it is a downtick or vice versa uptick. If price do not change between previous and last trade, it is a zerotick. So, an order can be structured in a way that, it is executed only in uptick(or downtick).

Trend related orders are generally used to neutralize market impact. For example, a sell uptick order makes sure, for each sell, another price moves the price up. It wont execute order if price goes up.

Actually, trend related orders are dynamic limit orders. For example, in buy downtick orders, limit price is dynamically adjusted to below difference last price. That type of orders are more effective when the tick size is large. That order type lost its popularity after decimalization of the US stock market in 2000. Tick size was one-sixteenth (6.25cents) before 2000, but it is decreased to 1 cents.

3- Expiry Related Orders
In addition to price related constraints, orders generally includes expiry related conditions, especially for limit and stop orders. These trades generally waits in order book to be matched.

Day orders are valid for the trading day and it is the most common expiry condition. When market is closed, the order expires.

Good-til-cancel (GTC) orders are stays in order book until it is cancelled manually by trader.

Good-until-orders stays in order book until a predefined date or period. Most common periods are week (Good-this-week) and month (Good-this-month).

Fill-or-kill orders are valid only when they are presented to the market. Unfilled part of the order is cancelled immediately.

Good-after-order get activated after a specified time in the order book.

Market-on-open orders are filled only in the opening session of a market which uses market open price.

Market-on-close orders are filled only in the closing session of a market which takes market closing price as basis.

4- Others

4.1 Market-not-Held Orders
Sometimes, a trader would leave trading strategy to the broker or floor specialist, as the broker or specialist is more experienced than trader. Broker comes up with execution plan to minimise trading cost for its client. In this order type, broker does not have legal responsibility to fill order in best prices, unlike market order.

4.2 All-or-None
In this type order, either whole size of requested trade is executed at once or not. A very large trade with a all-or-none order is generally negotiated between traders. As alternative to that, an order can specify minimum accepted quantity.

4.3 Spread Orders
When two different but correlated instrument is traded (sell and buy), a spread order can be used. Since, these instruments are related, there is a spread between them. An order can be constructed buy and sell instruments while the spread between these two instrument do not exceeds (or exceeds) a pre-defined limit.

4.4 Iceberg Orders
A very large order would have very adverse affect on market price if its size displayed on the market. Other traders would take advantage of this condition. To prevent it, display quantity of a large order can be limited to a smaller size, until it is fully filled.

Summary

- Standard order types are used for trading. For example, in FIX protocol, tag 40 (OrdType), following values are expected for an order.
1= Market
2=Limit
3=Stop
4=Stop limit
J=Market If Touched (MIT)
- Market orders executed immediately with current market prices and would cause market impact in large order.
- Limit orders supply liquidity by providing free trading option to other traders.
- While limit and market orders do not stabilize prices, stop orders can be used to destabilize the price.
- Various electronic automated trading strategies (for example iceberg, closing, open, spread) utilise these order types to minimize trading costs.

Below table summarizes types of market order.

Order Type

Usage

Effect on Liquidity

Price Contingencies

Advantages

Disadvantages

Market

Common

Demand immediate liquidity

None

Immediate execution

Uncertain price impact

Standing Limit

Common

Supply liquidity

Hard limit on price, a better or on limit price execution.

Limited price with on market impact

Uncertain execution

Marketable Limit

Common

Supply immediate liquidity

Hard limit on price, a better or on limit price execution.

Limited price impact

Limited price impact

Trend Related

Occasional

Supply liquidity

Must sell on uptrend or buy on downtrend

No price impact, dynamic with market

Uncertain price impact

Stop Market

Occasional

Demands liquidity when it is least available

Triggered when price touches or moves through the stop price;

Used to stop losses

Huge price impact

Stop limit

Rare

Triggered when liquidity is least available; offers liquidity on the side not needed.

Triggered when price touches or moves through the stop price and trade must be at or better than limit price.

Limit price impact upon a trigger

Uncertain execution

Market-if-Touched

Very Rare

Demand immediate liquidity and supplies resiliency

Triggered when price touches or moves through the touch price

Fast execution, upon a trigger

Uncertain price impact