Deniz's Notes: 2015

Saturday, November 14, 2015

Monty Hall problem simulation within Kdb+ q

Monty Hall problem is one of the paradox it would easily make you puzzled. If you are one of these puzzled one and still wondering how you would have 0.66 chance when you change your mind after a door is opened, following simulation in KDB+ q may help to persuade you:

mh:{[N]
        d:{.[000b;(1?3);:;1b]} each  til N; // create simulation with randomly setting car positions
        s:N?3; // choice of player
        i:(til N;s);  // index
        changed: sum not d @' s; // let see how many wins if player changed his mind
        :changed%N   // ratio  
  
 }

q)mh[100000]
0.66424

Tuesday, June 09, 2015

Mash-up technologies for a simple market data consumer

In this text i use following open source java libraries to simulate a simple market data consumer application:

Spring : Dependency injection
JeroMQ(ZeroMQ) : Pub/Sub between Provider and Aggregator
Chronicle Queue: Messaging between Aggregator and Writer
Smile Jackson: Object Serialization
Apache Commons Pool2 : Object Pooling
Guava: AtomicDouble
SLF4J and logback : Logging framework
Java 7 and Maven

Market Data Consumer

This application simulates a market data consumer. The ecosystem consists of two providers of market data, one aggregator service that consumes data from the providers and one file writer that writes the results to a CSV file. The communication between the components should occur using a messaging framework(JeroMQ and Chronice Queue).

Provider

The two providers will publish data that conforms to the following interface:

interface IMarketValue

{

String getProviderId ();

String getName();

double getValue();

}

Lets assume that providers are named “Orion“ and “Polaris”. The lists of instrument names for Orion are:

AAA

BBB

CCC

DDD

EEE

FFF

The instrument names for Polaris are:

AAA

BBB

CCC

GGG

HHH

III

JJJ

Providers generate random numbers (from 5.5 to 10.5) for the values as quick as possible without no throttling.

Aggregator

Aggregator service listens to data from both providers. For instruments that are only available in one provider, the aggregator should pass the data down as is to the file writer. For instruments that are available from both providers, the aggregator should take the average of the last incoming data from both providers. E.g.

After receiving:

Orion – AAA – 6.3

Orion – FFF – 7.0

Polaris – AAA – 7.0

Polaris – BBB – 8.0

the aggregator should pass down:

AAA – 6.65

FFF – 7.0

BBB – 8.0

However, if Orion then publishes:

Orion – BBB – 7.0

the next aggregator output should contain:

BBB – 7.5

The aggregator contains a dependency while processing messages which introduces a 1 second delay to the processing loop which must be simulated. Newest available data should be always processed for a given Name in the aggregator at any time. I.e. if the aggregator is stuck in a processing loop and 5 updates come through for AAA, the next processing run should use the latest available data for AAA.

File Writer

The file writer outputs in the following format and the entries should be in alphabetical order. Each new set of values should create a new file and there should be only one entry for each instrument.

Instrument,Value

AAA,6.65

BBB,5.5

FFF,7.0

Every time the file writer receives a new set of values it should create a new version of the file. Should the file writer not be able to write to the output file, then an alternate filename should be used, but once the file is once more accessible it should return to the original filename. It should also not create many files and only a new file when a file is not accessible.

Source Code

Full implementation of this application is checked in github:

https://github.com/denizstij/provider_aggregator_writer

Sunday, May 03, 2015

Which Concurrent Hash Map?

There are several open source java collection libraries which provide concurrent hash map functionality. But which one is faster and suitable for a low latency application?

In this text, following concurrent hash map classes are investigated in terms of their performance:

ConcurrentHashMap by Oracle (JDK 8)
ConcurrentHashMap and ConcurrentHashMapUnsafe by Goldman Sachs (GS) Collection (6.1.0)
FastMap by Javolution (6.0.0)
ConcurrentHashMap by Guava (18.0)

Wherever it is required, these parameters are used:

Initial Capacity: 16
Concurrency level: 16
Load factor: 0.75

Setup

Two kind of tests are done to benchmark Map#put and Map#get() methods of these classes. Following setup is applied:

As key for maps random generated strings with length of 8 are used.
For warm up 100K operation is done before benchmarks are done for each tests.
Each main benchmark is done with various number of tasks (10,100,1K,10K,100K,1M,10M)
Each benchmark is repeated 5 times and its average is used as result.
10 producer (thread) is used to do put/get operations to simulate concurrency.
To reduce garbage collection (GC) activities, before each benchmark, GC is forced , System.gc() and a large heap size is set.
Oracle JDK 1.8.0.25 is used
Test are done on a Redhat , Intel G6 @ 2.93Ghz with 12 cores server.

Results

Following tables and graphs illustrates elapsed time (macroSeconds) of put/get benchmarks.

*Put : Classes/Time (macroSecond)*	10	100	1000	10000	100000	1000000	10000000
Javolution_FastMap	188	157	292	1684	16857	297025	4228273
Oracle_ConcurrentHashMap	91	119	170	970	12388	207857	2684529
GS_ConcurrentHashMapUnsafe	85	111	259	1017	13290	252151	3053912
GS_ConcurrentHashMap	78	112	205	1060	14051	258071	3104562
Gauva_ConcurrentHashMap	74	93	158	911	12485	205433	2574604

*Get:* *Classes/Time (macroSecond)*	10	100	1000	10000	100000	1000000	10000000
Javolution_FastMap	98	105	215	1630	15732	215260	2296100
Oracle_ConcurrentHashMap	78	78	115	619	7411	144583	1866310
GS_ConcurrentHashMapUnsafe	86	81	149	960	10342	172114	2204728
GS_ConcurrentHashMap	69	77	146	1119	10322	168585	2027240
Gauva_ConcurrentHashMap	70	71	114	621	7861	129094	1581683

Conclusion

As tables and figures shows that ConcurrentHashMap implementation of Guava and Oracle performs best for put and get operations. This analysis also exposes myths about performance of FastMap by Javolution.

For source code of benchmark, please click here.

Saturday, March 21, 2015

Multiple or Single Exit Point in a Java Methods

Recently, during a code review, a question popped up: Whether a method should have a single or multiple exit point once some conditions are satisfied. As discussed in here, there is not a clear, one cut answer for this question.

In terms of readability and exposing responsibility of method immediately, my personal preference is to return immediately from a method once task is done. Based on my experience with dealing complex, long methods with limited documentation, multiple returns in a method helped me to understand code easier in past.

But readability is not always the only factor when I write code in my current job. Performance (low latency) beats readability. So, I decided to benchmark these two methods:

Multiple return points

public Number multipleReturnPoints(Number num) {
	if (num instanceof Integer) {
		return Integer.valueOf(num.intValue());
	}

	if (num instanceof Float) {
		return Float.valueOf(num.floatValue());
	}

	if (num instanceof Double) {
		return Double.valueOf(num.doubleValue());
	}
	return null;
}

Single returns in the end

public Number singleReturnPoint(Number num) {
	Number val = null;
	if (num instanceof Integer) {
		val = Integer.valueOf(num.intValue());
	}

	if (num instanceof Float) {
		val = Float.valueOf(num.floatValue());
	}

	if (num instanceof Double) {
		val = Double.valueOf(num.doubleValue());
	}
	return val;
}

I used following setup for testing:

Both method is called 5 million times, after 500K warm up
In order to asses JIT compile's affect, two different schemes is used for methods parameter selection:

Clustered (Batch): Same type parameter called in batch sequentially. In other words first Double, then Integer and finally Float type is chosen.
Random : Type parameter is chosen from a uniform distribution. This is more realistic case would happen in a system.

Following JRE are tested:

Oracle JRE 1.6.0_20
Oracle JRE 1.7.0_15
Oracle JRE 1.8.0_11
Azur Zing 1.7.0-5.9.5.0

Garbage collection is prevented by allocating large amount of heap size
Tests are run on Redhat Linux (64 bit,2.93GHz).

Results

Above benchmark test is done 7 times and in below charts, average latency (mSec) is illustrated .

Oracle JDK, although single returns perform marginally better , there is not much difference between multiple or single returns for both clustered random parameter selection. Whereas for Zing JVM, there is not a definitive conclusion. But given that random parameter selection is more realistic, single return is the winner, I think.

In conclusion, I could not find a definitive answer which method exit case performs better in terms of low latency.

Source Code is as below:

package com.denizstij;

public interface Simulation {
	 Number run(Number num);
	 String getName();
}

////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
package com.denizstij;

import java.util.Arrays;
import java.util.Random;

public class SingleOrMultipleExitPoints {

	public static class MultipleReturns implements Simulation {


		public Number run(Number num) {
			if (num instanceof Integer) {
				return Integer.valueOf(num.intValue());
			}

			if (num instanceof Float) {
				return Float.valueOf(num.floatValue());
			}

			if (num instanceof Double) {
				return Double.valueOf(num.doubleValue());
			}
			return null;
		}
		@Override
		public String getName() {
			return "Multiple Returns";
		}

	}

	public static class SingleReturn implements Simulation {
		public Number run(Number num) {
			Number val = null;
			if (num instanceof Integer) {
				val = Integer.valueOf(num.intValue());
			}

			if (num instanceof Float) {
				val = Float.valueOf(num.floatValue());
			}

			if (num instanceof Double) {
				val = Double.valueOf(num.doubleValue());
			}

			return val;
		}
		@Override
		public String getName() {
			return "Single Return";
		}
	}
	

	public static void main(String[] args) {
		
		Random random = new Random(0);
		int warmUpLoopSize = 50000;
		int loopSize = warmUpLoopSize * 10;
		if (args.length==2){
			warmUpLoopSize=Integer.parseInt(args[0]);
			loopSize=Integer.parseInt(args[1]);
			System.out.println("Number of warmUpLoopSize:"+warmUpLoopSize);
			System.out.println("Number of loopSize  :"+loopSize);
		}		
		// Arrays for warming up JVM 
		Integer warmUpIntArr[] = new Integer[warmUpLoopSize];
		Float warmUpFloatArr[] = new Float[warmUpLoopSize];
		Double warmUpDoubleArr[] = new Double[warmUpLoopSize];
		// Simulation arrays
		Integer intArr[] = new Integer[loopSize];
		Float floatArr[] = new Float[loopSize];
		Double doubleArr[] = new Double[loopSize];
		// Creatring random numbers 
		createRandomNumbers(random, warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,warmUpLoopSize);
		createRandomNumbers(random, intArr, floatArr, doubleArr, loopSize);

		// Creating test cases 
		MultipleReturns multipleReturns = new MultipleReturns();
		SingleReturn singleReturn= new SingleReturn();
		
		int[] warmUpRandomCallSeq = generateRandomCallSequence(warmUpLoopSize,3,random);
		int[] warmUpClusteredCallSeq = generateClusteredCallSequence(warmUpLoopSize,3,random);
		int[] randomCallSeq = generateRandomCallSequence(loopSize,3,random);
		int[] clusteredCallSeq = generateClusteredCallSequence(loopSize,3,random);
		
		//System.out.println(Arrays.toString(warmUpRandomCallSeq));
		//System.out.println(Arrays.toString(warmUpClusteredCallSeq));
		
		String simName="Clustered Warmup";
		callSimulation(warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,simName, multipleReturns,warmUpClusteredCallSeq);
		callSimulation(warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,simName, singleReturn,warmUpClusteredCallSeq);
		
		simName="Clustered Calls";
		callSimulation(intArr, floatArr, doubleArr,simName, multipleReturns,clusteredCallSeq);
		callSimulation(intArr, floatArr, doubleArr,simName, singleReturn,clusteredCallSeq);
		System.out.println("==============================");
		
		simName="Random Warmup";
		callSimulation(warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,simName, multipleReturns,warmUpRandomCallSeq);
		callSimulation(warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,simName, singleReturn,warmUpRandomCallSeq);
		
		simName="Random Calls";
		callSimulation(intArr, floatArr, doubleArr,simName, multipleReturns,randomCallSeq);
		callSimulation(intArr, floatArr, doubleArr,simName, singleReturn,randomCallSeq);
		System.out.println("==============================");
		

	}

	private static void callSimulation(Integer[] intArr, Float[] floatArr, Double[] doubleArr, String msg, Simulation sim, int[] callSeq) {
		long now = System.currentTimeMillis();
		int clusterIndexes[]= new int[3];
		int index=0;
		for (int i = 0; i < callSeq.length; i++) {
			int callerIndex = callSeq[i];
			index=clusterIndexes[callerIndex]++;
			if (index>=intArr.length){
				continue;
			}		
			switch (callerIndex){
				case 0:
					doubleArr[index] = (Double) sim.run(doubleArr[index]);
					break;
				case 1:
					intArr[index] = (Integer) sim.run(intArr[index]);
					break;
				case 2:
					floatArr[index] = (Float) sim.run(floatArr[index]);
					break;
				default:
					System.out.println("Unknown cluster :("+callerIndex);
			}
		}

		long finish = System.currentTimeMillis();
		long elapsedTime = finish - now;
		System.out.println(elapsedTime + ": " +msg+" :"+ sim.getName());
		System.out.println(Arrays.toString(clusterIndexes));
		System.out.flush();
		
	}	
	
	static void createRandomNumbers(Random random, Integer intArr[],
			Float floatArr[], Double doubleArr[], int size) {
		for (int i = 0; i < size; i++) {
			intArr[i] = Integer.valueOf(random.nextInt(size));
			floatArr[i] = Float.valueOf(random.nextFloat() * size);
			doubleArr[i] = Double.valueOf(random.nextDouble() * size);
		}
	}
	
	static int[] generateClusteredCallSequence(int simSize,int numOfCluster,Random random){
		int totalSize=numOfCluster*simSize;
		int callSeq[]= new int[totalSize];
		int index=0;
		for (int j=0;j < numOfCluster;j++){
			for (int i=0;i < simSize;i++){
				callSeq[index++]=j;
			}
		}
		return callSeq;
	}
	
	static int[] generateRandomCallSequence(int simSize,int numOfCluster,Random random){
		int totalSize=numOfCluster*simSize;
		int callSeq[]= new int[totalSize];
		for (int i=0;i < totalSize;i++){
			callSeq[i]=random.nextInt(numOfCluster);
		}
		return callSeq;
	}
}

Wednesday, February 11, 2015

Knight Moves Problem in Java

In this text, i represent a solution for Knight Moves Problem in java. The problem is defined as follows:

Pictured here is a keypad:

We want to find all sequences of length n that can be keyed into the keypad in the following manner:

The initial keypress can be any of the keys.
Each subsequent keypress must be a knight move from the previous keypress.
There can be at most 2 vowels in the sequence.
Run your solution for n = 1 to 32

A knight move is made in one of the following ways:

Move two steps horizontally and one step vertically.
Move two steps vertically and one step horizontally.
There is no wrapping allowed on a knight move.

Here are some examples of knight moves:

Given a value for n, the program should write the number of valid n-key sequences on a single line to standard out.

Knight Moves Solution


package com.denizstij.km;

import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;

import com.denizstij.km.Keyboard.Key;
/**
 * 
 * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij AT gmail.com
 * 
 */
public class KnightMoveMain {
 private final int maxVowelCount;
 
 public KnightMoveMain(int maxVowelCount){
  this.maxVowelCount=maxVowelCount;
 }
 
 public long findNumberOfSequence(int lenOfSequence) {
  if (lenOfSequence<=0){   
   return 1;// number of empty set is 1
  }
  Map <PathInfoBean, Long> pathInfoMap= new LinkedHashMap<PathInfoBean, Long>();
  for (Key key : Keyboard.INSTANCE.getKeySet()) {   
   PathInfoBean pathinfo= new PathInfoBean(key, key.isVowel()?1:0);
   pathInfoMap.put(pathinfo, 1L);
  }
  
  long numberOfPaths = findNumberOfSequence1(pathInfoMap,1,lenOfSequence);
  return numberOfPaths;
 }
 
 private long findNumberOfSequence1(Map <PathInfoBean, Long> pathInfoMap ,int sequenceLen, int lenOfSequence) {

  if (sequenceLen>=lenOfSequence){
   long numberOfPaths=0;
   for (Long pathCount : pathInfoMap.values()) {   
    numberOfPaths+=pathCount;
   }      
   return numberOfPaths;
  }   
   
  Map <PathInfoBean, Long> pathInfoMapNew= new LinkedHashMap<PathInfoBean, Long>();
  for (Entry<PathInfoBean, Long> pathInfoBeanEntry : pathInfoMap.entrySet()) {
   PathInfoBean pathInfoBean = pathInfoBeanEntry.getKey();

   Long pathCount = pathInfoBeanEntry.getValue();
   Set<key> possibleMoveSet =pathInfoBeanEntry.getKey().getKey().getKnightMoves();
     
   for (Key key : possibleMoveSet) {
    int vowelCount=pathInfoBean.getVowelCount();
    if (key.isVowel()){
     if (vowelCount>=maxVowelCount){
      continue;
     } 
     vowelCount++;
    }
    PathInfoBean newPathInfoBean = new PathInfoBean(key,vowelCount);  
    Long currentPath = pathInfoMapNew.get(newPathInfoBean);
    if (currentPath==null)  // first time 
     pathInfoMapNew.put(newPathInfoBean,pathCount);
    else {
     // we have already, so just lets aggregate both branch
     pathInfoMapNew.put(newPathInfoBean,currentPath+pathCount);
    }
   }   
  }
  return findNumberOfSequence1(pathInfoMapNew,sequenceLen+1,lenOfSequence);
 }  
 
 public static void main(String[] args) {  
  KnightMoveMain km= new KnightMoveMain(2);
  long now = System.currentTimeMillis();
  for (int n=0;n<=32;n++) {
   long numberOfPaths = km.findNumberOfSequence(n);
   System.out.println(n+"==>"+numberOfPaths);   
  }
  long elapsedTime = System.currentTimeMillis()-now;
  System.out.println("Total Elapsed Time (msec):"+elapsedTime);
 }
}

//###############################################################
//###############################################################
//###############################################################

package com.denizstij.km;

import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.Set;
/**
 * 
 * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij AT gmail.com
 * 
 */
public class Keyboard{

 final private  Set <key> KEYSET= new LinkedHashSet<key>();    
 final public static Key A= new Key('A', true);
 final public static Key B= new Key('B', false);
 final public static Key C= new Key('C', false);
 final public static Key D= new Key('D', false);
 final public static Key E= new Key('E', true);
 final public static Key F= new Key('F', false);
 final public static Key G= new Key('G', false);
 final public static Key H= new Key('H', false);
 final public static Key I= new Key('I', true);
 final public static Key J= new Key('J', false);
 final public static Key K= new Key('K', false);
 final public static Key L= new Key('L', false);
 final public static Key M= new Key('M', false);
 final public static Key N= new Key('N', false);
 final public static Key O= new Key('O', true);
 final public static Key _1= new Key('1', false);
 final public static Key _2= new Key('2', false);
 final public static Key _3= new Key('3', false);

 final public static Keyboard INSTANCE= new Keyboard();

  private void init(){  
  KEYSET.add(A.setKnightMoves(H,L));
  KEYSET.add(B.setKnightMoves(I,K,M));
  KEYSET.add(C.setKnightMoves(F,J,L,N));
  KEYSET.add(D.setKnightMoves(G,M,O));
  KEYSET.add(E.setKnightMoves(H,N));
  KEYSET.add(F.setKnightMoves(C,M,_1));
  KEYSET.add(G.setKnightMoves(D,N,_2));
  KEYSET.add(H.setKnightMoves(A,E,K,O,_1,_3));
  KEYSET.add(I.setKnightMoves(B,L,_2));
  KEYSET.add(J.setKnightMoves(C,M,_3));
  KEYSET.add(K.setKnightMoves(B,H,_2));
  KEYSET.add(L.setKnightMoves(A,C,I,_3));
  KEYSET.add(M.setKnightMoves(B,D,F,J));
  KEYSET.add(N.setKnightMoves(C,E,G,_1));
  KEYSET.add(O.setKnightMoves(D,H,_2));
  KEYSET.add(_1.setKnightMoves(F,H,N));
  KEYSET.add(_2.setKnightMoves(G,I,K,O));
  KEYSET.add(_3.setKnightMoves(H,J,L));   
 }
 
 // Singleton
 private Keyboard(){
  init();
 }

public Set <key> getKeySet() {
  return KEYSET;
 }

public static class Key{
 final private char key;
 final private boolean isVowel;
 private Set<key> knightMoves;
 
 Key(char key, boolean isVowel){
  this.key=key;
  this.isVowel=isVowel;
 }

 public Key setKnightMoves(Key ...knightMoves) {
  this.knightMoves=Collections.unmodifiableSet(new HashSet<key>(Arrays.asList(knightMoves)));
  return this;
 }

 public Set<key> getKnightMoves() {
  return knightMoves;
 }

 @Override
 public int hashCode() {
  final int prime = 31;
  int result = 1;
  result = prime * result + key;
  return result;
 }

 @Override
 public boolean equals(Object obj) {
  if (this == obj)
   return true;
  if (obj == null)
   return false;
  if (getClass() != obj.getClass())
   return false;
  Key other = (Key) obj;
  if (key != other.key)
   return false;
  return true;
 }
 public char getKey() {
  return key;
 }
 
 public boolean isVowel() {
  return isVowel;
 }
 @Override
 public String toString() {
  return "Key [key=" + key + ", isVowel=" + isVowel +"]";
 }
}
}

//###############################################################
//###############################################################
//###############################################################

package com.denizstij.km;

import com.denizstij.km.Keyboard.Key;

/**
 * 
 * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij AT gmail.com
 * 
 */
public class PathInfoBean {
 final private Key key; 
 final private int vowelCount;
 
 public PathInfoBean(Key key, int vowelCount) {   
  this.key = key;
  this.vowelCount = vowelCount;
 }
 
 public Key getKey() {
  return key;
 }
 
 public int getVowelCount() {
  return vowelCount;
 }


 @Override
 public int hashCode() {
  final int prime = 31;
  int result = 1;
  result = prime * result + ((key == null) ? 0 : key.hashCode());
  result = prime * result + vowelCount;
  return result;
 }

 @Override
 public boolean equals(Object obj) {
  if (this == obj)
   return true;
  if (obj == null)
   return false;
  if (getClass() != obj.getClass())
   return false;
  PathInfoBean other = (PathInfoBean) obj;
  if (key == null) {
   if (other.key != null)
    return false;
  } else if (!key.equals(other.key))
   return false;
  if (vowelCount != other.vowelCount)
   return false;
  return true;
 }
 
 @Override
 public String toString() {
  return "PathInfoBean [key=" + key + ", vowelCount=" + vowelCount + "]\n";
 }
}

Output

1==>18
2==>60
3==>214
4==>732
5==>2486
6==>8392
7==>27912
8==>93204
9==>306288
10==>1013398
11==>3302640
12==>10842528
13==>35108325
14==>114492546
15==>368772182
16==>1195650888
17==>3833937659
18==>12367469662
19==>39505386940
20==>126863960276
21==>403894417802
22==>1291845025388
23==>4100838894764
24==>13069438426724
25==>41380854190503
26==>131455706155352
27==>415265140628246
28==>1315326269976050
29==>4146551571701468
30==>13098978945964762
31==>41218021719932582
32==>129891093550589788

Total Elapsed Time (msec):450

Tuesday, February 10, 2015

Estimating Pi number with Monte Carlo Simulation in Java and Kdb+ q

package com.denizstij.montecarlo.sim;
 
import java.util.Random;
 
/**
*  This class estimates Pi with monte carlo simulation.
* 
 *  We assume we have a square with dimension 1x1 which contains a disk in middle with radious of 0.5
*  where it touches the square on 4 edges. We generate N number of pairs (x,y) between 0 and 1.
 *  Then we check if each pair is in the circle or not. 4 times ratio of total pairs in circle
*  to total simulation is an estimation of Pi. 
 * 
 *  pi=4*(total pairs in circle)/total pairs
* 
 * @author Deniz Turan, http://denizstij.blogspot.co.uk, denizstij AT gmail.com
*
 */
public class PIEstimationWithMonteCarloSimulation {
       // number of total simulation
       private static final long TOTAL_SIMULATION=10000000L;
      
       private double estimatePI(){            
              double pi=0;
              Random randomGen= new Random();
              long inCircleCounter=0;
              for (int i=0;i<TOTAL_SIMULATION;i++){
                     // generate pairs
                     double x = randomGen.nextDouble();
                     double y = randomGen.nextDouble();
                     // check if the pair  in the circle
                     inCircleCounter+=isInCircle(x,y);
              }
              double ratio=(double)inCircleCounter/TOTAL_SIMULATION;
              pi=ratio*4;
              return pi;
       }
      
       /*
       *  Check given two points in a circle: x^2+y^2<=1
       */
       private int isInCircle(double x, double y) {
              double z=Math.sqrt(x*x+y*y);            
              return z<=1?1:0;
       }
 
       public static void main(String[] args) {
              PIEstimationWithMonteCarloSimulation est= new PIEstimationWithMonteCarloSimulation();
              double estimatedPI = est.estimatePI();
              System.out.println("Estimated PI:"+estimatedPI);
       }
}

And now in Kdb+ q :

piEst:{[N]
        sqrArea:4;                                  / area of square with length 2 and containing a circle with radius 1
        xy:flip {2?1.0} each til N;                 / xy points
        p:sqrt sum xy*xy;                           / sqrt(x^2+y^2) <=1 to be in a circle
        totalIncircle: sum p<=1;
        piEst:sqrArea*totalIncircle%N;                    / that’s the estimation of PI
        :piEst
}
q)piEst[1000000]
3.141476

Above code can be reduced to one line such as:

q)N:10000000
q)(1%N)*4*sum 1>=sqrt sum ({x*x}') each flip {2?1.0} each til N
3.141842

Monday, February 09, 2015

Generating random numbers from a given set of numbers according to a distribution in java

Recently i am asked to implement following in Java based on the below interface:

Implement the method nextNum() and a minimal but effective set of unit tests. As a quick check, given Random Numbers are [-1, 0, 1, 2, 3] and Probabilities are [0.01, 0.3, 0.58, 0.1, 0.01] if we call nextNum() 100 times we may get the following results. As the results are random, these particular results are unlikely.

-1: 1 times

0: 22 times

1: 57 times

2: 20 times

3: 0 times

public class RandomGen { // Values that may be returned by nextNum() private int[] randomNums; // Probability of the occurence of randomNums private float[] probabilities; /** Returns one of the randomNums. When this method is called multiple times over a long period, it should return the numbers roughly with the initialized probabilities. */ public int nextNum() { } }
Solution:

package com.denizstij.rand; import java.util.Arrays; import java.util.Random; /** * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij AT gmail.com */ public class RandomGen { // Values that may be returned by nextNum() private int[] randomNums; // Probability of the occurence of randomNums private float[] probabilities; private float[] cumProbabilities; private int[] validRandomNums; private int totalNonZeroProbElement=0; private Random random; public RandomGen(int[] randomNums,float[] probabilities){ validateProcessInputs(randomNums,probabilities); random= new Random(); } public RandomGen(int[] randomNums,float[] probabilities, long seed){ validateProcessInputs(randomNums,probabilities); random= new Random(seed); } private void validateProcessInputs(int[] randomNums,float[] probabilities){ if (randomNums==null || probabilities==null || randomNums.length!= probabilities.length || probabilities.length==0){ throw new IllegalArgumentException("RandomNums and probabilities must be non empty and same size"); } int len=probabilities.length; cumProbabilities= new float[len]; validRandomNums= new int[len]; for (int i=0;i < len;i++) { float prob = probabilities[i]; int randomNum = randomNums[i]; if (MathUtil.isLess(prob,0.0f)){ throw new IllegalArgumentException("Probability can not be negative"); } if (MathUtil.isGreater(prob,1.0f)){ throw new IllegalArgumentException("Probability can not be greater than 1"); } if (MathUtil.checkAnyIsNanOrInfinite(prob)){ throw new IllegalArgumentException("Nan or Infite prob is not accepted"); } // not processing elements with zero probabilities as they can not occur if (MathUtil.isEqual(prob, 0.0f)){ continue; } // store valid random number validRandomNums[totalNonZeroProbElement]=randomNum; if (totalNonZeroProbElement==0){ cumProbabilities[totalNonZeroProbElement]=prob; } else { cumProbabilities[totalNonZeroProbElement]=prob+cumProbabilities[totalNonZeroProbElement-1]; } if (MathUtil.isGreater(cumProbabilities[totalNonZeroProbElement],1.0f)){ throw new IllegalArgumentException("Cumilative total probability can not be greater than 1"); } totalNonZeroProbElement++; } if (totalNonZeroProbElement==0){ throw new IllegalArgumentException("All probabilities are zero :("); } if (!MathUtil.isEqual(cumProbabilities[totalNonZeroProbElement-1],1.0f)){ throw new IllegalArgumentException("Total probability must be 1"); } // let's shrink arrays to save memory this.validRandomNums=Arrays.copyOf(validRandomNums, totalNonZeroProbElement); this.cumProbabilities=Arrays.copyOf(cumProbabilities, totalNonZeroProbElement); this.randomNums=randomNums; this.probabilities=probabilities; // some logging System.out.println("RandomNums :"+Arrays.toString(this.randomNums)); System.out.println("probabilities :"+Arrays.toString(this.probabilities)); System.out.println("totalNonZeroProbElement:"+totalNonZeroProbElement); System.out.println("validRandomNums :"+Arrays.toString(validRandomNums)); System.out.println("cumProbabilities :"+Arrays.toString(cumProbabilities)); } /** Returns one of the randomNums. When this method is called multiple times over a long period, it should return the numbers roughly with the initialized probabilities. */ public int nextNum() { float nextFloat = random.nextFloat(); return nextNum(nextFloat); } /** Returns one of the randomNums. When this method is called multiple times over a long period, it should return the numbers roughly with the initialized probabilities. */ // For testing purposes protected int nextNum(float nextFloat) { int index = Arrays.binarySearch(cumProbabilities,nextFloat); // if not found in the cum probability array, we need to infer index from insertation index if (index<0 code="" index="-1*index-1;" int="" nextrandomval="" return="">

package com.denizstij.rand; /** * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij@gmail.com */ public class MathUtil { public static final float FLOAT_COMPARE_RESOLUTION=0.00001f; public static boolean isEqual(float val1, float val2) { if (Math.abs(val1-val2)<FLOAT_COMPARE_RESOLUTION){ return true; } return false; } public static boolean isLess(float val1, float val2) { if (isEqual(val1,val2)){ return false; } return val1<val2; } public static boolean isLessEqual(float val1, float val2) { if (isEqual(val1,val2)){ return true; } return val1<val2; } public static boolean isGreater(float val1, float val2) { if (isEqual(val1,val2)){ return false; } return val1>val2; } public static boolean isGreaterEqual(float val1, float val2) { if (isEqual(val1,val2)){ return true; } return val1>val2; } public static boolean checkAnyIsNanOrInfinite(float ...values){ for (float val : values) { if (Float.isNaN(val) || Float.isInfinite(val)){ return true; } } return false; } }
Junit Test:
package com.denizstij.rand.test; import java.util.Arrays; import java.util.LinkedHashMap; import java.util.Map; import java.util.Map.Entry; import org.junit.Assert; import org.junit.Test; import com.denizstij.rand.RandomGen; /** * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij@gmail.com */ public class RandomGenTest { private static final float SIM_PROB_THRESHOLD = 0.001f; @Test(expected=IllegalArgumentException.class) public void testNullInputs() { int[] randomNums=null; float[] probabilities=null; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testEmptyArray() { int[] randomNums= new int[0]; float[] probabilities=new float[0]; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testDifferentSizeNumsAndProbs() { int[] randomNums= {1}; float[] probabilities={0.3f,0.4f}; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testProbsAndNumberArrayDifferentSize() { int[] randomNums= {1}; float[] probabilities={0.3f,0.4f}; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testProbsIsNotNegative() { int[] randomNums= {1,2}; float[] probabilities={0.3f,-0.4f}; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testProbsAllZero() { int[] randomNums= {1,2}; float[] probabilities={0.0f,0.0f}; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testProbArrayHasNaN() { int[] randomNums= {1,2}; float[] probabilities={Float.NaN,0.4f}; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testProbsIsNotGreaterThanOne() { int[] randomNums= {1,2}; float[] probabilities={0.3f,1.4f}; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testProbsTotalIsNotOne() { int[] randomNums= {1,2}; float[] probabilities={0.3f,0.4f}; new RandomGen(randomNums, probabilities); } @Test public void testCumulativeProbLogic() { int[] randomNums= {-1,-2,-4,3410,893,9,343,10}; float[] probabilities={0.0f,0.0f,0.0f,0.3f,0.5f,0.0f,0.2f,0.0f}; //validRandomNums :[3410, 893, 343] //cumProbabilities :[0.3, 0.8, 1.0] RandomGenWrapper randomGen = new RandomGenWrapper(randomNums, probabilities); // checking random number:3410 which has cum prob of 0<= and <=0.3 int nextRandomNum = randomGen.nextNum(0.2f); Assert.assertTrue(nextRandomNum==3410); nextRandomNum = randomGen.nextNum(0.299f); Assert.assertTrue(nextRandomNum==3410); nextRandomNum = randomGen.nextNum(0.3f); Assert.assertTrue(nextRandomNum==3410); // checking random number:893 which has cum prob of 0.3< and <=0.8 nextRandomNum = randomGen.nextNum(0.30001f); Assert.assertTrue(nextRandomNum==893); nextRandomNum = randomGen.nextNum(0.51f); Assert.assertTrue(nextRandomNum==893); nextRandomNum = randomGen.nextNum(0.799999f); Assert.assertTrue(nextRandomNum==893); nextRandomNum = randomGen.nextNum(0.8f); Assert.assertTrue(nextRandomNum==893); // checking random number:343 which has cum prob of 0.8< and <=1 nextRandomNum = randomGen.nextNum(0.800001f); Assert.assertTrue(nextRandomNum==343); nextRandomNum = randomGen.nextNum(0.9f); Assert.assertTrue(nextRandomNum==343); nextRandomNum = randomGen.nextNum(0.9999999f); Assert.assertTrue(nextRandomNum==343); nextRandomNum = randomGen.nextNum(1); Assert.assertTrue(nextRandomNum==343); } @Test public void testNonOccuringEventsWithProbZero() { // -1 event should never happen here int[] randomNums= {5,-1,2,4}; float[] probabilities={0.1f,0.0f,0.6f,0.3f}; RandomGen randomGen = new RandomGen(randomNums, probabilities); //lets try several times if this true int maxTry=100000; for (int i=0;i<maxTry;i++){ int nextNum = randomGen.nextNum(); // make sure we never get -1 which has prob 0 Assert.assertNotEquals(nextNum, -1); } } @Test public void testAlwaysOccuringEventsWithProbOne() { // -1 event should always happen here int[] randomNums= {5,-1,2,4}; float[] probabilities={0.0f,1.0f,0.0f,0.0f}; RandomGen randomGen = new RandomGen(randomNums, probabilities); //lets try several times if this true int maxTry=100000; for (int i=0;i<maxTry;i++){ int nextNum = randomGen.nextNum(); // make sure we always get -1 which has prob 1 Assert.assertEquals(nextNum, -1); } } /* * Generate huge amount of random next number, given probability and random number arrays. * Then, frequency of each random number is estimated. * Finally it is asserted that frequency is close to the input probability within a threshold (0.001) */ @Test public void testProvidedQuickCheck() { int[] randomNums= {-1,0,1,2,3}; float[] probabilities={0.01f, 0.3f, 0.58f, 0.1f, 0.01f}; float[] simFrequency= new float[probabilities.length]; Map <Integer,Long> counterMap= new LinkedHashMap<Integer,Long>(); // reset counter map for (int randomNum : randomNums) { counterMap.put(randomNum, 0L); } RandomGen randomGen = new RandomGen(randomNums, probabilities,1234L); //lets generate many times random numbers int maxTry=10000000; for (int i=0;i<maxTry;i++){ int nextNum = randomGen.nextNum(); Long counter = counterMap.get(nextNum); counter++; counterMap.put(nextNum, counter); } // estimate frequency int index=0; for (Entry<Integer, Long> entry : counterMap.entrySet()) { Long counter = entry.getValue(); // simulation frequencies simFrequency[index]=counter/(float)maxTry; float simProbDifference = Math.abs(simFrequency[index]-probabilities[index]); // Assert.assertTrue(simProbDifference<SIM_PROB_THRESHOLD); index++; } System.out.println("Counter Map "+counterMap); System.out.println("Freq map:"+Arrays.toString(simFrequency)); } private class RandomGenWrapper extends RandomGen{ public RandomGenWrapper(int[] randomNums, float[] probabilities) { super(randomNums, probabilities); } public int nextNum(float nextFloat){ return super.nextNum(nextFloat); } } }

Posted by Deniz Turan, PhD at 11:20 am 1 comment:

Saturday, January 03, 2015

Stationarity Test with KPSS (Kwiatkowski–Phillips–Schmidt–Shin) in Python

In addition to augmented Dickey–Fuller test (ADF), KPSS (Kwiatkowski–Phillips–Schmidt–Shin) is widely used to verify stationarity of a signal. Python statsmodels contains ADF test, but I could not find any implementation of KPSS in python. Therefore, I converted R implementation of kpss.test in tseries library to python as follows (Click here to download the source code) :

""" Created on Sat Jan-03-2015 @author: Deniz Turan (http://denizstij.blogspot.co.uk/) """ import numpy as np import statsmodels.api as sm from scipy.interpolate import interp1d def kpssTest(x, regression="LEVEL",lshort = True): """ KPSS Test for Stationarity Computes the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test for the null hypothesis that x is level or trend stationary. Parameters ---------- x : array_like, 1d data series regression : str {'LEVEL','TREND'} Indicates the null hypothesis and must be one of "Level" (default) or "Trend". lshort : bool a logical indicating whether the short or long version of the truncation lag parameter is used. Returns ------- stat : float Test statistic pvalue : float the p-value of the test. usedlag : int Number of lags used. Notes ----- Based on kpss.test function of tseries libraries in R. To estimate sigma^2 the Newey-West estimator is used. If lshort is TRUE, then the truncation lag parameter is set to trunc(3*sqrt(n)/13), otherwise trunc(10*sqrt(n)/14) is used. The p-values are interpolated from Table 1 of Kwiatkowski et al. (1992). If the computed statistic is outside the table of critical values, then a warning message is generated. Missing values are not handled. References ---------- D. Kwiatkowski, P. C. B. Phillips, P. Schmidt, and Y. Shin (1992): Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root. Journal of Econometrics 54, 159--178. Examples -------- x=numpy.random.randn(1000) # is level stationary kpssTest(x) y=numpy.cumsum(x) # has unit root kpssTest(y) z=x+0.3*arange(1,len(x)+1) # is trend stationary kpssTest(z,"TREND") """ x = np.asarray(x,float) if len(x.shape)>1: raise ValueError("x is not an array or univariate time series") if regression not in ["LEVEL", "TREND"]: raise ValueError(("regression option %s not understood") % regression) n = x.shape[0] if regression=="TREND": t=range(1,n+1) t=sm.add_constant(t) res=sm.OLS(x,t).fit() e=res.resid table=[0.216, 0.176, 0.146, 0.119] else: t=np.ones(n) res=sm.OLS(x,t).fit() e=res.resid table=[0.739, 0.574, 0.463, 0.347] tablep=[0.01, 0.025, 0.05, 0.10] s=np.cumsum(e) eta=np.sum(np.power(s,2))/(np.power(n,2)) s2 = np.sum(np.power(e,2))/n if lshort: l=np.trunc(3*np.sqrt(n)/13) else: l=np.trunc(10*np.sqrt(n)/14) usedlag =int(l) s2=R_pp_sum(e,len(e),usedlag ,s2) stat=eta/s2 pvalue , msg=approx(table, tablep, stat) print "KPSS Test for ",regression," Stationarity\n" print ("KPSS %s=%f" % (regression, stat)) print ("Truncation lag parameter=%d"% usedlag ) print ("p-value=%f"%pvalue ) if msg is not None: print "\nWarning:",msg return ( stat,pvalue , usedlag ) def R_pp_sum (u, n, l, s): tmp1 = 0.0 for i in range(1,l+1): tmp2 = 0.0 for j in range(i,n): tmp2 += u[j]*u[j-i] tmp2 = tmp2*(1.0-(float(i)/((float(l)+1.0)))) tmp1 = tmp1+tmp2 tmp1 = tmp1/float(n) tmp1 = tmp1*2.0 return s + tmp1 def approx(x,y,v): if (v>x[0]): return (y[0],"p-value smaller than printed p-value") if (v

Posted by Deniz Turan, PhD at 12:07 pm 3 comments:

Labels: Augmented Dickey-Fuller (ADF), finance, Financial Time Series Analysis, Johansen, Johansen Test, Variance Ratio: VRTest

Blog Archive

► 2022 (1)

► October (1)

► 2021 (8)

► February (8)

▼ 2015 (8)

▼ November (1)

Monty Hall problem simulation within Kdb+ q

► June (1)

Mash-up technologies for a simple market data cons...

► May (1)

Which Concurrent Hash Map?

► March (1)

Multiple or Single Exit Point in a Java Methods

► February (3)

Knight Moves Problem in Java

Estimating Pi number with Monte Carlo Simulation i...

Generating random numbers from a given set of numb...

► January (1)

Stationarity Test with KPSS (Kwiatkowski–Phillips–...

► 2014 (12)

► March (8)

► February (2)

► January (2)

► 2013 (10)

► December (2)

► November (5)

► October (2)

► September (1)

► 2012 (3)

► April (1)

► March (1)

► January (1)

► 2011 (3)

► December (2)

► January (1)

► 2010 (2)

► December (1)

► April (1)

► 2009 (15)

► October (3)

► September (3)

► August (3)

► July (3)

► May (1)

► April (1)

► February (1)

► 2008 (14)

► October (3)

► September (2)

► August (1)

► June (4)

► May (4)

About Me

Deniz Turan, PhD

London, UK, United Kingdom

View my complete profile