Saturday, November 14, 2015

Monty Hall problem simulation within Kdb+ q

Monty Hall problem is one of the paradox it would easily make you puzzled. If you are one of these puzzled one and still wondering how you would have 0.66 chance when you change your mind after a door is opened, following simulation in KDB+ q may help to persuade you:
mh:{[N]
        d:{.[000b;(1?3);:;1b]} each  til N; // create simulation with randomly setting car positions
        s:N?3; // choice of player
        i:(til N;s);  // index
        changed: sum not d @' s; // let see how many wins if player changed his mind
        :changed%N   // ratio  
  
 }

q)mh[100000]
0.66424

Tuesday, June 09, 2015

Mash-up technologies for a simple market data consumer

In this text i use following open source java libraries to simulate a  simple market data consumer application:

Market Data Consumer 


This application simulates a market data consumer. The ecosystem consists of two providers of market data, one aggregator service that consumes data from the providers and one file writer that writes the results to a CSV file.  The communication between the components should occur using a messaging framework(JeroMQ and Chronice Queue).


Provider

The two providers will publish data that conforms to the following interface:

interface IMarketValue
{
  String getProviderId ();
  String getName();
  double getValue();
}

Lets assume that providers are named “Orion“ and “Polaris”. The lists of instrument names for Orion are:
AAA
BBB
CCC
DDD
EEE
FFF
The instrument names for Polaris are:
AAA
BBB
CCC
GGG
HHH
III
JJJ

Providers generate random numbers (from 5.5 to 10.5) for the values as quick as possible without no throttling.

Aggregator 

Aggregator service listens to data from both providers. For instruments that are only available in one provider, the aggregator should pass the data down as is to the file writer. For instruments that are available from both providers, the aggregator should take the average of the last incoming data from both providers. E.g.
After receiving:
Orion – AAA – 6.3
Orion – FFF – 7.0
Polaris – AAA – 7.0
Polaris – BBB – 8.0
the aggregator should pass down:
AAA – 6.65
FFF – 7.0
BBB – 8.0

However, if Orion then publishes:
Orion – BBB – 7.0
the next aggregator output should contain:
BBB – 7.5

The aggregator contains a dependency while processing messages which introduces a 1 second delay to the processing loop which must be simulated. Newest available data should be always processed  for a given Name in the aggregator at any time. I.e. if the aggregator is stuck in a processing loop and 5 updates come through for AAA, the next processing run should use the latest available data for AAA.

File Writer

The file writer outputs in the following format and the entries should be in alphabetical order. Each new set of values should create a new file and there should be only one entry for each instrument.
Instrument,Value
AAA,6.65
BBB,5.5
FFF,7.0

Every time the file writer receives a new set of values it should create a new version of the file.  Should the file writer not be able to write to the output file, then an alternate filename should be used, but once the file is once more accessible it should return to the original filename.  It should also not create many files and only a new file when a file is not accessible.

Source Code

Full implementation of this application is checked in github:

Sunday, May 03, 2015

Which Concurrent Hash Map?


There are several open source java collection libraries which provide concurrent hash map functionality. But which one is faster and suitable for a low latency application?

In this text, following concurrent hash map classes are investigated in terms of their performance:

Wherever it is required, these parameters are used:
  • Initial Capacity: 16
  • Concurrency level: 16
  • Load factor: 0.75

Setup

Two kind of tests are done to benchmark Map#put and Map#get() methods of these classes. Following setup is applied:
  • As key for maps random generated strings with length of 8 are used.
  • For warm up 100K operation is done before benchmarks are done for each tests.
  • Each main benchmark is done with various number of tasks (10,100,1K,10K,100K,1M,10M)
  • Each benchmark is repeated 5 times and its average is used as result.
  • 10 producer (thread) is used to do put/get operations to simulate concurrency.
  • To reduce garbage collection (GC) activities, before each benchmark, GC is forced , System.gc() and a large heap size is set.
  • Oracle JDK 1.8.0.25 is used
  • Test are done on a Redhat , Intel G6 @ 2.93Ghz with 12 cores server.

Results

Following tables and graphs illustrates elapsed time (macroSeconds) of put/get benchmarks.

Put : Classes/Time (macroSecond) 10 100 1000 10000 100000 1000000 10000000
Javolution_FastMap 188 157 292 1684 16857 297025 4228273
Oracle_ConcurrentHashMap 91 119 170 970 12388 207857 2684529
GS_ConcurrentHashMapUnsafe 85 111 259 1017 13290 252151 3053912
GS_ConcurrentHashMap 78 112 205 1060 14051 258071 3104562
Gauva_ConcurrentHashMap 74 93 158 911 12485 205433 2574604

 


Get: Classes/Time (macroSecond) 10 100 1000 10000 100000 1000000 10000000
Javolution_FastMap 98 105 215 1630 15732 215260 2296100
Oracle_ConcurrentHashMap 78 78 115 619 7411 144583 1866310
GS_ConcurrentHashMapUnsafe 86 81 149 960 10342 172114 2204728
GS_ConcurrentHashMap 69 77 146 1119 10322 168585 2027240
Gauva_ConcurrentHashMap 70 71 114 621 7861 129094 1581683



Conclusion

As tables and figures shows that ConcurrentHashMap implementation of Guava and Oracle performs best for put and get operations. This analysis also exposes myths about performance of FastMap by Javolution.

For source code of benchmark, please click here.

Saturday, March 21, 2015

Multiple or Single Exit Point in a Java Methods

Recently, during a code review, a question popped up: Whether a method should have a single or multiple exit point once some conditions are satisfied. As discussed in here, there is not a clear, one cut answer for this question.

In terms of readability and exposing responsibility of method immediately, my personal preference is to return immediately from a method once task is done. Based on my experience with dealing complex, long methods with limited documentation, multiple returns in a method helped me to understand code easier in past.

But readability is not always the only factor when I write code in my current job. Performance (low latency) beats readability. So, I decided to benchmark these two methods:

Multiple return points 
public Number multipleReturnPoints(Number num) {
	if (num instanceof Integer) {
		return Integer.valueOf(num.intValue());
	}

	if (num instanceof Float) {
		return Float.valueOf(num.floatValue());
	}

	if (num instanceof Double) {
		return Double.valueOf(num.doubleValue());
	}
	return null;
}

Single returns in the end
public Number singleReturnPoint(Number num) {
	Number val = null;
	if (num instanceof Integer) {
		val = Integer.valueOf(num.intValue());
	}

	if (num instanceof Float) {
		val = Float.valueOf(num.floatValue());
	}

	if (num instanceof Double) {
		val = Double.valueOf(num.doubleValue());
	}
	return val;
}

I used following setup for testing:

  • Both method is called 5 million times, after 500K warm up  
  • In order to asses JIT compile's affect, two different schemes is used for methods parameter selection: 
    • Clustered (Batch): Same type parameter called in batch sequentially. In other words first Double, then Integer and finally Float type is chosen. 
    • Random : Type parameter is chosen from a uniform distribution. This is more realistic case would happen in a system. 
  • Following JRE are tested: 
    •  Oracle JRE 1.6.0_20 
    • Oracle JRE 1.7.0_15 
    • Oracle JRE 1.8.0_11
    • Azur Zing 1.7.0-5.9.5.0 
  • Garbage collection is prevented by allocating large amount of heap size
  • Tests are run on Redhat Linux  (64 bit,2.93GHz).

Results

Above benchmark test is done 7 times and in below charts, average latency (mSec) is illustrated .


 

Oracle JDK, although single returns perform marginally better , there is not much difference between multiple or single returns for both clustered random parameter selection. Whereas for Zing JVM, there is not a definitive conclusion. But given that random parameter selection is more realistic, single return is the winner, I think.
In conclusion, I could not find a definitive answer which method exit case performs better in terms of low latency. 
Source Code is as below:
package com.denizstij;

public interface Simulation {
	 Number run(Number num);
	 String getName();
}

////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
package com.denizstij;

import java.util.Arrays;
import java.util.Random;

public class SingleOrMultipleExitPoints {

	public static class MultipleReturns implements Simulation {


		public Number run(Number num) {
			if (num instanceof Integer) {
				return Integer.valueOf(num.intValue());
			}

			if (num instanceof Float) {
				return Float.valueOf(num.floatValue());
			}

			if (num instanceof Double) {
				return Double.valueOf(num.doubleValue());
			}
			return null;
		}
		@Override
		public String getName() {
			return "Multiple Returns";
		}

	}

	public static class SingleReturn implements Simulation {
		public Number run(Number num) {
			Number val = null;
			if (num instanceof Integer) {
				val = Integer.valueOf(num.intValue());
			}

			if (num instanceof Float) {
				val = Float.valueOf(num.floatValue());
			}

			if (num instanceof Double) {
				val = Double.valueOf(num.doubleValue());
			}

			return val;
		}
		@Override
		public String getName() {
			return "Single Return";
		}
	}
	

	public static void main(String[] args) {
		
		Random random = new Random(0);
		int warmUpLoopSize = 50000;
		int loopSize = warmUpLoopSize * 10;
		if (args.length==2){
			warmUpLoopSize=Integer.parseInt(args[0]);
			loopSize=Integer.parseInt(args[1]);
			System.out.println("Number of warmUpLoopSize:"+warmUpLoopSize);
			System.out.println("Number of loopSize  :"+loopSize);
		}		
		// Arrays for warming up JVM 
		Integer warmUpIntArr[] = new Integer[warmUpLoopSize];
		Float warmUpFloatArr[] = new Float[warmUpLoopSize];
		Double warmUpDoubleArr[] = new Double[warmUpLoopSize];
		// Simulation arrays
		Integer intArr[] = new Integer[loopSize];
		Float floatArr[] = new Float[loopSize];
		Double doubleArr[] = new Double[loopSize];
		// Creatring random numbers 
		createRandomNumbers(random, warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,warmUpLoopSize);
		createRandomNumbers(random, intArr, floatArr, doubleArr, loopSize);

		// Creating test cases 
		MultipleReturns multipleReturns = new MultipleReturns();
		SingleReturn singleReturn= new SingleReturn();
		
		int[] warmUpRandomCallSeq = generateRandomCallSequence(warmUpLoopSize,3,random);
		int[] warmUpClusteredCallSeq = generateClusteredCallSequence(warmUpLoopSize,3,random);
		int[] randomCallSeq = generateRandomCallSequence(loopSize,3,random);
		int[] clusteredCallSeq = generateClusteredCallSequence(loopSize,3,random);
		
		//System.out.println(Arrays.toString(warmUpRandomCallSeq));
		//System.out.println(Arrays.toString(warmUpClusteredCallSeq));
		
		String simName="Clustered Warmup";
		callSimulation(warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,simName, multipleReturns,warmUpClusteredCallSeq);
		callSimulation(warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,simName, singleReturn,warmUpClusteredCallSeq);
		
		simName="Clustered Calls";
		callSimulation(intArr, floatArr, doubleArr,simName, multipleReturns,clusteredCallSeq);
		callSimulation(intArr, floatArr, doubleArr,simName, singleReturn,clusteredCallSeq);
		System.out.println("==============================");
		
		simName="Random Warmup";
		callSimulation(warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,simName, multipleReturns,warmUpRandomCallSeq);
		callSimulation(warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,simName, singleReturn,warmUpRandomCallSeq);
		
		simName="Random Calls";
		callSimulation(intArr, floatArr, doubleArr,simName, multipleReturns,randomCallSeq);
		callSimulation(intArr, floatArr, doubleArr,simName, singleReturn,randomCallSeq);
		System.out.println("==============================");
		

	}

	private static void callSimulation(Integer[] intArr, Float[] floatArr, Double[] doubleArr, String msg, Simulation sim, int[] callSeq) {
		long now = System.currentTimeMillis();
		int clusterIndexes[]= new int[3];
		int index=0;
		for (int i = 0; i < callSeq.length; i++) {
			int callerIndex = callSeq[i];
			index=clusterIndexes[callerIndex]++;
			if (index>=intArr.length){
				continue;
			}		
			switch (callerIndex){
				case 0:
					doubleArr[index] = (Double) sim.run(doubleArr[index]);
					break;
				case 1:
					intArr[index] = (Integer) sim.run(intArr[index]);
					break;
				case 2:
					floatArr[index] = (Float) sim.run(floatArr[index]);
					break;
				default:
					System.out.println("Unknown cluster :("+callerIndex);
			}
		}

		long finish = System.currentTimeMillis();
		long elapsedTime = finish - now;
		System.out.println(elapsedTime + ": " +msg+" :"+ sim.getName());
		System.out.println(Arrays.toString(clusterIndexes));
		System.out.flush();
		
	}	
	
	static void createRandomNumbers(Random random, Integer intArr[],
			Float floatArr[], Double doubleArr[], int size) {
		for (int i = 0; i < size; i++) {
			intArr[i] = Integer.valueOf(random.nextInt(size));
			floatArr[i] = Float.valueOf(random.nextFloat() * size);
			doubleArr[i] = Double.valueOf(random.nextDouble() * size);
		}
	}
	
	static int[] generateClusteredCallSequence(int simSize,int numOfCluster,Random random){
		int totalSize=numOfCluster*simSize;
		int callSeq[]= new int[totalSize];
		int index=0;
		for (int j=0;j < numOfCluster;j++){
			for (int i=0;i < simSize;i++){
				callSeq[index++]=j;
			}
		}
		return callSeq;
	}
	
	static int[] generateRandomCallSequence(int simSize,int numOfCluster,Random random){
		int totalSize=numOfCluster*simSize;
		int callSeq[]= new int[totalSize];
		for (int i=0;i < totalSize;i++){
			callSeq[i]=random.nextInt(numOfCluster);
		}
		return callSeq;
	}
}

Wednesday, February 11, 2015

Knight Moves Problem in Java

In this text, i represent a solution for Knight Moves Problem in java. The problem is defined as follows:


Pictured here is a keypad:

We want to find all sequences of length n that can be keyed into the keypad in the following manner:
  • The initial keypress can be any of the keys.
  • Each subsequent keypress must be a knight move from the previous keypress. 
  • There can be at most 2 vowels in the sequence.
  • Run your solution for n = 1 to 32
A knight move is made in one of the following ways:
  • Move two steps horizontally and one step vertically.
  • Move two steps vertically and one step horizontally.
  • There is no wrapping allowed on a knight move.
Here are some examples of knight moves:


Given a value for n, the program should write the number of valid n-key sequences on a single line to standard out.


Knight Moves Solution

package com.denizstij.km;

import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;

import com.denizstij.km.Keyboard.Key;
/**
 * 
 * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij AT gmail.com
 * 
 */
public class KnightMoveMain {
 private final int maxVowelCount;
 
 public KnightMoveMain(int maxVowelCount){
  this.maxVowelCount=maxVowelCount;
 }
 
 public long findNumberOfSequence(int lenOfSequence) {
  if (lenOfSequence<=0){   
   return 1;// number of empty set is 1
  }
  Map <PathInfoBean, Long> pathInfoMap= new LinkedHashMap<PathInfoBean, Long>();
  for (Key key : Keyboard.INSTANCE.getKeySet()) {   
   PathInfoBean pathinfo= new PathInfoBean(key, key.isVowel()?1:0);
   pathInfoMap.put(pathinfo, 1L);
  }
  
  long numberOfPaths = findNumberOfSequence1(pathInfoMap,1,lenOfSequence);
  return numberOfPaths;
 }
 
 private long findNumberOfSequence1(Map <PathInfoBean, Long> pathInfoMap ,int sequenceLen, int lenOfSequence) {

  if (sequenceLen>=lenOfSequence){
   long numberOfPaths=0;
   for (Long pathCount : pathInfoMap.values()) {   
    numberOfPaths+=pathCount;
   }      
   return numberOfPaths;
  }   
   
  Map <PathInfoBean, Long> pathInfoMapNew= new LinkedHashMap<PathInfoBean, Long>();
  for (Entry<PathInfoBean, Long> pathInfoBeanEntry : pathInfoMap.entrySet()) {
   PathInfoBean pathInfoBean = pathInfoBeanEntry.getKey();

   Long pathCount = pathInfoBeanEntry.getValue();
   Set<key> possibleMoveSet =pathInfoBeanEntry.getKey().getKey().getKnightMoves();
     
   for (Key key : possibleMoveSet) {
    int vowelCount=pathInfoBean.getVowelCount();
    if (key.isVowel()){
     if (vowelCount>=maxVowelCount){
      continue;
     } 
     vowelCount++;
    }
    PathInfoBean newPathInfoBean = new PathInfoBean(key,vowelCount);  
    Long currentPath = pathInfoMapNew.get(newPathInfoBean);
    if (currentPath==null)  // first time 
     pathInfoMapNew.put(newPathInfoBean,pathCount);
    else {
     // we have already, so just lets aggregate both branch
     pathInfoMapNew.put(newPathInfoBean,currentPath+pathCount);
    }
   }   
  }
  return findNumberOfSequence1(pathInfoMapNew,sequenceLen+1,lenOfSequence);
 }  
 
 public static void main(String[] args) {  
  KnightMoveMain km= new KnightMoveMain(2);
  long now = System.currentTimeMillis();
  for (int n=0;n<=32;n++) {
   long numberOfPaths = km.findNumberOfSequence(n);
   System.out.println(n+"==>"+numberOfPaths);   
  }
  long elapsedTime = System.currentTimeMillis()-now;
  System.out.println("Total Elapsed Time (msec):"+elapsedTime);
 }
}

//###############################################################
//###############################################################
//###############################################################

package com.denizstij.km;

import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.Set;
/**
 * 
 * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij AT gmail.com
 * 
 */
public class Keyboard{

 final private  Set <key> KEYSET= new LinkedHashSet<key>();    
 final public static Key A= new Key('A', true);
 final public static Key B= new Key('B', false);
 final public static Key C= new Key('C', false);
 final public static Key D= new Key('D', false);
 final public static Key E= new Key('E', true);
 final public static Key F= new Key('F', false);
 final public static Key G= new Key('G', false);
 final public static Key H= new Key('H', false);
 final public static Key I= new Key('I', true);
 final public static Key J= new Key('J', false);
 final public static Key K= new Key('K', false);
 final public static Key L= new Key('L', false);
 final public static Key M= new Key('M', false);
 final public static Key N= new Key('N', false);
 final public static Key O= new Key('O', true);
 final public static Key _1= new Key('1', false);
 final public static Key _2= new Key('2', false);
 final public static Key _3= new Key('3', false);

 final public static Keyboard INSTANCE= new Keyboard();

  private void init(){  
  KEYSET.add(A.setKnightMoves(H,L));
  KEYSET.add(B.setKnightMoves(I,K,M));
  KEYSET.add(C.setKnightMoves(F,J,L,N));
  KEYSET.add(D.setKnightMoves(G,M,O));
  KEYSET.add(E.setKnightMoves(H,N));
  KEYSET.add(F.setKnightMoves(C,M,_1));
  KEYSET.add(G.setKnightMoves(D,N,_2));
  KEYSET.add(H.setKnightMoves(A,E,K,O,_1,_3));
  KEYSET.add(I.setKnightMoves(B,L,_2));
  KEYSET.add(J.setKnightMoves(C,M,_3));
  KEYSET.add(K.setKnightMoves(B,H,_2));
  KEYSET.add(L.setKnightMoves(A,C,I,_3));
  KEYSET.add(M.setKnightMoves(B,D,F,J));
  KEYSET.add(N.setKnightMoves(C,E,G,_1));
  KEYSET.add(O.setKnightMoves(D,H,_2));
  KEYSET.add(_1.setKnightMoves(F,H,N));
  KEYSET.add(_2.setKnightMoves(G,I,K,O));
  KEYSET.add(_3.setKnightMoves(H,J,L));   
 }
 
 // Singleton
 private Keyboard(){
  init();
 }

public Set <key> getKeySet() {
  return KEYSET;
 }

public static class Key{
 final private char key;
 final private boolean isVowel;
 private Set<key> knightMoves;
 
 Key(char key, boolean isVowel){
  this.key=key;
  this.isVowel=isVowel;
 }

 public Key setKnightMoves(Key ...knightMoves) {
  this.knightMoves=Collections.unmodifiableSet(new HashSet<key>(Arrays.asList(knightMoves)));
  return this;
 }

 public Set<key> getKnightMoves() {
  return knightMoves;
 }

 @Override
 public int hashCode() {
  final int prime = 31;
  int result = 1;
  result = prime * result + key;
  return result;
 }

 @Override
 public boolean equals(Object obj) {
  if (this == obj)
   return true;
  if (obj == null)
   return false;
  if (getClass() != obj.getClass())
   return false;
  Key other = (Key) obj;
  if (key != other.key)
   return false;
  return true;
 }
 public char getKey() {
  return key;
 }
 
 public boolean isVowel() {
  return isVowel;
 }
 @Override
 public String toString() {
  return "Key [key=" + key + ", isVowel=" + isVowel +"]";
 }
}
}

//###############################################################
//###############################################################
//###############################################################

package com.denizstij.km;

import com.denizstij.km.Keyboard.Key;

/**
 * 
 * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij AT gmail.com
 * 
 */
public class PathInfoBean {
 final private Key key; 
 final private int vowelCount;
 
 public PathInfoBean(Key key, int vowelCount) {   
  this.key = key;
  this.vowelCount = vowelCount;
 }
 
 public Key getKey() {
  return key;
 }
 
 public int getVowelCount() {
  return vowelCount;
 }


 @Override
 public int hashCode() {
  final int prime = 31;
  int result = 1;
  result = prime * result + ((key == null) ? 0 : key.hashCode());
  result = prime * result + vowelCount;
  return result;
 }

 @Override
 public boolean equals(Object obj) {
  if (this == obj)
   return true;
  if (obj == null)
   return false;
  if (getClass() != obj.getClass())
   return false;
  PathInfoBean other = (PathInfoBean) obj;
  if (key == null) {
   if (other.key != null)
    return false;
  } else if (!key.equals(other.key))
   return false;
  if (vowelCount != other.vowelCount)
   return false;
  return true;
 }
 
 @Override
 public String toString() {
  return "PathInfoBean [key=" + key + ", vowelCount=" + vowelCount + "]\n";
 }
}

Output
1==>18
2==>60
3==>214
4==>732
5==>2486
6==>8392
7==>27912
8==>93204
9==>306288
10==>1013398
11==>3302640
12==>10842528
13==>35108325
14==>114492546
15==>368772182
16==>1195650888
17==>3833937659
18==>12367469662
19==>39505386940
20==>126863960276
21==>403894417802
22==>1291845025388
23==>4100838894764
24==>13069438426724
25==>41380854190503
26==>131455706155352
27==>415265140628246
28==>1315326269976050
29==>4146551571701468
30==>13098978945964762
31==>41218021719932582
32==>129891093550589788

Total Elapsed Time (msec):450

Tuesday, February 10, 2015

Estimating Pi number with Monte Carlo Simulation in Java and Kdb+ q

package com.denizstij.montecarlo.sim;
 
import java.util.Random;
 
/**
*  This class estimates Pi with monte carlo simulation.
* 
 *  We assume we have a square with dimension 1x1 which contains a disk in middle with radious of 0.5
*  where it touches the square on 4 edges. We generate N number of pairs (x,y) between 0 and 1.
 *  Then we check if each pair is in the circle or not. 4 times ratio of total pairs in circle
*  to total simulation is an estimation of Pi. 
 * 
 *  pi=4*(total pairs in circle)/total pairs
* 
 * @author Deniz Turan, http://denizstij.blogspot.co.uk, denizstij AT gmail.com
*
 */
public class PIEstimationWithMonteCarloSimulation {
       // number of total simulation
       private static final long TOTAL_SIMULATION=10000000L;
      
       private double estimatePI(){            
              double pi=0;
              Random randomGen= new Random();
              long inCircleCounter=0;
              for (int i=0;i<TOTAL_SIMULATION;i++){
                     // generate pairs
                     double x = randomGen.nextDouble();
                     double y = randomGen.nextDouble();
                     // check if the pair  in the circle
                     inCircleCounter+=isInCircle(x,y);
              }
              double ratio=(double)inCircleCounter/TOTAL_SIMULATION;
              pi=ratio*4;
              return pi;
       }
      
       /*
       *  Check given two points in a circle: x^2+y^2<=1
       */
       private int isInCircle(double x, double y) {
              double z=Math.sqrt(x*x+y*y);            
              return z<=1?1:0;
       }
 
       public static void main(String[] args) {
              PIEstimationWithMonteCarloSimulation est= new PIEstimationWithMonteCarloSimulation();
              double estimatedPI = est.estimatePI();
              System.out.println("Estimated PI:"+estimatedPI);
       }
}
And now in Kdb+ q :
piEst:{[N]
        sqrArea:4;                                  / area of square with length 2 and containing a circle with radius 1
        xy:flip {2?1.0} each til N;                 / xy points
        p:sqrt sum xy*xy;                           / sqrt(x^2+y^2) <=1 to be in a circle
        totalIncircle: sum p<=1;
        piEst:sqrArea*totalIncircle%N;                    / that’s the estimation of PI
        :piEst
}
q)piEst[1000000]
3.141476
Above code can be reduced to one line such as:
q)N:10000000
q)(1%N)*4*sum 1>=sqrt sum ({x*x}') each flip {2?1.0} each til N
3.141842

Monday, February 09, 2015

Generating random numbers from a given set of numbers according to a distribution in java

Recently i am asked to implement following in Java based on the below interface:

Implement the method nextNum() and a minimal but effective set of unit tests.  As a quick check, given Random Numbers are [-1, 0, 1, 2, 3] and Probabilities are [0.01, 0.3, 0.58, 0.1, 0.01] if we call nextNum() 100 times we may get the following results. As the results are random, these particular results are unlikely.

-1: 1 times
0: 22 times
1: 57 times
2: 20 times
3: 0 times
public class RandomGen {
 // Values that may be returned by nextNum()
 private int[] randomNums;
 // Probability of the occurence of randomNums
 private float[] probabilities;

 /**
 Returns one of the randomNums. When this method is called
 multiple times over a long period, it should return the
 numbers roughly with the initialized probabilities.
 */
 public int nextNum() {

 }
}
Solution:
package com.denizstij.rand;

import java.util.Arrays;
import java.util.Random;
/**
 * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij AT gmail.com
 */
public class RandomGen {
	// Values that may be returned by nextNum()
	private int[] randomNums;
	// Probability of the occurence of randomNums
	private float[] probabilities;
	private float[] cumProbabilities;
	private int[] validRandomNums;
	private int totalNonZeroProbElement=0;
	private Random random;
	
	public RandomGen(int[] randomNums,float[] probabilities){
		validateProcessInputs(randomNums,probabilities);
		random= new Random();
	}
	
	public RandomGen(int[] randomNums,float[] probabilities, long seed){
		validateProcessInputs(randomNums,probabilities);
		random= new Random(seed);
	}
		
	private void validateProcessInputs(int[] randomNums,float[] probabilities){			
		if (randomNums==null || probabilities==null || randomNums.length!= probabilities.length || probabilities.length==0){
			throw new IllegalArgumentException("RandomNums and probabilities must be non empty and same size");
		}
		
		int len=probabilities.length;
		cumProbabilities= new float[len];
		validRandomNums= new int[len];
				
		for (int i=0;i < len;i++) {			
			float prob = probabilities[i];
			int randomNum = randomNums[i];
			if (MathUtil.isLess(prob,0.0f)){
				throw new IllegalArgumentException("Probability can not be negative");
			}
			
			if (MathUtil.isGreater(prob,1.0f)){
				throw new IllegalArgumentException("Probability can not be greater than 1");
			}
			
			if (MathUtil.checkAnyIsNanOrInfinite(prob)){
				throw new IllegalArgumentException("Nan or Infite prob is not accepted");
			}

			// not processing elements with zero probabilities as they can not occur
			if (MathUtil.isEqual(prob, 0.0f)){
				continue;
			}
			// store valid random number
			validRandomNums[totalNonZeroProbElement]=randomNum;
			if (totalNonZeroProbElement==0){
				cumProbabilities[totalNonZeroProbElement]=prob;				
			} else {
				cumProbabilities[totalNonZeroProbElement]=prob+cumProbabilities[totalNonZeroProbElement-1];				
			}			
			if (MathUtil.isGreater(cumProbabilities[totalNonZeroProbElement],1.0f)){
				throw new IllegalArgumentException("Cumilative total probability can not be greater than 1");
			}
			totalNonZeroProbElement++;
		}
		
		if (totalNonZeroProbElement==0){
			throw new IllegalArgumentException("All probabilities are zero :(");
		}
		
		if (!MathUtil.isEqual(cumProbabilities[totalNonZeroProbElement-1],1.0f)){
			throw new IllegalArgumentException("Total probability must be 1");
		}

		// let's shrink arrays to save memory
		this.validRandomNums=Arrays.copyOf(validRandomNums, totalNonZeroProbElement);
		this.cumProbabilities=Arrays.copyOf(cumProbabilities, totalNonZeroProbElement); 
		this.randomNums=randomNums;
		this.probabilities=probabilities;
		
		// some logging 
		System.out.println("RandomNums :"+Arrays.toString(this.randomNums));
		System.out.println("probabilities :"+Arrays.toString(this.probabilities));
		System.out.println("totalNonZeroProbElement:"+totalNonZeroProbElement);
		System.out.println("validRandomNums :"+Arrays.toString(validRandomNums));
		System.out.println("cumProbabilities :"+Arrays.toString(cumProbabilities));				
	}
	
	/**
	Returns one of the randomNums. When this method is called
	multiple times over a long period, it should return the
	numbers roughly with the initialized probabilities.
	*/
	public int nextNum() {
		float nextFloat = random.nextFloat();
		return nextNum(nextFloat);
	}
	
	/**
	Returns one of the randomNums. When this method is called
	multiple times over a long period, it should return the
	numbers roughly with the initialized probabilities.
	*/
	// For testing purposes 
	protected int nextNum(float nextFloat) {		
		int index = Arrays.binarySearch(cumProbabilities,nextFloat);
		// if not found in the cum probability array, we need to infer index from insertation index 
		if (index<0 code="" index="-1*index-1;" int="" nextrandomval="" return="">

package com.denizstij.rand;

/**
 * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij@gmail.com
 */
public class MathUtil {
	public static final float FLOAT_COMPARE_RESOLUTION=0.00001f;	
	
	public static boolean isEqual(float val1, float val2) {		
		if (Math.abs(val1-val2)<FLOAT_COMPARE_RESOLUTION){
			return true;
		}				
		return false;
	}
	
	public static boolean isLess(float val1, float val2) {
		if (isEqual(val1,val2)){
			return false;
		}
		return val1<val2;
	}
	
	public static boolean isLessEqual(float val1, float val2) {
		if (isEqual(val1,val2)){
			return true;
		}
		return val1<val2;
	}
	
	public static boolean isGreater(float val1, float val2) {
		if (isEqual(val1,val2)){
			return false;
		}
		return val1>val2;
	}
	
	public static boolean isGreaterEqual(float val1, float val2) {
		if (isEqual(val1,val2)){
			return true;
		}
		return val1>val2;
	}
	
	public static boolean checkAnyIsNanOrInfinite(float ...values){
		for (float val : values) {
			if (Float.isNaN(val) || Float.isInfinite(val)){
				return true;
			}
		}
		return false;
	}
}
Junit Test:

package com.denizstij.rand.test;

import java.util.Arrays;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Map.Entry;

import org.junit.Assert;
import org.junit.Test;

import com.denizstij.rand.RandomGen;

/**
 * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij@gmail.com
 */
public class RandomGenTest {
	private static final float SIM_PROB_THRESHOLD = 0.001f;

	@Test(expected=IllegalArgumentException.class)
	public void testNullInputs() {
		int[] randomNums=null;
		float[] probabilities=null;
		new RandomGen(randomNums, probabilities);		
	}

	@Test(expected=IllegalArgumentException.class)
	public void testEmptyArray() {
		int[] randomNums= new int[0];
		float[] probabilities=new float[0];
		new RandomGen(randomNums, probabilities);		
	}

	@Test(expected=IllegalArgumentException.class)
	public void testDifferentSizeNumsAndProbs() {
		int[] randomNums= {1};
		float[] probabilities={0.3f,0.4f};
		new RandomGen(randomNums, probabilities);		
	}
	
	@Test(expected=IllegalArgumentException.class)
	public void testProbsAndNumberArrayDifferentSize() {
		int[] randomNums= {1};
		float[] probabilities={0.3f,0.4f};
		new RandomGen(randomNums, probabilities);		
	}
	
	@Test(expected=IllegalArgumentException.class)
	public void testProbsIsNotNegative() {
		int[] randomNums= {1,2};
		float[] probabilities={0.3f,-0.4f};
		new RandomGen(randomNums, probabilities);		
	}
	
	@Test(expected=IllegalArgumentException.class)
	public void testProbsAllZero() {
		int[] randomNums= {1,2};
		float[] probabilities={0.0f,0.0f};
		new RandomGen(randomNums, probabilities);		
	}
	
	@Test(expected=IllegalArgumentException.class)
	public void testProbArrayHasNaN() {
		int[] randomNums= {1,2};
		float[] probabilities={Float.NaN,0.4f};
		new RandomGen(randomNums, probabilities);		
	}	
	
	@Test(expected=IllegalArgumentException.class)
	public void testProbsIsNotGreaterThanOne() {
		int[] randomNums= {1,2};
		float[] probabilities={0.3f,1.4f};
		new RandomGen(randomNums, probabilities);		
	}
	
	@Test(expected=IllegalArgumentException.class)
	public void testProbsTotalIsNotOne() {
		int[] randomNums= {1,2};
		float[] probabilities={0.3f,0.4f};
		new RandomGen(randomNums, probabilities);		
	}	
	
	@Test
	public void testCumulativeProbLogic() {
		int[] randomNums= {-1,-2,-4,3410,893,9,343,10};
		float[] probabilities={0.0f,0.0f,0.0f,0.3f,0.5f,0.0f,0.2f,0.0f};
		//validRandomNums :[3410, 893, 343]
		//cumProbabilities :[0.3, 0.8, 1.0]
		RandomGenWrapper randomGen = new RandomGenWrapper(randomNums, probabilities);
		// checking random number:3410 which has cum prob of 0<= and <=0.3
		int nextRandomNum = randomGen.nextNum(0.2f);		
		Assert.assertTrue(nextRandomNum==3410);
		nextRandomNum = randomGen.nextNum(0.299f);		
		Assert.assertTrue(nextRandomNum==3410);
		nextRandomNum = randomGen.nextNum(0.3f);		
		Assert.assertTrue(nextRandomNum==3410);

		// checking random number:893 which has cum prob of 0.3< and <=0.8		
		nextRandomNum = randomGen.nextNum(0.30001f);		
		Assert.assertTrue(nextRandomNum==893);
		nextRandomNum = randomGen.nextNum(0.51f);		
		Assert.assertTrue(nextRandomNum==893);
		nextRandomNum = randomGen.nextNum(0.799999f);		
		Assert.assertTrue(nextRandomNum==893);
		nextRandomNum = randomGen.nextNum(0.8f);		
		Assert.assertTrue(nextRandomNum==893);
		
		// checking random number:343 which has cum prob of 0.8< and <=1		
		nextRandomNum = randomGen.nextNum(0.800001f);
		Assert.assertTrue(nextRandomNum==343);
		nextRandomNum = randomGen.nextNum(0.9f);
		Assert.assertTrue(nextRandomNum==343);
		nextRandomNum = randomGen.nextNum(0.9999999f);
		Assert.assertTrue(nextRandomNum==343);
		nextRandomNum = randomGen.nextNum(1);
		Assert.assertTrue(nextRandomNum==343);
	}
	
	@Test
	public void testNonOccuringEventsWithProbZero() {
		// -1 event should never happen here	
		int[] randomNums= {5,-1,2,4};
		float[] probabilities={0.1f,0.0f,0.6f,0.3f};
		RandomGen randomGen = new RandomGen(randomNums, probabilities);
		//lets try several times if this true
		int maxTry=100000;
		for (int i=0;i<maxTry;i++){
			int nextNum = randomGen.nextNum();
			// make sure we never get -1 which has prob 0
			Assert.assertNotEquals(nextNum, -1);
		}		
	}
	
	@Test
	public void testAlwaysOccuringEventsWithProbOne() {
		// -1 event should always happen here	
		int[] randomNums= {5,-1,2,4};
		float[] probabilities={0.0f,1.0f,0.0f,0.0f};
		RandomGen randomGen = new RandomGen(randomNums, probabilities);
		//lets try several times if this true
		int maxTry=100000;
		for (int i=0;i<maxTry;i++){
			int nextNum = randomGen.nextNum();
			// make sure we always get -1 which has prob 1
			Assert.assertEquals(nextNum, -1);
		}		
	}
	
	/*
	 * Generate huge amount of random next number, given probability and random number arrays. 
	 * Then, frequency of each random number is estimated.
	 * Finally it is asserted that frequency is close to the input probability within a threshold (0.001)   
	 */
	@Test
	public void testProvidedQuickCheck() {
		int[] randomNums= {-1,0,1,2,3};
		float[] probabilities={0.01f, 0.3f, 0.58f, 0.1f, 0.01f};
		float[] simFrequency= new float[probabilities.length];
		
		Map <Integer,Long> counterMap= new LinkedHashMap<Integer,Long>();
		// reset counter map
		for (int randomNum : randomNums) {
			counterMap.put(randomNum, 0L);
		}
		
		RandomGen randomGen = new RandomGen(randomNums, probabilities,1234L);
		//lets generate many times random numbers
		int maxTry=10000000;
		for (int i=0;i<maxTry;i++){
			int nextNum = randomGen.nextNum();
			Long counter = counterMap.get(nextNum);			
			counter++;
			counterMap.put(nextNum, counter);			
		}
		
		// estimate frequency 
		int index=0;
		for (Entry<Integer, Long> entry : counterMap.entrySet()) {			
			Long counter = entry.getValue();
			// simulation frequencies
			simFrequency[index]=counter/(float)maxTry;
			float simProbDifference = Math.abs(simFrequency[index]-probabilities[index]);
			// 
			Assert.assertTrue(simProbDifference<SIM_PROB_THRESHOLD);
			index++;
		}

		System.out.println("Counter Map "+counterMap);
		System.out.println("Freq map:"+Arrays.toString(simFrequency));
		
	}
	private class RandomGenWrapper extends RandomGen{

		public RandomGenWrapper(int[] randomNums, float[] probabilities) {
			super(randomNums, probabilities);		
		}
		
		public int nextNum(float nextFloat){
			return super.nextNum(nextFloat);
		}
	}
}


Saturday, January 03, 2015

Stationarity Test with KPSS (Kwiatkowski–Phillips–Schmidt–Shin) in Python

In addition to augmented Dickey–Fuller test (ADF), KPSS (Kwiatkowski–Phillips–Schmidt–Shin) is widely used to verify stationarity of a signal. Python statsmodels contains ADF test, but I could not find any implementation of KPSS in python. Therefore, I converted R implementation of kpss.test in tseries library to python as follows (Click here to download the source code) :
"""
Created on Sat Jan-03-2015
@author: Deniz Turan (http://denizstij.blogspot.co.uk/)

"""
import numpy as np
import statsmodels.api as sm
from scipy.interpolate import interp1d

def kpssTest(x, regression="LEVEL",lshort = True):
    """
    KPSS Test for Stationarity

    Computes the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test for the null hypothesis that x is level or trend stationary.

    Parameters
    ----------
    x : array_like, 1d
        data series
    regression : str {'LEVEL','TREND'} 
        Indicates the null hypothesis and must be one of "Level" (default) or "Trend". 
    lshort : bool
        a logical indicating whether the short or long version of the truncation lag parameter is used.

    Returns
    -------
    stat : float
        Test statistic
    pvalue : float
        the p-value of the test.
    usedlag : int
        Number of lags used.

    Notes
    -----
    Based on kpss.test function of tseries libraries in R.
    
    To estimate sigma^2 the Newey-West estimator is used. If lshort is TRUE, then the truncation lag parameter is set to trunc(3*sqrt(n)/13), otherwise trunc(10*sqrt(n)/14) is used. The p-values are interpolated from Table 1 of Kwiatkowski et al. (1992). If the computed statistic is outside the table of critical values, then a warning message is generated.

    Missing values are not handled.

    References
    ----------
    D. Kwiatkowski, P. C. B. Phillips, P. Schmidt, and Y. Shin (1992): Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root. Journal of Econometrics 54, 159--178.

    Examples
    --------
    x=numpy.random.randn(1000)  #   is level stationary    
    kpssTest(x)

    y=numpy.cumsum(x)           # has unit root    
    kpssTest(y)

    z=x+0.3*arange(1,len(x)+1)   # is trend stationary
    kpssTest(z,"TREND")

    """
    x = np.asarray(x,float)
    if len(x.shape)>1:
        raise ValueError("x is not an array or univariate time series")
    if regression not in ["LEVEL", "TREND"]:
        raise ValueError(("regression option %s not understood") % regression)

    n = x.shape[0]
    if regression=="TREND":
        t=range(1,n+1)
        t=sm.add_constant(t)
        res=sm.OLS(x,t).fit()
        e=res.resid
        table=[0.216, 0.176, 0.146, 0.119]
    else:
        t=np.ones(n)
        res=sm.OLS(x,t).fit()
        e=res.resid
        table=[0.739, 0.574, 0.463, 0.347]

    tablep=[0.01, 0.025, 0.05, 0.10]
    s=np.cumsum(e)
    eta=np.sum(np.power(s,2))/(np.power(n,2))
    s2 = np.sum(np.power(e,2))/n
    if lshort:
        l=np.trunc(3*np.sqrt(n)/13)
    else:
        l=np.trunc(10*np.sqrt(n)/14)
    usedlag =int(l)
    s2=R_pp_sum(e,len(e),usedlag ,s2)
    
    stat=eta/s2

    pvalue , msg=approx(table, tablep, stat)
    
    print "KPSS Test for ",regression," Stationarity\n"
    print ("KPSS %s=%f" % (regression, stat))
    print ("Truncation lag parameter=%d"% usedlag )
    print ("p-value=%f"%pvalue )

    if msg is not None:
        print "\nWarning:",msg 
    
    return ( stat,pvalue , usedlag )


def R_pp_sum (u, n, l, s):
    tmp1 = 0.0
    for i in range(1,l+1):
        tmp2 = 0.0
        for j in range(i,n):
            tmp2 += u[j]*u[j-i]
        tmp2 = tmp2*(1.0-(float(i)/((float(l)+1.0))))
        tmp1 = tmp1+tmp2
    
    tmp1 = tmp1/float(n)
    tmp1 = tmp1*2.0
    return s + tmp1

def approx(x,y,v):
    if (v>x[0]):
        return (y[0],"p-value smaller than printed p-value")
    if (v