mh:{[N] d:{.[000b;(1?3);:;1b]} each til N; // create simulation with randomly setting car positions s:N?3; // choice of player i:(til N;s); // index changed: sum not d @' s; // let see how many wins if player changed his mind :changed%N // ratio } q)mh[100000] 0.66424
Saturday, November 14, 2015
Monty Hall problem simulation within Kdb+ q
Monty Hall problem is one of the paradox it would easily make you puzzled. If you are one of these puzzled one and still wondering how you would have 0.66 chance when you change your mind after a door is opened, following simulation in KDB+ q may help to persuade you:
Tuesday, June 09, 2015
Mash-up technologies for a simple market data consumer
In this text i use following open source java libraries to simulate a simple market data consumer application:
- Spring : Dependency injection
- JeroMQ(ZeroMQ) : Pub/Sub between Provider and Aggregator
- Chronicle Queue: Messaging between Aggregator and Writer
- Smile Jackson: Object Serialization
- Apache Commons Pool2 : Object Pooling
- Guava: AtomicDouble
- SLF4J and logback : Logging framework
- Java 7 and Maven
Market Data Consumer
This application simulates a market data consumer. The ecosystem consists of two providers of market data, one aggregator service that consumes data from the providers and one file writer that writes the results to a CSV file. The communication between the components should occur using a messaging framework(JeroMQ and Chronice Queue).
Provider
The two providers will publish data that conforms to the following interface:
interface IMarketValue
{
String getProviderId ();
String getName();
double getValue();
}
Lets assume that providers are named “Orion“ and “Polaris”. The lists of instrument names for Orion are:
AAA
BBB
CCC
DDD
EEE
FFF
The instrument names for Polaris are:
AAA
BBB
CCC
GGG
HHH
III
JJJ
Providers generate random numbers (from 5.5 to 10.5) for the values as quick as possible without no throttling.
Aggregator
Aggregator service listens to data from both providers. For instruments that are only available in one provider, the aggregator should pass the data down as is to the file writer. For instruments that are available from both providers, the aggregator should take the average of the last incoming data from both providers. E.g.
After receiving:
Orion – AAA – 6.3
Orion – FFF – 7.0
Polaris – AAA – 7.0
Polaris – BBB – 8.0
the aggregator should pass down:
AAA – 6.65
FFF – 7.0
BBB – 8.0
However, if Orion then publishes:
Orion – BBB – 7.0
the next aggregator output should contain:
BBB – 7.5
The aggregator contains a dependency while processing messages which introduces a 1 second delay to the processing loop which must be simulated. Newest available data should be always processed for a given Name in the aggregator at any time. I.e. if the aggregator is stuck in a processing loop and 5 updates come through for AAA, the next processing run should use the latest available data for AAA.
File Writer
The file writer outputs in the following format and the entries should be in alphabetical order. Each new set of values should create a new file and there should be only one entry for each instrument.
Instrument,Value
AAA,6.65
BBB,5.5
FFF,7.0
Every time the file writer receives a new set of values it should create a new version of the file. Should the file writer not be able to write to the output file, then an alternate filename should be used, but once the file is once more accessible it should return to the original filename. It should also not create many files and only a new file when a file is not accessible.
Source Code
Full implementation of this application is checked in github:
Sunday, May 03, 2015
Which Concurrent Hash Map?
There are several open source java collection libraries which provide concurrent hash map functionality. But which one is faster and suitable for a low latency application?
In this text, following concurrent hash map classes are investigated in terms of their performance:
- ConcurrentHashMap by Oracle (JDK 8)
- ConcurrentHashMap and ConcurrentHashMapUnsafe by Goldman Sachs (GS) Collection (6.1.0)
- FastMap by Javolution (6.0.0)
- ConcurrentHashMap by Guava (18.0)
Wherever it is required, these parameters are used:
- Initial Capacity: 16
- Concurrency level: 16
- Load factor: 0.75
Setup
Two kind of tests are done to benchmark Map#put and Map#get() methods of these classes. Following setup is applied:- As key for maps random generated strings with length of 8 are used.
- For warm up 100K operation is done before benchmarks are done for each tests.
- Each main benchmark is done with various number of tasks (10,100,1K,10K,100K,1M,10M)
- Each benchmark is repeated 5 times and its average is used as result.
- 10 producer (thread) is used to do put/get operations to simulate concurrency.
- To reduce garbage collection (GC) activities, before each benchmark, GC is forced , System.gc() and a large heap size is set.
- Oracle JDK 1.8.0.25 is used
- Test are done on a Redhat , Intel G6 @ 2.93Ghz with 12 cores server.
Results
Following tables and graphs illustrates elapsed time (macroSeconds) of put/get benchmarks.Put : Classes/Time (macroSecond) | 10 | 100 | 1000 | 10000 | 100000 | 1000000 | 10000000 |
Javolution_FastMap | 188 | 157 | 292 | 1684 | 16857 | 297025 | 4228273 |
Oracle_ConcurrentHashMap | 91 | 119 | 170 | 970 | 12388 | 207857 | 2684529 |
GS_ConcurrentHashMapUnsafe | 85 | 111 | 259 | 1017 | 13290 | 252151 | 3053912 |
GS_ConcurrentHashMap | 78 | 112 | 205 | 1060 | 14051 | 258071 | 3104562 |
Gauva_ConcurrentHashMap | 74 | 93 | 158 | 911 | 12485 | 205433 | 2574604 |
Get: Classes/Time (macroSecond) | 10 | 100 | 1000 | 10000 | 100000 | 1000000 | 10000000 |
Javolution_FastMap | 98 | 105 | 215 | 1630 | 15732 | 215260 | 2296100 |
Oracle_ConcurrentHashMap | 78 | 78 | 115 | 619 | 7411 | 144583 | 1866310 |
GS_ConcurrentHashMapUnsafe | 86 | 81 | 149 | 960 | 10342 | 172114 | 2204728 |
GS_ConcurrentHashMap | 69 | 77 | 146 | 1119 | 10322 | 168585 | 2027240 |
Gauva_ConcurrentHashMap | 70 | 71 | 114 | 621 | 7861 | 129094 | 1581683 |
Conclusion
As tables and figures shows that ConcurrentHashMap implementation of Guava and Oracle performs best for put and get operations. This analysis also exposes myths about performance of FastMap by Javolution.For source code of benchmark, please click here.
Saturday, March 21, 2015
Multiple or Single Exit Point in a Java Methods
Recently, during a code review, a question popped up: Whether a method should have a single or multiple exit point once some conditions are satisfied. As discussed in here, there is not a clear, one cut answer for this question.
In terms of readability and exposing responsibility of method immediately, my personal preference is to return immediately from a method once task is done. Based on my experience with dealing complex, long methods with limited documentation, multiple returns in a method helped me to understand code easier in past.
But readability is not always the only factor when I write code in my current job. Performance (low latency) beats readability. So, I decided to benchmark these two methods:
Multiple return points
Single returns in the end
Source Code is as below:
In terms of readability and exposing responsibility of method immediately, my personal preference is to return immediately from a method once task is done. Based on my experience with dealing complex, long methods with limited documentation, multiple returns in a method helped me to understand code easier in past.
But readability is not always the only factor when I write code in my current job. Performance (low latency) beats readability. So, I decided to benchmark these two methods:
Multiple return points
public Number multipleReturnPoints(Number num) { if (num instanceof Integer) { return Integer.valueOf(num.intValue()); } if (num instanceof Float) { return Float.valueOf(num.floatValue()); } if (num instanceof Double) { return Double.valueOf(num.doubleValue()); } return null; }
Single returns in the end
public Number singleReturnPoint(Number num) { Number val = null; if (num instanceof Integer) { val = Integer.valueOf(num.intValue()); } if (num instanceof Float) { val = Float.valueOf(num.floatValue()); } if (num instanceof Double) { val = Double.valueOf(num.doubleValue()); } return val; }
I used following setup for testing:
- Both method is called 5 million times, after 500K warm up
- In order to asses JIT compile's affect, two different schemes is used for methods parameter selection:
- Clustered (Batch): Same type parameter called in batch sequentially. In other words first Double, then Integer and finally Float type is chosen.
- Random : Type parameter is chosen from a uniform distribution. This is more realistic case would happen in a system.
- Following JRE are tested:
- Oracle JRE 1.6.0_20
- Oracle JRE 1.7.0_15
- Oracle JRE 1.8.0_11
- Azur Zing 1.7.0-5.9.5.0
- Garbage collection is prevented by allocating large amount of heap size
- Tests are run on Redhat Linux (64 bit,2.93GHz).
Results
Above benchmark test is done 7 times and in below charts, average latency (mSec) is illustrated .
Oracle JDK, although single
returns perform marginally better , there is not much difference between
multiple or single returns for both clustered random parameter selection. Whereas for Zing JVM, there is not a definitive conclusion. But given that random parameter selection is more realistic, single return is the winner, I think.
In conclusion, I could not find a definitive answer which
method exit case performs better in terms of low latency.
package com.denizstij; public interface Simulation { Number run(Number num); String getName(); } //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// package com.denizstij; import java.util.Arrays; import java.util.Random; public class SingleOrMultipleExitPoints { public static class MultipleReturns implements Simulation { public Number run(Number num) { if (num instanceof Integer) { return Integer.valueOf(num.intValue()); } if (num instanceof Float) { return Float.valueOf(num.floatValue()); } if (num instanceof Double) { return Double.valueOf(num.doubleValue()); } return null; } @Override public String getName() { return "Multiple Returns"; } } public static class SingleReturn implements Simulation { public Number run(Number num) { Number val = null; if (num instanceof Integer) { val = Integer.valueOf(num.intValue()); } if (num instanceof Float) { val = Float.valueOf(num.floatValue()); } if (num instanceof Double) { val = Double.valueOf(num.doubleValue()); } return val; } @Override public String getName() { return "Single Return"; } } public static void main(String[] args) { Random random = new Random(0); int warmUpLoopSize = 50000; int loopSize = warmUpLoopSize * 10; if (args.length==2){ warmUpLoopSize=Integer.parseInt(args[0]); loopSize=Integer.parseInt(args[1]); System.out.println("Number of warmUpLoopSize:"+warmUpLoopSize); System.out.println("Number of loopSize :"+loopSize); } // Arrays for warming up JVM Integer warmUpIntArr[] = new Integer[warmUpLoopSize]; Float warmUpFloatArr[] = new Float[warmUpLoopSize]; Double warmUpDoubleArr[] = new Double[warmUpLoopSize]; // Simulation arrays Integer intArr[] = new Integer[loopSize]; Float floatArr[] = new Float[loopSize]; Double doubleArr[] = new Double[loopSize]; // Creatring random numbers createRandomNumbers(random, warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,warmUpLoopSize); createRandomNumbers(random, intArr, floatArr, doubleArr, loopSize); // Creating test cases MultipleReturns multipleReturns = new MultipleReturns(); SingleReturn singleReturn= new SingleReturn(); int[] warmUpRandomCallSeq = generateRandomCallSequence(warmUpLoopSize,3,random); int[] warmUpClusteredCallSeq = generateClusteredCallSequence(warmUpLoopSize,3,random); int[] randomCallSeq = generateRandomCallSequence(loopSize,3,random); int[] clusteredCallSeq = generateClusteredCallSequence(loopSize,3,random); //System.out.println(Arrays.toString(warmUpRandomCallSeq)); //System.out.println(Arrays.toString(warmUpClusteredCallSeq)); String simName="Clustered Warmup"; callSimulation(warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,simName, multipleReturns,warmUpClusteredCallSeq); callSimulation(warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,simName, singleReturn,warmUpClusteredCallSeq); simName="Clustered Calls"; callSimulation(intArr, floatArr, doubleArr,simName, multipleReturns,clusteredCallSeq); callSimulation(intArr, floatArr, doubleArr,simName, singleReturn,clusteredCallSeq); System.out.println("=============================="); simName="Random Warmup"; callSimulation(warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,simName, multipleReturns,warmUpRandomCallSeq); callSimulation(warmUpIntArr, warmUpFloatArr, warmUpDoubleArr,simName, singleReturn,warmUpRandomCallSeq); simName="Random Calls"; callSimulation(intArr, floatArr, doubleArr,simName, multipleReturns,randomCallSeq); callSimulation(intArr, floatArr, doubleArr,simName, singleReturn,randomCallSeq); System.out.println("=============================="); } private static void callSimulation(Integer[] intArr, Float[] floatArr, Double[] doubleArr, String msg, Simulation sim, int[] callSeq) { long now = System.currentTimeMillis(); int clusterIndexes[]= new int[3]; int index=0; for (int i = 0; i < callSeq.length; i++) { int callerIndex = callSeq[i]; index=clusterIndexes[callerIndex]++; if (index>=intArr.length){ continue; } switch (callerIndex){ case 0: doubleArr[index] = (Double) sim.run(doubleArr[index]); break; case 1: intArr[index] = (Integer) sim.run(intArr[index]); break; case 2: floatArr[index] = (Float) sim.run(floatArr[index]); break; default: System.out.println("Unknown cluster :("+callerIndex); } } long finish = System.currentTimeMillis(); long elapsedTime = finish - now; System.out.println(elapsedTime + ": " +msg+" :"+ sim.getName()); System.out.println(Arrays.toString(clusterIndexes)); System.out.flush(); } static void createRandomNumbers(Random random, Integer intArr[], Float floatArr[], Double doubleArr[], int size) { for (int i = 0; i < size; i++) { intArr[i] = Integer.valueOf(random.nextInt(size)); floatArr[i] = Float.valueOf(random.nextFloat() * size); doubleArr[i] = Double.valueOf(random.nextDouble() * size); } } static int[] generateClusteredCallSequence(int simSize,int numOfCluster,Random random){ int totalSize=numOfCluster*simSize; int callSeq[]= new int[totalSize]; int index=0; for (int j=0;j < numOfCluster;j++){ for (int i=0;i < simSize;i++){ callSeq[index++]=j; } } return callSeq; } static int[] generateRandomCallSequence(int simSize,int numOfCluster,Random random){ int totalSize=numOfCluster*simSize; int callSeq[]= new int[totalSize]; for (int i=0;i < totalSize;i++){ callSeq[i]=random.nextInt(numOfCluster); } return callSeq; } }
Wednesday, February 11, 2015
Knight Moves Problem in Java
In this text, i represent a solution for Knight Moves Problem in java. The problem is defined as follows:
Pictured here is a keypad:
Pictured here is a keypad:
We want to find
all sequences of length n that can be
keyed into the keypad in the following manner:
- The initial keypress can be any of the
keys.
- Each subsequent keypress must be a knight move from the previous
keypress.
- There can be at most 2 vowels in the
sequence.
- Run your solution for n = 1 to 32
A knight move is
made in one of the following ways:
- Move two steps horizontally and one
step vertically.
- Move two steps vertically and one step horizontally.
- There is no wrapping allowed on a knight move.
Here are some
examples of knight moves:
Given a value for n, the program should write the number
of valid n-key sequences on a single
line to standard out.
Knight Moves Solution
package com.denizstij.km; import java.util.LinkedHashMap; import java.util.Map; import java.util.Map.Entry; import java.util.Set; import com.denizstij.km.Keyboard.Key; /** * * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij AT gmail.com * */ public class KnightMoveMain { private final int maxVowelCount; public KnightMoveMain(int maxVowelCount){ this.maxVowelCount=maxVowelCount; } public long findNumberOfSequence(int lenOfSequence) { if (lenOfSequence<=0){ return 1;// number of empty set is 1 } Map <PathInfoBean, Long> pathInfoMap= new LinkedHashMap<PathInfoBean, Long>(); for (Key key : Keyboard.INSTANCE.getKeySet()) { PathInfoBean pathinfo= new PathInfoBean(key, key.isVowel()?1:0); pathInfoMap.put(pathinfo, 1L); } long numberOfPaths = findNumberOfSequence1(pathInfoMap,1,lenOfSequence); return numberOfPaths; } private long findNumberOfSequence1(Map <PathInfoBean, Long> pathInfoMap ,int sequenceLen, int lenOfSequence) { if (sequenceLen>=lenOfSequence){ long numberOfPaths=0; for (Long pathCount : pathInfoMap.values()) { numberOfPaths+=pathCount; } return numberOfPaths; } Map <PathInfoBean, Long> pathInfoMapNew= new LinkedHashMap<PathInfoBean, Long>(); for (Entry<PathInfoBean, Long> pathInfoBeanEntry : pathInfoMap.entrySet()) { PathInfoBean pathInfoBean = pathInfoBeanEntry.getKey(); Long pathCount = pathInfoBeanEntry.getValue(); Set<key> possibleMoveSet =pathInfoBeanEntry.getKey().getKey().getKnightMoves(); for (Key key : possibleMoveSet) { int vowelCount=pathInfoBean.getVowelCount(); if (key.isVowel()){ if (vowelCount>=maxVowelCount){ continue; } vowelCount++; } PathInfoBean newPathInfoBean = new PathInfoBean(key,vowelCount); Long currentPath = pathInfoMapNew.get(newPathInfoBean); if (currentPath==null) // first time pathInfoMapNew.put(newPathInfoBean,pathCount); else { // we have already, so just lets aggregate both branch pathInfoMapNew.put(newPathInfoBean,currentPath+pathCount); } } } return findNumberOfSequence1(pathInfoMapNew,sequenceLen+1,lenOfSequence); } public static void main(String[] args) { KnightMoveMain km= new KnightMoveMain(2); long now = System.currentTimeMillis(); for (int n=0;n<=32;n++) { long numberOfPaths = km.findNumberOfSequence(n); System.out.println(n+"==>"+numberOfPaths); } long elapsedTime = System.currentTimeMillis()-now; System.out.println("Total Elapsed Time (msec):"+elapsedTime); } } //############################################################### //############################################################### //############################################################### package com.denizstij.km; import java.util.Arrays; import java.util.Collections; import java.util.HashSet; import java.util.LinkedHashSet; import java.util.Set; /** * * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij AT gmail.com * */ public class Keyboard{ final private Set <key> KEYSET= new LinkedHashSet<key>(); final public static Key A= new Key('A', true); final public static Key B= new Key('B', false); final public static Key C= new Key('C', false); final public static Key D= new Key('D', false); final public static Key E= new Key('E', true); final public static Key F= new Key('F', false); final public static Key G= new Key('G', false); final public static Key H= new Key('H', false); final public static Key I= new Key('I', true); final public static Key J= new Key('J', false); final public static Key K= new Key('K', false); final public static Key L= new Key('L', false); final public static Key M= new Key('M', false); final public static Key N= new Key('N', false); final public static Key O= new Key('O', true); final public static Key _1= new Key('1', false); final public static Key _2= new Key('2', false); final public static Key _3= new Key('3', false); final public static Keyboard INSTANCE= new Keyboard(); private void init(){ KEYSET.add(A.setKnightMoves(H,L)); KEYSET.add(B.setKnightMoves(I,K,M)); KEYSET.add(C.setKnightMoves(F,J,L,N)); KEYSET.add(D.setKnightMoves(G,M,O)); KEYSET.add(E.setKnightMoves(H,N)); KEYSET.add(F.setKnightMoves(C,M,_1)); KEYSET.add(G.setKnightMoves(D,N,_2)); KEYSET.add(H.setKnightMoves(A,E,K,O,_1,_3)); KEYSET.add(I.setKnightMoves(B,L,_2)); KEYSET.add(J.setKnightMoves(C,M,_3)); KEYSET.add(K.setKnightMoves(B,H,_2)); KEYSET.add(L.setKnightMoves(A,C,I,_3)); KEYSET.add(M.setKnightMoves(B,D,F,J)); KEYSET.add(N.setKnightMoves(C,E,G,_1)); KEYSET.add(O.setKnightMoves(D,H,_2)); KEYSET.add(_1.setKnightMoves(F,H,N)); KEYSET.add(_2.setKnightMoves(G,I,K,O)); KEYSET.add(_3.setKnightMoves(H,J,L)); } // Singleton private Keyboard(){ init(); } public Set <key> getKeySet() { return KEYSET; } public static class Key{ final private char key; final private boolean isVowel; private Set<key> knightMoves; Key(char key, boolean isVowel){ this.key=key; this.isVowel=isVowel; } public Key setKnightMoves(Key ...knightMoves) { this.knightMoves=Collections.unmodifiableSet(new HashSet<key>(Arrays.asList(knightMoves))); return this; } public Set<key> getKnightMoves() { return knightMoves; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + key; return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; Key other = (Key) obj; if (key != other.key) return false; return true; } public char getKey() { return key; } public boolean isVowel() { return isVowel; } @Override public String toString() { return "Key [key=" + key + ", isVowel=" + isVowel +"]"; } } } //############################################################### //############################################################### //############################################################### package com.denizstij.km; import com.denizstij.km.Keyboard.Key; /** * * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij AT gmail.com * */ public class PathInfoBean { final private Key key; final private int vowelCount; public PathInfoBean(Key key, int vowelCount) { this.key = key; this.vowelCount = vowelCount; } public Key getKey() { return key; } public int getVowelCount() { return vowelCount; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((key == null) ? 0 : key.hashCode()); result = prime * result + vowelCount; return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; PathInfoBean other = (PathInfoBean) obj; if (key == null) { if (other.key != null) return false; } else if (!key.equals(other.key)) return false; if (vowelCount != other.vowelCount) return false; return true; } @Override public String toString() { return "PathInfoBean [key=" + key + ", vowelCount=" + vowelCount + "]\n"; } }
Output
1==>18 2==>60 3==>214 4==>732 5==>2486 6==>8392 7==>27912 8==>93204 9==>306288 10==>1013398 11==>3302640 12==>10842528 13==>35108325 14==>114492546 15==>368772182 16==>1195650888 17==>3833937659 18==>12367469662 19==>39505386940 20==>126863960276 21==>403894417802 22==>1291845025388 23==>4100838894764 24==>13069438426724 25==>41380854190503 26==>131455706155352 27==>415265140628246 28==>1315326269976050 29==>4146551571701468 30==>13098978945964762 31==>41218021719932582 32==>129891093550589788 Total Elapsed Time (msec):450
Tuesday, February 10, 2015
Estimating Pi number with Monte Carlo Simulation in Java and Kdb+ q
package com.denizstij.montecarlo.sim; import java.util.Random; /** * This class estimates Pi with monte carlo simulation. * * We assume we have a square with dimension 1x1 which contains a disk in middle with radious of 0.5 * where it touches the square on 4 edges. We generate N number of pairs (x,y) between 0 and 1. * Then we check if each pair is in the circle or not. 4 times ratio of total pairs in circle * to total simulation is an estimation of Pi. * * pi=4*(total pairs in circle)/total pairs * * @author Deniz Turan, http://denizstij.blogspot.co.uk, denizstij AT gmail.com * */ public class PIEstimationWithMonteCarloSimulation { // number of total simulation private static final long TOTAL_SIMULATION=10000000L; private double estimatePI(){ double pi=0; Random randomGen= new Random(); long inCircleCounter=0; for (int i=0;i<TOTAL_SIMULATION;i++){ // generate pairs double x = randomGen.nextDouble(); double y = randomGen.nextDouble(); // check if the pair in the circle inCircleCounter+=isInCircle(x,y); } double ratio=(double)inCircleCounter/TOTAL_SIMULATION; pi=ratio*4; return pi; } /* * Check given two points in a circle: x^2+y^2<=1 */ private int isInCircle(double x, double y) { double z=Math.sqrt(x*x+y*y); return z<=1?1:0; } public static void main(String[] args) { PIEstimationWithMonteCarloSimulation est= new PIEstimationWithMonteCarloSimulation(); double estimatedPI = est.estimatePI(); System.out.println("Estimated PI:"+estimatedPI); } }And now in Kdb+ q :
piEst:{[N] sqrArea:4; / area of square with length 2 and containing a circle with radius 1 xy:flip {2?1.0} each til N; / xy points p:sqrt sum xy*xy; / sqrt(x^2+y^2) <=1 to be in a circle totalIncircle: sum p<=1; piEst:sqrArea*totalIncircle%N; / that’s the estimation of PI :piEst } q)piEst[1000000] 3.141476Above code can be reduced to one line such as:
q)N:10000000 q)(1%N)*4*sum 1>=sqrt sum ({x*x}') each flip {2?1.0} each til N 3.141842
Monday, February 09, 2015
Generating random numbers from a given set of numbers according to a distribution in java
Recently i am asked to implement following in Java based on the below interface:
Implement the method nextNum() and a minimal but effective set of unit tests. As a quick check, given Random Numbers are [-1, 0, 1, 2, 3] and Probabilities are [0.01, 0.3, 0.58, 0.1, 0.01] if we call nextNum() 100 times we may get the following results. As the results are random, these particular results are unlikely.
Implement the method nextNum() and a minimal but effective set of unit tests. As a quick check, given Random Numbers are [-1, 0, 1, 2, 3] and Probabilities are [0.01, 0.3, 0.58, 0.1, 0.01] if we call nextNum() 100 times we may get the following results. As the results are random, these particular results are unlikely.
-1: 1 times
0: 22 times
1: 57 times
2: 20 times
3: 0 times
public class RandomGen { // Values that may be returned by nextNum() private int[] randomNums; // Probability of the occurence of randomNums private float[] probabilities; /** Returns one of the randomNums. When this method is called multiple times over a long period, it should return the numbers roughly with the initialized probabilities. */ public int nextNum() { } }Solution:
package com.denizstij.rand; import java.util.Arrays; import java.util.Random; /** * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij AT gmail.com */ public class RandomGen { // Values that may be returned by nextNum() private int[] randomNums; // Probability of the occurence of randomNums private float[] probabilities; private float[] cumProbabilities; private int[] validRandomNums; private int totalNonZeroProbElement=0; private Random random; public RandomGen(int[] randomNums,float[] probabilities){ validateProcessInputs(randomNums,probabilities); random= new Random(); } public RandomGen(int[] randomNums,float[] probabilities, long seed){ validateProcessInputs(randomNums,probabilities); random= new Random(seed); } private void validateProcessInputs(int[] randomNums,float[] probabilities){ if (randomNums==null || probabilities==null || randomNums.length!= probabilities.length || probabilities.length==0){ throw new IllegalArgumentException("RandomNums and probabilities must be non empty and same size"); } int len=probabilities.length; cumProbabilities= new float[len]; validRandomNums= new int[len]; for (int i=0;i < len;i++) { float prob = probabilities[i]; int randomNum = randomNums[i]; if (MathUtil.isLess(prob,0.0f)){ throw new IllegalArgumentException("Probability can not be negative"); } if (MathUtil.isGreater(prob,1.0f)){ throw new IllegalArgumentException("Probability can not be greater than 1"); } if (MathUtil.checkAnyIsNanOrInfinite(prob)){ throw new IllegalArgumentException("Nan or Infite prob is not accepted"); } // not processing elements with zero probabilities as they can not occur if (MathUtil.isEqual(prob, 0.0f)){ continue; } // store valid random number validRandomNums[totalNonZeroProbElement]=randomNum; if (totalNonZeroProbElement==0){ cumProbabilities[totalNonZeroProbElement]=prob; } else { cumProbabilities[totalNonZeroProbElement]=prob+cumProbabilities[totalNonZeroProbElement-1]; } if (MathUtil.isGreater(cumProbabilities[totalNonZeroProbElement],1.0f)){ throw new IllegalArgumentException("Cumilative total probability can not be greater than 1"); } totalNonZeroProbElement++; } if (totalNonZeroProbElement==0){ throw new IllegalArgumentException("All probabilities are zero :("); } if (!MathUtil.isEqual(cumProbabilities[totalNonZeroProbElement-1],1.0f)){ throw new IllegalArgumentException("Total probability must be 1"); } // let's shrink arrays to save memory this.validRandomNums=Arrays.copyOf(validRandomNums, totalNonZeroProbElement); this.cumProbabilities=Arrays.copyOf(cumProbabilities, totalNonZeroProbElement); this.randomNums=randomNums; this.probabilities=probabilities; // some logging System.out.println("RandomNums :"+Arrays.toString(this.randomNums)); System.out.println("probabilities :"+Arrays.toString(this.probabilities)); System.out.println("totalNonZeroProbElement:"+totalNonZeroProbElement); System.out.println("validRandomNums :"+Arrays.toString(validRandomNums)); System.out.println("cumProbabilities :"+Arrays.toString(cumProbabilities)); } /** Returns one of the randomNums. When this method is called multiple times over a long period, it should return the numbers roughly with the initialized probabilities. */ public int nextNum() { float nextFloat = random.nextFloat(); return nextNum(nextFloat); } /** Returns one of the randomNums. When this method is called multiple times over a long period, it should return the numbers roughly with the initialized probabilities. */ // For testing purposes protected int nextNum(float nextFloat) { int index = Arrays.binarySearch(cumProbabilities,nextFloat); // if not found in the cum probability array, we need to infer index from insertation index if (index<0 code="" index="-1*index-1;" int="" nextrandomval="" return="">0>
package com.denizstij.rand; /** * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij@gmail.com */ public class MathUtil { public static final float FLOAT_COMPARE_RESOLUTION=0.00001f; public static boolean isEqual(float val1, float val2) { if (Math.abs(val1-val2)<FLOAT_COMPARE_RESOLUTION){ return true; } return false; } public static boolean isLess(float val1, float val2) { if (isEqual(val1,val2)){ return false; } return val1<val2; } public static boolean isLessEqual(float val1, float val2) { if (isEqual(val1,val2)){ return true; } return val1<val2; } public static boolean isGreater(float val1, float val2) { if (isEqual(val1,val2)){ return false; } return val1>val2; } public static boolean isGreaterEqual(float val1, float val2) { if (isEqual(val1,val2)){ return true; } return val1>val2; } public static boolean checkAnyIsNanOrInfinite(float ...values){ for (float val : values) { if (Float.isNaN(val) || Float.isInfinite(val)){ return true; } } return false; } }Junit Test:
package com.denizstij.rand.test; import java.util.Arrays; import java.util.LinkedHashMap; import java.util.Map; import java.util.Map.Entry; import org.junit.Assert; import org.junit.Test; import com.denizstij.rand.RandomGen; /** * @author Deniz Turan, http://denizstij.blogspot.co.uk denizstij@gmail.com */ public class RandomGenTest { private static final float SIM_PROB_THRESHOLD = 0.001f; @Test(expected=IllegalArgumentException.class) public void testNullInputs() { int[] randomNums=null; float[] probabilities=null; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testEmptyArray() { int[] randomNums= new int[0]; float[] probabilities=new float[0]; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testDifferentSizeNumsAndProbs() { int[] randomNums= {1}; float[] probabilities={0.3f,0.4f}; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testProbsAndNumberArrayDifferentSize() { int[] randomNums= {1}; float[] probabilities={0.3f,0.4f}; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testProbsIsNotNegative() { int[] randomNums= {1,2}; float[] probabilities={0.3f,-0.4f}; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testProbsAllZero() { int[] randomNums= {1,2}; float[] probabilities={0.0f,0.0f}; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testProbArrayHasNaN() { int[] randomNums= {1,2}; float[] probabilities={Float.NaN,0.4f}; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testProbsIsNotGreaterThanOne() { int[] randomNums= {1,2}; float[] probabilities={0.3f,1.4f}; new RandomGen(randomNums, probabilities); } @Test(expected=IllegalArgumentException.class) public void testProbsTotalIsNotOne() { int[] randomNums= {1,2}; float[] probabilities={0.3f,0.4f}; new RandomGen(randomNums, probabilities); } @Test public void testCumulativeProbLogic() { int[] randomNums= {-1,-2,-4,3410,893,9,343,10}; float[] probabilities={0.0f,0.0f,0.0f,0.3f,0.5f,0.0f,0.2f,0.0f}; //validRandomNums :[3410, 893, 343] //cumProbabilities :[0.3, 0.8, 1.0] RandomGenWrapper randomGen = new RandomGenWrapper(randomNums, probabilities); // checking random number:3410 which has cum prob of 0<= and <=0.3 int nextRandomNum = randomGen.nextNum(0.2f); Assert.assertTrue(nextRandomNum==3410); nextRandomNum = randomGen.nextNum(0.299f); Assert.assertTrue(nextRandomNum==3410); nextRandomNum = randomGen.nextNum(0.3f); Assert.assertTrue(nextRandomNum==3410); // checking random number:893 which has cum prob of 0.3< and <=0.8 nextRandomNum = randomGen.nextNum(0.30001f); Assert.assertTrue(nextRandomNum==893); nextRandomNum = randomGen.nextNum(0.51f); Assert.assertTrue(nextRandomNum==893); nextRandomNum = randomGen.nextNum(0.799999f); Assert.assertTrue(nextRandomNum==893); nextRandomNum = randomGen.nextNum(0.8f); Assert.assertTrue(nextRandomNum==893); // checking random number:343 which has cum prob of 0.8< and <=1 nextRandomNum = randomGen.nextNum(0.800001f); Assert.assertTrue(nextRandomNum==343); nextRandomNum = randomGen.nextNum(0.9f); Assert.assertTrue(nextRandomNum==343); nextRandomNum = randomGen.nextNum(0.9999999f); Assert.assertTrue(nextRandomNum==343); nextRandomNum = randomGen.nextNum(1); Assert.assertTrue(nextRandomNum==343); } @Test public void testNonOccuringEventsWithProbZero() { // -1 event should never happen here int[] randomNums= {5,-1,2,4}; float[] probabilities={0.1f,0.0f,0.6f,0.3f}; RandomGen randomGen = new RandomGen(randomNums, probabilities); //lets try several times if this true int maxTry=100000; for (int i=0;i<maxTry;i++){ int nextNum = randomGen.nextNum(); // make sure we never get -1 which has prob 0 Assert.assertNotEquals(nextNum, -1); } } @Test public void testAlwaysOccuringEventsWithProbOne() { // -1 event should always happen here int[] randomNums= {5,-1,2,4}; float[] probabilities={0.0f,1.0f,0.0f,0.0f}; RandomGen randomGen = new RandomGen(randomNums, probabilities); //lets try several times if this true int maxTry=100000; for (int i=0;i<maxTry;i++){ int nextNum = randomGen.nextNum(); // make sure we always get -1 which has prob 1 Assert.assertEquals(nextNum, -1); } } /* * Generate huge amount of random next number, given probability and random number arrays. * Then, frequency of each random number is estimated. * Finally it is asserted that frequency is close to the input probability within a threshold (0.001) */ @Test public void testProvidedQuickCheck() { int[] randomNums= {-1,0,1,2,3}; float[] probabilities={0.01f, 0.3f, 0.58f, 0.1f, 0.01f}; float[] simFrequency= new float[probabilities.length]; Map <Integer,Long> counterMap= new LinkedHashMap<Integer,Long>(); // reset counter map for (int randomNum : randomNums) { counterMap.put(randomNum, 0L); } RandomGen randomGen = new RandomGen(randomNums, probabilities,1234L); //lets generate many times random numbers int maxTry=10000000; for (int i=0;i<maxTry;i++){ int nextNum = randomGen.nextNum(); Long counter = counterMap.get(nextNum); counter++; counterMap.put(nextNum, counter); } // estimate frequency int index=0; for (Entry<Integer, Long> entry : counterMap.entrySet()) { Long counter = entry.getValue(); // simulation frequencies simFrequency[index]=counter/(float)maxTry; float simProbDifference = Math.abs(simFrequency[index]-probabilities[index]); // Assert.assertTrue(simProbDifference<SIM_PROB_THRESHOLD); index++; } System.out.println("Counter Map "+counterMap); System.out.println("Freq map:"+Arrays.toString(simFrequency)); } private class RandomGenWrapper extends RandomGen{ public RandomGenWrapper(int[] randomNums, float[] probabilities) { super(randomNums, probabilities); } public int nextNum(float nextFloat){ return super.nextNum(nextFloat); } } }
Saturday, January 03, 2015
Stationarity Test with KPSS (Kwiatkowski–Phillips–Schmidt–Shin) in Python
In addition to augmented Dickey–Fuller test (ADF), KPSS (Kwiatkowski–Phillips–Schmidt–Shin) is widely used to verify stationarity of a signal. Python statsmodels contains ADF test, but I could not find any implementation of KPSS in python. Therefore, I converted R implementation of kpss.test in tseries library to python as follows (Click here to download the source code) :
""" Created on Sat Jan-03-2015 @author: Deniz Turan (http://denizstij.blogspot.co.uk/) """ import numpy as np import statsmodels.api as sm from scipy.interpolate import interp1d def kpssTest(x, regression="LEVEL",lshort = True): """ KPSS Test for Stationarity Computes the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test for the null hypothesis that x is level or trend stationary. Parameters ---------- x : array_like, 1d data series regression : str {'LEVEL','TREND'} Indicates the null hypothesis and must be one of "Level" (default) or "Trend". lshort : bool a logical indicating whether the short or long version of the truncation lag parameter is used. Returns ------- stat : float Test statistic pvalue : float the p-value of the test. usedlag : int Number of lags used. Notes ----- Based on kpss.test function of tseries libraries in R. To estimate sigma^2 the Newey-West estimator is used. If lshort is TRUE, then the truncation lag parameter is set to trunc(3*sqrt(n)/13), otherwise trunc(10*sqrt(n)/14) is used. The p-values are interpolated from Table 1 of Kwiatkowski et al. (1992). If the computed statistic is outside the table of critical values, then a warning message is generated. Missing values are not handled. References ---------- D. Kwiatkowski, P. C. B. Phillips, P. Schmidt, and Y. Shin (1992): Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root. Journal of Econometrics 54, 159--178. Examples -------- x=numpy.random.randn(1000) # is level stationary kpssTest(x) y=numpy.cumsum(x) # has unit root kpssTest(y) z=x+0.3*arange(1,len(x)+1) # is trend stationary kpssTest(z,"TREND") """ x = np.asarray(x,float) if len(x.shape)>1: raise ValueError("x is not an array or univariate time series") if regression not in ["LEVEL", "TREND"]: raise ValueError(("regression option %s not understood") % regression) n = x.shape[0] if regression=="TREND": t=range(1,n+1) t=sm.add_constant(t) res=sm.OLS(x,t).fit() e=res.resid table=[0.216, 0.176, 0.146, 0.119] else: t=np.ones(n) res=sm.OLS(x,t).fit() e=res.resid table=[0.739, 0.574, 0.463, 0.347] tablep=[0.01, 0.025, 0.05, 0.10] s=np.cumsum(e) eta=np.sum(np.power(s,2))/(np.power(n,2)) s2 = np.sum(np.power(e,2))/n if lshort: l=np.trunc(3*np.sqrt(n)/13) else: l=np.trunc(10*np.sqrt(n)/14) usedlag =int(l) s2=R_pp_sum(e,len(e),usedlag ,s2) stat=eta/s2 pvalue , msg=approx(table, tablep, stat) print "KPSS Test for ",regression," Stationarity\n" print ("KPSS %s=%f" % (regression, stat)) print ("Truncation lag parameter=%d"% usedlag ) print ("p-value=%f"%pvalue ) if msg is not None: print "\nWarning:",msg return ( stat,pvalue , usedlag ) def R_pp_sum (u, n, l, s): tmp1 = 0.0 for i in range(1,l+1): tmp2 = 0.0 for j in range(i,n): tmp2 += u[j]*u[j-i] tmp2 = tmp2*(1.0-(float(i)/((float(l)+1.0)))) tmp1 = tmp1+tmp2 tmp1 = tmp1/float(n) tmp1 = tmp1*2.0 return s + tmp1 def approx(x,y,v): if (v>x[0]): return (y[0],"p-value smaller than printed p-value") if (v
Subscribe to:
Posts (Atom)