Spare Time Labs 2.0

Welcome


EazyCNC


jDraft 2.0




PureJavaComm


PIC CDC ACM


Weather


Ten-Buck Furnace



H8S Bootloader



Camera Calibration



Multitouch



Myford VFD


Fun with HC08


bl08


printf II


Java Goes Native



Densitometer


printf


jApp


Igloo


New Furnace


New Furnace
Part II


Linux 101


H8S/gcc


Quickie


Gas Fired Furnace


Down Memory Lane


Exlibris


Wheel Patterns


Glitches


CHIP-8


eDice


Animato


jDraft


JNA Benchmark


Contact Info
12.11.2013

Benchmarking JNA method calls

A year or so ago when I was optimizing my PureJavaComm library I made a micro benchmark to evaluate some details of JNA method calling overhead. Some resent discussion on the rxtx@qbang.org mailing list prompted me to dig out that benchmark and write this page to document my findings.

You can find the complete benchmark code here.

To run it you need a Unix-like operating system, I don't think it will run without modifications on Windows, but I think the results and conclusions from my Mac OS X are valid for Windows and Linux too.

The benchmark calls three different standard POSIX C-library functions, htonl,memset() and memcpy, using both 'classic' JNA calling method and the newer and faster 'direct' call method. The purpose of these tests is to evalute the speed of just calling a C-function and the cost of passing non trivial data/parameters to a C-function.

The code also measures the speed of calling memset() and memcpy with the test data allocated on the C-heap to get a different measurement the overhead of copying the data from the JVM memory to C-code and back.

The total time to execute each call for 10.000 times are recorded and averaged per call.

To prime the Java Hot Spot JIT compiler the tests are repeated ad nouseam and only after a couple of minutes I sampled the output.

Here is a representative sample of the results:

 
1.350	5.322	3.727	6.080	3.719	0.147	0.444	0.243	0.689	0.276		

From left to right the results, in microseconds, for standard JNA calls are:


	htonl() 
	memset() with byte[] parameter, data on Java-heap
	memset() with Memory/Pointer parameter, data on C-heap
	memcpy() with byte[] parameter, data on Java-heap
	memcpy() with Memory/Pointer parameter, data on C-heap

and then the same using 'direct' call method.

The standard deviation of the results are in the ball park of 2%.

Dividing the standard call times with the direct call times gives us a measure of how much speed up we get from direct calls:

 
8,36	11,69		15,32	10,41		13,22	

So we can say that for these types of calls (simple 'primitive' types supported by the direct call method) the speed up is roughly ten fold.

Substracting the memset times from the memcpy times we get an estimate of how long it takes to copy 512 bytes back and forth from Java-heap to C-heap and back. For standard calls that overhead is about 3 used and for direct calls about 200 usec.

The htonl test shows us that a simply crossing the Java to C boundary takes about 1.4 usec for standard calls and 0.15 usec for direct calls, on my test machine:

 
	Mac BookPro Retina 
	2.6 GHz Intel Core i7 
	8 GB 1600 MHz DDR3 
	Mac OS X 10.8.5 
	jdk1.7.0_07

Comparing the execution times with data on Java-heap or C-heap we see that memset and memcpy take about 200 nsec to 'process' 512 bytes.

Note that his sort of comparisong is full of pitfalls and is not directly appicable to all use cases but I thought sharing these simple measurments would help people to quantify their statements instead of just charaterizings this and thas as fast or slow.

For me the take-away from this exercise was that for PureJavaComm the standard calls were fast enough in a fast computer and did not warant even the relatively simple and easy task of converting to the direct call method. On the otherhand for the Linux backend running a the low powered RaspberryPI it was well worth the effort and made the difference between usable and useless.

That's all folks, Kusti