What is the best macro-benchmarking tool / framework to measure a single-threaded complex algorithm in Java? [closed]

I want to make some performance measures (mainly runtime) for my Java code, a single-threaded, local, complex algorithm. (So I do not want a macro-benchmark to measure a JVM implementation.)

With the tool, I would like to

  • analyse the complexity, i.e. see how my code scales for a parameter n (the search depth). (I already have a junit test parameterized in n.)
  • do some trend analysis to get warned if some change to the code base makes the code slower.

For this, I would like to use a tool or framework that

  • does the statistics, optimally computing the mean value, standard deviation and confidence intervals. This is very important.
  • can be parameterized (see parameter n above). This is also very important.
  • is able to produce a fancy plot would be nice, but is not required
  • can be used in an automated (junit-)test to warn me if my program slows done, but this is also not required, just a plus.

What tools/frameworks fulfill these requirements? Which one would be well suited for complexity and trend analysis, and why?


Solution 1:

Below is an alphabetical list of all the tools I found. The aspects mentioned are:

  • is it easily parameterizable
  • is it a Java library or at least easily integratable into your Java program
  • can it handle JVM micro benchmarking, e.g. use a warmup phase
  • can it plot the results visually
  • can it store the measured values persistently
  • can it do trend analysis to warn that a new commit caused a slow down
  • does it provide and use statistics (at least max, min, average and standard deviation).

Auto-pilot

parameterizable; Perl library; no JVM micro benchmarking; plotting; persistence; trend analysis!?; good statistics (run a given test until results stabilize; highlight outliers).

Benchmarking framework

not parameterizable; Java library; JVM micro benchmarking; no plotting; no persistence; no trend analysis; statistics.

Does the statistics extremely well: besides average, max, min and standard deviation, it also computes the 95% confidence interval (via bootstrapping) and serial correlation (e.g. to warn about oscillating execution times, which can occur if your program behaves nondeterministically, e.g. because you use HashSets). It decides how often the program has to be iterated to get accurate measurements and interprets these for reporting and warnings (e.g. about outliers and serial correlation).

Also does the micro-benchmarking extremely well (see Create quick/reliable benchmark with java? for details).

Unfortunately, the framework comes in a util-package bundled together with a lot of other helper-classes. The benchmark classes depend on JSci (A science API for Java) and Mersenne Twister (http://www.cs.gmu.edu/~sean/research/). If the author, Brent Boyer, finds time, he will boil the library down and add a simpler grapher so that the user can visually inspect the measurements, e.g. for correlations and outliers.

Caliper

parameterizable; Java library; JVM micro benchmarking; plotting; persistence; no trend analysis; statistics.

Relatively new project, tailored towards Android apps. Looks young but promising. Depends on Google Guava :(

Commons monitoring

not parameterizable!?; Java library; no JVM micro benchmarking!?; plotting; persistence through a servlet; no trend analysis!?; no statistics!?.

Supports AOP instrumentation.

JAMon

not parameterizable; Java library; no JVM micro benchmarking; plotting, persistence and trend analysis with additional tools (Jarep or JMX); statistics.

Good monitoring, intertwined with log4j, data can also be programmatically accessed or queried and your program can take actions on the results.

Java Simon

not parameterizable!?; Java library; no JVM micro benchmarking; plotting only with Jarep; persistence only with JMX; no trend analysis; no statistics!?.

Competitor of Jamon, supports a hierarchy of monitors.

JETM

not parameterizable; Java library; JVM micro benchmarking; plotting; persistence; no trend analysis; no statistics.

Nice lightweight monitoring tool, no dependencies :) Does not offer sufficient statistics (no standard deviation), and extending the plugIn correspondingly looks quite difficult (Aggregators and Aggregates only have fixed getters for min, max and average).

jmeter

parameterizable!?; java library; no JVM micro benchmarking!?; plotting; persistence; trend analysis!?; statistics!?.

Good monitoring library that is tailored towards load testing web applications.

Java Microbenchmark Harness (jmh)

parametrizable (custom invokers via Java API); Java library; JVM microbenchmarking; no plots; no persistence; no trend analysis; statistics.

The benchmarking harness built by Oracle's HotSpot experts, thus very suitable for microbenchmarking on HotSpot, used in OpenJDK performance work. Extreme measures are taken to provide the reliable benchmarking environment. Besides human-readable output, jmh provides a Java API to process the results, e.g. for 3rd party plotters and persistence providers.

junit-Benchmarks

parameterizable; Java library; JVM micro benchmarking; plotting; persistence (using CONSOLE, XML or database H2); graphical trend analysis; statistics (max, min, average, standard deviation; but not easily extensible for further statistics).

Simply add a junit-4-rule to your junit tests :)

junit-Benchmarks is open source, under the Apache 2 licence.

Update: project moved to jmh

junitperf

Mainly for doing trend analysis for performance (with the JUnit test decorator TimedTest) and scalability (with the JUnit test decorator LoadTest).

parameterizable; Java library; no JVM micro benchmarking; no plotting; no persistence; no statistics.

perf4j

not parameterizable; Java library; no JVM micro benchmarking; plotting; persistence via JMX; trend analysis via a log4j appender; statistics.

Builds upon a logging framework, can use AOP.

Project Broadway

Very general concept: monitors observe predefined conditions and specify how to react when they are met.

speedy-mcbenchmark

Main focus is on parameterizability: check whether your algorithm scales, i.e. check if it's O(n), O(n log(n)), O(n²)...

java library; JVM micro benchmarking; no plotting; persistence; trend analysis; no statistics.

The Grinder

parameterizable; Jython library; no JVM micro benchmarking; plotting; persistence; no trend analysis; no good statistics, but easily extensible.

Depends on Jython, HTTPClient, JEditSyntax, ApacheXMLBeans, PicoContainer.

TPTP

parameterizable!?; Java tool platform; no JVM micro benchmarking!?; plotting; persistence; graphical trend analysis; no statistics!?

The Test & Performance Tools Platform is a huge generic and extensible tool platform (based on Eclipse and four EMF models). Hence it is powerful but quite complex, can slow Eclipse down, and extending it for your own needs (e.g. with statistics so that they influence the number of iterations) seems to be very difficult.

Usemon

parameterizable!?; Java library; no JVM micro benchmarking; plotting; persistence; trend analysis!?; statistics!?.

Tool is tailored towards monitoring in large clusters.

Solution 2:

Another alternative is caliper from google. It allows parameterized testing.

Solution 3:

Try using http://labs.carrotsearch.com/junit-benchmarks.html. This is an extention to JUni4, features:

Records execution time average and standard deviation.
Garbage collector activity recording.
Per-benchmark JVM warm-up phase.
Per-run and historical chart generation.
Optional results persistence in the H2 SQL database (advanced querying, historical analysis).