erlperf 2.2 improvements – Maxim Fedorov

What’s the best way to demotivate an engineer? Request a written documentation. I have been procrastinating long enough, but late realisation came, I am repeating myself too often, explaining the same concepts. Instead of preemptively documenting and then referencing the documentation.

Major change: documentation

The most important change in 2.2 is documentation update. Or rather, full rewrite. Starting with 2.1, erlperf documentation is hosted on hexdocs.pm, and generated with ExDoc. It gives much better user experience and powerful navigation.

Command line API got its own reference, separated from the overview.

Reporting format

Starting with 2.2, erlperf supports basic, extended and full reports. The basic format is the default when less than 10 samples are requested:

./erlperf 'rand:uniform().'
Code                    ||        QPS       Time
rand:uniform().          1   17002 Ki      58 ns

Extended format adds a bunch of extra statistics:

./erlperf 'rand:uniform().' -s 20
Code            ||  Samples       Avg  StdDev    Median      P99  Iterat
rand:uniform().  1       20  16788 Ki   0.05%  16787 Ki 16806 Ki   59 ns

Beyond average number of calls per second, it prints standard deviation (percentage of the average), median and 99th percentile.

Full report adds system information – OS, Erlang VM version and CPU string if accessible. See documentation for more details.

Reflecting changes in the reporting options, programmatic API also received support for the full report, including extra statistics and system information.

Multiple samples in timed mode

Timed mode implementation, originally done in 2.0, lacked an option to repeat the measurement process multiple times. Only one sample was taken and reported. It worked well for tiny functions, but benchmarking complex code required running erlperf multiple times to ensure reproducibility. Starting with 2.2, the -s option works in the timed mode too:

./erlperf 'rand:uniform().' -l 10M -s 20
Code            ||  Samples      Avg  StdDev    Median      P99  Iterat
rand:uniform().  1       20   557 ms   0.50%    556 ms   563 ms   55 ns

The example above requests erlperf to do 20 runs of 10 million rand:uniform() calls. On average, it takes 557 ms (on my machine!) to do a single 10-million-randoms errand.

Standalone benchmarks

Supplying Erlang code through the command line is really cumbersome, especially when the shell escapes it in a weird way. Starting with 2.2, it’s possible to write an escript leveraging erlperf modules for benchmarking:

#!/usr/bin/env escript
%%! +pc unicode -pa /home/erlperf/_build/default/lib/erlperf/ebin
-mode(compile).

measure_me() ->
    rand:uniform().

main(_) ->
   Report = erlperf:benchmark([
       #{runner => fun measure_me/0},
       #{runner => "rand:uniform()."}
   ], #{report => full}, undefined),
   Out = erlperf_cli:format(Report, #{format => extended,
        viewport_width => 120}),
   io:format(Out),
   halt(0).

This script can be re-ran as many times as needed while working on better measure_me function implementation:

./bench.erl 
Code        ||  Samples       Avg  StdDev    Median      P99  Iterat   Rel
rand:unif    1        3  17013 Ki   0.06%  17013 Ki 17024 Ki   58 ns  100%
#Fun<benc    1        3  16394 Ki   0.17%  16390 Ki 16424 Ki   61 ns   96%

Bugfixes and minor API improvements

Specifying --warmup argument for concurrency estimation benchmark had no effect for versions between 1.1.2 and 2.1.0. This has been fixed.

Several programmatic APIs (history, monitor and cluster history) received sensible defaults.

Some improvements were made for cluster-wide benchmarking. Before 2.2, monitoring jobs in a cluster was hardly working. While it’s still an experimental feature, recent bugfixes at least made it usable.

Deprecations and breaking changes

Starting with 2.2, call trace recording is deprecated and should not be used. It will be removed in 3.0. This feature did not do a good job of capturing the trace anyway. Replaying a captured trace is still supported, although as an experiment. erlperf command line interface treats runner code that does not end with . (period) or } (MFA tuple) as a file name. This file is expected to contain list of MFA tuples written as term_to_binary.

Due to changes in the monitor sample structure, it is not possible to monitor a cluster running previous versions of erlperf. Given the experimental nature of the clustering feature, this change is unlikely to affect your deployment.