Alternative to Benchee for Erlang?

zabrane · January 14, 2023, 9:56pm

Hi guys,

I’m working on a {Key::binary(),Value::term()} Cache Store, and I would like to bench it against Cachex. The author of Cachex uses Benchee.

Is there anything equivalent to Elixir Benchee for Erlang?
I found (the old) BashoBench but i couldn’t even make it compile under OTP25.

Many thanks

masleeds · January 14, 2023, 11:19pm

The develop-3.2 branch of basho_bench was pushed out last week, to provide support for OTP 25. Checkout the branch and run ./rebar3 escriptize.

The basho_bench tool is primarily about long-running load tests though, so I don’t think you will find it suitable as a Benchee alternative.

zabrane · January 15, 2023, 12:09pm

@masleeds doesn’t compile on my Mac.

But why BashBench has so many dependencies (riakc, eleveldb,bitcask…)?

===> Generated /Users/zab/Documents/mycache/_build/default/lib/riak_pb/src/riak_pb_messages.erl

if [ ! -r snappy-"1.1.9".tar.gz ]; then \
	    wget -O snappy-"1.1.9".tar.gz https://github.com/google/snappy/archive/refs/tags/"1.1.9".tar.gz; \
	fi
if [ ! -d leveldb ]; then \
	    git clone https://github.com/basho/leveldb && \
	    (cd leveldb && git checkout "2.0.38") && \
	    (cd leveldb && git submodule update --init); \
	fi
/Applications/Xcode.app/Contents/Developer/usr/bin/make LDFLAGS=" -L/Users/zab/Documents/mycache/_build/default/lib/eleveldb/c_src/system/lib -lsnappy" LD_LIBRARY_PATH="/Users/zab/Documents/mycache/_build/default/lib/eleveldb/c_src/system/lib:" -C leveldb all
make[1]: Nothing to be done for `all'.
/Applications/Xcode.app/Contents/Developer/usr/bin/make LDFLAGS=" -L/Users/zab/Documents/mycache/_build/default/lib/eleveldb/c_src/system/lib -lsnappy" LD_LIBRARY_PATH="/Users/zab/Documents/mycache/_build/default/lib/eleveldb/c_src/system/lib:" -C leveldb tools
g++ -I. -I./include -mmacosx-version-min=10.8 -DOS_MACOSX -stdlib=libc++ -DLEVELDB_PLATFORM_POSIX  -O2 -g -DNDEBUG     -fPIC tools/leveldb_repair.cc -o leveldb_repair -L . -lleveldb -L/Users/zab/Documents/mycache/_build/default/lib/eleveldb/c_src/system/lib -lsnappy
ld: warning: object file (/Users/zab/Documents/mycache/_build/default/lib/eleveldb/c_src/system/lib/libsnappy.a(snappy-c.cc.o)) was built for newer macOS version (13.0) than being linked (10.8)
ld: warning: object file (/Users/zab/Documents/mycache/_build/default/lib/eleveldb/c_src/system/lib/libsnappy.a(snappy-sinksource.cc.o)) was built for newer macOS version (13.0) than being linked (10.8)
ld: warning: object file (/Users/zab/Documents/mycache/_build/default/lib/eleveldb/c_src/system/lib/libsnappy.a(snappy-stubs-internal.cc.o)) was built for newer macOS version (13.0) than being linked (10.8)
ld: warning: object file (/Users/zab/Documents/mycache/_build/default/lib/eleveldb/c_src/system/lib/libsnappy.a(snappy.cc.o)) was built for newer macOS version (13.0) than being linked (10.8)
ld: warning: ignoring file ./libleveldb.a, building for macOS-x86_64 but attempting to link with file built for unknown-unsupported file format ( 0x21 0x3C 0x61 0x72 0x63 0x68 0x3E 0x0A 0x2F 0x20 0x20 0x20 0x20 0x20 0x20 0x20 )
Undefined symbols for architecture x86_64:
  "leveldb::Env::Default()", referenced from:
      _main in leveldb_repair-10dba6.o
  "leveldb::Status::CopyState(char const*)", referenced from:
      _main in leveldb_repair-10dba6.o
  "leveldb::Options::Options()", referenced from:
      _main in leveldb_repair-10dba6.o
  "leveldb::RepairDB(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, leveldb::Options const&)", referenced from:
      _main in leveldb_repair-10dba6.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[1]: *** [leveldb_repair] Error 1
make: *** [ldb] Error 2
===> Hook for compile failed!

tsloughter · January 15, 2023, 12:20pm

You can use Benchee from Erlang. They have an example in the benchee readme, but we do it slightly differently (using rebar_mix plugin) in Otel, GitHub - open-telemetry/opentelemetry-erlang: OpenTelemetry Erlang SDK

masleeds · January 15, 2023, 1:17pm

The reason for all the deps is that there are specific basho_bench drivers for testing eleveldb, bitcask and riak that require those at deps. I don’t believe there’s anything in basho_bench itself that requires those deps - i.e. you can compile without the drivers and therefore without those deps.

Right now, there’s minimal ongoing work on basho_bench, just enough to maintain the volume tests required for Riak releases. If you want to use it for other purposes, then you may need to do some work to get it up and running.

For OSX, I have it working on Intel/Monterey, but it won’t compile (for a different reason to the one you hit) on Intel/Catalina. If you want to use basho_bench, it may well be easier to strip out eleveldb rather than trying to fix this issue.

zabrane · January 15, 2023, 2:45pm

@masleeds i managed to make it compile on Intel/Montery by re-installing brew.

$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/uninstall.sh)"
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

But when running the basho_bench (escriptized) script, it complains about missing the .beam for my driver which is there:

$ rebar3 compile
$ ls -1 _build/default/lib/mycache/ebin/basho_bench_driver_mycache.beam                                                                                                                                                                                                                                                                
_build/default/lib/mycache/ebin/basho_bench_driver_mycache.beam

%% In a terminal, i started my app:
$ rebar3 shell
[...]
(mycache@127.0.0.1)> l(basho_bench_driver_mycache).
{module,basho_bench_driver_mycache}

%% In another terminal, i ran basho_bench:
$ /usr/local/bin/basho_bench -C 123 -N 'bb@127.0.0.1' -J 'mycache@127.0.0.1' mycache.config

WARNING: This is a deprecated console configuration. Please use "[{level,debug}]" instead.
15:28:25.676 [debug] Lager installed handler {lager_file_backend,
                            "/Users/zab/code/mycache/tests/20230115_152825/console.log"} into lager_event
15:28:25.675 [debug] Lager installed handler {lager_file_backend,
                            "/Users/zab/code/mycache/tests/20230115_152825/error.log"} into lager_event
15:28:25.677 [debug] Lager installed handler error_logger_lager_h into error_logger
15:28:25.934 [notice] Changed loglevel of /Users/zab/code/mycache/tests/20230115_152825/console.log to debug
15:28:25.941 [info] No dimension available for key generator: {pareto_int,1000000}
15:28:25.997 [info] module=basho_bench_stats_writer_csv event=start stats_sink=csv
15:28:26.002 [info] Random source: calling crypto:strong_rand_bytes(100663296) (override with the 'value_generator_source_size' config option
15:28:26.118 [info] Random source: finished crypto:strong_rand_bytes(100663296)
15:28:26.124 [error] Failed to initialize driver basho_bench_driver_mycache: {'EXIT',{undef,[{basho_bench_driver_mycache,new,[1],[]},{basho_bench_worker,worker_idle_loop,1,[{file,"/Users/zab/code/mycache/_build/default/lib/basho_bench/src/basho_bench_worker.erl"},{line,216}]}]}}

This is my driver config (adapted from basho’s ETS config):

$ cat mycache.config
{mode, max}.
{duration, 3}.
{concurrent, 10}.
{driver, basho_bench_driver_mycache}.
{operations, [{get,1}, {put,1}]}.
{key_generator, {pareto_int, 1000000}}.
{value_generator, {fixed_bin, 10}}.
{file_loglevel, info}.
{console_loglevel, notice}.
{logfile, "basho_bench.log"}.

Help appreciated.

masleeds · January 15, 2023, 3:02pm

Not sure that it is complaining about the beam for the driver being missing, more that the driver does not export a new/1 function (apologies if I’ve misinterpreted this).

A driver is required to export a new/1 and a run/4. The new/1 must take an integer ID (which is the ID of the worker, so will be between 1 and the number of concurrent workers in your config) as an argument and return {ok, State} where the state contains any information from the config file required to run the individual operations.

The run/4 should take 4 inputs - OperationName, KeyGen, ValueGen, State - and return {ok, State} should the operation be successful e.g. where OperationName will be get|put in your case.

zabrane · January 15, 2023, 3:04pm

Hmmm, am i missing something? The new/1is exported.

-module(basho_bench_driver_mycache).

%% adapted from: https://github.com/basho/basho_bench/blob/master/src/basho_bench_driver_ets.erl

-export([new/1,
         run/4]).

new(_Id) ->
    Buckets = 1000,
    Ref = mycache:new(Buckets),
    {ok, Ref}.

run(get, KeyGen, _ValueGen, Ref) ->
    Start = KeyGen(),
    case mycache:lookup(Ref, Start) of
        [] -> 
            {ok, Ref};
        [{_Key, _Val}] ->
            {ok, Ref}
    end;

run(put, KeyGen, ValueGen, Ref) ->
    Object = {KeyGen(), ValueGen()},
    mycache:insert(Ref, Object),
    {ok, Ref};

run(delete, KeyGen, _ValueGen, Ref) ->
    Start = KeyGen(),
    mycache:delete(Ref, Start),
    {ok, Ref}.

masleeds · January 15, 2023, 4:02pm

There’s something not quite right.

I just did this on my local copy of basho_bench develop-3.2:

cd src
BashoBench:src $ vi basho_bench_driver_mycache.erl
BashoBench:src $ cd ..
BashoBench:basho_bench $ vi examples/mycache.config
BashoBench:basho_bench $ ./rebar3 escriptize
===> Verifying dependencies...
===> Analyzing applications...
===> Compiling basho_bench
===> Building escript for basho_bench...
BashoBench:basho_bench $ _build/default/bin/basho_bench examples/mycache.config
WARNING: This is a deprecated console configuration. Please use "[{level,debug}]" instead.
15:55:03.486 [debug] Lager installed handler {lager_file_backend,
                            "my_pathbasho_bench/tests/20230115_155503/error.log"} into lager_event
15:55:03.486 [debug] Lager installed handler {lager_file_backend,
                            "my_path/basho_bench/tests/20230115_155503/console.log"} into lager_event
15:55:03.487 [debug] Lager installed handler error_logger_lager_h into error_logger
15:55:03.800 [notice] Changed loglevel of my_path/basho_bench/tests/20230115_155503/console.log to debug
15:55:03.808 [info] No dimension available for key generator: {pareto_int,1000000}
15:55:03.912 [info] module=basho_bench_stats_writer_csv event=start stats_sink=csv
15:55:03.922 [info] Random source: calling crypto:strong_rand_bytes(100663296) (override with the 'value_generator_source_size' config option
15:55:03.953 [debug] Lager installed handler lager_backend_throttle into lager_event
15:55:04.025 [info] Random source: finished crypto:strong_rand_bytes(100663296)
15:55:04.043 [error] Failed to initialize driver basho_bench_driver_mycache: {'EXIT',{undef,[{mycache,new,[1000],[]},{basho_bench_driver_mycache,new,1,[{file,"my_path/basho_bench/src/basho_bench_driver_mycache.erl"},{line,10}]},{basho_bench_worker,worker_idle_loop,1,[{file,"my_path/basho_bench/src/basho_bench_worker.erl"},{line,216}]}]}}

So with the same config and driver file, then the new/1 function is defined in basho_bench_driver_mycache - it is only failing in my case only because I don’t have the mycache module.

So perhaps related to the path you run the basho_bench script from?

zabrane · January 15, 2023, 4:45pm

The only reason it worked in your case is because the rebar3 escriptize command added the basho_bench_driver_mycache.beam to basho_bench escript. That’s why the driver was found.

In this case, i have to copy all my modules inside basho_bench/src, and regenerate it to make it work.
Not very convenient if i’ve to make changes.

zabrane · January 15, 2023, 8:04pm

Using the code_paths config key solved the missing .beam issue. Many thanks @masleeds

{code_paths, [ "/Users/zab/code/mycache/_build/default/lib/mycache/ebin" ]}.
{mode, max}.
{duration, 3}.
{concurrent, 10}.
{driver, basho_bench_driver_mycache}.
{operations, [{get,1}, {put,1}]}.
{key_generator, {pareto_int, 1000000}}.
{value_generator, {fixed_bin, 10}}.
{file_loglevel, info}.
{console_loglevel, notice}.
{logfile, "basho_bench.log"}.

max-au · January 15, 2023, 10:43pm

I believe erlperf is what you’re looking for. Works with both Erlang and Elixir.

zabrane · January 16, 2023, 7:44pm

@max-au that’s awesome. Would it be possible to help me configure erlperf to more or less mimic my basho_bench driver’s config?

{driver, basho_bench_driver_mycache}.
{duration, 3}.    %% in minutes, but seconds are ok
{concurrent, 10}. %% number of spawned processes
{operations, [{get,5}, {put,1}]}.    %% i.e out of 6 (=5+1) operations, try 5-get and 1-put
{key_generator,  {fixed_bin, 10}}.   %% integers are ok
{value_generator, {fixed_bin, 100}}. %% integers are ok

It’s fine if erlperf doesn’t provide the same key/value generator. I’m ok to use integers.

My current testbed is very dumb:

-module(basho_bench_driver_mycache).
%% adapted from:
%% https://github.com/basho/basho_bench/blob/master/src/basho_bench_driver_ets.erl
-export([new/1,
         run/4]).

new(_Id) ->
    Ref = mycache:new(),
    {ok, Ref}.

run(get, KeyGen, _ValueGen, Ref) ->
    Key = KeyGen(),
    _ = mycache:lookup(Ref, Key),
    {ok, Ref};

run(put, KeyGen, ValueGen, Ref) ->
    KeyValue = {KeyGen(), ValueGen()},
    _ = mycache:insert(Ref, KeyValue),
    {ok, Ref};

max-au · January 17, 2023, 6:22am

Depending on how you want to run these tests, you can either leverage command line for erlperf, or use erlperf in your Common Test benchmarks.

In the example below I’m benchmarking pure ETS table insert/lookup, but you can just replace ets with mycache:

Command line:

./erlperf --samples 30 --concurrency 10 \
--init 'ets:new(tab, [named_table, public]).' \
--init_runner 'rand:mwc59_seed()'. \
'run(_, S) -> R = rand:mwc59(S), true = ets:insert_new(tab, {R, value}), value = ets:lookup_element(tab, R, 2), R.'

Here, the incantation is: --samples 30 - measure for 30 seconds, take 30 samples, --concurrency 10 - run 10 processes doing the same stuff, --init ets:new(tab, [named_table, public]). creates a global named public ETS table named tab (shared between all processes), --init_runner 'rand:mwc59_seed()' initialises fast random number generator (it’s quite important, because usual RNG is way too slow for benchmarking).

Lastly, the runner code:

run(_, S) ->           %% accepts RNG state in the second argument (first is the ETS table name from init/1)
    R = rand:mwc59(S), %% generate next random
    true = ets:insert_new(tab, {R, value}),  %% insert the tuple - good test for RNG too :)
    value = ets:lookup_element(tab, R, 2),   %% ensure it's been inserted
    R.                                       %% pass the RNG state

Of course you can add several more “get” requests as needed. From the command line, on my machine it prints the following:

Code                                           ||        QPS       Time
run(_, S) -> R = rand:mwc59(S), tr....         10      47668     210 us

Embed this as a Common Test for continuous checks. Add erlperf to your project dependencies (via rebar3 deps), and run measurements using erlperf:run/2 function. In the init_per_testcase you can start your cache, and in the runner function do whatever you want. It’d look like this:

bench_test(Config) when is_list(Config) ->
    TotalIterations = erlperf:run(#{runner => {Mod, Fun, Args}}, #{samples => 3, concurrency => 10}),
    ?assert(TotalIterations > 1000).

zabrane · January 17, 2023, 3:11pm

@max-au worked as expected, thanks a lot.

I’m now using erlperf from within mycache_bench.erl module. Something like this:

[...]
runner(S) ->
    R = rand:mwc59(S),
    true = mycache:put({R, 42}),
    {ok, 42} = mycache:get(R),
    R.

config() ->
    #{concurrency => 10,
      duration => 1000,
      samples => 10}.
code() ->
    #{runner => fun(_, S) -> runner(S) end,
      init  => fun() -> mycache:new() end,
      init_runner => {rand, mwc59_seed, []}}.

test() ->
   QPS = erlperf:run(code(), config()),
   io:format("QPS: ~w~n", [QPS]).

Does erlperf:run/2 return the number of Queries Per Second?
How can I specify the duration for my test (ex. run it for 30 seconds)?
When setting concurrency => 1000, the test never finishes. What do you advice to set concurrency to?
Could you please explain the samples option and how it relates to duration?

max-au · January 18, 2023, 3:22am

I discovered that I failed to properly upload documentation for erlperf to hexdocs.pm. While I’m working to fix that (and going over the pain of edoc → ex_doc migration), here are answers to your questions:

Does erlperf:run/2 return the number of Queries Per Second?

Yes, if you requested a simple report, erlperf:run/2 returns the average of all collected (non-warmup) samples. Consider this configuration #{samples => 30, sample_duration => 1000, warmup => 3}. It tells erlperf to run for 33 seconds in total. Seconds, because sample_duration is set to 1000 (ms). First 3 samples are discarded (they are “warm-up samples”). For the remaining 30 samples, an average is returned.

You can request actual samples by setting report => extended. In the example below (started from rebar3 shell in the erlperf folder), I run rand:uniform(). for 5 samples, and get a list of 5 values (iterations per second):

(erlperf@ubuntu22)1> erlperf:run({rand, uniform, []}, #{samples => 5, report => extended}).
[15064123,15092843,15148119,15105712,15145791]

How can I specify the duration for my test (ex. run it for 30 seconds)?

You can specify it this way: #{samples => 30}. Because default sample_duration is 1000 ms, the actual test should run for 30 seconds (and return average QPS).

When setting concurrency => 1000, the test never finishes. What do you advice to set concurrency to?

I found a deficiency (*) in erlperf that caused it to work incorrectly under extreme scheduler utilisation (which is almost guaranteed under such a heavy load). I just fixed it, could you please update erlperf to version 2.1.0 or above? With that update, it will report correctly for any reasonable concurrency (including 1000 or 10000).

One more note, erlperf has built-in support for estimating how concurrent your code is. It’s called “concurrency estimation”, or “squeeze” mode. You can either use it from command-line (-q argument), or by using run/3 and specifying corresponding options. It will tell you how many concurrent processes saturate your code. See this blogpost for a few more hints.

Could you please explain the samples option and how it relates to duration?

erlperf runs your runner function in an endless loop, bumping a counter every time function is invoked. This counter grows monotonically. Every sample_duration (there is no duration option) value of the counter is recorded. When erlperf collect samples counter values, it considers test complete, calculates difference between recorded counter values, and returns average (you can also get the actual samples, and perform your own statistical operations, e.g. median). In some cases (e.g. to warm up the cache) it’s necessary to discard first few samples, this is done with warmup argument.

(*) Update erlperf to 2.1.0
When I tried running the example I suggested (writing to ETS table) with high concurrency (more than the number of CPU cores I had available), I noticed that benchmark results did not look right. Benchmark itself was taking longer than expected, and results fluctuated quite a bit. At first, I suspected that many workers running concurrently were stealing CPU time from the process that was taking samples. Which was, indeed, true, - but was easily solved by setting process_flag(priority, high) before calling erlperf:run/2. But it wasn’t enough to provide stable measurements: when VM experiences heavy lock contention, it won’t schedule high-prio processes either. Hence timer:sleep(1000) resulted in significantly larger delay, up to 2 seconds.

Fortunately, it is easy to detect this, and switch to busy wait loop (constantly checking monotonic clock value) when timer:sleep precision gets too low for benchmarking purposes. Essentially, that’s the main body of 2.1.0 update. This should only happen when lock contention is really high (otherwise ERTS scheduling should not skew too far from the expected sample duration), hence busy loop should not affect the result.

(**) to resolve the problem with ETS table concurrency, apply {read_concurrency, true}, {write_concurrency, auto} options to ets:new/2 call. In my tests it bumps QPS with high concurrency (at the expense of the total function latency):

Code                                            ||        QPS       Time
run(_, S) -> R = rand:mwc59(S), true = ets:     10    5173 Ki    1930 ns

zabrane · January 18, 2023, 7:06am

@max-au the fix 2.1.0 to erlperf doubled the speed of my cache. No more crash with large concurrent workers. Thanks for this EXCELLENT benchmarking tool.

max-au · January 23, 2023, 12:46am

Thanks @zabrane for reporting this issue.

It motivated me to figure out ex_doc style documentation, which is huge improvement over edoc. It is now published at erlperf v2.2.0 — Documentation as a part of larger 2.2.0 update bringing new features (e.g. more statistics reported) and bugfixes.

zabrane · January 23, 2023, 6:37pm

@max-au we switched to v2.2.0 this morning and while reading the new doc, a colleague pointed out this: the pg2 join/leave test is showing a x45 slowdown. Were the two tests performed on two different machines? Is it because the -i option?

Could you also please explain why the Rel column is 0% for pg2:

Rel: relative performance of this code, compared to others. Printed only when more than one runner is specified.

max-au · January 23, 2023, 7:48pm

Yes!
Isolation option makes erlperf to start an extra Erlang VM instance. So the second invocation involves 3 Erlang nodes communicating (and I need to fix documentation, saying only about 2 nodes). You can see the names of those nodes by running epmd -names in a separate console window, while the test is running.

max-au$ epmd -names
epmd: up and running on port 4369 with data:
name peer-11-7121 at port 38417
name peer-19-7121 at port 38422
name erlperf-16-7121 at port 45541

Every pg2 operation involves taking global lock (on all 3 nodes, that is). Then an actual operation. And then it releases the lock. It is impressive for a 3-node cluster can do all that in less than 1 ms.

pg does not take any locks and therefore operates at much higher speed.

Relative speed of pg2 compares to pg is so low in that environment that it is rounded to 0% ( 241.000 iterations for pg compared to 1415 for pg2).

… I just realised that I did not document what is Ki, Mi - Kilo-iterations, Mega-iterations. Good catch, thanks.