Alternative to Benchee for Erlang?

zabrane · January 24, 2023, 10:12pm

@max-au how can I use the new max option to figure out the best concurrency number for my cache? I tried this, but not sure:

config() ->
    #{max => 500, concurrency => 200}.

When I bench using #{max => 500} without setting concurrency, i get very small numbers in comparison to when i use #{concurrency => 200}.

Thanks.

max-au · January 24, 2023, 10:59pm

It depends on what you deem as the “best”: higher throughput (total number of cache operations per second achieved by all concurrently running workers) or lowest latency (the fastest single iteration of a single worker).

config() →
#{max => 500, concurrency => 200}.

Are you passing this as run_options, or as concurrency estimation mode settings?

If it’s the former, it does not have max argument, and the latter does not have concurrency.

Let’s consider the concurrency estimation mode example:

to_benchmark() ->
   mycache:put(1, 2),
   mycache:get(100, 200).


main([]) ->
    erlperf:run(fun to_benchmark/0, #{samples => 10}, #{min => 2, max => 500}).

What this test does:

Starts with 2 workers that try to make as many to_benchmark/0 calls as possible for 10 seconds (samples). It is remembered as the best result.
Continue with 3 workers doing the same. If this resulted in a higher total throughput (that is, 3 workers can do more calls than 2), the best result is replaced with this one.
Continue increasing concurrency, until at some point (say, 8 workers) the total average throughput per second is no longer increasing. For example, it could be that with 9, 10 and 11 workers total throughput is lower than with 8. It means that local maximum was achieved with 8 workers. This result is reported as the result of the entire concurrency estimation test.

Effectively, I recommend thinking of concurrency estimation mode as a sequence of usual runs, every time with increased -c argument.

To give just one more example, here are 3 simple runs of persistent_term:put/2 function done with 1 worker, 2 workers and 3 workers:

./erlperf 'persistent_term:put(1, "string").' -c 1
Code                                      ||        QPS       Time
persistent_term:put(1, "string").          1    8035 Ki     124 ns
./erlperf 'persistent_term:put(1, "string").' -c 2
Code                                      ||        QPS       Time
persistent_term:put(1, "string").          2    5050 Ki     396 ns
./erlperf 'persistent_term:put(1, "string").' -c 3
Code                                      ||        QPS       Time
persistent_term:put(1, "string").          3    4358 Ki     688 ns

As you can see, the function is not at all concurrent - it has less throughput with 2 (and 3) workers than with just one.

Now instead of manually adding concurrency, you can run a concurrency estimation test:

./erlperf 'persistent_term:put(1, "string").' -q
Code                                      ||        QPS       Time
persistent_term:put(1, "string").          1    8788 Ki     113 ns

It basically says the same - from the througput perspective, a single worker is better than multiple.

zabrane · January 25, 2023, 10:55am

@max-au worked now. Thanks again.