Pre-allocate large binary

I need to assemble a very large binary from sub-binaries and I want to avoid copying the smaller binaries every time. In my scenario I know the size of the final binary before assembling. My ‘long term’ goal is to pass a refc between processes instead of a bunch of small chunks of binaries.

I believe I read somewhere that binary comprehensions have the ability to pre-compute the required space and avoid the intermediate copies. Is this a recommend method to achieve the final outcome?

4 Likes

Yes, that is correct, but it is difficult to predict whether the compiler will generate code that pre-allocates the binary. When the compiler is not sure that it is safe and efficient, it will not emit code that pre-allocates the binary. A binary comprehension will still be slightly more efficient than the corresponding “hand-written” code because it can use a slightly more efficient instruction for appending to a binary.

Note that appending to a binary is relatively efficient and that there will not be any copying every time. For example:

bin_append(Bs) ->
    bin_append(Bs, <<>>).

bin_append(<<Sz:8,SubBin:Sz/binary,Rest/binary>>, Result) ->
    bin_append(Rest, <<Result/binary,SubBin/binary>>);
bin_append(<<>>, Result) ->
    Result.

The growing binary in Result will not be copied every time in the loop. Instead, SubBin will be appended to the Result binary. If there is no more space in the Result binary, a new binary twice as large will be allocated and the previous binary copied. (If the binary is larger than 16Mb, the new binary will only be 20% larger.)

Alternatively, the sub-binaries could be collected into a list and converted to a binary at the end:

list_append(Bs) ->
    list_append(Bs, []).

list_append(<<Sz:8,SubBin:Sz/binary,Rest/binary>>, Result) ->
    list_append(Rest, [Result|SubBin]);
list_append(<<>>, Result) ->
    iolist_to_binary(Result).

Which one is faster depends on the number of sub-binaries and their sizes. For a “reasonable” number of sub-binaries, list_append/1 is probably faster.

To find out what a reasonable number of sub-binaries is, I ran erlperf. On my computer, it seems that bin_append/1 is faster for roughly 20000 sub-binaries (of size 8) or more.

To run the benchmark, I first defined this function for generating test data:

data(N, Sz) ->
    iolist_to_binary(lists:duplicate(N, <<Sz,0:Sz/unit:8>>)).

I then ran the benchmark, varying the number of sub-binaries:

$ (INIT='t:data(10, 8).'; erlperf --init_runner "$INIT" 'run(D) -> t:bin_append(D).' --init_runner "$INIT" 'run(D) -> t:list_append(D).')
Code                                ||        QPS       Time     Rel
run(D) -> t:list_append(D).          1    1849 Ki     540 ns    100%
run(D) -> t:bin_append(D).           1    1632 Ki     612 ns     88%
$ (INIT='t:data(100, 8).'; erlperf --init_runner "$INIT" 'run(D) -> t:bin_append(D).' --init_runner "$INIT" 'run(D) -> t:list_append(D).')
Code                                ||        QPS       Time     Rel
run(D) -> t:list_append(D).          1     275 Ki    3632 ns    100%
run(D) -> t:bin_append(D).           1     195 Ki    5131 ns     70%
$ (INIT='t:data(1000, 8).'; erlperf --init_runner "$INIT" 'run(D) -> t:bin_append(D).' --init_runner "$INIT" 'run(D) -> t:list_append(D).')
Code                                ||        QPS       Time     Rel
run(D) -> t:list_append(D).          1      29362   34057 ns    100%
run(D) -> t:bin_append(D).           1      21861   45743 ns     74%
$ (INIT='t:data(10000, 8).'; erlperf --init_runner "$INIT" 'run(D) -> t:bin_append(D).' --init_runner "$INIT" 'run(D) -> t:list_append(D).')
Code                                ||        QPS       Time     Rel
run(D) -> t:list_append(D).          1       2878     347 us    100%
run(D) -> t:bin_append(D).           1       2270     441 us     78%
$ (INIT='t:data(100_000, 8).'; erlperf --init_runner "$INIT" 'run(D) -> t:bin_append(D).' --init_runner "$INIT" 'run(D) -> t:list_append(D).')
Code                                ||        QPS       Time     Rel
run(D) -> t:bin_append(D).           1        188    5319 us    100%
run(D) -> t:list_append(D).          1        132    7576 us     70%
$ (INIT='t:data(50_000, 8).'; erlperf --init_runner "$INIT" 'run(D) -> t:bin_append(D).' --init_runner "$INIT" 'run(D) -> t:list_append(D).')
Code                                ||        QPS       Time     Rel
run(D) -> t:bin_append(D).           1        414    2415 us    100%
run(D) -> t:list_append(D).          1        296    3378 us     71%
$ (INIT='t:data(25_000, 8).'; erlperf --init_runner "$INIT" 'run(D) -> t:bin_append(D).' --init_runner "$INIT" 'run(D) -> t:list_append(D).')
Code                                ||        QPS       Time     Rel
run(D) -> t:bin_append(D).           1        892    1121 us    100%
run(D) -> t:list_append(D).          1        859    1164 us     96%
$ (INIT='t:data(20_000, 8).'; erlperf --init_runner "$INIT" 'run(D) -> t:bin_append(D).' --init_runner "$INIT" 'run(D) -> t:list_append(D).')
Code                                ||        QPS       Time     Rel
run(D) -> t:bin_append(D).           1       1180     847 us    100%
run(D) -> t:list_append(D).          1       1096     912 us     92%
$ (INIT='t:data(15_000, 8).'; erlperf --init_runner "$INIT" 'run(D) -> t:bin_append(D).' --init_runner "$INIT" 'run(D) -> t:list_append(D).')
Code                                ||        QPS       Time     Rel
run(D) -> t:list_append(D).          1       1939     516 us    100%
run(D) -> t:bin_append(D).           1       1586     631 us     81%
15 Likes

Thank you for the detailed, informative answer.

3 Likes