Properly benchmark NIF’s memory consumption?

jackalcooper · December 27, 2024, 10:36am

The use case

I’m using LLVM/MLIR to generate native code from a DSL of Elixir. It looks like this

len = Pointer.load(i32(), len_ptr)
arr = Pointer.allocate(Term.t(), len)

The problem

In a recent effort, I am trying to improve the memory management of the native code.

Calling to C’s malloc and free are replaced with calling to enif_alloc and enif_free . I thought this will enable the comparison of memory consumption in benchee , but looks like it doesn’t help. (the JIT version always has a memory usage around 768B).

The benchmark result from benchee:

##### With input array size 1000000 #####
Name                      ips        average  deviation         median         99th %
enif_quick_sort          9.49      105.37 ms    ±11.79%      102.73 ms      172.08 ms
Enum.sort                5.78      173.00 ms    ±34.82%      154.64 ms      416.53 ms

Comparison: 
enif_quick_sort          9.49
Enum.sort                5.78 - 1.64x slower +67.64 ms

Memory usage statistics:

Name                    average  deviation         median         99th %
enif_quick_sort      0.00073 MB     ±0.00%     0.00073 MB     0.00073 MB
Enum.sort             203.08 MB     ±0.00%      203.08 MB      203.09 MB

Comparison: 
enif_quick_sort      0.00073 MB
Enum.sort             203.08 MB - 277278.58x memory usage +203.08 MB

PS: I’m pretty sure the enif_alloc and enif_free are actually generated:

It will crash if calling free to deallocate memory allocated by enif_alloc
I checked the LLVM IR.

starbelly · December 28, 2024, 4:42pm

A quick glance at benchee shows that it measures process heap memory, which would surely be your problem if the nif in question allocates and frees during the life cycle of a call to nif. That is, it is not returning something measurable back to the process. Even then, you’d only be measuring a data structure returned to the process and thus put on the proc heap for said process, but not temporary allocations done in the nif.

@PragTob would know better of course in regard to benchee, however, it’s going to be quite difficult to account for nif memory for the run of a specific function, at least in a very generic way. Perhaps with the excellent work done on instrument and Mtags, it would be easier these days. There also may be a feature in benchee that allows you to see the difference in system memory between runs.