Size differences between iolists and binaries

ruslandoga · November 28, 2022, 3:16pm

I’m trying to understand what causes memory usage differences between equivalent iolists and binaries, in particular, when stored into ets.

gist.github.com

https://gist.github.com/ruslandoga/959aa9244a70040b9938690f9da8f427

ets-memory-iodata.md

```elixir
iex(1)> tab = :ets.new(:bin, [])
#Reference<0.658737734.4062838786.175931>

iex(2)> :erlang.memory
[
  total: 26914645,
  processes: 9866336,
  processes_used: 9865160,
  system: 17048309,

This file has been truncated. show original

Is the x4 memory usage caused just by linked list references? Or is there something more to it? It seems like all binaries in my test are heap binaries (less than 64 bytes), so it’s not about refc/heap distinction, or is it?

mpope · November 28, 2022, 5:12pm

I like to refer to this great slideshow when I have memory questions: Efficient Erlang - Performance and memory efficiency of your data by …

The list in the second example has relatively high overhead, it is composed of many small elements. IOlists can be an improper list of binaries and integers. Consider this example:

iex(33)> a = [1 | ",some,string,imagine,its,csv\n"]
[1 | ",some,string,imagine,its,csv\n"]

iex(34)> :erts_debug.flat_size(a)
8

iex(35)> b = [1, "some", ?,, "string", ?,, "imagine", ?,, "its", ?,, "csv", ?\n]
[1, "some", 44, "string", 44, "imagine", 44, "its", 44, "csv", 10]

iex(36)> :erts_debug.flat_size(b)
52

The first list is an improper list of 1+2 words ontop of the 29 bytes that the base binary requires, as opposed to the code in the gist which requires something like 1+11 words of list overhead.
- :erts_debug.flat_size returns words not byes so the numbers might look a bit funky.

This is a bit contrived because CSVs are usually built with dynamic values and not from a static binary. It was more to highlight memory usage. Consing all of the elements would make more sense and would result in the same amount of memory used.