Why ei_x_format and erlang:term_to_binary produces different binary?

Hi all!

So question in topic.

Erlang/OTP 24 [erts-12.2.1] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit]
Eshell V12.2.1  (abort with ^G)
1> erlang:term_to_binary({a,[1,2,3]}). 
Erlang/OTP 26 [erts-14.0.2] [source-672bd95480] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit:ns]
Eshell V14.0.2 (press Ctrl+G to abort, type help(). for help)
1> erlang:term_to_binary({a,[1,2,3]}).
ei_x_buff buf;
ei_new_format(&buf, "{a,[1,2,3]}");
// buf.buff: 131,104,2,119,1,97,108,0,0,0,3,97,1,97,2,97,3,106

Why not use sole interface across whole OTP to “binarize” terms?

1 Like

If you do binary_to_term here then both cases will result in the same {a,[1,2,3]}. The difference is that the one generated by term_to_binary is a more compact form which recognises that the list is 3 small integer elements. Why the ei_x_format does not do this I don’t know.

1 Like

The erlang implementation seems to be using STRING_EXT, whereas libei seems to be using LIST_EXT.

1 Like

Yes, I see, that bin_to_term produced correct and same result. But why it different in binary form? Why use different algorithms to make binary representation? It question to makers, I guess.

Both are valid binary representations as you can see with binary_to_term. The difference is that the form that term_to_binary generates is an optimised format which only works for a list of byte sized characters, a classic Erlang string, which is the STRING_EXT tag that @asabil mentioned. The libei uses the general list format which can take a list of anything and doesn’t even have to be a proper list, the LIST_EXT tag that @asabil mentioned. The last byte in that format, the 106 is the NIL_EXT which shows that it is a proper list which ends ends in a (nil).

It is just that binary_to_term goes a bit deeper into parsing the list and can generate a specialised format while the libei doesn’t. Why it doesn’t I don’t know.

Wild guess: when was STRING_EXT introduced?
This smells like an algorithm used in multiple places being
updated in only a subset of the places, and nobody noticing
because (a) nothing broke and (b) nothing started performing
worse – it just failed to perform better.
Alternatively, using STRING_EXT takes extra time to check
that it’s appropriate, and maybe someone decided that the
tradeoff was worth it in some places but not others.
Looking at revision control logs around the time that
STRING_EXT was introduced might help.

The more important question is whether all copies of the
algorithm should be made the same NOW.