Compiler bug or feature? (Atoms in a literal not appearing in atoms section of beam file)

fadushin · May 25, 2022, 1:14am

I have found that if an atom occurs in a literal that can be fully known to the compiler (e.g., a map), then the atom does not appear in the atoms section of the beam file.

Example:

-module(test).

-export[test/0].

test() ->
    Example = #{
        foo => bar
    },
    erlang:display(Example).

Snippet of generated byte code:

{function, test, 0, 2}.
  {label,1}.
    {line,[{location,"test.erl",5}]}.
    {func_info,{atom,test},{atom,test},0}.
  {label,2}.
    {move,{literal,#{foo => bar}},{x,0}}.
    {line,[{location,"test.erl",9}]}.
    {call_ext_only,1,{extfunc,erlang,display,1}}.

Erlang shell:

1> c(test).
{ok,test}
2> {ok, Bin} = file:read_file("test.beam").
{ok,<<70,79,82,49,0,0,2,56,66,69,65,77,65,116,85,56,0,0,
      0,52,0,0,0,5,4,116,101,...>>}
3> beam_lib:chunks(Bin, [atoms]).
{ok,{test,[{atoms,[{1,test},
                   {2,erlang},
                   {3,display},
                   {4,module_info},
                   {5,get_module_info}]}]}}
4> test:test().
#{foo=>bar}
true

Feature or bug?

(I am trying to use the atoms section of a BEAM file to find recursive dependencies from a root node, using the occurrences of atoms to widen the potential list of dependent BEAM files, as a dependent module may occur as an atom, so I would call it an oversight, at least for my purposes.)

bjorng · May 25, 2022, 3:02am

Neither. It is an implementation detail.

The atom chunk in the BEAM file contains those atoms directly referenced by instructions in the code chunk.

You could read the literal chunk (“LitT”) as well and extract the atoms it contains.

fadushin · May 28, 2022, 2:18pm

Thanks for that, @bjorng.

Here is a solution then, using your approach. (See here)

%% @private
get_atom_literals(undefined) ->
    [];
get_atom_literals(UncompressedLiterals) ->
    <<NumLiterals:32, Rest/binary>> = UncompressedLiterals,
    get_atom_literals(NumLiterals, Rest, []).

%% @private
get_atom_literals(0, <<"">>, Accum) ->
    Accum;
get_atom_literals(I, Data, Accum) ->
    <<Length:32, StartData/binary>> = Data,
    <<EncodedLiteral:Length/binary, Rest/binary>> = StartData,
    Literal = binary_to_term(EncodedLiteral),
    ExtractedAtoms = extract_atoms(Literal, []),
    get_atom_literals(I - 1, Rest, ExtractedAtoms ++ Accum).

%% @private
extract_atoms(Term, Accum) when is_atom(Term) ->
    [Term|Accum];
extract_atoms(Term, Accum) when is_tuple(Term) ->
    extract_atoms(tuple_to_list(Term), Accum);
extract_atoms(Term, Accum) when is_map(Term) ->
    extract_atoms(maps:to_list(Term), Accum);
extract_atoms([H|T], Accum) ->
    HeadAtoms = extract_atoms(H, Accum),
    extract_atoms(T, HeadAtoms ++ Accum);
extract_atoms(_Term, Accum) ->
    Accum.

Remind me again why people program in any other language than Erlang?

fadushin · June 18, 2022, 8:19pm

And oops, there is a bug in the above code. The recursive call to extract_atoms on the head of the list (HeadAtoms = extract_atoms(H, Accum),) should not be passing in Accum, but should pass in the empty list to start the recursive search.