erlang:atom_prefix/2 - zero-allocation atom prefix matching BIF

dkuku · March 7, 2026, 10:06pm

Checking whether an atom starts with a given prefix is a common pattern, but today it requires converting to a list/binary first:

lists:prefix(“erl_”, atom_to_list(Module))

This allocates on every call and creates GC pressure unnecessarily — the atom’s bytes are already sitting in the atom table.

I have a working BIF implementation of erlang:atom_prefix/2 that compares directly against the atom table’s internal UTF-8 bytes. Zero allocation, zero GC.

erlang:atom_prefix(hello_world, <<“hello”>>). %% true
erlang:atom_prefix(hello_world, “world”). %% false

The pattern already appears in at least 5 places in OTP (diameter_info, diameter_service, ct_netconfc, dialyzer tests) and 5 places in Elixir (code formatter, Mix help, type checker, autocomplete) via Atom.to_string/1 |> String.starts_with?/2.

Would this be a useful addition?

nzok · March 9, 2026, 8:14am

In Erlang 28, I find 3 occurrences in diameter and one in common test.
In my own programming, testing for a suffix is about half as common as
testing for a prefix.

I suggest lower level building blocks.
(1) size(<<“foo”>>) is already defined. It’s 3.
Extend size/1 to work on atoms, so that size(‘’) = 0, size(foo) = 3,
size(vexatious_kibitzing) = 19.
(2) atom_compare(Atom1, Drop1, Take1, Atom2, Drop2, Take2)
compare portions of two atoms.

With these primitives,
atom_prefix(Prefix, Whole)
when is_atom(Prefix), is_atom(Whole) →
Prefix_Size = size(Prefix),
Whole_Size = size(Whole),
Whole_Size >= Prefix_Size and
atom_compare(Prefix, 0, Prefix_Size, Whole, 0, Prefix_Size) =:= 0.

atom_suffix(Suffix, Whole)
when is_atom(Suffix), is_atom(Whole) →
Suffix_Size = size(Suffix),
Whole_Size = size(Whole),
Whole_Size >= Suffix_Size and
atom_compare(Prefix, 0, Prefix_Size,
Whole, Whole_Size - Prefix_Size, Prefix_Size) =:= 0.

and several other useful functions are definable.
(1) just removes an arbitrary restriction.
(2) isn’t that much more complicated than the BIF you already want.

sverker · March 9, 2026, 1:18pm

has_my_prefix(Atom) ->
    Atom >= my_prefix andalso Atom < my_prefiy.

nzok · March 9, 2026, 10:41pm

You win. Of course, that assumes you know what the successor to prefix is without creating it, which the examples I’ve seen don’t.

Benjamin-Philip · March 10, 2026, 4:25am

Eh, that’s trivial fix:

has_prefix(Atom, Prefix) ->
    Atom >= Prefix andalso Atom < Prefix.

vances · March 10, 2026, 5:26am

You just created the Prefix atom(), and now it lives forever in the atom table.

Schultzer · March 10, 2026, 3:32pm

Since the atom table contains all the binaries, would be cool if we could match binaries with atoms and even have a contains function.

Led · March 10, 2026, 11:42pm

It looks like a problem XY. In 15 years, this has never been needed. Please describe the full problem, because it looks like an architectural error.

bjorng · March 11, 2026, 5:27am

Since a few releases ago, atom_to_binary/1 will not produce any garbage. Each atom in the atom table is a binary, and atom_to_binary/1 will return that binary directly.

Unfortunately, I can’t think of a way to test whether a binary starts with a certain sequence of characters without creating garbage. Doing binary matching will create a match context/sub binary:

has_prefix(Atom, Prefix) ->
    case atom_to_binary(Atom) of
        <<Prefix:(byte_size(Prefix))/binary, _/binary>> -> true;
        _ -> false
    end.

Other tricks using the binary module that I can think of also produce garbage. For example:

has_prefix(Atom, Prefix) ->
    binary:longest_common_prefix([Prefix, atom_to_binary(Atom)]) =:=
        byte_size(Prefix).

However, testing whether an atom starts with a certain character without producing garbage is possible using binary:first/1.

jhogberg · March 11, 2026, 8:08am

I strongly suspect this too. If you want to peek at the contents then atoms are likely the wrong tool for the job, their very name means indivisibility. They’re meant to be used as constants in code and nothing more.

Why aren’t bitstrings good enough?

dkuku · March 11, 2026, 8:30am

On Kubernetes, the module name autocomplete in Elixir was killing the pods for me when the allowed memory was set too low. I was looking into optimizing this. One of the patterns I spotted was converting Atom.to_string and then running String.starts_with? on it. After looking into it, this pattern is used in multiple places in the standard library — probably even more so in tools that work with source code.

jhogberg · March 11, 2026, 8:37am

That sounds like a problem with how Elixir is doing things, and it’s probably good to look into that separately, I’ll talk to Jose about it. Thanks!

josevalim · March 11, 2026, 9:17am

I guess this is my bat signal!

This is not an Elixir specific issue, as it does happen in Erlang as well, under the same circumstances. When you are autocompleting module names in the Erlang shell, you invoke this code:

github.com/erlang/otp

lib/stdlib/src/edlin_expand.erl

723de8d5f


      
          expand_module_name("",_) ->
              {no, [], []};
          expand_module_name(Prefix,CC) ->
              Alts = [{list_to_atom(M),""} || {M,_,_} <- code:all_available()],
              case match(Prefix, Alts, CC) of
                  {_Res,_Expansion,[]}=M -> M;
                  {Res,Expansion, Matches} -> {Res,Expansion,[#{title=>"modules", elems=>Matches, options=>[highlight_all]}]}
              end.

Which calls `code:all_available/0`:

github.com/erlang/otp

lib/kernel/src/code.erl

63f365b64


      
          all_available([], AllModules) ->
              AllLoaded = [{atom_to_list(M),Path,true} || {M,Path} <- all_loaded()],

In interactive mode, `all_available` will return modules from all load paths by traversing the disk, plus all loaded code. In `embedded` mode, it runs only the branch above, which we can focus on right now.

The code above calls `all_loaded` and then returns `{ModString, Path, true}` lists. So when autocompleting modules in embedded mode, you can see the code above can be quite wasteful:

It loads `{Module, Path}` data from ETS (including modules and paths) and sends it across processes
Then it traverses all pairs, converts atoms into lists, and returns triplets
Then `edlin` converts all lists back into atoms
Then the `match` function converts the atoms back into lists

There is probably a lot that could be simplified in here! You could probably change `edlin` expand to check if the mode is embedded and, if so, bypass `code:all_available/0` altogether and settle on this core loop:

[M || {M, _} <- code:all_loaded(), lists:prefix(Hint, list_to_atom(M))]

(or use `erlang:all_loaded/0` directly, which should be fine in embedded mode). And this shows exactly the problem statement here: you potentially have thousands of modules, you are allocating now thousands of lists, only to find 30 that starts with `my_app_foo_`. A solution similar to the one here would allow you to bypass this allocation altogether:

[M || {M, _} <- code:all_loaded(), erlang:atom_prefix(Hint, M)]

TL;DR: I’d say core problem happens in Erlang but, because the conversions are hidden through a few different modules, it is not obvious that Erlang also is converting between atom and lists/binaries when there isn’t a need to. I hope this makes sense!

jhogberg · March 11, 2026, 9:24am

Looking at it, the entire thing could be reduced to this step by refactoring the code server a bit. No need to even load data into process memory, just pure ETS operations.

josevalim · March 12, 2026, 7:50am

Expanding on this after talking to @jhogberg and sleeping on it.

Option 1: add prefix option to `code:all_loaded/1`

John’s idea is to have the code server store the binary name of the module in its ets table, so we could ordered_set and do the prefix search as suggested by @sverker. This the whole selection and filtering happens on ets and efficiently due to the ordered set. I believe this requires the binary module name to be the key in ets (I am not sure if this could impact overall insertion/lookup performance)
Assuming the above is acceptable, we could then add code:all_loaded([{prefix, ~"my_app_"}]) and code:all_available([{prefix, "my_app_"}]) APIs to be used by Erlang/Elixir autocoplete
The above still has the pitfall that code:all_loaded/0 return module names and paths, and the paths will be allocated and unused by edlin_expand. But perhaps this is fine, since the module names will be filtered anyway to only the relevant ones

Option 2: add `erlang:all_loaded([{prefix, "my_app"}])` instead

As mentioned above, code:all_loaded/0 return module names and paths and it requires changing the code server. An alternative solution would be to add erlang:loaded([{prefix, "my_app"}]), which does the filtering on a NIF. Then we can change edlin to have its own version of code:all_available that only lists/filters modules, reducing the amount of indirection and back and forth between modules. Roughly this:

all_modules(Prefix) ->
    case code:get_mode() of
        interactive -> all_modules(code:get_path(), Prefix, []);
        embedded -> all_modules([], Prefix, [])
    end.

all_modules([Path|Tail], Prefix, Acc) ->
    case erl_prim_loader:list_dir(Path) of
        {ok, Files} ->
            %% IIRC the ++ after a comprehension is optimized by the compiler
            %% to avoid building list only to concatenate it
            NewAcc = [Module || File <- Files,
                                filename:extension(File) == ".beam",
                                Module = filename:rootname(File),
                                lists:prefix(Prefix, Module)] ++ Acc,
            all_modules(Tail, Prefix, NewAcc);
        _Error ->
            all_modules(Tail, Prefix, Acc)
    end;
all_modules([], Prefix, Acc) ->
    AllLoaded = [atom_to_list(M) || M <- erlang:loaded([{prefix, Prefix}])] ++ Acc,
    lists:usort(AllLoaded).

I’d probably take option 2, because having edlin compute its own modules will be clearer than depending on code APIs.

jhogberg · March 12, 2026, 9:01am

Another variant is to add a function that does exactly what edlin_expand wants to do (be it for option 1 or 2). We’re not stuck with current APIs.

erlang:atom_prefix/2 - zero-allocation atom prefix matching BIF

Option 1: add prefix option to code:all_loaded/1

Option 2: add erlang:all_loaded([{prefix, "my_app"}]) instead

Option 1: add prefix option to `code:all_loaded/1`

Option 2: add `erlang:all_loaded([{prefix, "my_app"}])` instead