Discussion on compiling time standard library code expansion

When your code calls a function from the standard library and passes in a literal value, why can’t it be evaluated and expanded at compile or precompile time

For example:

list_seq() ->
    lists:seq(1, 5).

map_set() ->
    sets:from_list([1,2,3,4,5], [{version, 2}]).

Expand into the following code during the compile or preprocessing phase

list_seq() ->
    [1,2,3,4,5].

map_set() ->
    #{1 => [], 2=>[], 3=>[],4=>[],5=>[]}.

Is there any reason to restrict erlang from doing this?

When you compile an Erlang module it does not know which version of stdlib that will be used at run-time. Your example is actually rather good at showing why this is an example. Depending on version of stdlib the sets:from_list/1 would return either a map or a record.

2 Likes

lists:seq(1, 5) gets its nose in the tent,
lists:seq(1, 500000000) comes next.
Why spend space on something that might never be evaluated?

This is actually a classic issue that has nipped at the heels of
Lisp-family programming language implementors for years.

A couple of decades ago I proposed a way of binding a group of
Erlang modules together, so that while modules would remain the
unit of encapsulation they’d no longer necessarily be the unit
of hot-loading. A bound group would allow cross-module inlining
in a fairly simple way. This idea was inspired by Interlisp-10’s
earlier “block compilation”, which actually bound a group of
functions together. Of course recompilation and hot-loading are
not the heart of the issue, as Lucas Larsson pointed out it’s late
binding. A call to lists:seq/2 means "I want this to call whatever
version of lists:seq/2 is current when that call happens and to
get any kind of expansion you have to commit to some particular version
at some particular time and then expansion is safe (well, for some
value of “safe” approaching zero, recall the data bloat problem I
started with).

I’m reminded of the way the Common Lisp people dithered about #..
#. evaluates at READ time.
#, evaluates at LOAD time.
evaluates at RUN time.
It went in, obviously a great way to handle things with no printed
representation, &c, and then they added read-eval so you could turn
#. into an error and they deleted #, from the language.

I think that if the call results from the most commonly used modules in the standard library (lists, maps, sets) can be expanded, it would be beneficial. After all, these modules have not changed for a long time, are relatively stable, and backward compatible. If a property or compilation flag could be provided to inform the compiler whether to expand the function and the size of the expanded result, it would be similar to inline functions.

This optimization can be applied to static template modules generated by domain-specific languages (DSLs), for example, game configuration tables. significantly improving execution efficiency.

You are absolutely right, and I agree with your perspective. However, we can determine whether to unfold the call results based on the actual situation. In fact, the same effect can be achieved at compile time using parse_transform .

The compiler has an option to inline some functions in the lists module.

The compiler always inlines some functions from the maps module. See Using the Functions in the maps Module for details.

2 Likes

I used to do https://github.com/Ledest/lopt parse transformers (const_pt is this case).
And this has been partially tested on several fairly large projects.
But I didn’t quite like how it was done, and there was no time to do it better.

This is a good attempt :clap: