Core Erlang: Why have <e_1,...,e_n>?

I am looking into Core Erlang.

According to the syntax of Core Erlang described here: https://www.it.uu.se/research/group/hipe/cerl/doc/core_erlang-1.0.3.pdf it is possible to have “exprs” in the body of a function, i.e. I could write a function like this: fun() -> <1,2> if I wanted, but that is not accepted by Erlang due to wrong number of arguments.

I am aware that the spec document has not been updated in a long time. If I checkout otp/lib/compiler/src/core_parse.yrl at d24f0573199e2eb7698afba9e383b0eb183df3b5 · erlang/otp · GitHub I can see that the syntax still supports the function I wrote above.

Furthermore <> is used in clauses for pattern matching, but I’m having a hard time understanding why to use <> instead of tuples. What can <> do that tuples cannot do? Why does Core Erlang have <> ?

Help me understand it. I really want to understand it. Every case I can come up with is not accepted by the Core Erlang to Beam compiler. I am sure <…> has been added for a reason.

Thank you so much in advance!

1 Like

Out of curiosity why are you interested in Core Erlang?

What is the equivalent Erlang function to the one you have been trying to write in Core Erlang and does your complete Core function look like?

Academic

To the best of my knowledge it is not possible to write a function that can return “<…>” in Erlang, since erlang does not have <>.

Which is the source of my question, why is <…> a part of the Core Erlang syntax?

When do you need <> ? Why is it a part of Core Erlang? I do not believe it was added for no reason. There must have been a reason for including <…> in Core Erlang.

I think the thing to realise here is that Core Erlang is not Erlang, it is basically an intermediate language of the compiler. The semantics of Core Erlang is not the really the same as Erlang’s either, e.g. the handling of funs is different. That is why the syntax is not the same. Now the compiler interface allows you to both output the generated Core and also to use Core as input to the compiler.

If you want to get a feel for Core Erlang my suggestion to write some Erlang modules and then compile them and generate the Core forms for them. IIRC the compiler options for doing this are dcore and to_core0 and to_core. I cannot remember which ones return the core forms and which output core syntax.

The basic passes of the compiler are

parse input → AST → core erlang → (gazillian optimistations here) → kernel erlang → beam instructions.

3 Likes

I am fully aware that Core Erlang is not Erlang. Everything you can do in Erlang, you can do in Core Erlang - and more. I want to understand Core Erlang and why it is constructed as it is. I want to understand this intermediate language. Most documentation I can find focus on Erlang, not any of the intermediate languages. Even the compiler passes you mention are hard to find. I have found the Erlang Abstract Format (AST) to be the best documented intermediate format due to this third party git repo: GitHub - zuiderkwast/erlang_abstract_format: Documentation of Erlang Abstract Format

Thanks, I only knew to_core, not the other two. I have already experimented with to_core, but I have not figured out why Core erlang has both <…> and {…}. I already know that experimentation and trying to understand the source code properly is the only way forward as I have been unable to find Core Erlang documentation.

I have written here in hope that someone in this forum knows something about the design principles of Core Erlang.

Which is the source of my question, why is <…> a part of the Core Erlang syntax?

My understanding is that this is part of “multiple-values” which is a way for compilers to talk about an expression yielding multiple values (at the same time!, I’m not talking about generators) or a binding binding to those multiple values, without involving source-language constructs like tuples. Apart from that, they’re essentially tuples.

@richardc might have more info.

3 Likes

Yes, that is correct.

Here follows an example where it used. First consider this Erlang function:

foo(0, 0) -> invalid;
foo(A, B) -> A + B.

In Core Erlang, a function cannot have multiple clauses. Therefore, multiple clauses in Erlang is translated to a case in Core Erlang:

'foo'/2 =
    fun (_0,_1) ->
          case <_0,_1> of
              <0,0> when 'true' ->
                  'invalid'
              <A,B> when 'true' ->
                  call 'erlang':'+'(A, B)
          end

The Core Erlang optimization passes also take advantage of the values construct to eliminate unnecessary tuple construction. For example:

bar(N) ->
    {A,B} = case N of
                0 -> {0,1};
                _ -> {-N,N}
            end,
    A * B.

After optimization, the Core Erlang code looks like this:

'bar'/1 =
    fun (_0) ->
        let <_7,_8> =
            case _0 of
              <0> when 'true' ->
                  <0,1>
              <_6> when 'true' ->
                  let <_1> =
                      call 'erlang':'-'(_0)
                  in  <_1,_0>
            end
        in
            call 'erlang':'*'(_7, _8)

Not sure whether you know about, but there is a series of blog posts at the Erlang/OTP Blog about the compiler. One of blog posts describes Core Erlang: Core Erlang by Example.

A final note, if you have serious plans creating your own BEAM-based language, we recommend using the Erlang Abstract Format because of the stronger compatibility guarantees. We consider Core Erlang to be an internal format that we will not change without good reason, but we will change it if there is a good reason. Note that we have made changes to how the compiler uses Core Erlang in the past.

8 Likes

Thanks for the thorough reply.

Short version: Your reply helped me get a better understanding than before, but I still have some unanswered questions. I think that I understand why a value list makes sense now. What is the purpose of an expression list?

Long version:

FYI: I wanted to put in links for everything, but since I am a new user I cannot add more than five links in a post. I have been forced to remove most of my references. Full version with links can be found here: Erlang Forum Response · GitHub

Intro

My initial post would have been very long if I tried to explain everything that I have tried to understand Core Erlang, which is why I kept it brief.
Furthermore, I wanted to focus on Core Erlang, rather than what I did with it. But I realize now that more context is needed. Sadly I am not sure how to keep it brief while explaining what I know and the source for my confusion with part of the Core Erlang syntax, so the long version is quite long.

My Academic work is about Gradual Session types. We want to continue the work of Igarashi et al. by closing the gap between academic work and usable implementations. Our product will be a type system with an implemented type checker that supplements Dialyzer like other tools already do today (eqWAlizer, Gradualizer etc.)

My introduction to Erlang was with Learn You Some Erlang for Great Good! supplemented by Google and YouTube. As for Core Erlang: I looked at the Erlang Blog as you linked Core Erlang by Example including the followup on Core Erlang Wrap Up. Furthermore, several websites give some introduction to Core Erlang, e.g. 8thlight [dot] com and baha [dot] github [dot] io. I also read a bit on the Elixir forum to figure out where Core Erlang belonged in the compile process since I did not fully trust the third party websites referenceless description of the compile process. I have studied several research articles that use Core Erlang to learn more and hopefully find references to more recent official documentation. The most official Core Erlang documentation I can find is consistently on the HIPE CERL research group website with the latest version obviously outdated. Then I tried to look at Dialyzer for both the Dialyzer research group website and the Dialyzer source code since I got hints that Dialyzer might use Core Erlang somehow. I have even explored a Haskell package that works with Core Erlang.

Why Core Erlang?

I did not know about Erlang before I started working on this research project last summer (~August 2023). I have found the more strict nature of Core Erlang to be an advantage when building an analysis tool, which is why I have chosen to move on with Core Erlang rather than Erlang or the Erlang Abstract Format. The Erlang Abstract Format “just” represents Erlang code as tagged tuples, it does not reduce the scope of the language. The sheer possibilities of tagged tuples and their use are overwhelming.

I have already seen EEP 52. There is also EEP 43. Core Erlang changing is an acceptable risk especially as I think the more strict construction makes static analysis easier. I would have to spend significantly more time understanding the flow of data in Erlang as the syntax is much less strict and have many ways to do similar things. Core Erlang is more verbose and less readable, but at the same time much simpler and straightforward in general. I feel that I am close to fully understanding Core Erlang. A key part is to understand why expression lists are allowed almost everywhere in the syntax.

Expression lists

I did not call <…> value lists, as the same syntactical construct can be used to contain expressions (refer to Core Erlang 1.0.3 Appendix A.7, see exprs). This is properly what put me the most off with regards to understanding the syntax of Core Erlang.

It is syntactically possible to write an expression list in both the Core Erlang spec and the Core Erlang parser. Expression lists are syntactically valid almost everywhere in Core Erlang (since a normal expression can always become an expression list), but it seems to me that their use is very limited. I only remember seeing expression lists being used like patterns, i.e. variables and/or values mixed in a single list.

You could put a full expression in the head of the case ... of if you wanted:

module 'test' ['foo'/1]
    attributes []
'foo'/1 =
     fun (_0) -> case <let X = 5 in X,42> of <A,B> when 'true' -> B end
end

However, even though expression lists are allowed almost everywhere in the syntax, there are several cases where I am not sure how it would make sense to have an expression list. E.g. in apply the local function name can be an expression list according to the syntax. For example:

module 'test' ['foo'/1]
    attributes []
'foo'/1 =
     fun (_0) -> let <Q,S> = <fun() -> 42,fun() -> 43> in apply Q ()
end

But when I use <Q> instead of just Q in apply I get an illegal expression in foo/1:

module 'test' ['foo'/1]
    attributes []
'foo'/1 =
     fun (_0) -> let <Q,S> = <fun() -> 42,fun() -> 43> in apply <Q> ()
end

Similarly, I do not understand why the syntax of the guard clause in the case of allows an expression list.

module 'test' ['foo'/1]
    attributes []
'foo'/1 =
     fun (_0) -> case <_0> of <X> when 'true' -> X end
end

Is accepted, but yet again if I put 'true' in an expression list it is no longer ok due to illegal guard expression in foo/1:

module 'test' ['foo'/1]
    attributes []
'foo'/1 =
     fun (_0) -> case <_0> of <X> when <'true'> -> X end
end

This is consistent with section 6.7 of Core Erlang 1.0.3:

If a clause guard evaluates to a value other than true or false, the behaviour is undefined

Why not limit the syntax to only support a single expression here? Why allow an expression list?

Yet another case where I do not fully understand the logic of where expression lists are in the sequence do notation. It allows both elements of the sequence to be an expression list: The following code is accepted with a warning (since I do not use 42):

module 'test' ['foo'/1]
    attributes []
'foo'/1 =
     fun (_0) -> do 42 <_0>
end

However, if I wrap 42 in <> to make an expression list:

module 'test' ['foo'/1]
    attributes []
'foo'/1 =
     fun (_0) -> do <42> <_0>
end

I get an illegal expression error:

$ erlc test.core
test: illegal expression in foo/1

Are expression lists allowed everywhere to keep the syntax simpler and defer the decision of where expression lists are used to a higher level in the compilation process? Or is there some other reason?

I’ve built my own Core Erlang parser, and understanding the meaning <…> makes sense to ensure the analysis of Core Erlang is correct. Until now the specifics were not important, but it has become important now. I had hoped that I would understand why expression lists existed as I worked more with the other parts of Core Erlang. Since I still do not fully understand it, I asked here in hoping to get help from someone who knows the motivation behind the design of Core Erlang.

Best internal format?

You say it is “better” to use the Erlang Abstract format. After I got the hint about to_core0 and dcore I started to read the compile.erl file (as I found them here).

There are two aspects here. One is the stability, the comments in the source code suggest they are equal: abstr and core. Another is documentation, where compile.erl clearly states that core erlang is not documented.

If I understand the original motivation for the Core Erlang format in the first place documented in the cerl spec version 1.0.3 it clearly states that Core Erlang is meant to be usable externally even to be edited by hand:

During its evolution, the syntax of Erlang has become somewhat complicated, making it difficult to develop programs that operate on the source. Such programs might be new parsers, optimisers that transform source code, and various instrumentations on the source code, for example, profilers and debuggers

I started to work with Core Erlang for the very reasons that motivated the creation of Core Erlang in the first place. I want a simpler way of working with Erlang for analysis with my tool.

The comments in the source code contradict these goals (e.g. lack of documentation). When did Erlang leave these goals behind? Or maybe Erlang never adopted them in the first place?

Maybe it is for the same reason that the Core Erlang specification never got updated after EEP 43 and EEP 52. Furthermore, I wonder why receive wasn’t completely removed if the format is considered to be internal only. I got the impression from EEP 52 that the receive was kept for backward compatibility reasons.

I feel that I am very close to fully understanding Core Erlang. Thank you for your time.

2 Likes

Yes. Anywhere that you can have an expression that produces a single value, the Core Erlang spec allows you to write a sequence of expressions producing a (mathematical) vector of values. This makes the syntax uniform with no special cases for where you may or may not write a vector. Quoting the spec:

… while in Erlang a function call evaluates
to a single value, in Core Erlang the result of evaluating an expression is
an ordered sequence, written <x1, . . ., xn>, of zero, one or more values xi. A
sequence is not in itself a value; thus it is not possible to create sequences of
sequences. For simplicity we denote any single-value sequence by x where
no confusion can ensue. If an expression e always evaluates to a sequence of
values <x1, . . ., xn>, then we define the degree of e to be the length n of this
sequence

The spec does not require an implementation to check whether you are matching the degrees of producers and consumers in your code. It could, but if it doesn’t (as in the Beam currently), you’re on your own, and a degree mismatch could result in a VM crash, just like if you generate low level Beam code yourself.

For example, syntactically you cannot write <1, <2, 3>, 4>, but you can write <1, call 'foo':'bar'(...), 4>, and there’s nothing that checks that the call doesn’t produce exactly one value. If it produces more than one, the result would probably just be that all but the first are dropped. However, you could also have something like case call 'foo':'bar'(...) of <v1, v2> -> ... end where nothing ensures that v2 is in fact initialized, if the call returns only 1 value, and the result could be a crash, possibly even an emulator segfault. Hence, multiple-value optimizations need to be very carefully implemented.

Since Core Erlang was created to support the Beam compiler, it is not so strange that while the parser might accept all legal programs, the actual processing of the parse tree is less general and in some cases fails to understand the code because there has never been an actual program that has tried to use that particular form before. If you find such cases and they seem worth fixing, you’re very welcome to try to submit a PR to the compiler - it might be a very simple thing to fix - or open a new issue.

8 Likes

Thanks, it is much clearer now.