Re-visiting EEP-0055

starbelly · April 21, 2022, 2:10am

EEP-0055 was submitted on
21-Dec-2020.

An accompanying implementation was submitted in which a lot of conversation ensued.

It was decided that the EEP would not be set for inclusion in OTP-24, per the time table at that juncture and that it would be revisited prior to OTP-25. OTP-25 is now at a point where this is not possible.

That said, I wanted to start a topic here about the EEP and gun for inclusion in OTP-26.

I would point to @kennethL’s last comment on the PR as a starting point for discussion.

I suppose my overarching question here is : Is this still on the table? And if so, what are the road blocks? Kenneth pointed out some possible roadblacks that needed investigation, but it’s not clear to me what happened after that.

Of course, since I’m raising this topic, I’m obviously in favor of the operator I’d also be happy to work to drive it forward.

LeonardB · April 21, 2022, 12:32pm

I’m copying the Erlang Questions ML with this post since there was
significant and heated discussion regarding this EEP and not all ML
subscribers have joined the forum.

nzok · April 21, 2022, 12:58pm

I was one of the people who was very much against this syntactic
form. There are better things to do.

michalmuskala · April 21, 2022, 1:04pm

With the feature mechanism introduced in OTP 25, this could be implemented as an optional, experimental syntax extension relatively easily. To me this makes it a no-brainer to move forward with the EEP and implementation - it’s conceptually simple feature that improves ergonomics and makes pattern matching in fun heads possible:

foo(List, B) ->
   lists:filter(fun
       ({element, ^B}) -> true;
       (_) -> false
    end, List).

is much nicer than:

foo(List, B) ->
   lists:filter(fun
       ({element, Tmp}) when Tmp =:= B -> true;
       (_) -> false
    end, List).

Similarly, it improves pattern matching in list comprehensions:

[Element || {element, ^B} <- List]
% instead of
[Element || {element, Tmp} <- List, Tmp =:= B]

Finally, it allows emitting much more optimised match specs from ets:fun2ms:

ets:fun2ms(fun({^Key, Value}) -> Value end).

would probably emit:

[{{Key, '$1'}, [], ['$1']}]

instead of today’s:

ets:fun2ms(fun({Tmp, Value}) when Tmp =:= Key -> Value end).

That emits:

[{{'$1', '$2'}, [{'=:=', '$1', Key}], ['$2']}]

Which is much worse in execution since it cannot take advantage of the key matching optimisation. This is even worse in ordered_set tables where prefix key matches are optimised, but not if guards are used. Missing this optimisation is easy and can lead to surprising performance behaviour. With the ^ matching the simplest way to write the match spec is also the most performant way.

starbelly · April 21, 2022, 2:42pm

Ya know, I didn’t even think about the optimization gains from it. Thanks for calling this out!

Jacob · April 21, 2022, 4:55pm

This is not the same like the comparison, since Key is interpreted as a pattern here which can contain '_' and '$1' etc. So the semantics of the match spec would in general be different from the semantics of the transformed function.

It actually emits (replacing Key by its value and wrapping it with const)

[{{'$1','$2'},[{'=:=','$1',{const,Key}}],['$2']}]

to make sure that every term is compared as-is and not evaluated (e.g. if Key was {'+', 1, 1}) .

kennethL · April 22, 2022, 11:37am

I can just say that this is one of many in a list of potential extensions to the language that the OTP team will consider start working on after the release of OTP 25.

ebengt · April 23, 2022, 4:07pm

Greetings,

I remember from the mail list discussion that somebody suggested that it was a waste to use ^ for this one thing only. Since we are talking about annotating a variable with some extra information, it would be better to use ^ for annotations. All annotations that we might want to have. Not only variables but functions, too.

Example (not from the mailing list, this one is on me):
^pin X = 123,

Best Wishes,
bengt

tsloughter · April 25, 2022, 11:53am

Just throwing in my 2 cents: I would very much love to see this added if it were only a part of anonymous functions somehow. Being able to do f(X) -> fun(^X) -> ... instead of f(X) -> fun(OtherX) when OtherX =:= X -> ... would be very nice. It helps prevent some mistakes that lead to bugs and makes code more readable when X is a useful name and not just X

jhogberg · April 25, 2022, 12:24pm

+1. Letting us phase out shadowing over time is very nice, it’s one of the ugliest warts we have.

MononcQc · April 25, 2022, 2:18pm

I don’t think my personal views on that EEP have changed since the github thread:

I like the current idea that I can reuse a previously bound variable in a case (or any other conditional) expression’s clauses. It gives more power to pattern matching, in a way that I prefer to the accidental risk of blocking a match unintentionally.

I would in fact prefer if the ^ operator was used to say “in this specific case, I want you to rebind the value” – allow control to rebind, but keep a pattern match as the default.

I do agree that the shadowing in funs is annoying and working the guard is more work than ideal.

My sense about the language is that scoping rules in general are a bigger burden, namely that the variables bound within a case (or if, begin, try...catch, receive, and now maybe) expression are bound outside of that expression. At least the compiler catches times where it’s not safe to reuse, but I find that more confusing than the re-matching

I think that addressing the ability to rebind in these conditional clauses patterns but without addressing the current scope (or lack thereof) of these expressions actually increases language complexity rather than decreasing it.

Giving these block expressions their own scope would spread that fun ghosting issue, which I think should be tackled independently, and in that world, using ^Var to mean “Grab the value from the parent scope” would possibly make more sense.

But with the absence of a fixed scope, I don’t think the ^ from the EEP should be allowed anywhere but in places where you currently get shadowing warnings, which I believe is funs, list comprehension generators, and nowhere else, and both of this is mostly simplifying having to pin the value with a guard.

The advantage of also keeping it only with warnings is that you could introduce the feature almost purely via these warnings. Of course you’d want documentation and whatnot, but it creates an interesting self-contained mechanism where shadowing tells you about the ^, and so does removing the ^ end up explaining what it does.

You could also safely add a warning or error when the ^ is used outside of where it makes sense ("the '^' captures elements from the parent scope when there would otherwise be shadowing, but there is either no parent scope or no shadowing occurring") This is a bit more discoverable and usable than allowing it on any random pattern or head.

But in the form the EEP had, I’m unexcited about it and feel it makes things more complex than they are.

max-au · April 25, 2022, 3:04pm

My vote as well.

lehoff · April 26, 2022, 7:13am

I have never felt that single assignment has been a big problem.
Yes, it can be annoying to come up with new names, but often that is just a matter of appropriate helper functions.

I agree with @MononcQc that scoping is a bigger issue. “Exporting” a variable out of a case-statement is a source of crazy pain when trying to understand code.

But one example from the EEP really puts the nail in the coffin for me:

f(X, Y) ->
    F = fun ({a, ^Y, Y})  -> {ok, Y};
            (_) -> error
        end,
    F(X).

Here Y has multiple interpretations. In the anonymous function’s header ^Y refers to Y from f's arguments, but inside the anonymous function Y refers to the newly bound Y.
That is just too complicated to work with. This dwarfs all the upsides from having a pinning operator.

OvermindDL1 · April 26, 2022, 4:44pm

Yeah, I honestly really would prefer it to be all scoped to the main function, so the Y in f would be the same Y as in the fun, no pinning needed. Now if only that could be done in a backwards compatible way… Compiler switch? Or perhaps an opt-in -compile(something). or so per file? There really is no need for pinning.

starbelly · April 27, 2022, 12:38am

That’s definitely the one part I didn’t like and would ask to be removed.

starbelly · April 27, 2022, 1:10am

That sounds much more preferable to me, but if I’m not mistaken, this would be a huge breaking change. How would we get there?

michalmuskala · April 27, 2022, 3:59pm

Are you proposing that to pass in a variable into a function I’d have to “pin” it?

Effectively:

X = 1,
fun() -> X end

would not be allowed, I’d have to write:

X = 1,
fun() -> ^X end

This seems like solving a completely different issue and rather unrelated to the original proposal, unless I did not understand your comment.

BTW, just earlier today we had another bug caused by accidentally re-using variables in patterns, that the pin operator solves. It is an actual issue that affects real codebases leading to serious bugs and a big foot-gun in the language affecting especially newcomers.

MononcQc · April 28, 2022, 1:39am

No I mean specifically the idea of going:

f() ->
    Key = whatever,
    fun({^Key, Val}) -> {ok, Val};
       ({Key, _}) -> {error, {unknown, Key}}
    end.

g(Key, Val) ->
    F = f(),
    F({Key, Val}).

Here the anonymous function has two clauses. The top one uses ^ and matches on whatever because it reuses the value from the parent scope within the match pattern, with no shadowing. The second clause works as it does today and matches to anything (and likely warns about shadowing still).

LeonardB · April 28, 2022, 11:55am

The way I see it, one of the first things I first learned when starting with Erlang is that it has “single assignment”.

My expectation was that all variables inside a function were therefore, “assign on first use and match thereafter”.

This seems to be what most expect to happen and the primary cause of the class of bugs discussed in the EEP.

As I was learning I was bitten by this a few times until I learned that funs and comprehensions both have slightly strange scoping in that “parts” of them inherit variables from the parent scope, and others do not.

For these specific cases ‘^’ pinning could help eliminate a class of bugs.

I may be a bit nuts, but my opinion is that the issue is almost always caused by a misunderstanding of the scoping rules, and the only “proper” way to fix it is by fixing the scoping to include all bound variables in the parent scope in the fun/comprehension scope.

Without understanding the complexities and knock-on effects of such a change;

Existing code that is using variables from the parent scope in matching should already be using a guard with a temporary variable and would be fully backward compatible.
Compatibility would only be broken for cases where shadowing is happening.
By ‘fixing’ the scoping and turning shadowing into an error instead of a warning this should resolve all the confusion without having to introduce pinning at all.

tldr;

being able to use ‘^’ pinning to indicate a match against a previously bound variable from the parent scope would be great.
Not having to do anything to existing code would be even better.

MononcQc · April 28, 2022, 5:32pm

A lot of the use cases that could be considered buggy in terms of using clauses in case expressions seem to come from cases where the variable is mistakenly already bound (this happened to me a few times but was pretty much always caught real fast).

I sort of always liked the thing Dr.Racket (previously Dr.Scheme) did as a programming environment for variable tracking:

This sort of handling is done automatically when hovering over a variable and visibly gives a visual explanation about where variables go.

Either way, I sort of felt they were interesting to mention because that sort of threading-the-needle to track where values come from can be supported by the language, but do not necessarily need to. In all cases this isn’t necessarily easy to do in all editors, but works around semantic gotchas without changes to the language.