EEP 70: Non-filtering generators

dszoboszlay · August 24, 2024, 11:04pm

This is the abstract of EEP-70 (written by me), which aims to address generators silently skipping non-matching elements of collections.

EEP Link: Add proposal for non-skipping generators - Pull Request #62 - erlang/eep - GitHub

Abstract

This EEP proposes the addition of a new, non-skipping variant of all existing generators (list, bit string and map). Currently existing generators are skipping: they ignore terms in the right-hand side expression that do not match the left-hand side pattern. Non-skipping generators on the other hand shall fail with exception badmatch.

For example consider the below snippet:

[{User, Email} || #{user := User, email := Email} <- all_users()]

This list comprehension would skip users that don’t have an email address. This may be an issue if we suspect potentially incorrect input data, like in case all_users/0 would read the users from a JSON file. Therefore cautious code that would prefer crashing instead of silently skipping incorrect input would have to use a more verbose map function:

lists:map(fun(#{user := User, email := Email}) -> {User, Email} end,
          all_users())

Unlike the generator, the anonymous function would crash on a user without an email address. Non-skipping generators would allow similar semantics in comprehensions too:

[{User, Email} || #{user := User, email := Email} <:- all_users()]

This generator would crash (with a badmatch error) if the pattern wouldn’t match an element of the list.

nhpip · August 25, 2024, 2:11pm

I like this idea.

franbrau · August 25, 2024, 2:12pm

Thats 3 different characters in a single operator that looks like an emoticon instead of an arrow. Why not something like…

[{User, Email} || #{user := User, email := Email} <<- all_users()]

dszoboszlay · August 28, 2024, 10:29am

Well, finding syntax that fits everybody’s taste is always hard… I originally proposed <-:- but in the EEP PR it was suggested to change to <:-. I’m not strongly attached to either.

My slight issue with <<- is that it’s already valid syntax (as in <<-1>> for example), which would make parsing harder.

schnef · August 29, 2024, 8:51am

Yes, I know I’m starting to sound a bit like a grumpy old man who especially dislikes change, but a key point why I’m a fan of Erlang is the limited, simple, syntax of the language. Recently, several things have been added to the language that in themselves are very legitimate, but meanwhile make Erlang increasingly complex. For me, the increasing complexity does not outweigh something like being able to distinguish between filtering or not filtering in generators. After running into this once, I have always taken it into account in my code, especially since non-filtering generators also often require additional actions and the “regular” comprehension does not fit anyway. My choice is not to implement EEP70.

dszoboszlay · August 29, 2024, 11:29am

I understand your point. On the other hand, I see the introduction of features to Erlang as a call for more experimentation for extending the language (and therefore inevitably making it more complex), without committing to those experiments until they are proven to have a positive effect.

Therefore I proposed implementing EEP 70 as a feature that can be discarded if it doesn’t stand the test of time. I hope this would help you too in accepting it!

nzok · August 29, 2024, 11:32pm

I agree with Frans Schneider that this feature fills a much-needed gap (a phrase that goes back to 1857, I believe, although I first me it in a conference report by Dijkstra, who mentioned a speaker using it).

TL;DR: I do not believe that EEP 70 actually gets to the heart of the problem that it is meant to address, and find the proposed notation obscure.

There’s something I often say to my granddaughter, which I knew I got from T.S.Eliot but I just discovered I misremembered. What I say is “You gotta use words when you talk to me”. The correct quotation, from Fragment of an Agon, is “I gotta use words when I talk to you”.

There are two separable issues.

Do we need an error-on-mismatch generator at all?

(Digression: I did not find ‘non-skipping’ to be a helpful way to describe it, because I have never thought of the existing constructions as ‘skipping’. I’m using error-on-mismatch here as being understandable in its own right without reference to what any other form of generator does.)

Possibly because I met list comprehension in Haskell long before Erlang got it, and a range of analogous constructions in Pop-11 before Haskell existed, it has long been “intuitive” to me that IF you are allowed to put patterns in a generator at all, then you OBVIOUSLY want candidates that don’t match those patterns to be silently passed over.

If for some reason my program expects the elements of a list to have a certain form and they don’t, I certainly don’t want the error message coming from deep in the guts of a list comprehension. I want to know about it before it gets that far. I’m going to want to document, test, and probably Dialyze the thing that constructs the list.

This really seems like a very comprehension-specific way to address a problem that isn’t actually about list comprehension at all

Is making semantic distinctions by mashing up strings of non-semantically-related characters a good approach?

In my view, no. A programming language may imbue a punctuation mark with a semantics it does not normally have. I am happy with the use of “!” for “definitely not NULL” and “?” for “possibly NULL” in some languages because they’re consistent about it. I am happy with the distinctions SETL draws between () and {} because they are fundamental to that language and are used consistently. But wanting more than one kind of arrow, and distinguishing between them in a way that is NOT done elsewhere in the language, no. That just makes things harder to understand.

Now that Erlang is a Unicode language, there is an abundance of arrow shapes to choose from, but no pair of shapes suggests a select-matching vs error-on-mismatch distinction.

So how about using words to convey meanings?

And of course, the question remains, will EEP 70 fix the problem?
Now my idea of what the real problem is may differ from yours.
Let’s agree that EEP 70 thinks that the problem is the way existing Erlang list comprehensions work.
EEP 70 doesn’t actually change that. It doesn’t change all the training material in existence, like LYSE, which points people to the current syntax. And it doesn’t make the new syntax easier to use than the old.
Just how many people will find the new syntax useful enough to adopt?
Remember, it doesn’t let you DO anything you couldn’t do before.
It doesn’t make it EASIER to do something you could do before.
It doesn’t make your code FASTER.
There is no reason to change existing code to the new form.
Just how many people will find the new syntax useful enough to adopt?

dszoboszlay · August 30, 2024, 11:25pm

(Digression: I did not find ‘non-skipping’ to be a helpful way to describe it, because I have never thought of the existing constructions as ‘skipping’. I’m using error-on-mismatch here as being understandable in its own right without reference to what any other form of generator does.)

This is a very valid point. When I first thought about the problem, I used the term “filtering”, because a comprehension contains generators and filters, and the current generators also act as an implicit filter when they contain patterns. I switched to the term “skipping” during implementation, because the compiler calls the respective clause of the generated anonymous fun the skip clause. But this name is completely internal to the compiler, so it would be easy to change to something that makes sense to users.

it has long been “intuitive” to me that IF you are allowed to put patterns in a generator at all, then you OBVIOUSLY want candidates that don’t match those patterns to be silently passed over

For me at least this is not intuitive at all. It isn’t even documented how generators with patterns treat non-matching elements. For the very least this should be documented, but I believe having a different kind of generator that errors on non-matching elements would be even better, because nobody reads the fine print in the documentation, but I hope most people at least remember what operators are in a language. And once you remember that there are two different generators, but you don’t remember what the difference is, than at least the problem became a known-unknown for you, which is much better than an unknown-unknown.

But wanting more than one kind of arrow, and distinguishing between them in a way that is NOT done elsewhere in the language, no. That just makes things harder to understand.

I think that ship has long sailed, Erlang already uses ->, <-, =>, <= and :=, not to mention << and >> which also sort of look like arrows. And it doesn’t make things hard to understand. If you don’t like the proposed <:- and <:= arrows (or my original proposal of <-:- and <=:=, which by the way are not random character sequences, the : was chosen to represent a match, similar to how =:= is a test for matching vs. == which is not), please propose something better! But just introducing a new arrow that can be used in a specific context won’t make the language too hard to understand. Just look at how the maybe expression introduced the ?= operator, without exploding Erlang’s complexity.

Let’s agree that EEP 70 thinks that the problem is the way existing Erlang list comprehensions work.
EEP 70 doesn’t actually change that.

Yes, it intentionally avoids messing with existing comprehensions. Adding new language elements instead of changing the semantics of existing ones gives us backward compatibility, and makes it easier to read the code (if you encounter syntax that you haven’t seen before, you can go and check in the docs; if you encounter syntax that unknowingly to you changed its semantics, you’ll just be very confused). Both of these are desired properties.

It doesn’t change all the training material in existence, like LYSE, which points people to the current syntax.

This is not an argument against EEP 70, it’s an argument against any kind of change. Also, pretty much out of scope for an EEP.

And it doesn’t make the new syntax easier to use than the old.

Unfortunately, this is true, because the new operators will be longer than the old ones. But this will be just one more entry on the long list of language elements where the thing you typically want is not the thing that is easiest to type. See == vs. =:= in Erlang, or == vs. === in Javascript, or C requiring you to mark const arguments instead of mutable ones etc.

Just how many people will find the new syntax useful enough to adopt?
Remember, it doesn’t let you DO anything you couldn’t do before.
It doesn’t make it EASIER to do something you could do before.
It doesn’t make your code FASTER.

I respectfully disagree. I just want to cite my example: writing [{User, Email} || #{user := User, email := Email} <:- all_users()] is definitely easier and faster than typing out a lists:map/2 call with an anonymous function and whatnot. (And one may even argue that since the comprehension compiles into a fun that will only use local calls and just one call per element, it will be faster than the one local call + one qualified call per element lists:map/2.)

schnef · September 1, 2024, 10:48am

In case you have a generator that errors on non-matching elements, you will have to deal with those errors. In case you have a nested comprehension, which is exactly what current comprehensions excel at, how do you deal with those errors? In case you want to catch non-matching elements, the most logical approach is to use an (anonymous) function and a function from the list module. Old training material I found even explicitly stated that you should clean up your data first before pushing it through a comprehension to process the data. In other words, in such situations comprehensions are not the appropriate solution and you need to do more than using a syntactically good looking compact construction.

Maybe this says more about me than about the use of punctuation marks, but I still regularly have to peek for the meaning of =>, :=, <-, `<=’ and so on. As with, say, Haskell, all those punctuation combinations don’t work for me.

dszoboszlay · September 1, 2024, 11:32am

In case you have a generator that errors on non-matching elements, you will have to deal with those errors.

No, my proposal is not for the case when you want to handle errors. It’s for the (very typical) scenario when there shouldn’t be any bad data in the input, and if there is, it’s probably a bug. So you want an assertion and you want to just crash when the assertion fails instead of hiding the problem and continue.

I agree that if you know you’re dealing with unsainitized input you probably shouldn’t go with a comprehension at all.

nzok · September 4, 2024, 10:30am

I reiterate my point that ensuring that there are no mismatches
in a comprehension is best done by documentation, testing, and
type checking.

<< and >> look like to me like ASCII approximations to
LEFT-POINTING DOUBLE ANGLE QUOTATION MARK and
RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
and are used in balanced pairs consistent with the use of guillemets.
So they don’t trigger my “arrow recognition” neurons at all at all.

As for all the other sigils, they DO cause confusion.
I for one would be much happier with
max(X, Y) when X > Y = X;
max(X, Y) = X.
And given that Erlang does use → in function definitions,
I’m not thrilled with its use in “if” and “case”.
But just because I have to live with old problems doesn’t mean
I am eager to face new ones.

As for the distinction between “:=” and “=>” in maps, I find it
a serious pain I appreciate the utility of the distinction, but
it is NOT visually intuitive and I still keep getting it wrong.
Like I said, I appreciate why the distinction is made, so I blame
myself rather than Erlang for the frequent mistakes.

However, there’s an interesting distinction.
The multiple readings of “->” don’t actually confuse me because
in each context where “->” is allowed there is no alternative.
In each context where “<-” is allowed, there is no alternative.
There is no confusion between “<-” and “->” because neither can
occur where the other does.
But “:=” and “=>” CAN occur in the same place and DO cause
confusion because they are semantically opaque.

As T. S. Eliot’s Sweeney put it, “I gotta use words when I talk to you.”

Note that BOTH kinds of comprehension (list and binary) have
pattern-selection semantics. Here’s an example:

> [X || <<X,0>> <= <<1,0,2,1,3,0>>].
[1,3]

So you don’t need ONE extra arrow, you need TWO.
One for error-on-mismatch generation from a list, and
one for error-on-mismatch generation from a binary.
And then we will have four visually confusing and
subtly different arrows that can occur in the same place.

At least with binary comprehensions, you get a compile-time
error if you use <= with a pattern that is not a binary pattern.

But in any case I don’t buy the argument “Erlang already has
a lot of things that look like arrows including some that don’t
so it’s no big deal to add another semantically opaque one.”
It’s like saying “Mrs Proust’s face has so many warts that
it would be improved by a scar.” No, it wouldn’t.

Maria-12648430 · September 4, 2024, 3:33pm

I have long pondered whether I like this proposal or not, and why not ATM, I’m inclined towards “don’t like”, also (but not only) for the reasons @schnef and @nzok pointed out.

One reason that was (AFAICS) not mentioned before is that this pattern concerns only a very narrow use case, namely that you expect all elements of a list (or binary or map) to adhere to a specific shape. I would argue that (current) comprehensions are a convenience feature, with work under certain assumptions, restrictions, and peculiarities (which, I agree, should be clearly documented). If your use case doesn’t fit those assumptions/restrictions/peculiarities under which the convenience feature works, you can’t use it and will have to go the explicit way, not introduce yet another similar convenience feature with slightly different assumptions/etc.

As an example, binary generators (<=) will drop any non-fitting remainders (eg, [X || <<X:3>> <= <<2#0101010101010101:16>>] will result in [2,5,2,5,2], the remaining 1 being dropped). I often wished for a convenient binary generator to also deliver the remainder which does not fit <<X:3>>. Neither <= nor the proposed <:= offers that, ie the former will drop it, the latter will badarg. But it would be so very convenient to have it, a common use case, so should we not introduce yet another generator for the sake of convenience, like, say, <~=?

Another objection I have is that it is difficult to spot when skimming existing code. Others have complained about the operators becoming longer (or more verbose). I object to it being not verbose enough, ie that the behavior of being lenient or strict entirely hinges on there being a single : in the comprehension, which is very difficult to spot, more so since it is bound to occur way down a line of code (vs at the beginning), buried somewhere between the expression and generator code of a comprehension.

Hm. I can’t be sure, but I don’t think feature-ing is meant to be used that way, in the sense of “let’s see if people use and like it, and drop it if they don’t”. The feature mechanism as I understand it is meant to be testing and working out bugs in features that are still unstable but definitely to be in the language for good, ie not to see if people like it and remove it again if they don’t. For one, I would not use such a feature, like, put it in only to maybe remove it later, why bother? For another, how would anyone even measure if and how many people were using it?

bjorng · October 3, 2024, 2:58pm

The OTP Technical Board has discussed EEP 70 after having read the public feedback and decided to approve EEP 70.

We have a further suggestion regarding terminology of the operators. We suggest calling the traditional generator operators (<- and <=) relaxed, while the new operators are strict (<:- and <:=).

Also, this EEP 70 should not be implemented as a feature. We find that unnecessary.

josevalim · October 3, 2024, 6:15pm

I think framing it as relaxed/strict and pigging back on the existing =:= is a great idea to make them feel more integrated with the rest of the language.

nzok · October 4, 2024, 11:37pm

“It’s your ball, Dr Naismith.”

Can I beg you for an ‘antifeature’, a flag I can set that says “I declare to human and machine readers of this model that I do not intend to use this and would like it reported as a typing mistake if it shows up”?

Not just for this particular wart on the language, but there are other features of Erlang one might want to avoid. For example,I should be able to say “warn if I use ‘and’ or ‘or’”

elbrujohalcon · October 7, 2024, 7:15am

If not part of OTP itself, you can achieve this goal by writing custom Elvis rules (or not custom… if you feel like contributing to the elvis_core project with a PR ), as long as you use elvis or rebar3_lint to check your project

arcusfelis · October 10, 2024, 12:49am

Oh, yes please

hanssv · October 17, 2024, 2:49pm

I am late to the party, but I am curious? Is there positive feedback elsewhere, or why was it decided that this was a feature that needed to be included? Apart from one (1) “I like this idea” the rest of the feedback leans towards this being at best unnecessary, and at worst a rather poor idea?

dszoboszlay · October 17, 2024, 5:08pm

If I counted correctly, there were 5 people expressing some concerns in this thread. There is only one “I like it” comment, but don’t forget about the 6 likes on the opening post either, plus some more on the GitHub PRs.

I know it’s not a lot of feedback, but it isn’t mostly negative either.

juhlig · October 17, 2024, 7:20pm

I’m impartial to this, but just let me point out that if you are taking the likes on the opening post into account, you also have to take the likes on the critical ones into account