EEP 70: Non-filtering generators

dszoboszlay · October 17, 2024, 7:39pm

That is definitely true, and I’m sorry for my methodologycal error! My only excuse is that I was replying from my phone, travelling on the underground, which posed a limit on my analytical capabilities.

If I didn’t mess up counting, the 41 likes seem to divide 21:20 among pro and contra posts. I’m not going to count the numbers of unique users behind each set, because I’m too lazy for that.

bjorng · October 18, 2024, 8:20am

We don’t count the number of likes and dislikes.

We consider whether the good that comes from a new feature outweighs the negative consequences resulting from the feature. In this case, in the opinion of the OTB, the pros outweighs the cons.

Personally, in the beginning the pros and cons balanced pretty even for me. After discussing with the team and thinking about it, my opinion is now that the positive outweighs the negative.

So here is my thoughts about what good comes from this feature. In huge code bases with many developers, a developer can use the new operators to make their intention perfectly clear that every element in a generator must match a certain pattern. If another developer comes along later and adds element to the list that don’t match the pattern, it will be noticed when running the test suite (or possibly by Dialyzer before that), instead of much later.

The Erlang compiler application is not a huge application and the compiler team is fairly small, but it is important for the entire BEAM community that the compiler has as few bugs as possible. We welcome anything that can help us find bugs quicker. Therefore, we look forward to start using the strict generator operators.

In fact, I couldn’t resist the temptation to try it out immediately:

So this branch, based on PR-8625, introduces strict generators in many generators in many modules in the compiler. Note that the compiler is an application where we deliberately filter lists using patterns. For example, we have lists containing a mix of #b_var{} and #b_literal{} records. We often want to collect only the variables (or vice versa) from such a list. Having different generator operators makes the intent (filtering vs processing all elements) clear at a glance.

dszoboszlay · October 18, 2024, 8:45am

Wow, that’s amazing! You found 95 comprehension worth updating to the strict operators. That’s way more than what I expected.

hanssv · October 18, 2024, 9:01am

I personally think that the board severly underestimate the cons of bloating a language that has as one of its key selling poings that it has a small footprint. As I mentioned I was very surprised that this was accepted, given that no-one (apart of course from the contributions by @dszoboszlay) had managed to write down any substantial support for this.

Now I guess I have to look forward to the discussion on how to properly mix EEP73 (the zipping) with this. Surprisingly the default over there seems to be the non-filtering version with a syntax that “at a glance” suggests it is filtering…

juhlig · October 18, 2024, 9:48am

I wasn’t entirely serious

LostKobrakai · October 18, 2024, 10:34am

I don’t have any hard evidence, but in the elixir space there’ve been people asking for generators, which fail if the pattern wasn’t matched, for as long as I’m active in the community. But given elixir compiles to erlang the answer has always been “no, erlang doesn’t provide this”.
I personally have rewritten for usages in elixir to Enum for exactly those reasons, which always felt unfortunate.

On the one hand there’s an expressive alternative to chaining a bunch of function calls around iteration and on the other hand you can only use it if you have confidence your data matches the expected patterns or are fine with silently dropping any non-matching ones. The complexity of syntax question is not to be played down for sure. Though list comprehensions existing is already quite the signal of the alternative not being great either.

bjorng · October 18, 2024, 10:58am

The zip comprehension implementation and the EEP will be updated to support the strict operators when PR-8625 has been merged. The colon-less generator operators will do filtering to be consistent with the old non-zipping generators. Currently in the compiler, none of the zip generators would do any filtering, but I have branch I’m working on there a filtering zip comprehension would be useful.

Here is an example from yecc of zipping and filtering:

github.com

bjorng/otp/blob/8a743c1fd5f54fc6a186c3879ba2f1b36887bb76/lib/parsetools/src/yecc.erl#L2341-L2345


      
          find_partial_shift_states(StateActionsL, StateReprs) ->
              L = [{State, Actions} ||
                      {State,Actions} <- StateActionsL &&
                          {State,State} <- StateReprs,
                      shift_actions_only(Actions)],

rlipscombe · October 21, 2024, 8:11pm

I’ve only just seen this proposal. I’ve not looked at the details, but I raised a similar request here: Is there a way to make list comprehensions “strict”?, so I’m broadly in favour of it.

josevalim · October 22, 2024, 6:08am

There has been some discussion about this feature in a separate thread, with some rationale from the Erlang/OTP team. I am moving them back to this thread because I am mostly interested on the discussion around this feature.

There are two arguments here:

Binary comprehensions are confusing and prone to mistakes (which I agree and I will add that using <= in general for comprehensions is confusing)
The footguns in comprehensions are meant to supplant rather than extend

To the second point, @bjorng said “there are 105 strict generator operators and 73 relaxed generator operators”. That tells me that, what we will see in practice, is that at least three operators will be used: <-, <:-, and <:=. So if the goal is to supplant, I don’t believe the current proposal has enough tools to make that a reality (please correct me if I am wrong). To do so, we need to either add a “strict” match to comprehensions or a “relaxed” match. I will outline both options below.

Option 1: Add strict operators and relaxed match

This option suggests adding <:- and <:=, as in EEP 70, and then adding ?= (which is used in maybe) for relaxed matching in comprehensions. This way, we can argue that all generators should be strict by default, <- and <= can be deprecated in long term, and effectively supplanted. For example, if I have a list with ok/error tuples and I want to filter the ok ones, I could write:

Oks = [Value || Pair <:- List, {ok, Value} ?= Pair].

But if I am expecting all of them to be ok, I would write:

Oks = [Value || {ok, Value} <:- List].

Option 2: Add a new operator for binary generators and a strict match

This option suggests adding a new operator for binary comprehensions only (either <:- or something else), which is relaxed but raises on trailing entries. And then allowing = in comprehensions. This way, <= can be supplanted.

If I have a list with ok/error tuples and I want to filter the ok ones, I could write (as of today):

Oks = [Value || {ok, Value} <- List].

But if I am expecting all of them to be ok, I would write:

Oks = [Value || Pair <- List, {ok, Value} = Pair].

Summary

While I would prefer Option 2, because that would require fewer changes, I also understand the goal of making generators strict by default, so I am perfect fine with any of the options above. My main goal is to point out we might not have enough to replace the existing generators. If that’s one of the rationales behind accepting this EEP, then I’d suggest to:

Update the EEP to mention the goal of replacement and include examples of how existing relaxed generators can be written from strict ones
Include a rough estimate of when the current generators will be deprecated (we can keep them around forever but we could emit a warning - which one could turn off if desired).

Thank you for your time.

hanssv · October 22, 2024, 8:54am

I think you forgot Option 0, the one where we don’t add any of this

Regarding Option 1 and Option 2, I am not sure Option 1 is really a realistic option since it would invalidate all existing comprehension code?!

josevalim · October 22, 2024, 11:37am

We wouldn’t remove any of the existing operators. They would be kept working for a long time, as other deprecated functionality that still exists in Erlang.

hanssv · October 22, 2024, 12:24pm

I read words like deprecate and ‘emit a warning’ - that would invalidate any sane build (warnings_as_errors) of existing code, no?

josevalim · October 22, 2024, 12:31pm

Sure, but I would also hope that any sane codebase would prefer to update to the patterns encouraged by the language, rather than relying on functionality that is considered to have pitfalls and have been marked in the documentation as stale several years prior. And if you really really really want to stick with the no-longer-recommended constructs, you would be able to add -compile(nowarn_deprecated_generators) or similar.

I am not proposing to remove things tomorrow. But if the goal is to replace the existing constructs, a long term plan on how to get there would be an important part of the discussion.

hanssv · October 22, 2024, 12:53pm

Unfortunately that is not how it works, even though it should in the best of worlds

schnef · October 22, 2024, 1:39pm

Iff, and only if, non-filtering generators are to be added, option 2 definitely is the best option

dszoboszlay · October 22, 2024, 10:11pm

This came up a couple of times, but I think it’s unfortunately a necessary piece of complexity, unless we want to completely change the comprehension syntax as suggested by EEP 12. The problem is that list, binary and map generators should be statically distinguishable.

List generators are the default case: everything that doesn’t look like a map or binary generator is a list generator.
Map generators are easy to identify from their left hand side, because K := V is not a term, thus cannot be a member of a list or binary.
This trick doesn’t work with binary generators, because their left hand side is a binary pattern, which could also match the element of a list. So binary generators have to be marked explicitly somehow, and using a different arrow is a handy tool for that.

Without statically distinguishable generators we’d have problems with expressions like [X || <<X>> <- [<<1>>, <<2>> | <<3, 4>>]]. What should this evaluate to: [1, 2, 3, 4] or a bad_generator error?

I can’t speak for the OTP team’s plans, but I didn’t write EEP 70 with the intention to completely kill current generators. I read the original comment about supplanting vs extending in line with this: strict generators should be your default choice, because most of the time they describe your intentions better. But relaxed operators will remain there for the scenarios where your actual intent matches the relaxed semantics (plus backward compatibility, of course). My preferred analogy is =:= vs ==: 99% of the time you want =:= but == is still there for you.

That said, out of your two options I find the 1st one doable, but the 2nd one seems problematic.

Matching with = is already allowed in comprehensions as long as it works as a filter (that is: the result is a boolean). Deprecating this usage, then completely disallowing it and then finally reintroducing the same syntax with different semantics appears to be next to impossible due to the backward (in)compatibility issues.
You leave the foot gun of relaxed generators silently filtering out elements in place. It doesn’t matter that you could write Pair <- List, {ok, Value} = Pair to avoid the problem, people will forget to do so. Just like they currently forget to use the workarounds that exist today, such as Pair <- List, case Pair of {ok, Value} -> true; Value -> exit({badmatch, Value}) end.

Regarding matching in comprehensions: I think it would worth a discussion of its own. I personally felt the need for matching in comprehensions many times, and would love to have this feature. But first of all, the proposed ?= can be easily achieved with current language elements: {ok,Value} <- [Pair] is exactly what {ok, Value} ?= Pair would do. On the other hand you may need strict matching too (stupid example: [{Name, Phone} || #{first_name := FirstName, last_name := LastName} <:- Persons, Name = FirstName ++ " " ++ LastName, #{name := Name, phone := Phone} <- Phonebook]), but matching with = is already allowed with different semantics, and we cannot practically change it. Using a different operator for strict matching inside a comprehension sounds wrong. So maybe just use {ok, Value} <:- [Pair] for that? I don’t know.

josevalim · October 23, 2024, 5:44am

I agree. If it is done, it would probably have to be done over the course of several years. I’d still propose for erl_lint or similar to warn on this syntax today anyway, even if we don’t plan to use it, as it has no practical use. It only accepts booleans and, if it succeeds, you know it has returned true anyway.

Good point. You can always wrap in a list today if you want either strict or relaxed matching.

Maria-12648430 · October 23, 2024, 6:49am

I have seen (and ignored) this statement that filters must result in a boolean a few times lately. However, it is not true: Rather, it is the case that if and only if a filter results in true it succeeds, any other result means false:

> [ X || X <- [a, b, c], foo ].
[]

AFAIK, ~~this is not documented anywhere~~, and it is a strictness issue in itself

Maria-12648430 · October 23, 2024, 7:03am

And quite surprisingly, even this “works”:

> [ X || X <- [a, b, c], X + 1 ].
[]

X + 1, which in this case is “an atom plus 1”, should raise a badarith error.

However, raising that error explicitly does not “work”:

> [ X || X <- [a, b, c], error(badarith) ].
** exception error: an error occurred when evaluating an arithmetic expression

But if you, say, try to assign the result…:

> [ X || X <- [a, b, c], Foo = X + 1 ].
** exception error: an error occurred when evaluating an arithmetic expression
     in operator  +/2
        called as a + 1

It looks like as if there was some magic that turns filters into guards if possible? (@bjorng?)

Maria-12648430 · October 23, 2024, 7:32am

Oh boy… ok, so turns out filter results must indeed be booleans if the filters are not suitable to be expressed as guards…

> [X || X <- [1, 2, 3], timer:sleep(X)].
** exception error: bad filter ok

Did I just open another can of worms?