EEP 73: Zip generators

juhlig · October 18, 2024, 2:20pm

Doesn’t it contain one already? Not a section for themselves, but at least they are mentioned.

Maria-12648430 · October 18, 2024, 2:24pm

Sly

… and thanks for the praise

hanssv · October 18, 2024, 2:51pm

Too late to change opinion, quite obviously you are happy to bloat the language - and, then, seriously, I see no reason not to do so in a consistent manner?! If it is concluded that zip-functions working on lists of different lengths is a good idea (I personally think it is, good job there!), then not adding it here is missing an opportunity to make things symmetric! The best syntax suggestion so far is probably to use { ... } for the case where non-equal lenght:ed lists doesn’t crash (relaxed) and {: ... :} when they do crash (strict).

As of now, of course, there are fewer zip:s with different length:ed lists to replace since it is a newer thing. Possibly it is a less common use case also in the long run but only time will tell.

hanssv · October 18, 2024, 3:04pm

Yes, this part is a good improvement! In my eyes the pros doesn’t outweigh the cons but it is pretty close and I can see the arguments for including it.

Maria-12648430 · October 21, 2024, 9:43am

I can see the benfits of having zipping generators in comprehensions. More often than not, I encountered cases where I would wish for a “take X from A and take Y from B” instead of “for each X from A, take all Ys from B” behavior. Personal experience and the use cases especially @dgud presented are enough to convince me of the benefits that such an addition provides.

You’re making a good point there IMO. The behavior of current (pre-EEP 70) comprehensions/generators is to be relaxed, ie to silently skip if the patterns don’t match. The behavior of zip generators however, as outlined in the current state of EEP 73, is to be strict, ie to fail if the lengths of the lists (binaries, maps) don’t match up.

IMO it would make sense to have relaxed/strict options here too, with relaxed being the default (meaning, the less wordy) option.

This can’t be achieved with infix operators like && AFAICS, and therefore I like your suggestions of {...} (relaxed, skip surplus elements, akin to the trim option for lists:zip and friends) and {:...:} (strict, fail on surplus elements, akin to the fail option).

One of the reasons we had for proposing lists: enable zip functions to work on lists of different lengths by Maria-12648430 · Pull Request #6347 · erlang/otp · GitHub was to make it possible to zip up lists of different lengths without the need to trim (or pad) them to equal lengths beforehand. Without the option for relaxed zip generators, this requirement would arise again.

josevalim · October 22, 2024, 6:29am

One alternative is to use lists:zipwith/3 and also introduce a lists:zipreduce/4 (here is the Elixir version). The reduce version would allow you to do filtering as well.

I understand zipreduce/4 would be a bit more verbose than the comprehension version with filters (but faster) but it would probably take way fewer lines of code to optimize some comprehensions as lists:zipreduce/4 than it would take to support the whole zip generator machinery in the compiler?

Comprehensions are one of my favorite language features in general. One of my favorite papers are about comprehensions. I have been tempted about adding similar features to Elixir for a really long time but I truly think the need for zip generators are rare.

I would argue the goal of the current generators is not to combine but to be a flatmap, which is a feature that I use all the time. The combining aspect is simply a consequence of when one of the generators are not nested in another.

schnef · October 22, 2024, 1:36pm

Yes please, that would be great!

bjorng · October 23, 2024, 6:06am

I forgot to mention this kind of code (actual example from the compiler):

fix_exit_phi_args(Vs, Rms, Exit, Blocks) ->
    [{V,Pred} ||
        V <- Vs && Rm <- Rms,
        Pred <- exit_predecessors(beam_ssa:rpo([Rm], Blocks), Exit, Blocks)].

While you could write this example with lists:zipreduce/4, I don’t think you can eliminate the building of a temporary list.

Personally, I find that for most languages, learning the language syntax is the easy part. Learning enough of the libraries to be productive is the hard part.

I think that the lists module already has grown to the point that it can be difficult to navigate and find the function one needs. Further additions to the lists module need to have very good motivations.

Why would zipreduce/4 be faster than the comprehension?

The zip generator machinery is not that many extra lines of code, and it is located in a relatively simple and straightforward part of the compiler. I would not expect having to do much work to keep zip comprehensions working.

josevalim · October 23, 2024, 2:18pm

Would this work?

fix_exit_phi_args(Vs, Rms, Exit, Blocks) ->
    lists:zipreduce(Fun(V, RM, Acc) ->
      Vals = exit_predecessors(beam_ssa:rpo([Rm], Blocks), Exit, Blocks),
      lists:reduce(fun(H, T) -> [H | T] end, Acc, Vals)
    end, [], Vs, Rms).

You could also use ++ or lists:reverse/2. The above should build and traverse the same amount of lists as zip comprehensions, right? Or can the zip comprehension be smarter?

Sorry, I meant to say faster than today’s comprehensions (which must use lists:zip/2).

bjorng · October 24, 2024, 4:49am

Yes, with a few tweaks it will work, and it will not build an extra list:

fix_exit_phi_args(Vs, Rms, Exit, Blocks) ->
    lists:zipreduce(fun(V, Rm, Acc) ->
                      Preds = exit_predecessors(beam_ssa:rpo([Rm], Blocks), Exit, Blocks),
                      lists:foldl(fun(Pred, T) -> [{V,Pred} | T] end, Acc, Preds)
              end, [], Vs, Rms).

However, the resulting list is reversed compared to version the using the zip generators. In this particular case, this is fine as the order of the elements doesn’t matter.

By the way, if we were to add this function to the lists module, it should be called zipfoldl/4 to be consistent with the naming of the existing functions in the lists module, and there should also be a zipfoldr/4 function.

Personally, I find the version using zip generators easier to read.

Maria-12648430 · October 24, 2024, 6:59am

Not only that. To be fully consistent with what the existing zip functions offer, there would have to be zipfoldl3/5 and zipfoldr3/5 functions for zipping 3 lists, and also zipfoldl/5, zipfoldr/5, zipfoldl3/6 and zipfoldr3/6 functions for dealing with lists of non-equal lengths.

This would mean adding 8 new functions to the lists module, and most of them with an excessive number of arguments (4/5/6).

I fully agree

zipfolds are more flexible than zip generators (process left-to-right or right-to-left, build anything instead of just a list, etc), in fact they offer the highest degree of freedom short of going for explicit recursion. But I guess (I have been wrong before) that such use cases are rare.

Finally, IMO zipfold is a misnomer. It does not necessarily zip the input lists in the sense of producing a list where each element is a combination of the respective elements in the input lists. It processes the input lists in zip-style, but the result can be anything. Therefore, I would argue that foldl2/foldr2/foldl3/foldr3 would be better names to express “folding over 2 (or 3) lists” (note that I’m not saying that we should do this )

josevalim · October 24, 2024, 8:31am

Agreed as well. The main question is if there could be an alternative that has a smaller impact in the language/stdlib surface. Although at this point it is pretty clear that adding 8 different functions to the lists module is not one of them.

bjorng · November 6, 2024, 11:25am

Zip generators updated to support strict generators (as in EEP 70) have been approved by the OTP Technical Board.

This PR updates EEP 73 to describe how the strict generator option is handled in zip generators, and the PR with the implementation has been updated to support strict zip generators.