Uses cases? for proposed new lists functions

kennethL · February 23, 2022, 8:39am

There are proposals for a number of new functions in the lists module in stdlib.

For adding new functions in lists and other stdlib modules a number of criteria must be fulfilled:

There must be convincing and common enough uses case where use of the new function will make a difference i terms of better, shorter, nicer easier to understand code.
The function as such is known from other languages libraries and kind of expected in a standard library.
The functions fits in the suggested module and follows the same principles as other functions in that module.

This is a request focusing on use cases for the proposed functions. So please send us
use cases with examples and motivation.

For example snippets of existing code and how it can be rewritten in a nicer way with the use of one of the proposed functions.

Other Pros and Cons for the functions are also welcome.

The proposed new functions I like use cases for are:

lists:groupby

Splits the list into groups using a function as discriminator.
Example:

1> lists:groupby(fun(X) -> X rem 2 end, [1,2,3]).
#{0 => [2], 1 => [1, 3]}

lists:foldlwhile

Like foldl but stops when a certain criteria is met.

lists:uniq

github.com/erlang/otp

lists:uniq

opened 09:46AM - 14 Jan 22 UTC

meox

team:VM enhancement

Seems that an important function is missing in the `lists` module: `unique` The… `uniq` function should takes a list and returns the unique elements from that list preserving the order. Possible implementation: ```erlang uniq([]) -> []; uniq(List) when is_list(List) -> uniq(List, []). uniq([], Acc) -> lists:reverse(Acc); uniq([H | Tail], Acc) -> case lists:member(H, Acc) of true -> uniq(Tail, Acc); false -> uniq(Tail, [H | Acc]) end. ```

The uniq function should takes a list and returns the unique elements from that list preserving the order. There is already the usort function which returns unique elements in sorted order.

1> L = [x,y,z,b,a,a,c].
2> lists:uniq(L).
[x,y,z,b,a,c]
3> list:usort(L).
[a,b,c,x,y,z]

lists:transpose

github.com/erlang/otp

Add lists:transpose/1, useful for list of lists transposition

erlang:master ← manuel-rubio:lists-zip-1

opened 05:34AM - 06 Dec 21 UTC

manuel-rubio

+77 -1

When I was developing the Advent of Code 2021, using Elixir and Erlang, I just r…ealized the rotating of a matrix like this: ```erlang L = [ [1, 2, 3], [4, 5, 6] ], ``` Could be rotated with `lists:zip/2` if we pass the lists as: ```erlang L1 = [1, 2, 3], L2 = [4, 5, 6], lists:zip(L1, L2). % [{1, 4}, {2, 5}, {3, 6}] ``` But, what if we have the need to rotate a matrix of arbitrary size? Then I've implemented `lists:zip/1` which is accepting the `L` parameter as a list of lists (matrix) and it's returning the rotated matrix: ```erlang L = [ [1, 2, 3], [4, 5, 6] ], lists:zip(L). % [[1, 4], [2, 5], [3, 6]] ``` And of course, we could use not only a matrix of 3x2 but also 9x7 or other completely different sizes.

Transposes a list of lists into one list of lists where the first element of each resulting list is taken from the first list and the rest of elements taken from the first element of the following lists.

1> lists:transpose([[1,2,3], [4,5,6]]).
[[1,4],[2,5],[3,6]]

bjorng · February 23, 2022, 3:06pm

In order to find use cases for the proposed lists:uniq/1 function, I grepped for all uses of Enum.uniq/1 in my Elixir solution for Advent of Code:

for i in {2015..2021}; do (cd advent-of-code-$i && git grep Enum.uniq); done | wc
      19      64     769

I then looked at all 19 occurrences and tested whether calling lists:usort/1 would work just as well as Enum.uniq/1. I found that in the solutions for two of the days lists:usort/1 would not work.

The first one was in the solution for day 11 of 2016:

    mapping = 1..4
    |> Enum.flat_map(fn floor ->
      {gen, chip} = Map.fetch!(state, floor)
      bit_numbers(gen ||| chip, 0, [])
    end)
    |> Enum.uniq
    |> Enum.with_index
    |> Map.new

Here the order was critical. But it turns out that I didn’t actual need to call Enum.uniq. When I removed it, I still got the correct solution for the problem.

The other one was in the solution for day 4 of 2017:

  defp is_valid(words) do
    words === Enum.uniq(words)
  end

Here, again, the order was critical. But if I there were no Enum.uniq, I could easily make it work using lists:usort/1:

  defp is_valid(words) do
    length(words) === length(:lists.usort(words))
  end

josevalim · February 23, 2022, 11:03pm

I use both group_by and uniq extensively. For group_by, here are three examples:

Group files based on a given path prefix: gettext/compile.gettext.ex at becf0585b12762bddb6dd04a9f0a307c8768fa1a · elixir-gettext/gettext · GitHub
Group modules based on a prefix: gettext/compiler.ex at ac112ee18cedb30e0a8429add4d158805e2f9ed3 · elixir-gettext/gettext · GitHub
Group warnings in Elixir by file: elixir/elixir.ex at 8d06ee819acec42f2a576077e2c9fdc2a92c52bb · elixir-lang/elixir · GitHub

For uniq, I prefer it by default instead of usort because I don’t need to think if it is safe to reorder a collection or not. To me, uniq is the “default” and usort should be the special case.

For foldwhile, I often prefer to write the recursion by hand and abort when necessary, so I am not sold. Especially because the cont/halt style is not used anywhere in the lists module.

transpose I used once during Advent of Code and I will likely never use it again.

michalmuskala · February 24, 2022, 12:24pm

I’m pretty sure there are also quite a few places in the Erlang compiler itself, where group_by could be useful that today use some combination of calls to the sofs module.

I agree with @josevalim as to the “uniq” being a base operation and “usort” as a more complex one - there’s conceptually one operation vs two operations going on (removing duplicates vs sorting and removing duplicates). It’s probably easier to reason about code that only removes duplicates if removing duplicates is what is important to the algorithm.

Finally, I’d like to advocate to use the name “unique” rather than just “uniq” - I don’t think we’re that size constrained on the function names

juhlig · February 24, 2022, 4:21pm

groupby

It sounds useful to some degree, but it can be done almost as easily with a fold.

Leaving it at a fold and not adding groupby comes with the benefit that we don’t have to bother about what the best output format is, too. Maps are probably the most natural and useful, but contradicts the style of the lists module by returning something that is not a list. If we return a list of tuples (converted from a map, which is probably what is internally used by the function), people will likely convert it to a map in the next step.

foldwhile

Here I agree 100% with @josevalim

uniq

The point @josevalim makes about not having to worry about if order is important or not is IMO valid, but I’ll point out that there is an ambiguity here, ie that for a list like [a, b, a], both [a, b] and [b, a] are valid results in that they are lists of unique elements which are in the original order. The statements "a came before b" and "b came before a" are both true.

transpose

I would have needed that once for an exercise on Exercism, I will probably never need it again in real life

josevalim · February 24, 2022, 10:38pm

Actually there are several functions in the lists module that do not return lists: all, any, unzip, keyfind, keymember, fold, min, max and likely more!

bjorng · February 25, 2022, 3:30am

Yes, it seems that even a huge sofs fan such as myself will have to admit that groupby could be a better choice than sofs in some places in the compiler.

juhlig · February 25, 2022, 10:49am

Ok, I was sloppy with my statement and over-generalized, so let me clarify

Obviously, there are functions in lists that can’t return lists (any, all, sum, …), and others that may return just about anything, including but not restricted to lists (min, max, fold, …) However, where the return value contains several items from or based on the items argument list, the return value is a list (map, filter, sort, …) or a structured term containing such lists (partition, split, unzip, …). It is this last category of functions that I meant when I said “the style of the lists module”. Satisfied?

That said, I have to admit that the existing functions in this category can’t sensibly return any other structure than a list or including such lists, ie it is circumstantial. groupby is the first such function where there is a choice and where a decision has to be made, which may set a precedent: either consistency (down to the finest detail) over general utility (an approach that will often get in the way of doing things), or general utility over negligible consistency details (which may raise the question what and what not constitutes a negligible detail, for every new function to be added where there is a choice).

Finally, let me say that if groupby should be accepted, I would like it to return a map

cmo · February 25, 2022, 12:17pm

Surely it’s about what the function operates on, not what it outputs.

Maria-12648430 · February 25, 2022, 12:19pm

foldlwhile would be somewhat useful, but what @josevalim said I don’t think we need shortcuts for just about anything.

groupby, for me at least, is in the same line. It can be done with a fold almost as easy but with more flexibility. A bit more verbose, yes, but again, IMO there is no need to shortcut everything.

transpose, I never needed that and can’t imagine that I ever will.

uniq however is interesting (but I agree with @michalmuskala that we should use an unabbreviated name).

The ambiguity @juhlig pointed out exists, but that can be documented like “the first is kept, subsequent duplicates will be removed”. There are many functions in the lists module doing that already, eg. delete/2. It is what people probably expect to happen, anyway.

Anyway, to make this function really useful, there should be a uniq/2 variant, accepting an unary function that maps an element to some other term. This way, something like deduplicating a list of key-value tuples would be possible, like this:

lists:uniq(
    fun ({Key, _Value}) -> Key end,
    [{a, 1}, {b, 2}, {a, 3}]
)

… which would result in [{a, 1}, {b, 2}]: only the first a-tuple is kept, even though the second a-tuple is a different term, ie not identical to the first one.

Equally, it would be possible to not deduplicate certain elements even if they are identical, like this:

lists:uniq(
    fun
        (keep_me) -> make_ref();
        (Other) -> Other
    end,
    [a, keep_me, b, keep_me, a]
)

… which would result in [a, keep_me, b, keep_me]: the duplicate a at the end is removed, but the duplicate keep_mes are still there.

juhlig · February 25, 2022, 12:23pm

I agree. I’m only elaborating on something that was brought up in the discussion following the respective PR.

(But I think the function would be decidedly misplaced in the maps module, as was suggested in the linked comment).

raimo · February 25, 2022, 2:41pm

cmo:

Surely it’s about what the function operates on, not what it outputs.

I agree. I’m only elaborating on something that was brought up in the discussion following the respective PR.

In the lists module there are functions that operate on lists. I.e they
take lists as input and return lists. Some return several lists in a
tuple, but the return values are several lists and the tuple is just
a container.

In other modules such as maps there are functions that convert to
and from lists, since the list is such a useful data type for
sequential processing.

When converting a list to a map there is an obvious interpretation and
that is [{Key, Value}] where duplicate keys e.g override previous.

Another possible interpretation would be to have values with duplicate
keys grouped by the keys and store all values in a list for the key.

Therefore I do not think such a function is necessarily misplaced
in the maps module. It would simply be a different useful way
to create a map from a list of 2-tuples.

(But I think the function would be decidedly misplaced in the maps module, as was suggested in the linked comment).

Cheers

juhlig · February 25, 2022, 3:32pm

Hm, you do have a point, but I believe it is a merely academic one… I like to do that myself a lot, so this is the pot calling the kettle black (or whatever the proverb is )

Anyway. The maps module wouldn’t be the first place that I (or @cmo I guess) would instinctively go to look for such a function. Rather, I would go searching in the module that best fits what is operated on: lists.

From a slightly different point of view, there is only 1 possible kind of input, a list, but 2 possible kinds of output, a tuple list or a map. To accomodate for this and be really very consistent, we would need maps:groupby (returning a map), and lists:groupby (returning a tuple list).

In that light, I really think that the lists module is the most fitting place, no matter what it returns.

meox · February 27, 2022, 5:54pm

it’s not a simple shortcut: it’s a way to avoid reinventing the wheel over and over

Maria-12648430 · February 28, 2022, 11:38am

Ok, granted, bad expression on my behalf Nevertheless, what I think this thread is about is deciding whether this specific type of wheel (foldlwhile for instance) is needed often enough that having it in stock (ie as a function) is justified or not. Myself, I would probably use it occasionally if it was accepted, but I also wouldn’t mind going by explicit recursion if it didn’t. So call me impartial on that part