# Uses cases? for proposed new lists functions

There are proposals for a number of new functions in the `lists` module in `stdlib`.

For adding new functions in lists and other stdlib modules a number of criteria must be fulfilled:

• There must be convincing and common enough uses case where use of the new function will make a difference i terms of better, shorter, nicer easier to understand code.
• The function as such is known from other languages libraries and kind of expected in a standard library.
• The functions fits in the suggested module and follows the same principles as other functions in that module.

This is a request focusing on use cases for the proposed functions. So please send us
use cases with examples and motivation.

For example snippets of existing code and how it can be rewritten in a nicer way with the use of one of the proposed functions.

Other Pros and Cons for the functions are also welcome.

The proposed new functions I like use cases for are:

lists:groupby

Splits the list into groups using a function as discriminator.
Example:

``````1> lists:groupby(fun(X) -> X rem 2 end, [1,2,3]).
#{0 => [2], 1 => [1, 3]}
``````

lists:foldlwhile

Like foldl but stops when a certain criteria is met.

lists:uniq

The `uniq` function should takes a list and returns the unique elements from that list preserving the order. There is already the usort function which returns unique elements in sorted order.

``````1> L = [x,y,z,b,a,a,c].
2> lists:uniq(L).
[x,y,z,b,a,c]
3> list:usort(L).
[a,b,c,x,y,z]
``````

lists:transpose

Transposes a list of lists into one list of lists where the first element of each resulting list is taken from the first list and the rest of elements taken from the first element of the following lists.

``````1> lists:transpose([[1,2,3], [4,5,6]]).
[[1,4],[2,5],[3,6]]
``````
10 Likes

In order to find use cases for the proposed lists:uniq/1 function, I grepped for all uses of Enum.uniq/1 in my Elixir solution for Advent of Code:

``````for i in {2015..2021}; do (cd advent-of-code-\$i && git grep Enum.uniq); done | wc
19      64     769
``````

I then looked at all 19 occurrences and tested whether calling `lists:usort/1` would work just as well as `Enum.uniq/1`. I found that in the solutions for two of the days `lists:usort/1` would not work.

The first one was in the solution for day 11 of 2016:

``````    mapping = 1..4
|> Enum.flat_map(fn floor ->
{gen, chip} = Map.fetch!(state, floor)
bit_numbers(gen ||| chip, 0, [])
end)
|> Enum.uniq
|> Enum.with_index
|> Map.new
``````

Here the order was critical. But it turns out that I didnāt actual need to call `Enum.uniq`. When I removed it, I still got the correct solution for the problem.

The other one was in the solution for day 4 of 2017:

``````  defp is_valid(words) do
words === Enum.uniq(words)
end
``````

Here, again, the order was critical. But if I there were no `Enum.uniq`, I could easily make it work using `lists:usort/1`:

``````  defp is_valid(words) do
length(words) === length(:lists.usort(words))
end
``````
9 Likes

I use both `group_by` and `uniq` extensively. For `group_by`, here are three examples:

For `uniq`, I prefer it by default instead of `usort` because I donāt need to think if it is safe to reorder a collection or not. To me, `uniq` is the ādefaultā and `usort` should be the special case.

For `foldwhile`, I often prefer to write the recursion by hand and abort when necessary, so I am not sold. Especially because the cont/halt style is not used anywhere in the lists module.

`transpose` I used once during Advent of Code and I will likely never use it again.

9 Likes

Iām pretty sure there are also quite a few places in the Erlang compiler itself, where `group_by` could be useful that today use some combination of calls to the `sofs` module.

I agree with @josevalim as to the āuniqā being a base operation and āusortā as a more complex one - thereās conceptually one operation vs two operations going on (removing duplicates vs sorting and removing duplicates). Itās probably easier to reason about code that only removes duplicates if removing duplicates is what is important to the algorithm.

Finally, Iād like to advocate to use the name āuniqueā rather than just āuniqā - I donāt think weāre that size constrained on the function names

6 Likes

groupby

It sounds useful to some degree, but it can be done almost as easily with a fold.

Leaving it at a fold and not adding `groupby` comes with the benefit that we donāt have to bother about what the best output format is, too. Maps are probably the most natural and useful, but contradicts the style of the lists module by returning something that is not a list. If we return a list of tuples (converted from a map, which is probably what is internally used by the function), people will likely convert it to a map in the next step.

foldwhile

Here I agree 100% with @josevalim

uniq

The point @josevalim makes about not having to worry about if order is important or not is IMO valid, but Iāll point out that there is an ambiguity here, ie that for a list like `[a, b, a]`, both `[a, b]` and `[b, a]` are valid results in that they are lists of unique elements which are in the original order. The statements "`a` came before `b`" and "`b` came before `a`" are both true.

transpose

I would have needed that once for an exercise on Exercism, I will probably never need it again in real life

7 Likes

Actually there are several functions in the lists module that do not return lists: all, any, unzip, keyfind, keymember, fold, min, max and likely more!

4 Likes

Yes, it seems that even a huge `sofs` fan such as myself will have to admit that `groupby` could be a better choice than `sofs` in some places in the compiler.

3 Likes

Ok, I was sloppy with my statement and over-generalized, so let me clarify

Obviously, there are functions in `lists` that canāt return lists (`any`, `all`, `sum`, ā¦), and others that may return just about anything, including but not restricted to lists (`min`, `max`, `fold`, ā¦) However, where the return value contains several items from or based on the items argument list, the return value is a list (`map`, `filter`, `sort`, ā¦) or a structured term containing such lists (`partition`, `split`, `unzip`, ā¦). It is this last category of functions that I meant when I said āthe style of the `lists` moduleā. Satisfied?

That said, I have to admit that the existing functions in this category canāt sensibly return any other structure than a list or including such lists, ie it is circumstantial. `groupby` is the first such function where there is a choice and where a decision has to be made, which may set a precedent: either consistency (down to the finest detail) over general utility (an approach that will often get in the way of doing things), or general utility over negligible consistency details (which may raise the question what and what not constitutes a negligible detail, for every new function to be added where there is a choice).

Finally, let me say that if `groupby` should be accepted, I would like it to return a map

3 Likes

Surely itās about what the function operates on, not what it outputs.

3 Likes

`foldlwhile` would be somewhat useful, but what @josevalim said I donāt think we need shortcuts for just about anything.

`groupby`, for me at least, is in the same line. It can be done with a fold almost as easy but with more flexibility. A bit more verbose, yes, but again, IMO there is no need to shortcut everything.

`transpose`, I never needed that and canāt imagine that I ever will.

`uniq` however is interesting (but I agree with @michalmuskala that we should use an unabbreviated name).

The ambiguity @juhlig pointed out exists, but that can be documented like āthe first is kept, subsequent duplicates will be removedā. There are many functions in the `lists` module doing that already, eg. `delete/2`. It is what people probably expect to happen, anyway.

Anyway, to make this function really useful, there should be a `uniq/2` variant, accepting an unary function that maps an element to some other term. This way, something like deduplicating a list of key-value tuples would be possible, like this:

``````lists:uniq(
fun ({Key, _Value}) -> Key end,
[{a, 1}, {b, 2}, {a, 3}]
)
``````

ā¦ which would result in `[{a, 1}, {b, 2}]`: only the first `a`-tuple is kept, even though the second `a`-tuple is a different term, ie not identical to the first one.

Equally, it would be possible to not deduplicate certain elements even if they are identical, like this:

``````lists:uniq(
fun
(keep_me) -> make_ref();
(Other) -> Other
end,
[a, keep_me, b, keep_me, a]
)
``````

ā¦ which would result in `[a, keep_me, b, keep_me]`: the duplicate `a` at the end is removed, but the duplicate `keep_me`s are still there.

4 Likes

I agree. Iām only elaborating on something that was brought up in the discussion following the respective PR.

(But I think the function would be decidedly misplaced in the `maps` module, as was suggested in the linked comment).

5 Likes

I agree. Iām only elaborating on something that was brought up in the discussion following the respective PR.

In the `lists` module there are functions that operate on lists. I.e they
take lists as input and return lists. Some return several lists in a
tuple, but the return values are several lists and the tuple is just
a container.

In other modules such as `maps` there are functions that convert to
and from lists, since the list is such a useful data type for
sequential processing.

When converting a list to a map there is an obvious interpretation and
that is [{Key, Value}] where duplicate keys e.g override previous.

Another possible interpretation would be to have values with duplicate
keys grouped by the keys and store all values in a list for the key.

Therefore I do not think such a function is necessarily misplaced
in the maps module. It would simply be a different useful way
to create a map from a list of 2-tuples.

(But I think the function would be decidedly misplaced in the `maps` module, as was suggested in the linked comment).

Cheers

5 Likes

Hm, you do have a point, but I believe it is a merely academic oneā¦ I like to do that myself a lot, so this is the pot calling the kettle black (or whatever the proverb is )

Anyway. The `maps` module wouldnāt be the first place that I (or @cmo I guess) would instinctively go to look for such a function. Rather, I would go searching in the module that best fits what is operated on: `lists`.

From a slightly different point of view, there is only 1 possible kind of input, a list, but 2 possible kinds of output, a tuple list or a map. To accomodate for this and be really very consistent, we would need `maps:groupby` (returning a map), and `lists:groupby` (returning a tuple list).

In that light, I really think that the `lists` module is the most fitting place, no matter what it returns.

6 Likes

itās not a simple shortcut: itās a way to avoid reinventing the wheel over and over

6 Likes

Ok, granted, bad expression on my behalf Nevertheless, what I think this thread is about is deciding whether this specific type of wheel (`foldlwhile` for instance) is needed often enough that having it in stock (ie as a function) is justified or not. Myself, I would probably use it occasionally if it was accepted, but I also wouldnāt mind going by explicit recursion if it didnāt. So call me impartial on that part

3 Likes