Every now and then, I like to reverse engineer the wheel. To better understand wheels you know, and just to see if I can implement something exhibiting the same behavior (scrap the syntactic sugar, though), like “Assume X didn’t exist in Erlang: Given this set of features/characteristics, how would X be implemented?”
By doing this, I often gain sometimes surprising insights and realize things along the way which, while obvious if you think about them, you are seldom consciously aware of.
This time, the wheel I busied myself with was the popular List Comprehension.
What are the obvious features and characteristics of list comprehensions?
From some experience, reading the docs, and trying things out, the following are some of the most prominent characteristics/features of list comprehensions, as seen from the outside.
There is an expression (the part before the ||
) which results in the value to be put in the result list. It can make use of the values produced by the generators. Nothing much to it.
(After the ||
) there are pairs of patterns and generators, and filters, in arbitrary order with the restriction that generators that depend on the output of other generators must come after the generators they depend on, and likewise filters using values generated by a generator must come after that generator.
All combinations of values the generators produce and pass the filters are handed to the expression part.
Can something like list comprehensions even be implemented in plain Erlang?
Short answer
Yes. And no. Or maybe. Depends
Long answer
Of course it is possible to rewrite any specific list comprehension as a combination of filter
s, map
s fold
s or explicit recursions, take your pick.
Getting something dynamic requires some macro magic, however, since there are patterns involved, which you can’t pass around in variables. See here for my reference implementation (demo usage at the end of the file).
Be warned, though: it is ugly as hell, and throws shadow and unreachable clause warnings all over the place (or so it would if I didn’t suppress them). However, it shows the same features and behavior as real list comprehensions.* I wasn’t aiming for prettiness, anyway.
In any case, after seeing this, you will be very thankful to the person who gave us the nice-looking list comprehensions (if you are that person, please raise your hand ) that we know, and cherish them forever.
- In my implementation, there can only be one filter expression, if you need to check several things you have to
andalso
them together. In fact, there could be more filter expressions, but I couldn't figure out how to use literal commas in macro arguments. - In a regular list comprehension, there may be filters even before the first generator. I didn't bother with that, though.
Conclusions
Pointless as the whole enterprise may seem, it gave me some valuable insights along the way.
Generator order matters
Up to today, I have attached no importance to the order in which I put the generators. Generally, I ordered them in the same order as their results appeared in the expression. But consider this:
[ {X, Y} || X <- lists:seq(1, 1000), Y <- lists:seq(1, 10) ].
[ {X, Y} || Y <- lists:seq(1, 10), X <- lists:seq(1, 1000) ].
Xs = lists:seq(1, 1000), [ {X, Y} || Y <- lists:seq(1, 10), X <- Xs ].
All three of the above snippets produce the exactly same output. However, the first snippet calls lists:seq
1,001 times (once for X
, once for Y
for each of the 1,000 X
es); the second snippet calls lists:seq
11 times (once for Y
, once for X
for each of the 10 Y
s); the third snippet calls lists:seq
2 times (once for Xs
, once for Y
).
Now, lists:seq
is very fast, you won’t notice a difference no matter which of the above snippets you use. But imagine you use a function that takes a second to return instead, and the difference will be very obvious.
Filter order matters
Like with generators, I have never attached any importance to the order in which I put the filters. By and large, I put all the generators at the beginning, and all the filters at the end. But consider this:
[ {X, Y} || Y <- lists:seq(1, 10), X <- lists:seq(1, 1000), Y =< 5 ].
[ {X, Y} || Y <- lists:seq(1, 10), Y =< 5, X <- lists:seq(1, 1000) ].
Both of the above snippets produce the exactly same output. However, the first snippet does 10,000 comparisons of Y =< 5
(once for each X
for each Y
); the second snippet does 10 comparisons (once for each Y
).
Here again, =<
is of course as fast as can be, but imagine a filter that takes a second to run, and the difference is noticeable.
Additionally, the first snippet calls lists:seq
11 times (once for Y
, once for X
for each Y
); the second snippet calls lists:seq
6 times (once for Y
, once for X
for each Y
that passes through the filter).
Side effects
As a rule of thumb, generators and filters shouldn’t produce side effects, instead the expression part (left of the ||
) should do the side effects, if you need them. OTOH, there is nothing to prevent you from having side-effectish generators and filters.
Extending from the above two paragraphs, order matters again. The order and number of side effects happening depends on the order of the respective generators and filters.