Proposal: adding ets:lookup_element_as_list/3

RMorisset · August 16, 2022, 9:44am

Hello,

I’d like to propose a small extension to the ets API, and would like some feedback before making the proposal official.

Motivation:

ets:lookup is pretty slow if the tuple to be copied is big, and only a small part of it is used.
ets:lookup_element is the recommended solution to this problem, unfortunately it throws an exception if there is no match for the key, which is significantly slower than the way that lookup returns nil.
While profiling some Erlang code, I found some cases where an ETS table has huge tuples (18 fields, some of which are themselves quite big), making lookup costly, but using lookup_element could cause performance regressions when the element is absent. In addition to being slower, having to wrap lookup_element with a try/catch also makes it a bit less ergonomic.

Proposal

Add a new function ets:lookup_element_as_list/3 to the standard library. It shares most of its implementation with ets:lookup_element/3, and takes the same arguments, but has the following returns:

If there is no match for the key, return [] like ets:lookup (instead of throwing an exception like ets:lookup_element)
If there is a single match for the key, return the corresponding element wrapped in a singleton list: [V]
If the table is a bag/duplicate_bag, and there is at least one match, returns them in a list

Shown with code

For a set/ordered_set table, the following are identical:

foo(Table, Key) ->
  try ets:lookup_element(Table, Key, 2) of
    Value -> [Value]
  catch
    _:badarg -> [].

And

foo(Table, Key) -> ets:lookup_element_as_list(Table, Key, 2).

Benchmarks

I have implemented this (it is rather short, about 100LoC, half of which are docs and tests), and benchmarked it on some micro-benchmark on my M1 laptop. I tested both the case where the key is present and when it is absent.
When it is present, lookup_element_as_list is roughly as fast as lookup_element, which is about 50% faster than lookup.
When it is absent, lookup_element_as_list is roughly as fast as lookup, which is about 50% faster than lookup_element.

The microbenchmark is obviously not representative of a real application, but I consider it evidence that there are situations where neither lookup nor lookup_element are ideal from a performance point of view.

Questions

I have a few questions for the community:

Do you agree that this is worth adding to ETS?
Do you have any suggestions of changes either to the name or the semantics?
Should I post this as a formal EEP, or is it sufficiently limited that I can just send a pull-request on github?

NelsonVides · August 16, 2022, 10:13am

TL;DR: Yes! I’m all in for this!

I’d definitely use this one, I’ve encounter myself having to do so many try-catches for this and wishing there was a proper way to do it from the standard API.
I’m only not sure about the naming, lookup_element_as_list is a very long function name, but ergonomics in syntax is nothing more than makeup to me, what to me really matters is ergonomics in semantics, and performance.
EEPs I don’t know, let’s see what the OTP people think about this one

juhlig · August 17, 2022, 6:28pm

You can try a PR, if an EEP is required the OTP team will tell you, and then you can use your PR as reference implementation.

LeonardB · August 17, 2022, 7:20pm

This would be extremely helpful.

I do have a suggestion though. Why not rather add a 4-arity for ets:lookup_element accepts a default which is returned when the record is not found.

Solves the issue of having to come up with a new function name

josevalim · August 17, 2022, 9:31pm

There is also prior art on maps:get/2 vs maps:get/3.

RMorisset · August 18, 2022, 1:44pm

That is a great suggestion, thank you.

I’ll replace my current implementation of lookup_element_as_list/3 by lookup_element/4, then submit the PR directly on Github.

juhlig · August 18, 2022, 5:10pm

Hm, consider that there is an ambiguity, namely that either the key may not exist, or that the key exists but that the found tuple may be too short.

LeonardB · August 18, 2022, 5:40pm

I would expect a badarg exit in that case since the provided index is bad in the context of the expected record/tuple?

edit:
This is how lookup_element/3 behaves if the provided index is invalid as well

1> ets:new(test,[public,named_table]).
test
2> ets:insert(test, {1,2}).
true
3> ets:lookup_element(test, 1, 3).
** exception error: bad argument
     in function  ets:lookup_element/3
        called as ets:lookup_element(test,1,3)
4> ets:new(test,[public,named_table]).
test
5> ets:insert(test, {1,2}).           
true
6> ets:lookup_element(test, 1, 2).    
2

edit 2:
So in the implementation, the default would only be returned in a case where desired record/tuple does not exist, while a ‘bad offset’ for the tuple would behave exactly like lookup_element/3

juhlig · August 18, 2022, 6:15pm

That again would take away half of the convenience you get out of the whole enterprise… And what about bag/duplicate_bag?

(Sorry, I can’t check the behavior of lookup_element/3 myself since I’m on vacation really )

LeonardB · August 18, 2022, 7:00pm

since you’re on vacay

ets:lookup_element(Table, Key, Pos)

badarg for all table types with Pos being > tuple size

Eshell V11.2.2.13  (abort with ^G)
1> ets:new(test,[public,named_table,bag]).
test
2> ets:insert(test, {1,2}).
true
3> ets:insert(test, {1,3}).
true
4> ets:tab2list(test).
[{1,2},{1,3}]
5> ets:lookup_element(test,1,2).
[2,3]
6> ets:lookup_element(test,1,3).
** exception error: bad argument
     in function  ets:lookup_element/3
        called as ets:lookup_element(test,1,3)
7> ets:new(test,[public,named_table,duplicate_bag]).
test
8> ets:insert(test, {1,2}).                         
true
9> ets:insert(test, {1,2}).
true
10> ets:tab2list(test).                              
[{1,2},{1,2}]
11> ets:lookup_element(test,1,2).                    
[2,2]
12> ets:lookup_element(test,1,3).                    
** exception error: bad argument
     in function  ets:lookup_element/3
        called as ets:lookup_element(test,1,3)

juhlig · August 19, 2022, 7:38pm

Thanks But I see that I was sloppy there… Can you try that with two objects with the same key, one with sufficient number of elements, one with too few elements, and see what happens? Is there an exception, or does the element from the sufficient tuple get returned?

LeonardB · August 19, 2022, 7:56pm

Friday and I’m bored…

Badarg if the Pos is out of range for any matching key (bag/duplicate_bag)

19> ets:new(test,[public,named_table,duplicate_bag]).
test
20> ets:insert(test, {1,2}).                         
true
21> ets:insert(test, {1,3,4}).
true
22> ets:tab2list(test).
[{1,2},{1,3,4}]
23> ets:lookup_element(test,1,2).
[2,3]
24> ets:lookup_element(test,1,3).
** exception error: bad argument
     in function  ets:lookup_element/3
        called as ets:lookup_element(test,1,3)
25> ets:new(test,[public,named_table,bag]).
test
26> ets:insert(test, {1,2}).
true
27> ets:insert(test, {1,3,4}).
true
28> ets:lookup_element(test,1,2).          
[2,3]
29> ets:lookup_element(test,1,3).
** exception error: bad argument
     in function  ets:lookup_element/3
        called as ets:lookup_element(test,1,3)

juhlig · August 20, 2022, 10:21am

Glad I can help

Then maybe this should be considered/discussed in the PR for the new function, while we @RMorisset is at it After all, we are trying to make things better over all.

Led · August 20, 2022, 3:09pm

What about ets:select/2? Or is it too slow for you?

michalmuskala · August 22, 2022, 8:16am

ets:select/2 is significantly slower than any version of lookup or lookup_element. It’s usually cheaper to do several lookup calls than one select call.

Led · August 22, 2022, 9:19am

significantly slower

About twice as slow.
That is 200000 vs 100000 operation per second. Does this really matter?

RMorisset · August 22, 2022, 9:46am

Unfortunately, it does matter quite a bit for us, ets:lookup is already the most expensive thing on some of our performance profiles.
Improving this was the original reason for my proposed change, the ergonomics improvement is just a cherry on top.

juhlig · August 22, 2022, 12:07pm

Btw @RMorisset, when you have made that PR, be sure to post a link here so all of us interested can have a look

RMorisset · August 22, 2022, 5:36pm

Here is the PR that implements ets:lookup_element/4 : Add lookup_element/4, which takes a default value by RobinMorisset · Pull Request #6234 · erlang/otp · GitHub

Maria-12648430 · August 25, 2022, 2:40pm

I like that this is being addressed, but I’m torn on whether lookup_element with a default value is a good way to do it. Actually, I liked the first suggestion (Proposal in the opening post) better. Well, except for the name, that is But returning a (possibly empty) list of lookup results seems to me preferable.

There exists an ambiguity in lookup_element/3, given that the key actually exists, which is related to the different table types. For the set types it returns the raw value that was found, for the bag types it returns a list of the values that were found. So, without always paying attention the table type, it is unclear if the value returned by the call was a list from a single object in a set table or a list of elements from multiple objects in a bagtable.

Example:

1> ets:new(t1, [named_table, set]).
t1
2> ets:insert(t1, {a, [z, y, x]}).
true
3> ets:lookup_element(t1, a, 2).
[z,y,x]

… and …

4> ets:new(t2, [named_table, bag]).
t2
5> ets:insert(t2, [{a, x}, {a, y}, {a, z}]).
true
6> ets:lookup_element(t2, a, 2).
[z,y,x]

… will return equal results (with the elements in undefined order in the latter case, yes).

With the original approach, the first snippet would return [[z,y,x]] instead, making it clear that the element returned stems from a single object, not from multiple ones.

lookup_element/4 may even add to the confusion:

7> ets:new(t3, [named_table, bag]).
t3
8> ets:lookup_element(t3, a, 2, [z, y, x]).
[z,y,x]
9> ets:insert(t2, [{a, x}, {a, y}, {a, z}]).
true
10> ets:lookup_element(t3, a, 2, [z, y, x]).
[z,y,x]

To be sure, I like the proposal itself. But alluding to what @juhlig said, we should try to make it good, not another function with slightly different peculiarities.