ETS: lookup multiple elements at once

zabrane · January 3, 2023, 10:11pm

Hey guys & Happy New Year.

I’m wondering why we can store multiple items at once with ets:insert(Table, ObjectOrObjects) but we can’t look them up with a single call? Something like ets:lookup(Table, KeyOrKeys)

My ETS table if of type set and I’m searching for multiple keys, always.
What’s the most efficient way to perform a bulk search?

Thanks
/Z

dominic · January 4, 2023, 7:32am

IMO, lookup a hash dict or tree is very fast and lookup multiple items one by one should be fast enough.
But ets has lock compete, If you have a performance bottlenecks about this?

aramallo · January 4, 2023, 12:03pm

You have ets:select have you tried that?

zabrane · January 4, 2023, 12:36pm

@aramallo Yes and it’s slow (full scan). I know that all searched keys are in my ETS table.

zabrane · January 4, 2023, 12:41pm

@dominic kind of perf issue. Currently, I’m doing this which doesn’t look great IMHO:

[ ets:lookup_element(Tab, Key, 2) || Key <- ListOfKeys ],

My table contains ~15Mil entries where:

key: UUID-v4 as binary()
value: Blob binary() up to 64-bytes

aramallo · January 4, 2023, 1:00pm

I see. I think you will get much better results using an ordered_set table and either doing:

a select (possibly ordering the Match conditions based on Key :: uuid() ), or
traversing the table yourself using next/2

aramallo · January 4, 2023, 1:17pm

BTW, I am assuming you want share data and hence the reason you are using ets. Otherwise I would agree with @dominic and would explore other in-process options e.g. maps, gb_trees or if your list is stable (doesn’t change often) then you could even go for a solution that leverages the binary heap e.g. GitHub - krestenkrab/vbisect: variable-sized binary ordered dictionary

sverker · January 4, 2023, 4:43pm

A select with a match spec like this will not scan the entire table.

ets:select(T, [{{K,'$1'}, [], ['$1']} || K <- ListOfKeys]).

It will analyze the match spec and see that all keys are fully bound and then do a lookup for each key.

Don’t know how fast it will be compared to a loop over ets:lookup_element though.
And you will not be guaranteed a consistent snapshot, if that is what you are looking for.

zabrane · January 4, 2023, 5:19pm

@sverker thanks for the tip. Quick test:

1. [ ets:lookup_element(Tab, Key, 2) || Key <- ListOfKeys ]: 92μs
2. ets:select(T, [{{K,'$1'}, [], ['$1']} || K <- ListOfKeys]): 337μs

zabrane · January 4, 2023, 10:34pm

i used bisect because the API is more convenient:

1. [ ets:lookup_element(Tab, Key, 2) || Key <- ListOfKeys ]: 89μs
2. [ bisect:find(Bisect, Key) || Key <- ListOfKeys ]: 486μs

dominic · January 5, 2023, 6:52am

AFAIK, ets:lookup_element/3 is most effect ets function and if you don’t have any bottlenecks about lock compete, IMO, maybe you have to use something to instead of ets.

If you never change these data after insert, maybe you can try persistent_term

a simple test

erl -noshell -eval “test:run(100000, 1000),init:stop().”

-module(test).

-compile([export_all, nowarn_export_all]).

run(DataSize, LookupSize) ->
    %% init
    ets:new(test_binary, [named_table]),
    ets:new(test_int, [named_table]),
    PId =
        spawn(fun() ->
            M = lists:foldl(fun(N, Acc) -> Acc#{N => N} end, #{}, lists:seq(1, DataSize)), 
            receive
                {From, KeyL} ->
                    From ! [maps:get(Key, M) || Key <- KeyL]
            end
        end),
    lists:foreach(fun(N) ->
        ets:insert(test_int, {N, N}),
        %% i don't know the detail about these binary
        ets:insert(test_binary, {<<N:32, N:32>>, <<N:32, N:32>>})
    end, lists:seq(1, DataSize)),
    %% int
    SeqLookupInt = lists:seq(1, LookupSize),
    {IntTime, _} = timer:tc(fun() -> [ ets:lookup_element(test_int, Key, 2) || Key <- SeqLookupInt ] end),
    %% binary
    SeqLookupBinary = [ <<N:32, N:32>> || N <- SeqLookupInt],
    {BinaryTime, _} = timer:tc(fun() -> [ ets:lookup_element(test_binary, Key, 2)  || Key <- SeqLookupBinary ] end),
    %% process
    {ProcessTime, _} = timer:tc(fun() -> PId ! {self(), SeqLookupInt}, receive Value -> Value end end),
    io:format("data size: ~p, lookup size: ~p, int cost: ~p, bianry cost: ~p~n, process cost: ~p", 
        [DataSize, LookupSize, IntTime, BinaryTime, ProcessTime]).