Is lookup and insert really constant time in Erlang ETS set?

Inforista · February 27, 2024, 12:56pm

The erlang doc states that:

Insert and lookup times in tables of type set are constant, regardless of the
table size.

How is erlang ets set key/value tables implemented internally? If it is by a hash table - will it not be nessasary to recalculate the hash adress space once in a while and thus it is not “pure” constant lookup time?

sverker · February 27, 2024, 1:10pm

Ets set, bag and duplicate_bag are hash tables that grow and shrink dynamically one bucket at a time using linear hashing.

Inforista · February 27, 2024, 1:14pm

Thank you @sverker Will read up on linear hashing right away

sverker · February 27, 2024, 1:31pm

The ETS hash implementation is a segmented array of buckets, where each segment contains 256 buckets. When the first initial segment, with buckets 0-255, is “full” we allocate a new segment, with buckets 256-511, and start splitting bucket #0 into bucket #0 and bucket #256 by using one more bit of the hash value. Then we split bucket #1 into #1 and #257 and so on.

When we have all 512 buckets used and the table is considered “full” again we allocate yet another segment and start over by splitting bucket #0 into #0 and #512 by using yet another bit of the hash value, and then bucket #1 into #1 and #513…

Inforista · February 27, 2024, 1:56pm

Thank you @sverker for this detailed explanation

Inforista · February 27, 2024, 2:04pm

Actually, with this ETS hash implementation - are there “edge” cases there can break the ETS table? For ETS set tables … in principal can you have an unlimited size table, where only limit is the RAM size of the system? Like TB of RAM … will the ETS set table still work in this extreme case? Or are there physicial limitations I need to concider for really extreme use of ETS set tables?

sverker · February 27, 2024, 3:21pm

Well, as you ask, I checked, and it seems we keep the total number of buckets in a 32-bit signed integer. So, if you can fill a table with more than 2 billion keys you might hit that limit. If I calculate correctly that would demand an absolute minimum of 128 Gb of memory.

That is easy to fix, and we should probably do it.

Inforista · February 27, 2024, 4:41pm

@sverker thank you for checking up on this Actually I am working on a new kind of AI and tuplespace information system, where it is highly likely that I will hit this limit (sometime within 1/2-2 years time from now). So If you can add this easy fix, and raise the limit, in a future release I will be very happy

sverker · February 27, 2024, 5:19pm

Write an issue at GitHub - erlang/otp: Erlang/OTP so we don’t forget it.

Inforista · February 27, 2024, 5:20pm

Ok, I will write an official issue at GitHub Erlang/OTP