Is lookup and insert really constant time in Erlang ETS set?

The erlang doc states that:

Insert and lookup times in tables of type set are constant, regardless of the
table size.

How is erlang ets set key/value tables implemented internally? If it is by a hash table - will it not be nessasary to recalculate the hash adress space once in a while and thus it is not “pure” constant lookup time?

Ets set, bag and duplicate_bag are hash tables that grow and shrink dynamically one bucket at a time using linear hashing.

3 Likes

Thank you @sverker :slight_smile: Will read up on linear hashing right away :blush:

The ETS hash implementation is a segmented array of buckets, where each segment contains 256 buckets. When the first initial segment, with buckets 0-255, is “full” we allocate a new segment, with buckets 256-511, and start splitting bucket #0 into bucket #0 and bucket #256 by using one more bit of the hash value. Then we split bucket #1 into #1 and #257 and so on.

When we have all 512 buckets used and the table is considered “full” again we allocate yet another segment and start over by splitting bucket #0 into #0 and #512 by using yet another bit of the hash value, and then bucket #1 into #1 and #513

2 Likes

Thank you @sverker for this detailed explanation :ok_hand:

Actually, with this ETS hash implementation - are there “edge” cases there can break the ETS table? For ETS set tables … in principal can you have an unlimited size table, where only limit is the RAM size of the system? Like TB of RAM … will the ETS set table still work in this extreme case? Or are there physicial limitations I need to concider for really extreme use of ETS set tables?

Well, as you ask, I checked, and it seems we keep the total number of buckets in a 32-bit signed integer. So, if you can fill a table with more than 2 billion keys you might hit that limit. If I calculate correctly that would demand an absolute minimum of 128 Gb of memory.

That is easy to fix, and we should probably do it.

2 Likes

@sverker thank you for checking up on this :blush: Actually I am working on a new kind of AI and tuplespace information system, where it is highly likely that I will hit this limit (sometime within 1/2-2 years time from now). So If you can add this easy fix, and raise the limit, in a future release I will be very happy :smiley:

Write an issue at GitHub - erlang/otp: Erlang/OTP so we don’t forget it.

1 Like

Ok, I will write an official issue at GitHub Erlang/OTP :slight_smile: