Insert and lookup times in tables of type set are constant, regardless of the
table size.
How is erlang ets set key/value tables implemented internally? If it is by a hash table - will it not be nessasary to recalculate the hash adress space once in a while and thus it is not “pure” constant lookup time?
The ETS hash implementation is a segmented array of buckets, where each segment contains 256 buckets. When the first initial segment, with buckets 0-255, is “full” we allocate a new segment, with buckets 256-511, and start splitting bucket #0 into bucket #0 and bucket #256 by using one more bit of the hash value. Then we split bucket #1 into #1 and #257 and so on.
When we have all 512 buckets used and the table is considered “full” again we allocate yet another segment and start over by splitting bucket #0 into #0 and #512 by using yet another bit of the hash value, and then bucket #1 into #1 and #513…
Actually, with this ETS hash implementation - are there “edge” cases there can break the ETS table? For ETS set tables … in principal can you have an unlimited size table, where only limit is the RAM size of the system? Like TB of RAM … will the ETS set table still work in this extreme case? Or are there physicial limitations I need to concider for really extreme use of ETS set tables?
Well, as you ask, I checked, and it seems we keep the total number of buckets in a 32-bit signed integer. So, if you can fill a table with more than 2 billion keys you might hit that limit. If I calculate correctly that would demand an absolute minimum of 128 Gb of memory.
That is easy to fix, and we should probably do it.
@sverker thank you for checking up on this Actually I am working on a new kind of AI and tuplespace information system, where it is highly likely that I will hit this limit (sometime within 1/2-2 years time from now). So If you can add this easy fix, and raise the limit, in a future release I will be very happy