Origins of Erlang collection data structures?

itsok · May 18, 2023, 2:12pm

This is a question for someone who is familiar with origins of Erlang or design of programming languages in general and reasoning behind some of the fundamental decisions behind Erlang. I’m secretly hoping I might be privileged to get an answer from Robert Virding himself .

Erlang has three basic collection data types - lists, tuples and maps (which was added most recently).

While designing a programming language, why is there a need for these types? Or to put it another way, why not some other basic types with some other properties? What alternatives were considered?

williamthome · May 18, 2023, 3:29pm

@rvirding

rvirding · May 19, 2023, 3:15pm

I do try to reply but sometimes you may have to wait a bit. Or ping me!

Of course we needed collection types. Lists and tuples are basic, and also very versatile and can be used for almost anything and everything, and have no restrictions to the types of their elements. From the very beginning we had decided there would be no user defined data types which very much limited the choices. Maps came later and they do provide a very useful and practical interface while still being very versatile in that the keys and values can be anything.

Records were added as way to define a named structure with access to named fields within them. They came very early and you need to be very careful in viewing them as user defined types. Seeing we didn’t have user defined types all record operations are translated by the compiler to operations on tuples.

Elixir structs are similar to erlang records except that they use maps instead of tuples. This has both benefits, you can better “see” what they are and fields in them, and disadvantages, access is slower.

So there are no real user defined types.

raimo · May 22, 2023, 12:17pm

The motivation for list and tuple types, the original ones, that I have heard, is that:

Tuples are for a fixed size collection, indexed by an integer, exhibiting fast lookup but slow construction, which is what you can expect for an array in a functional language.
Single linked lists (cons cells a.k.a pairs) are for variable size collections since fast to operate on the front, fast to iterate linearly, slow to search through and even slower to construct in whole.

So these are two collections with very different properties that are very useful in a functional language. Other alternatives could be arrays (tuples but would not fit as well in a functional language), double linked lists (very strange in a functional language), strings (not as essential as lists and not a priority back then, and not really a container type), associative arrays (indexed with any key, i.e maps, not deemed as essential back then.

Associative arrays first appeared as ETS tables, and as higher level data structures created from tuples and pairs (trees), so we did well for many years without maps.

Pairs and tuples are the bare minimum for a functional language. One could also do without lists and construct them from 2-tuples. But it is a huge improvement to have syntax for lists / pairs, so only tuples would have been a too bare minmum.