The need for "protocols" in Erlang

josevalim · October 23, 2023, 5:28pm

At Code BEAM Europe, I had several discussions which all pointed out towards a need for protocols, or a similar solution, in Erlang. This post is an attempt to summarize them and start a discussion on the topic.

What are protocols?

In Erlang, we are familiar with polymorphic code:

negate(Int) when is_integer(Int) -> -Int;
negate(Bool) when is_boolean(Bool) -> not Bool.

The code above is poly (many) morphic (shapes) because it can handle different types of arguments (integers and booleans).

The limitation in Erlang’s polymorphism is that it is “closed to extension”. It is not possible to add additional clauses to negate/1 unless we change its original source.

Many programming languages provide a mechanism to have “open polymorphism”. Typeclasses in Haskell (with a paper dated back to 89), Protocols in Clojure, Interfaces in Go, Protocols in Elixir, etc. I am using the name “Protocols” for the rest of the post, as that is what it is called in Elixir, which Elixir inherited from Clojure, all of them being dynamic languages.

Example #1: pretty printing

One canonical example of protocols is pretty printing data structures. Many data structures in Erlang have complex implementation, which is private to the data structure, but ends up leaking in the shell:

1> gb_sets:from_list([foo, bar, 123, {tuple}]).
{4,{foo,{bar,{123,nil,nil},nil},{{tuple},nil,nil}}}
2> re:compile("(a|b)", []).
{ok,{re_pattern,1,0,0,
                <<69,82,67,80,86,0,0,0,0,0,0,0,1,0,0,0,255,255,255,255,
                  255,255,...>>}}

While Erlang could potentially provide mechanisms to customize the shell, it goes beyond the shell. For example, any logging mechanism may print the internals of data structure.

3rd party libraries may also have the same constraints: they want to create new data types and abstractions without leaking their implementation details.

This is also problematic for user data types. With you have a user record, with email and password fields, you want to make sure those are not written in logs for data privacy reasons. Therefore, we need a mechanism where the data type extends Erlang over how it should be pretty printed. And protocols are a possible mechanism for such.

Example #2: JSON

There is a discussion for adding JSON support to Erlang/OTP. Serialization is another area where protocols shine, as they provide an extensible mechanism for data types to express they can be converted to JSON (or any other serialization format).

This way, a JSON implementation in Erlang could support the native data types out of the box. Custom types, such as gb_trees, library types, and user-defined types can all opt-in to encoding by “implementing a protocol”.

Of course, it is possible to solve this problem by providing a “recursive custom encoder” function to the JSON encoding mechanism, but it forces developers to be responsible for stitching together how all the different data types should be encoded.

Example #3: String interpolation

There has been on-going discussions about adding string interpolation to Erlang. Without going into the merits of the feature itself, they can be another use case of protocols. For example, imagine that we want to allow interpolation in Erlang:

"My name is ~{Name}"

Then you need to decide what values can be interpolated. If you say only strings, then that’s fine, but if you want to allow integers or floats to work naturally, then the same question arises: how to allow custom data types? Such as a library that implements arbitrary decimal precision?

Other examples

There are many other examples of where protocols can be handy. For example, Elixir uses protocols to provide a module, similar to the lists module, but which works with a huge variety of data structure, including maps, file streams, etc. It doesn’t mean all of these features need to be added to Erlang, but protocols would enable doing so, if desired.

Different serialization formats may define their own protocols and libraries may define custom protocols for specific needs.

Protocol dispatching

There are many design decisions to be taken around implementing protocols (or similar) in Erlang. The main goal of this thread is to not explore solutions, but bring attention to problems and situations where protocols may be a good fit.

I will also be glad to document how the Elixir implementation works but I assume an Erlang solution could (should?) work differently. I would like to use this final section to discuss the main challenges here.

The core of a protocol is a dispatch mechanism. When you implement a protocol, the implementation is not part of the protocol nor of the data structure. For example, take a possible “json” protocol. If that protocol is defined as part of Erlang/OTP, then Erlang/OTP could implement the “json” protocol for all of its built-in data structures: maps, integers, floats, gb_trees, etc. However, a library author may also want to implement the “json” protocol for their own data structures as well.

I can also be a library author that defines a new “csv” protocol. In this case, I also want to define and implement the protocol for all built-in types (which I don’t know, they come as part of OTP). I also want to allow library and application authors to implement the “csv” protocol for their own data structures.

In other words, you may implement any protocol for any data structure at any time. And because code in Erlang always exists inside modules, the protocol dispatching mechanism most likely needs to be based around modules.

We can do so by breaking the protocol dispatching in two parts: a naming schema and a cached module lookup.

The naming schema

The naming schema defines how we are going to find the module that implements the protocol. For example, when giving a map to the json protocol, we could dictate the protocol implementation must be placed at the json_map module. For integers, in json_integer. For a custom type foo_bar_baz, at json_foo_bar_baz.

Therefore, when calling json:encode(SomeValue), we need to get the type of SomeValue, concatenate with the protocol name, and then invoke that module. Once more, if SomeValue is an integer, then json_integer. If SomeValue is a dict record, then json_dict.

The need for caching

The tricky part is that, the json_integer module may be in three different states:

loaded in memory
not loaded but available in disk
not loaded and not available in disk

If json_integer is loaded in memory then it will be fast. If it is not loaded, then we need to go through the code server and traverse code paths. If none is available, we will traverse all code paths, which may be slow.

The trouble is that some protocols may also define “default implementations”. For example, a pretty printing protocol must be able to fallback to a default implementation if none is provided, so we always pretty print something. In other words, every time the protocol is invoked, we would traverse all code paths, find nothing, and then fallback to the default one. This will be extremely slow.

Therefore, we need to cache the result of the module lookup: if the module cannot be loaded, we shouldn’t try to load it next time around. This cache is tied to the module the protocol is defined (think about it as a “module dictionary”). Furthermore, we would ideally cache both the computation of the naming schema and the module lookup.

Let’s imagine we want to dispatch a pretty_printing protocol to custom records, such as re_pattern or dict. The pretty_printing protocol would need to have code that looks like this:

-module(pretty_printing).

to_string(Value) when is_tuple(Value), is_atom(element(1, Value)) ->
  Record = element(1, Value),

  case erlang:module_get(Record, undefined) of
    undefined ->
      Module = list_to_atom("pretty_printing_" ++ atom_to_list(Record)),
      case code:ensure_loaded(Module) of
        {ok, Module} ->
          erlang:module_put(Record, Module);
          Module:to_string(Value);

        {error, _} ->
          erlang:module_put(Record, pretty_printing_fallback);
          pretty_printing_fallback:to_string(Value)
      end;

    Module ->
      Module:to_string(Value)
  end.

There are many design decisions to be made around the code above. Ideally we would want to optimize it such that, at runtime, it is literally equivalent to either pretty_printing_re_pattern:to_string(Value) or pretty_printing_fallback:to_string(Value) value, but the Erlang/OTP compiler/runtime team will be the most capable of making these decisions.

Not only that, we also need to talk about cache expiration. After all, if pretty_printing_some_module was not defined at some point, but it is loaded in the future, it must expire its dispatch cache. This could be done by annotating the implementation modules:

-module(pretty_printing_re_pattern).
-expire(pretty_printing, re_pattern).

Of course, I am not advocating for the user to write these, but that’s the rough lines of a low-level mechanism required to implement protocols.

Thanks for reading and hopefully it gives some ideas, even if I am barking at the wrong tree.

elbrujohalcon · October 24, 2023, 5:58am

I like this idea. I would like to present a possibly problematic scenario here, just for consideration (i.e., a problem to solve not a counterexample to the proposal).

What if I have a module like this…

-module my_users.

-export [new/2, to_string/1].

-opaque t() :: #{username := binary(), password := binary()}.
-export_type [t/0].

-spec new(binary(), binary()) -> t().
new(U, P) -> #{username => U, password => P}.

-spec to_string(t()) -> iodata().
to_string(#{username := U}) ->
    <<"#User<name:", U/binary, ">">>.

When the Erlang runtime has dispatch the to_string/1 function… how will it know that #{username => <<"a">>, password => <<"b">>} is a my_users:t() and not any other map?

josevalim · October 24, 2023, 7:34am

Correct. I haven’t gone into detail about this part, but we would need to define what constitutes custom types in Erlang. For example, we already have records, but some custom types in the standard library are not tagged (such as queue, gb_trees, etc). In Elixir, we have structs, which are maps that use a special key to denote the type (similar to records, but on top of maps).

mmin · October 24, 2023, 10:09am

Hi, I may be misunderstanding something, but isn’t all that achieveable via behaviours?

What would be a mechanism to give a name to a custom data? Would that imply another convention or something? Or is this proposal limited just to records/tagged data where a tag would be it’s name?

Naming schema is something I’d personally like to avoid - I’m more into already used pattern of providing callback functions directly or via callback modules (of a custom name).

rfk23 · October 24, 2023, 10:35am

This proposal adds “magic” to the language, while Erlang’s main selling point to me is distinct simplicity and lack of magic. It’s like “old good C” of functional languages. The ability to read the code and be sure that behavior of function foo:bar is constant is severely underappreciated.
Dynamic dispatch is akin to adding a hidden global variable to every call, that can be modified by totally unrelated code, or, in the worst case, by simply loading a module. Spooky action at a distance.
Let’s go through motivational examples:

Pretty printing in the shell: when I’m troubleshooting a live system where some internal data structure went wrong, I want “internal implementation details” to be “leaked”. Hiding secrets by customizing pretty print function is security through obscurity.
JSON: there’s never going to be a 1-to-1 mapping between Erlang data types and JSON, because JSON is a very minimalist serialization format. The most trivial example: Erlang has bignums, while floats is all that JSON has to offer. It’s up to the business code to define whether it wants to lose precision and transform integers to floats, or encode them as strings. Treating this through a generic protocol is how you make money disappear. Edit: this is wrong.
String interpolation: iolists.

In other words, sometimes lack of a feature is a feature.

dischoen · October 24, 2023, 11:44am

I appreciate this feature (that you know exactly where a function call is defined) very much.
I also never use imports, so that it is clear where a function is from.
Last year I had to work with Haskell, and there I had situations where I could not know from a piece of
source code alone what it was doing. There were at least three dimensions which could overload the
meaning of functions.

kuna.prime · October 24, 2023, 12:34pm

IMHO design of logger is prime example how one can design extensible, configurable stateful/dynamic dispatch in erlang. Of course there are many more examples how it can be done even in stateless way but callback modules/behaviors, at least form my point of view, are correct way how to handle this in general.

josevalim · October 24, 2023, 12:53pm

The behaviour has the specification but not the dispatching mechanism. I didn’t cover precisely where we will dispatch to because that’s a longer discussion.

Actually, I agree with you. I am fine with not having interpolation. I am fine always showing implementation details. And I am fine with limited JSON support.

My only concern is that, if we try to introduce some form of polymorphism to address some of these problems, then we should consider doing it in a consistent fashion, instead of ad-hoc approaches. After all, there are many other ways to introduce dynamic dispatching in Erlang, you just need to store closures or MFAs in any mutable storage (process state, ets, persistent term, etc), and each comes with a different set of trade-offs.

My original proposal had a question mark in the title, removed at the last second, and now I partially regret taking it out.

EDIT: I partially disagree with “Hiding secrets by customizing pretty print function is security through obscurity”. You can call it obscurity, others will call “secure defaults”. Also, even if it is “security through obscurity”, one can argue if that’s the only mechanism you have, then that should be used. So ideally we should discuss what other options exist to avoid leakage of private data to developers, logs, etc., but that’s a whole separate conversation.

starbelly · October 24, 2023, 4:55pm

Speaking of features, this could be an optional and experimental feature using the new feature functionality. If it doesn’t pan out, it could be yanked.

This assumes no major changes would need to happen in terms of reworking the compiler and friends for optimizations.

schnef · October 25, 2023, 7:13am

To me, this seems like a specialized attribute that is useful in certain types of situations or projects. To me, the main characteristics of Erlang are simplicity and stability. Personally, I work in software engineering, creating applications to solve practical issues. I find that Erlang works exceptionally well for me. I think they call this feature creep or a solution looking for a problem.

Let’s concentrate on maintaining and building libraries, improving tools and optimising.

nzok · October 25, 2023, 7:27am

Please don’t call these “protocols”.
In Erlang, a protocol is the interface of a process.
Back in the 1990s when I was connected with SERC it
became clear to us that protocols were vital to
understanding Erlang code, arguably at least as
important as types, but rather harder to dig out.
Back then, there wasn’t a type system to build on.
If I recall correctly, there was the Wadler type
system, and Joe Armstrong was building a rival one
(which he demonstrated to me acting as a code
navigation tool), but neither of them was available,
and UBF wasn’t even a twinkle in Joe’s eye. As in
UBF and Sing#, a protocol in this sense is basically
a softly parametrically typed FSA.

I’m actually rather disappointed that this thread
isn’t about adding real Erlang-style protocols.

“Protocol” in this thread seems to be used in the sense
that Objective C gave it. Something like “extensible
abstract data type” would be more intention-revealing.

So. Wanted: a concurrent programming language running on
the BEAM with a strong polymorphic type system supporting
Haskell/Clean-style extensions. Found: Mercury.

Mercury has been around for, what, 25 years? I’m afraid
support for the Erlang back end was dropped, but I dare say
it could be revived if anyone wanted it. At any rate, it’s
an existence proof that a concurrent language with
Haskell-style types could work on the BEAM.

josevalim · October 25, 2023, 11:59am

I did give examples of problems where protocols can be a solution. I also provided examples of languages, such as Clojure, Elixir, and Haskell, where protocols are used to solve exactly these problems.

You could say it is the wrong solution to these problems in Erlang, as @rfk23 has argued, but I disagree it is “looking for a problem”.

Joe Armstrong said the same but, at that point, the word “protocol” was already widely used in Elixir. For the proposal, I went with the same name, but I wouldn’t necessarily argue we should use the same in Erlang.

That’s also another way to frame the proposal:

a behaviour is the interface of a module
a protocol is the interface of a process
but what is the interface of data?

Behaviours are by far the most commonly used though, with data interfaces coming second (in Elixir), and process interfaces coming third. Besides the I/O protocol, are there any public process interfaces in Erlang/OTP?

cevado · October 25, 2023, 3:53pm

can’t it be solved with dependency injection?
like the json case:

Formatter = fn(Data) ->
...
end,
json_encoder:encode(EncodeData, [{formatter, Formatter}]).

even interpolation could be done that way:

Interpolator = fn(Data) ->
...
end,
string:interpolate(<<"My name is ?">>, [Name], [{interpolator, Interpolator}]).

sure, it’s not as handy as being handled by the compiler/runtime in itself. but it has way less friction and allows for the caller to define the expected handling of such transformations in a explicit way.

nzok · October 25, 2023, 10:10pm

In languages like Mercury, Clean, and Haskell (all of which
have strong static polymorphic typing PLUS type extension
and also have concurrency), type extension is essentially
tied to the obligatory type system.

One of the things that amuses me is that C++'s type system
is a Turing complete functional programming language, so much
so that the standard allows compilers to give up if type
checking is too hard. We turn to a functional language, and
behold! Haskell’s type system is a Turing complete logic
programming language. I think, but I may be mistaken, because
the higher reaches of Haskell are a closed book to me, that
the Haskell type system may be a constraint logic programming
language.

Let’s not take Erlang there.

[By the way, when Joe Armstrong and the other founders
established the meaning of “protocol” within Erlang, the
BEAM itself did not yet exist, let alone any other BEAM
language, and Elixir was decades away from existing. It
simply isn’t right to force redefinitions of key terms
of art.]

The message that started this thread mixed up two things.
(A) there aren’t really any abstract data types in Erlang.
With the optional type system, you can check that a client
doesn’t care what the implementation is, but absent type
checking there is no way to stop a client finding out
exactly what the underlying data structure is.

We had/have exactly the same problem in Prolog, and the
Prolog community eventually decided not to care. Those
who did care found Mercury.

There was a proposal for ‘sealed’ terms for Prolog where
seal(Private_Key, Raw_Term, Sealed_Term)
would have wrapped Raw_Term to give Sealed_Term or
unwrapped Sealed_Term to give Raw_Term. (Yes, I proposed
this.) It was simply promoting for general use a technique
used in implementations for things like data-base references.

(B) Every built-in or user-defined operation is already
defined for all concrete data types, so there is no way to
specify what some operation does for an abstract data type.
In the Prolog world, this surfaced as the problem that you
could not make (1/2) = (2/4) work and you could not make a
version of set equality that worked with unification.

The fact that Prolog terms have function symbols eased
the pain in practice, see how portray/1 is extended, and
indeed in theory (look up ‘narrowing’). Erlang terms do
not have function symbols. I regret never asking Joe why.

One thing I see here is that we are asking our modular
language to do something it really doesn’t want to do.
Have an operation defined by multiple clauses in
DIFFERENT modules. In Prolog, where nondeterminism is
normal, this wasn’t much of a problem, and Quintus Prolog
for one actually allowed this. In Erlang, this is a big
problem, especially with hot loading.

And that’s the final issue (for now). Mercury, Clean,
and Haskell don’t have hot loading. This makes whole-
program optimisation feasible as well as desirable, and
makes typeclasses practical. It also means that there
are no issues about keeping processes running when the
representations of abstract data types change. Having
a type system in Erlang is nice, and works fine in
practice because almost all of a module’s types don’t
change when you load a new version. But nothing
repeat nothing should be allowed to interfere with
Erlang’s ability to do its basic job, and that means
that whatever solution is adopted HAS to work WITHOUT
types and WITH hot loading.

There’s another issue. Clause matching is
fundamental to Erlang, and we want to ensure that all
matching terminates in finite time, so we cannot run
arbitrary code in clause heads. (This is why there is
a distinction between guards and expressions.) Letting
programmers redefine == and =:= could be very troublesome.
The prospect of

f(X, Y) when X < Y → g(X);
f(X, Y) → h(Y).

and

f(X, Y) → if X < Y → g(X)
; true → h(Y)
end.

behaving differently lacks appeal.

All things considered, I’m reminded of this quotation
about UK legislation degenerating into ‘gibberish’:

Dr. G. R. Y. Radcliffe, in the course of The Times’ correspondence suggested that the remedy was in the hands of M.P.s who should refuse to pass legislation which they did not understand. This solution, although certainly the ideal to be aimed at, would probably result in a complete cessation of parliamentary activity if introduced all at once. Thus, although attractive at first sight, it would, it is feared, merely result in ‘the much needed gap’ being filled by further delegated legislation.
1950 October, The Modern Law Review, Vol. 13, No. 4, Page 488, Blackwell Publishing.

If Erlang has managed to support great things for the last
37 years without this issue being a show-stopper, maybe we
don’t need to rush into anything but can take our time
thinking really deeply about it.

starbelly · October 26, 2023, 3:46am

I’m interested in this one. At work, which is an elixir shop, we hot code load a lot, and by virtual of using Elixir itself, means when we hot code load protocols are involved. Perhaps you’re getting at a rub where by the protocol itself has changed?

josevalim · October 26, 2023, 8:11am

It depends.

In pretty-printing, because pretty-printing happens anywhere in the system, you can’t really inject it. You could store the injected function somewhere but, by doing so, you are still doing “extensible dynamic dispatching”, just using something different from protocols.
For JSON, yes, this is possible. But it will be the responsibility of your application to stitch all the type dispatching together.
For interpolation, I don’t see much reason. Just invoke the function directly in place of the interpolation.

Agreed. The goal of the proposal was not to say “let’s add protocols now” but rather to start a discussion around the problems and possible solutions. It would also be valuable to get your feedback on the specific problems listed and if they worth solving and other possible solutions.

Maria-12648430 · October 26, 2023, 2:52pm

tl;dr: I’m against this proposal

As others already pointed out, one of the strong points of Erlang is the absence of any behind-the-curtains magic that happens unasked for when the stars happen to be right.

As an analogy, if I go to a coffee shop, say “fill this” and hand over some object, I expect it filled if and only if the clerk I talk to knows how to fill that object with coffee.
Be it a cup or a bucket, he will likely know how to fill the vessel with coffee.
If it is, say, a dog, I would expect either a or a “WTF my girl, are you crazy? ”. Even the dog being filled with coffee could be an expected outcome.
What I would not expect is that the dog would be taken for a walk or played around with or given candy (or whatever dogs like, I’m really more of a cat person ), just because somebody who happened to be around knows how to fill dogs with joy.

Anyway, more seriously…

With the suggested dispatch mechanism of looking up a specially named module with a function with the same name and arity as the one that is to be extended, what if…

… you want to use a library that just happens to have such a module with such a function, for entirely unrelated purposes? Suddenly you can give otherwise invalid arguments to a function, and instead of failing, it would do something that neither you nor the author of the library thought of. Like taking the argument for a walk
… you want to use two libraries that each want to implement the protocol for the specific type and for that reason? They will have to use the same named modules in them, ie module name clashes would become a common thing, meaning you can use only one of the two libraries you need, and no way around it.

Dispatching on records is another thing, how is this supposed to work? Records are tagged tuples, not a type of their own, how would you distinguish a record from a tuple in the dispatch?

IF this suggestion should be considered, I think it is imperative to at least “mark” functions as protocols, ie that they have to explicitly declare that they can be extended, and that functions that are intended to extend a protocol have to exactly specify that that is what they do. And the module-dispatching issues I outlined above need to be solved, of course.

And even then I won’t be a fan of it

juhlig · October 26, 2023, 3:09pm

Hm. I work at a Clojure shop right now, and this magic extensible protocols/multimethod stuff is unpredictable, hard to follow up and debug.

josevalim · October 26, 2023, 3:47pm

To clarify, all of this indeed has to happen. You can’t just dynamically change the behaviour of any function. I will update the proposal to make this clear but you could see in my example the json:encode/1 fun explicitly implements the dispatching logic (edit: it turns out I cannot update the post :D). None of this would happen magically behind the scenes. All protocols are explicitly declared upfront and all implementations are explicitly declared as such.

You are 100% right though that the suggested implementation leads to naming collisions. And this is particularly problematic for records indeed, because record names are not unique. In Elixir this is a lesser issue since structs are tied to modules and most data structures have a clear and obvious protocol implementation.

starbelly · October 27, 2023, 3:58am

I am interested this. I don’t know clojure well or have ever really used it, but I can comment on the experience in Elixir : I have never had a problem with protocols. I’ve never been in a situation, that I can remember , where I ran into some strange problem and protocols where the issue behind it all or made anything hard to debug for that matter. When something tries utilize the string protocol and it’s invalid, it’s quite clear why as an example.

It may be that protocols in clojure are a bit more than what protocols in this proposal describe or as implemented in Elixir? Protocols in general have always felt pretty light weight to m.

I’m not saying those cases do not exist, it’s just, I’ve never hit them, that I can remember Of course, anything can be abused and maybe that’s where the snag you hit live (i.e., abuse within third party libs and such).