The need for "protocols" in Erlang

crownedgrouse · November 1, 2023, 4:36pm

yes but that was ugly and implied naming files too.
introducing namespaces would be retro-compatible as old code would bind to empty namespace once compiled with a release handling it.

yes, declaring a namespace in module is not a guarantee, like in XML, that namespace is not used elsewhere because it is a declarative FQDN, but in general uses a domain name owned by owner code.

But under the hood, Erlang VM could simply prefix modules basenames with the namespace as an atom (or a hash of namespace as atom) as well call to modules .
This would not impact the VM itself but only the name of objects compiled.
So that same module name in two different namespaces would be in fact two different modules.

igorclark · November 1, 2023, 6:43pm

Personally I’m not a fan of complexity in the core language, I like how simple it is and I don’t see these cases as sufficient cause to complicate it. I realise this might make me some weird kind of ascetic/masochist, but I tend to think it’s an application, framework or environment’s job to staunch data leaks, rather than the core language.

I can certainly see a value in namespaces, but then I immediately feel like I’d really want them to provide some simple source directory hierarchy mapping, and I don’t think I’d want to have com/domain/subdomain/package/module stuff like in some other languages, and I wouldn’t want it to interfere with how search paths work in e.g. c() or to make things any more complex - which on balance would probably make me think that overall, eh, it ain’t broke, maybe let’s not fix it ¯\_(ツ)_/¯

crownedgrouse · November 2, 2023, 7:24am

I understand , but we have to remind interoperability between beam languages.
I personally named an Erlang application the same than an Elixir one, just because I do not use Elixir and I wasn’t aware of.
Namespaces will allow much more bridges between beam languages MHO.

josevalim · November 3, 2023, 11:32am

This proposal is not suggesting that foo:bar(Arg) becomes spread out across multiple modules. Since this was brought up more than once, I probably expressed myself poorly.

foo:bar(Arg) would still have some explicit source code as part of its implementation that does a dynamic dispatch. What I am proposing would not be different than doing this:

-module(foo).

bar(Arg) ->
  try ets:lookup_element(?MODULE, key_for(Arg), 2) of
    Fun -> Fun(Arg)
  catch
    _:_ -> error({badimpl, Arg})
  end.

The above is one possible implementation of dynamic dispatch that relies on ETS. A separate implementation could use persistent term or even processes. In other words, there are other ways of doing dynamic dispatch on Erlang, but they are more verbose and inefficient than a solution where the runtime/VM knows exactly when new dispatch options are added (based on module loading) and removed (based on module removal).

Not really. Remember you want to be able to write new implementations for existing data types and implementations of new data types for existing protocols.

Think about it as a table. New rows means adding new functionality for existing data types. New columns means adding new data types to existing functionality.

	Lists	Maps	Integers	…
Pretty printing
JSON encoding
Interpolation
…

In your suggestion, once a record namespace is defined, I cannot augment it with new functionality (rows) because it requires changing the module. And the way that function clauses work today do not allow you to add new data types (columns) because it also requires changing the module. The trick is to have a solution that allows both. You have to decouple on both axes.

Here is an example from Rust: The Expression Problem in Rust | Chris Swierczewski’s Website - notice how as you add new traits and new data types, you don’t have to change none of the previous trait definitions nor the existing data types. Coupling a record to a single module would not provide this property.

igorclark · November 3, 2023, 6:23pm

Hiya!

I understand , but we have to remind interoperability between beam languages.
I personally named an Erlang application the same than an Elixir one, just because I do not use Elixir and I wasn’t aware of.

Yes, absolutely, for multi-language apps & systems it could be really problematic if there are modules with the same name. Personally I’ve not really struggled with this, even though there are plenty of e.g. JSON libraries that all manage to find different names, the only time was a couple of UUID projects and it worked fine with rebar.config aliases. I guess that’s because I end up not using Elixir code from Erlang because of the work involved in the interop. But the fact that people name their modules the same as modules from the other language without realising there’s a clash does seem to suggest that there’s not that much in-project interoperability happening anyway, if people are mostly sticking to one language and not checking the other for clashes.

Even so,

Namespaces will allow much more bridges between beam languages MHO.

Right on. If namespaces would allow the kind of interoperability whereby Erlang could use Elixir code as easily as vice versa, then I’d be all for it. That’d be fantastic - there are tons of cool Elixir modules I’d like to be able to use reliably, and without having to do all the work currently necessary. I’d just want to be really sure that it doesn’t have any negative impact on the Erlang-only user-dev experience, especially if it doesn’t actually provide that level of benefit to Erlang-only projects - and that it’s not solely a case of making big changes to that experience in order to avoid name clashes with modules in another language that, right now, involves a fair amount of work to use from Erlang.

eproxus · November 4, 2023, 8:48am

My 2 cents:

Please do not pollute (abuse) the global Erlang module namespace with protocol or type lookups. It is already crowded enough as it is, and we only have one module namespace. It is not the right place to put type-to-encoding mappings in.
I think the premise that there is a strong one-to-one mapping between type and some external representation is more often false than not.

Take for example the calendar:datetime() type in Erlang. Let’s consider a JSON protocol implementation for it. We could choose e.g. an ISO 8601 string or a Unix seconds integer. Which one is “correct”? I’m guessing we can find many different protocols (here I’m referring the network kind, a sign that the name is already overloaded) that require one or the other. Therefore, the implementation of a JSON protocol (the type mapping kind) for datetime() can not and should not choose one or the other. This is up to the implementation of a specific system.

In this light, any protocol-like implementation must be flexible enough that it can be fully parameterized at runtime (if, for example, building a system that needs to use both formats at different ends in different network protocols).

nzok · November 4, 2023, 1:30pm

I personally don’t care whether anything in my Erlang code has the same name as something in Elixir, any more than I care whether it has the same name as something in Ada or Ruby (or Mercury, when Mercury compiled to BEAM). Just because something uses the same VM doesn’t mean I want it in my address space. We have a namespace concept in Erlang: a namespace is a node. That’s what Lawrie Brown based Safe(r) Erlang on all those years ago.

What is the difference between a namespace and a node?
“Namespace” means many things to many people, so when we talk about “namespaces” none of us knows what the others think we mean. “Node” has a precise meaning in Erlang. Currently, nodes are realised as operating system processes, but Lawrie Brown showed us they don’t have to be.

What do virtual nodes (multiple software-isolated nodes that may but need not reside in the same operating system process) buy us?

Untrusted/third-part code may be isolated in a separate node
Different applications may use different versions of modules
When multiple applications happen to use the same version of a module, only one copy needs to exist
A new performance level for inter-node communication is added; to WAN-remove, LAN-remote, same-cluster, and same board we add same OS process.
Java has already gone this route with the idea that running a new Java program can just drop into an existing JVM instance so that there’s no need for multiple copies of classes in memory, no need to keep on re-JITting the same methods, &c, reducing memory and startup time.

Let’s not introduce a new mechanism into the language.
Let’s enrich the implementation of what we have.

nzok · November 4, 2023, 2:22pm

“Think about it as a table.”
This is one of the classic arguments for OOP.
And it’s wrong.
DON’T think of it as a (two-dimensional) table.
Think of it as AT LEAST a three-dimensional space of

what abstract data type
what operation
what consumer
Pretty-printing: the combination of what data type and operation = pretty-print is not all we need to know. You might want different “pretty-printing” methods for
Braille devices
dumb terminals/printers
small LCD screens
large LCD screens
projectors
three-dimensional displays
and for any one of these, there might be many definitions of what counts as “pretty”. Think about whether the interface allows “folding”, for example.

JSON: we’ve already covered that and the idea that there is one right way to convert any Erlang data type to JSON is pretty much as dead as the phlogiston theory by now. For one thing, Erlang data types are already at the wrong level to make that kind of decision. Many different Erlang data structures might represent the same abstract data. Think of a Sudoku board; I could represent one as a string, as a tuple of strings, as a single 81-digit integer with 0 for blank, as a tuple of 9 9-digit integers, as a tuple of tuples of 1-digit integers, as a list of lists of 1-digit integers). And I must be able to change my choice of representation inside my Erlang program without changing the JSON. And of course one Erlang data structure might represent different abstract data, needing different JSON encodings.

Interpolation was never clear to me as an example. Not least because tutorials I’ve seen on string interpolation in other languages are littered with things like “this is what you do when the default conversion for the data type you want to paste in is WRONG for your application”. It’s a combination of “what ABSTRACT data do I have here” (not “what CONCRETE data do I have”) and “What does the CONSUMER of this string want”.

The wrongness of the 2-D model is one of the classic arguments for multiple dispatch. The other classic argument is of course arithmetic. Consider Q + D where Q is a quaternion and D is a dual number. (An actual example in my Smalltalk, which being a single inheritance language does NOT do this easily.) Which argument should decide how it is done? If you say “the first argument”, then what about D + Q? If you say “the second argument”, then again, what about D + Q? The answer is that the implementation depends on BOTH types. Given that (a) Smalltalk is a single-inheritance language, rather like what ‘protocols’ would give us, and (b) my Smalltalk library includes 13 or 14 different kinds of “number”, making sure this all works is NOT easy. (2 * aMoney is OK, 2 + aMoney is NOT, aDate + 2 is OK aDate * 2 is NOT, …) Oh, I didn’t count Money in the 13 or 14…

‘Protocols’ as proposed just cannot deal with the problems they are supposed to solve. Adding a lot of complexity to NOT solve a problem doesn’t sound like a good trade-off.

starbelly · November 4, 2023, 3:26pm

This is a good point, yet I don’t think there is an argument for a default JSON encoder / decoder in the language. Those details should be handled by a library. In this case a library could offer an option to encode or decode a specific key/val in a particular way. Or one could opt to do the transformation prior to encoding.

I only see this as being extremely problematic if there were a default implementation of a JSON protocol in erlang/otp, still it is a very good point and gives me pause

Edit:

Disregard. This thread has become long and unwieldy such that I forgot the piece that you’ve alluded to which is related to a discussion of a default implementation of a json encoder/decoder.

dischoen · November 4, 2023, 4:41pm

Hi,

you mentioned the virtual nodes several times.
Is this only a concept or is there also an implementation for it?
So far I only found a paper [Brown99] by L. Brown, which focuses on safe erlang, but nothing more current.

Thanks for any pointers.

–
[Brown99] EXTENDING ERLANG FOR SAFE MOBILE CODE EXECUTION

eproxus · November 4, 2023, 5:59pm

This has nothing to do with Elixir. I was referring to using a mapping scheme with module names like json_datetime which by definition would live in the Erlang VM global module namespace. The fact that different programs might want different implementations of such a mapping makes it impossible to use the module namespace in a node for that purpose.

nzok · November 5, 2023, 7:26am

At the time that Lawrie Brown was presenting papers like
https://link.springer.com/chapter/10.1007/978-3-540-47942-0_5
https://link.springer.com/chapter/10.1007/10718964_3
http://lpb.canb.auug.org.au/adfa/seminars/adfa303/adfa303.html
http://lpb.canb.auug.org.au/adfa/papers/tr9704.html
there was a small group of Erlang people saying “YES! This
is great! This is the next step in scaling Erlang!” and
there was a much larger group of people saying “meh.”

I remember there being great excitement at RMIT about the Magnus project. Then silence. The Erlang Compiler EC that was being developed at the time (this was before bit syntax, let alone maps) got to the point of being useful, and then disappeared. I haven’t heard anything from Maurice Castro or Dan Sahlin or Lawrie Brown since moving back to New Zealand.

I just found Lawrie Brown’s e-mail address and sent him a query.
He’s the Brown of Stallings & Brown on Computer Security, so he was pretty clued-up about the security aspects of SSErl.

nzok · November 5, 2023, 8:34am

“This has nothing to do with Elixir. I was referring to using a mapping scheme with module names like json_datetime which by definition would live in the Erlang VM global module namespace. The fact that different programs might want different implementations of such a mapping makes it impossible to use the module namespace in a node for that purpose.”

Why are two different programs running in the same node?

That’s not the problem. The problem is that ONE program will commonly need multiple mappings between its own data types and JSON, so that json_datetime isn’t a problem (to be solved by some limited namespace hack), it’s a mistake (to be solved by better design). What the program needs is a
talking_to_partner_foo
module with functions mapping between one set of data types and JSON in one way and another
talking_to_partner_bar
module with functions mapping between a different but probably overlapping set of data types and JSON in a different way.
If I call
talking_to_partner_foo:request(loan, Loan_Info)
then the caller module SHOULD NOT KNOW that JSON is involved (or that it is not). The caller is responsible for WHAT is said to partner foo. talking_to_partner_foo is responsible for HOW it is said.

If I call
talking_to_partner_bar:request(invoice, Invoice_Info)
then the caller module SHOULD NOT KNOW that JSON is involved (or that it is not). The caller is responsible for WHAT is said to partner foo. talking_to_partner_foo is responsible for HOW it is said.

I repeat, json_datetime is not a problem, it is a MISTAKE.
We do not, no, we should not, go out of our way to twist the language to support design mistakes.

I feel like crying. I left the IT industry to teach programming and software engineering for the next couple of decades, and I’m seeing lessons that the industry had learned more than 30 years ago still not heeded. It’s right there in “Criteria for Decomposing Systems into Modules.” Consider changeability. Practice information hiding. In particular, hide things like external representations in modules.

Let’s look at an actual JSON file I requested recently.
Or at least just enough to make my point.

{
“disclaimer”: “Usage subject to terms: https://openexchangerates.org/terms”,
“license”: “https://openexchangerates.org/license”,
“timestamp”: ,
“base”: “USD”,
“rates”: {
…
“USD”: 1,
…
}
}

We start off with some fixed fields,

and then have a table of -> maplets.

Loading this into my program,

the licence and disclaimer are strings, discard them
the timestamp is a number, read it as an Integer then
convert it to a DateAndTime.
the base currency is a string, convert it to a Currency.
In the table,
the currencies are strings, convert them to Currency
objects
the ratios are numbers, which are first read as exact
ScaledDecimal numbers (not FloatD) and then converted
– by attaching the base currency – to Money objects.

Two different treatments for strings (and in neither case is a String the result), two different treatments for numbers (and in neither case is a Number the result).

I could generate this JSON file easily enough.
However, the conversion of a DateAndTime to JSON form is not the usual Javascripty representation of a timestamp. If there were a json_datetime module (analogue) it would be WRONG to use it here. What happens to a Money object isn’t the usual representation of Money in JSON either. If there were a json_money module (analogue) it would be WRONG to use it here.
The question is not “How to represent an instance of the Money abstract data type in JSON” but “How to represent an instance of the Money abstract data type in JSON when generating an exchange rate table in OpenExchange format”.

If you look at the GRASP patterns,

talking_to_partner_foo is the Information Expert
it’s arguably an Indirection in that it mediates between
clashing representations of information
it achieves Low Coupling
and High Cohesion (its responsibility is there in its
name: talking to partner foo; what is said to that
partner and what is done with the partner’s replies are
decided elsewhere)
it’s all about Protected Variation (decoupling its callers
from changes to the external representation and the partner
from changes to the internal representation)

crownedgrouse · November 5, 2023, 9:24am

This is not different of what i thought.

-module(foo).
-namespace(talking_to_partner).

request(…) → … .

This is only a way to virtually prefixe your own modules.That’s all. No other goal.

And if local code want to access another implementation of a module with same basename, declare it.

-module(foo).
-namespace(talking_to_partner).

-namespaces([{a, talking_to_other}]).

request(xxx)-> … ;
request(yyy)-> a::foo:request(yyy).

josevalim · November 5, 2023, 5:16pm

Yes, protocols do not solve multiple dispatch, it is two dimensional. Yet, we are still stuck on a single-dimensional space. If there are good solutions for efficiently solving the N-dimensional aspect in a dynamic language (perhaps Julia would be a good example), then I am all ears. Meanwhile I will keep thinking about the problem. If we assume we will change the VM, it at least opens up the solution space.

The proposal assumes the ability of defining/wrapping new types somehow.

Even then, let’s assume we do pick a solution with multiple-dispatch. I don’t see how to avoid both performance and ordering issues without making the solution type based.

If you have a SUDOKU string, then you would either need to check on every string if that’s a SUDOKU string and then fallback. If others define FOO, BAR, BAZ strings, then we are now traversing 4 different types of strings. Extend this to every datatype and you end-up with a very slow dispatching mechanism? Plus if both FOO and BAR have similar shapes, then the order will matter. This sounds like weak-typing and it is easy to make a mess. If you want to have a special representation for a string, asking to wrap it on a new type is not much (and arguably a best practice even if you are not looking for a custom representation).

At best, we could ditch the type mechanism for a subset of pattern matching + guards where we can detect overlaps, but even then I fail to see how it would handle generic tuples or strings efficiently.

PS: In any case, I believe we had more than enough feedback on this one, so thanks everyone for the convo. I am glad to withdraw the idea for now.