There seems to be some miscommunication happening.
“Having typeclasses/interfaces/protocols does not mean they belong to layer 2.”
That’s not just not what I said, it’s a category error.
Protocol layer 2 is about abstract information.
typeclasses/interfaces/protocols are an implementation issue.
There are three things:
PROTOCOL LAYER TWO (this is conceptual)
DATA STRUCTURE PROTOCOL LAYER ONE (UBF, BSON, JSON, ASN.1, XML, &c)
IN CODE |
lower level transport system
When I say that protocol layer 2 is conceptual, I mean that it deals with concepts like “user credentials” or “geographic location”.
There are TWO mappings to/from Protocol layer 2, and they have to be
able to vary INDEPENDENTLY:
- the mapping between data structures in some programming language
and the concepts of protocol layer 2, these might be utterly different
in different programming languages and would almost surely need to
change with scale in a single programming language
- the mapping between concepts in protocol layer 2 and forms in
protocol layer 2. Indeed, you are quite likely to find yourself
needing more than one version of protocol layer 1; JSON and XML
for example. This further stresses that NO HINT OF PROTOCOL LAYER
ONE SHOULD BE VISIBLE in the mapping between your data structures
and protocol layer 2.
Let me stress this again: we do not generate JSON just for the fun of it;
we generate JSON in order to communicate something else, and letting
JSON be visible in our own data-structure/layer two mapping just makes it
HARDER to adapt to other “wire protocols” (XDR, Protobufs, whatever).
And if you have to talk to two clients with different layer two protocols,
you need in general DIFFERENT data structure<->wire protocol translations.
This is why protocol level 2 needs to be reified in your program and IT
needs to manage the translations, not your data structures. Your data
structures shouldn’t “know” that JSON even EXISTS let alone be tied to it.
"What dictates what are valid data structures for the JSON Protocol layer 1? We can say lists, maps, strings, integers, and floats. Let’s also say Erlang/OTP includes dict
(dual of maps) and array
, because an array
is a low-level data structure that maps to a JavaScript array.
Now let’s imagine that at my company I created a fast_array
module, that improves Erlang’s array
. Its API is the same as array
and it should be a drop-in replacement. Except that I cannot encode it to JSON without forking Erlang/OTP."
This couldn’t be wronger. The point of JSON is that it is very simple. Seriously. The last JSON writer I wrote (in a Lispy language) was 42 raw lines + 16 lines to write strings. To generate output in a way that accepts your new type, you have to clone at most a page of code. The corresponding JSON reader is 124 raw lines. 35 deal with JSON strings. 24 deal with numbers. 17 deal with JSON5 comments. 48 do everything else and looking at them, only 44 are really needed. Making it generate a different kind of sequence or a different kind of dictionary would be one line changes.
(“Raw” lines means with blank lines and comments removed, but counting end brackets like “end” as full line.)
If you want to encode/decode JSON differently, you need to clone and modify a JSON writer/reader. AND THIS IS TRIVIAL. Which is the point of JSON. The claim that you would have to fork the whole of Erlang/OTP is absurdly exaggerated.
But remember, each layer two protocol will want its own way of decoding JSON. One application will want numbers in certain contexts to represent timestamps and never change strings. Another application will want strings in certain contexts to represent timestamps and never change numbers.
The whole idea that there is, or should be, or even could be a “one size fits all” (extensible) mapping between data structures in any programming language and JSON is wrong. Even in Javascript, you sometimes have to convert something else to JSON-data before generating JSON-text from it.
In any case, it’s up to the Protocol layer 2 code to decide what it is going to accept or generate. Lying to it, pretending to offer it what it wants but actually giving it something else, is “laying the foundations for a building made of fail.” I know about the Liskov Substitution Principle. I’ve never yet met one data type that’s an exact drop-in replacement for another, and I’ve struggled very hard to produce some. In order to get close to that, you hnave to use very narrow interfaces, which means not being able to exploit composite operations that are especially efficient.
Here is an example I had recently. Suppose you have a Point.
{“x”:X,“y”:Y} is the obvious representation. Suppose, as many Smalltalk systems do, you have a PointArray. The default output for such a thing will be [{“x”:X1,“y”,Y1},…,{“x”:Xn,“y”:Yn}]. But suppose you have a lot of these to exchange. You can save a factor of 2 by using[[“x”,“y”],[X1,Y1],…,[Xn,Yn]]. But you DON’T want to tie this to PointArray because some receivers are not expecting this optimisation. A revised Protocol layer 2 where the sender and receiver agree on this optimisation, that’s where the code goes, and it doesn’t matter whether the datum is an ordinary Array of Points or a specialised PointArray or even a DoublyLinkedList of Points, at this point in the higher level protocol the sender demands and the receiver expects some sort of sequence of Points. It’s the (communication) protocol that decides, NOT the data type.
What we really want for JSON is some sort of declarative notation that says
-
THIS concept
-
is represented THIS way in JSON
-
and implemented THIS way in Erlang
-
and THIS way in Common Lisp
-
and THIS way in Smalltalk
-
and so on.
I’m afraid I have a bad habit of whipping up little tabular DSLs with AWK scripts to generate the code, instead of implementing something I could report at a conference.