Buult-in JSON decoding keys as existing atoms

eproxus · July 26, 2024, 9:08am

A common feature that almost all existing JSON libraries support is converting the resulting keys to atoms. While the new json module technically supports this, the interface is quite cumbersome:

Push = fun(Key, Value, Acc) -> [{binary_to_existing_atom(Key), Value} | Acc] end,
json:decode(<<"{\"foo\": 1}">>, ok, #{object_push => Push}).

It would be nice with a simpler API for the most common use cases, such as:

json:decode(<<"{\"foo\": 1}">>, #{keys => existing_atom}).

Perhaps this has already been thought of. Are there any existing plans to add such an interface?

starbelly · July 27, 2024, 7:18pm

I’ve been busy I didn’t get a chance to look at the api, I’m surprised what you’re suggesting wasn’t done already and that the api of Jason or other libraries wasn’t essentially mirrored considering it’s success.

Sounds like an easy win to me.

michalmuskala · August 27, 2024, 4:21pm

This was a sort-of intentional omission. The new API offered by the json module is strictly more powerful, even if more verbose.
The complexity of supporting both APIs come when you pass some option from one and some options from the other. For example, it’s not clear what a call like this would mean:

json:decode(<<"{\"foo\": 1}">>, #{keys => existing_atom, object_push => ...}).

I would see two possible, and in my mind undesirable, outcomes:

Combining options from the two APIs is not supported - this means that as soon as you needed to use the “advanced” API, you’d need large code changes
We define some rules for how the options are combined - this means that the result could be not intuitive

For this reason, I decided to conservatively err on the side of smaller, simpler API, even if there’s some extra verbosity.

nzok · August 28, 2024, 6:18am

keys => existing_atom is already the wrong approach, by the way.
Why do I say that?
Why do we want a JSON object to arrive with atoms as keys?
Because we are expecting to MATCH those atoms.
There is a protocol, an agreement between sender and receiver,
according to which certain keys are expected to be present,
and atoms are being used in their Erlang-traditional role of
representing protocol tokens.
What we want then is something like
expected_keys => [plangent,ochre,triskelion,…],
other_keys => binary
where the JSON keys to be converted to atoms are NOT any old JSON
strings that happen to have a corresponding atom in Erlang, but
the keys we expect as part of the protocol.

This also addresses another issue. Consider four policies:
(A) Always convert JSON keys to atoms. This would be a good policy if Erlang either garbage collected the atom table or split the atoms as recommended by EEP 20.
(H) Convert JSON keys to atoms HAPHAZARDLY, in a way that depends on the History of the current node.
(P) Convert JSON keys that are part of the expected PROTOCOL to atoms
(N) Never convert JSON keys to atoms.
Out of these four policies, three of them guarantee that the same JSON object decoded by processes in different nodes will decode as the SAME Erlang term, and the fourth does not can cannot make any such guarantee. Three of those four policies guarantee that if you decode a JSON object and stuff it in ETS or DETS or anything that preserves Erlang structure, and at a later time convert the same JSON object with the same options and go looking for the result in ETS or DETS or whatever, you will find it. The fourth does not and cannot make any such guarantee.

Of course it’s existing_atom that’s the dangerous one.

With the advent of maps, option (P) became practical. It’s arguably better from a software engineering point of view.

This is true of list_to_existing_atom/1 generally. It’s a sticky bandage (to avoid trademarks). A stopgap. Is there any use of list_to_existing_atom/1 that would not be better expressed as list_to_expected_atom(List, Atoms), (Or for efficiency via a two-step process, first converting a list of atoms to an unspecified data structure – actually a map – and then looking a list of characters up in that unspecified data structure.) Perhaps this deserves an EEP. I do know that when list_to_existing_atom/1 was introduced I saw it as a temporary expedient that would go away soon. It’s not a good thing to build on.

jhogberg · August 28, 2024, 8:40am

Yes, I very much like the idea of list_to_expected_atom(List, Atoms) and its binary counterpart. In addition to the reasons you’ve listed, it also avoids a really silly corner case where one or more of the desired atoms have been optimized out, resulting in a surprising crash that is not obvious how to debug – the atom is there in the source code!

dch · August 28, 2024, 9:21am

Yea this sounds pretty good, where the exact schema is already known at compile time.

Generally this seems like the first step towards JSON schema validation ala GitHub - vstakhov/libucl: Universal configuration library parser & similar libraries using JSON schema. https://json-schema.org/ where the shape and structure, possibly nested, needs to be validated too.

I have written a NIF for libUCL parsing already, and it would be easy to extend this to validate against a provided JSON schema.

Dave

michalmuskala · August 28, 2024, 10:13am

The good thing about the API in the json module is that it allows building any of those options on top of it

LostKobrakai · August 28, 2024, 10:17am

This reminds me very much of:

To me wanting to get atoms back is the first tinies step in moving from
json text ↔ json object model towards json text ↔ your data model. I can certainly see the appeal, but keeping the step separate makes things more maintainable.

I was just about to suggest that. There could be APIs, where e.g. given a list of expected atoms it would return the necessary set of options for the json:decode call.