Ensuring jsone encodes/decodes to correct representations

scherrey · June 15, 2023, 5:04am

We’re dealing with a lot of ingress/egress of data encoded as JSON. While JSON is normally quite lovely, it’s insistence that all numbers are floats is absolutely inane especially when we deal with numeric values that are larger than a JSON float can properly represent. So we’re generally trying to store numeric values as strings inside JSON. We’re currently using jsone library for JSON management but are happy to change if there’s a better option.

What we want to accomplish is strong certainty that when we decode JSON strings that we know should be a numeric or when we decode JSON numerics that we know should be small value integers - how do we guarantee the Erlang value we extract is a proper Erlang integer and doesn’t end up being a binary string or float?

eproxus · June 15, 2023, 6:43am

Since I don’t think Jsone does it for you automatically, you should be able to rely on something like this:

1> list_to_integer(binary_to_list(<<"102928172663637277182837373672726251415378495950397262">>)).
102928172663637277182837373672726251415378495950397262

jimdigriz · June 15, 2023, 11:57am

The JSON spec does not limit accuracy here the problem is in the implementations of the various JSON parsers that tend to use a float (as that may also be the languages internal boxed format)…of particular note JavaScript so it is an interop problem as detailed in the RFC.

[erlang] 1> jsone:decode(<<"{\"moo\":47328914732981470983214709321740932174093217432174983217409832170}">>).
#{<<"moo">> =>
      47328914732981470983214709321740932174093217432174983217409832170}

Javascript:

[node] > JSON.parse("{\"moo\":47328914732981470983214709321740932174093217432174983217409832170}")
{ moo: 4.732891473298147e+64 }

[erlang] 1> floor(4.732891473298147e+64).
47328914732981471038302663499663783705414799970317379120419307520

Of course for numbers with fractional parts, you are battling with floating point representation as this is the language implementation boxed format (including Erlang) so you cannot really avoid this short of doing your own fixed point math.

Another way is to pick your JSON parser carefully, or ‘cheat’ and use a macro and post process the encoded output to ‘fixup’ the JSON output to have the unadulterated numbers.

Depending on your options, another serialisation format like ASN.1 might be an option, which supports arbitrary precision and does not come with the pointlessness of protocol buffers where every implementation mostly is just a C binding to the library as such you may as well just drop binary structs straight onto the wire…but I digress :-/

vances · June 16, 2023, 3:28am

You could also just be explicit about it:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "UnitValue.schema.json",
  "title": "UnitValue",
  "definitions": {
    "UnitValue": {
      "$id": "#UnitValue",
      "description": "Specifies a decimal value",
      "type": "object",
      "properties": {
        "valueDigits": {
          "description": "Contains the significant digits of the number.",
          "type": "integer",
          "minimum": -9223372036854775808,
          "maximum": 9223372036854775807
        },
        "exponent": {
          "description": "Contains the decimal exponent value to be applied.",
          "type": "integer",
          "minimum": -2147483648,
          "maximum":  2147483647
        }
      }
    }
  }
}

scherrey · June 17, 2023, 11:36am

Alas, this JSON is for interoperability with a go-lang client. Not sure they have ASN.1 - have to check. I did use ASN.1 a lot back during my telecom dev days in the 90’s. That’s when every byte counted for sure.

scherrey · June 17, 2023, 11:37am

Thanks - this seems to work. I just don’t trust JSON parsers when it comes to numbers.

DianaOlympos · June 18, 2023, 2:41pm

The problem is that JSON is giving too much leeway to the parsers. JSON definition of numbers is “a bunch of digits with a dot in it maybe”, and it is up to the parser to do what they want with it. This means that parsers tend to implement whatever makes sense to the author. It also means that if you want a specific meaning for them, your only recourse is to write your own parser. In a different world, most parsers would give you a way to specify how to map numbers to your own parsers, but that is a paaaaaaain to implement efficiently.

Said otherwise, the JSON spec is, on purpose, lax about it, and if you want a specific parsing of numbers, write your own serialization and deserialization. Which is basically what making it a string and then doing post-processing is about.