Thoas - Elixir's Jason JSON library converted to Erlang

Hello everyone! I’ve converted @michalmuskala’s wonderful Elixir Jason library to Erlang.

It has a smaller feature-set compared to Jason, see the README for details there.

Why convert Jason to Erlang?

  1. I wanted to use Jason from within Gleam and Erlang projects without pulling in the Elixir compiler and standard library.
  2. I wanted a fast JSON library to which I could add an API that does not work by traversing maps and lists created from Erlang, as I think this would be more suited to Gleam.

How does it perform?

In my benchmarking I found Thoas to use the same amount of memory as Jason while being a few percent faster or slower, which I expect is due to my development machine being a bit noisy.

Here it is compared to jsone and jsx.

image

image

##### With input Pokedex #####
Name            ips        average  deviation         median         99th %
thoas        762.02        1.31 ms    ±21.99%        1.28 ms        1.61 ms
jsone        671.54        1.49 ms    ±17.81%        1.43 ms        1.93 ms
JSX          398.03        2.51 ms     ±9.54%        2.44 ms        2.91 ms

Comparison: 
thoas        762.02
jsone        671.54 - 1.13x slower +0.177 ms
JSX          398.03 - 1.91x slower +1.20 ms

Memory usage statistics:

Name     Memory usage
thoas         0.38 MB
jsone         1.23 MB - 3.23x memory usage +0.85 MB
JSX           2.59 MB - 6.80x memory usage +2.21 MB

Thanks

All credit for this library goes to Michał and the Jason contributors, they did all the work.

20 Likes

I saw your thread on twitter and was terribly excited about it! I can’t wait to try it out!

4 Likes

Let me know if you find any bugs :stuck_out_tongue:

3 Likes

It looks very impressive! :mechanical_arm:
Have you compared with other libraries(jiffy, jsone, poison, jazz, jsx) for speed checking of maps, lists, strings, string escaping, large value, pretty print?

2 Likes

It’s roughly the same as Jason, so it’s generally slower than Jiffy but faster than the others.
There’s benchmarks in the repo which you can run, but they take a couple hours to run.

edit: I ran a portion of the benchmarks and added them above.

4 Likes

Nice. It’s really sounds cool! Thanks.

2 Likes

A couple of small issues found and logged with a PR.

3 Likes

Thank you @LeonardB !

2 Likes

On my macbook (M1), for encoding I consistently see Jason on top, sometimes even Poison. Poison especially surprises me. I wonder what’s up :thinking:

Did you do the translation completely by hand? If so, it might be worth doing a mix decompile to spot differences, I would expect the results at worst to be the same as Jason.

2 Likes

I find on my M1 macbook it’s random with Thoas and Jason always within a few percent of each other. Poison seems to sometimes beat Jason with OTP 24, perhaps the JIT changed some things.

I did this:

  1. Fork Jason
  2. Remove all usage of the Elixir stdlib
  3. Remove Elixir specific features
  4. Compile to .beam
  5. Decompile to Erlang
  6. Fix compile errors
  7. Neaten up formatting
  8. Removed 10k lines of repeated generated code

I expected there to be some performance impact from 8, but when I benchmarked before and after I saw no change.

4 Likes

There were surely some calls via elxiir stdlib that had optimizations.

I bet we can make it go faster!

6 Likes

How was the benchmark done? I have always wondered how this module jhn_stdlib/json.erl at master · JanHenryNystrom/jhn_stdlib · GitHub compares to other libraries.

We use it in Nova today.

2 Likes

Benchmark is done using benchee. If that’s what you mean. You could add jhn_stdlib to the mix to compare.

  1. Add jhn_stdlib to the deps in mix.exs
  2. Modify bench/encode.exs to include jhn_stdlib json module
  3. Modify bench/decode.exs to include jhn_stdlib json module
  4. Run mix bench.encode , then mix bench.decode

You may need to remove the other json dep from the mix otherwise you might see a name conflict.

4 Likes

I’m just running it now, give me a few minutes. :slight_smile:

edit: Here we go.

I ran this one on battery power so it’s a bit slower than the above.

##### With input Pokedex #####
Name            ips        average  deviation         median         99th %
thoas        658.88        1.52 ms    ±31.23%        1.44 ms        3.03 ms
json         388.80        2.57 ms    ±19.98%        2.46 ms        3.83 ms

Comparison: 
thoas        658.88
json         388.80 - 1.69x slower +1.05 ms

Memory usage statistics:

Name     Memory usage
thoas         0.79 MB
json          1.74 MB - 2.20x memory usage +0.95 MB

I wasn’t able to get the decode benchmarks to run as the json:decode/1 function would crash. I’m likely doing something wrong.

4 Likes

It looks like you need to pass the option {:maps, true} for map support (i.e., json:encode(Value, [{maps, true}]))

I ran the encode tests :

Operating System: macOS
CPU Information: Apple M1
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.13.1
Erlang 24.2

Benchmark suite executing with the following configuration:
warmup: 5 s
time: 30 s
memory time: 1 s
parallel: 1
inputs: Pokedex
Estimated total run time: 1.20 min

Benchmarking jhn_stdlib-json with input Pokedex...
Benchmarking thoas with input Pokedex...
Generated /Users/starbelly/devel/erlang/thoas/bench/output/encode.html
Generated /Users/starbelly/devel/erlang/thoas/bench/output/encode_pokedex_comparison.html
Generated /Users/starbelly/devel/erlang/thoas/bench/output/encode_pokedex_jhn_stdlib_json.html
Generated /Users/starbelly/devel/erlang/thoas/bench/output/encode_pokedex_thoas.html
Opened report using open

##### With input Pokedex #####
Name                      ips        average  deviation         median         99th %
thoas                  662.42        1.51 ms    ±25.45%        1.44 ms        2.95 ms
jhn_stdlib-json        408.06        2.45 ms     ±9.51%        2.46 ms        2.98 ms

Comparison:
thoas                  662.42
jhn_stdlib-json        408.06 - 1.62x slower +0.94 ms

Memory usage statistics:

Name               Memory usage
thoas                   0.79 MB
jhn_stdlib-json         1.74 MB - 2.20x memory usage +0.95 MB

Edit: I might have derped, you said you couldn’t run the decode tests vs the encode tests :stuck_out_tongue:

Either way, here’s the decode tests where there’s a much bigger difference :

##### With input Pokedex #####
Name                      ips        average  deviation         median         99th %
thoas                  775.07        1.29 ms     ±5.11%        1.27 ms        1.50 ms
jhn_stdlib-json        185.93        5.38 ms    ±17.89%        5.13 ms        8.04 ms

Comparison:
thoas                  775.07
jhn_stdlib-json        185.93 - 4.17x slower +4.09 ms

Memory usage statistics:

Name               Memory usage
thoas                   0.38 MB
jhn_stdlib-json         8.03 MB - 21.08x memory usage +7.65 MB
2 Likes

to encode you can run json:encode(Map, [maps, binary]) and decode json:decode(Json, [maps]) to get it as maps.

But I will try to run it by myself if I figure out the elixir stuff. :slight_smile:

encode:

With input Pokedex

Name ips average deviation median 99th %
thoas 441.98 2.26 ms ±73.13% 1.05 ms 6.23 ms
jhn_stdlib - json 268.83 3.72 ms ±42.94% 4.43 ms 6.97 ms

Comparison:
thoas 441.98
jhn_stdlib - json 268.83 - 1.64x slower +1.46 ms

Memory usage statistics:

Name Memory usage
thoas 0.79 MB
jhn_stdlib - json 1.74 MB - 2.20x memory usage +0.95 MB

decode:

With input Pokedex

Name ips average deviation median 99th %
thoas 1.21 K 0.82 ms ±14.09% 0.81 ms 1.18 ms
jhn_stdlib - json 0.0996 K 10.04 ms ±4.76% 9.95 ms 11.88 ms

Comparison:
thoas 1.21 K
jhn_stdlib - json 0.0996 K - 12.17x slower +9.21 ms

Memory usage statistics:

Name Memory usage
thoas 0.38 MB
jhn_stdlib - json 8.03 MB - 21.08x memory usage +7.65 MB

Will look into maybe change jhn_stdlib json to thoas. :slight_smile:

4 Likes

Ah! binary was the option I was missing. Thanks

2 Likes

We will change to Thoas today if I have time to make a PR to Nova for it. Memory usage was really awesome.

4 Likes

Worth pointing out that Pokedex to a certain extend and Canada for a more visible one, have massive memory usage bloat due to floats. As so far, the “to string” for floats used return charlists instead of io_data. This cannot be fixed easily until OTP 25.

3 Likes

@lpil Great work! Is there a streaming API for encoding/decoding large/asynchronous payloads?

3 Likes