Euneus - A JSON parser and generator in pure Erlang

Euneus

What is Euneus?

A JSON parser and generator in pure Erlang. Euneus is a rewrite of Thoas. Like Thoas, both the parser and generator fully conform to RFC 8259 and ECMA 404.

Why Euneus over Thoas?

Thoas is incredible, works performant and perfectly fine, but Euneus is more flexible, permitting more customizations. The motivation for Euneus is this PR when I was looking for options to not traverse lists and maps to customize terms.

How does it perform?

Note

  • Jason it’s an Elixir library that Thoas was based on.
  • See more benchmarks here.

Encode

eunes-encode-ips

euneus-encode-memory

##### With input Blockchain #####
Name             ips        average  deviation         median         99th %
euneus       10.56 K       94.69 μs    ±15.18%       95.53 μs      139.58 μs
Jason        10.35 K       96.58 μs    ±16.88%       85.48 μs      155.88 μs
thoas         7.42 K      134.68 μs     ±9.91%      135.49 μs      175.04 μs

Comparison: 
euneus       10.56 K
Jason        10.35 K - 1.02x slower +1.89 μs
thoas         7.42 K - 1.42x slower +39.99 μs

Memory usage statistics:

Name      Memory usage
euneus        83.10 KB
Jason         78.91 KB - 0.95x memory usage -4.19531 KB
thoas         89.41 KB - 1.08x memory usage +6.31 KB

Decode

euneus-decode-ips

euneus-decode-memory

##### With input Blockchain #####
Name             ips        average  deviation         median         99th %
euneus        6.98 K      143.35 μs     ±5.22%      141.99 μs      178.33 μs
Jason         6.93 K      144.26 μs    ±14.21%      140.26 μs      274.38 μs
thoas         5.65 K      177.01 μs     ±8.87%      175.34 μs      252.45 μs

Comparison: 
euneus        6.98 K
Jason         6.93 K - 1.01x slower +0.91 μs
thoas         5.65 K - 1.23x slower +33.66 μs

Memory usage statistics:

Name      Memory usage
euneus        51.41 KB
Jason         51.63 KB - 1.00x memory usage +0.21 KB
thoas         51.41 KB - 1.00x memory usage +0 KB

Installation

Euneus is available at Hex.

Erlang

% rebar.config
{deps, [{euneus, "0.7.0"}]}

Elixir

def deps do
  [{:euneus, "~> 0.7"}]
end

Customizations

Please see the README file in the repository for more info.

Encode

#{
    %% nulls defines what terms will be replaced with the null literal (default: ['undefined']).
    nulls => nonempty_list(),
    %% binary_encoder allow override the binary() encoding.
    binary_encoder => function((binary(), euneus_encoder:options()) -> iolist()),
    %% atom_encoder allow override the atom() encoding.
    atom_encoder => function((atom(), euneus_encoder:options()) -> iolist()),
    %% integer_encoder allow override the integer() encoding.
    integer_encoder => function((integer(), euneus_encoder:options()) -> iolist()),
    %% float_encoder allow override the float() encoding.
    float_encoder => function((float(), euneus_encoder:options()) -> iolist()),
    %% list_encoder allow override the list() encoding.
    list_encoder => function((list(), euneus_encoder:options()) -> iolist()),
    %% map_encoder allow override the map() encoding.
    map_encoder => function((map(), euneus_encoder:options()) -> iolist()),
    %% datetime_encoder allow override the calendar:datetime() encoding.
    datetime_encoder => function((calendar:datetime(), euneus_encoder:options()) -> iolist()),
    %% timestamp_encoder allow override the erlang:timestamp() encoding.
    timestamp_encoder => function((erlang:timestamp(), euneus_encoder:options()) -> iolist()),
    %% unhandled_encoder allow encode any custom term (default: raise unsupported_type error).
    unhandled_encoder => function((term(), euneus_encoder:options()) -> iolist()),
    %% escaper allow override the binary escaping (default: json)
    escaper => json
             | html
             | javascript
             | unicode
             | function((binary(), euneus_encoder:options()) -> iolist()),
    error_handler => function(( error | exit | throw
                              , term()
                              , erlang:stacktrace() ) -> euneus_encoder:result())
}

Decode

#{
    %% null_term is the null literal override (default: 'undefined').
    null_term => term(),
    %% arrays allow override any array/list().
    arrays => function((list(), euneus_decoder:options()) -> term()),
    %% objects allow override any object/map().
    objects => function((map(), euneus_decoder:options()) -> term()),
    %% keys allow override the keys from JSON objects.
    keys => copy
          | to_atom
          | to_existing_atom
          | to_integer
          | function((binary(), euneus_decoder:options()) -> term()),
    %% values allow override any other term, like array item or object value.
    values => copy
            | to_atom
            | to_existing_atom
            | to_integer
            | function((binary(), euneus_decoder:options()) -> term())
}

Decode resuming

Euneus permits resuming the decoding when an invalid token is found. Any value can replace the invalid token by overriding the error_handler option. Please see an example here.

Credits

Euneus is a rewrite of Thoas, so all credits go to Michał Muskała, Louis Pilfold, also both Jason and Thoas contributors. Thanks for the hard work!

15 Likes

New release

v1.0.0 is released with a plugin mechanism implemented.

Changes

A mechanism to easily plug in encoders and decoders is now implemented. You can use the built-in plugins to handle common types or create your own in a module by implementing the euneus_plugin behavior.

Plugins Usage

1> {ok, JSON} = euneus:encode_to_binary({127,0,0,1}, #{plugins => [inet]}).
{ok,<<"\"127.0.0.1\"">>}

2> euneus:decode(JSON, #{plugins => [inet]}).
{ok,{127,0,0,1}}

Built-in Plugins

Currently, these are the built-in plugins:

Data Mapping

| **Erlang ->**                                                	| **Encode Options ->**     	| **JSON ->**                          	| **Decode Options ->**     	| **Erlang**                                                   	|
|--------------------------------------------------------------	|---------------------------	|--------------------------------------	|---------------------------	|--------------------------------------------------------------	|
| {{1970,1,1},{0,0,0}}                                         	| #{plugins => [datetime]}  	| "1970-01-01T00:00:00Z"               	| #{plugins => [datetime]}  	| {{1970,1,1},{0,0,0}}                                         	|
| {127,0,0,1}                                                  	| #{plugins => [inet]}      	| "127.0.0.1"                          	| #{plugins => [inet]}      	| {127,0,0,1}                                                  	|
| {16#3ffe,16#b80,16#1f8d,16#2,16#204,16#acff,16#fe17,16#bf38} 	| #{plugins => [inet]}      	| "3ffe:b80:1f8d:2:204:acff:fe17:bf38" 	| #{plugins => [inet]}      	| {16#3ffe,16#b80,16#1f8d,16#2,16#204,16#acff,16#fe17,16#bf38} 	|
| <0.92.0>                                                     	| #{plugins => [pid]}       	| "<0.92.0>"                           	| #{plugins => [pid]}       	| <0.92.0>                                                     	|
| #Port<0.1>                                                   	| #{plugins => [port]}      	| "#Port<0.1>"                         	| #{plugins => [port]}      	| #Port<0.1>                                                   	|
| [{foo, bar}]                                                 	| #{plugins => [proplist]}  	| {\"foo\":\"bar\"}                    	| #{}                          	| #{<<"foo">> => <<"bar">>}                                    	|
| #Ref<0.957048870.857473026.108035>                           	| #{plugins => [reference]} 	| "#Ref<0.957048870.857473026.108035>" 	| #{plugins => [reference]} 	| #Ref<0.957048870.857473026.108035>                           	|
| {0,0,0}                                                      	| #{plugins => [timestamp]} 	| 1970-01-01T00:00:00.000Z             	| #{plugins => [timestamp]} 	| {0,0,0}                                                      	|

Custom Plugins

Custom plugins must be implemented using the euneus_plugin behavior, for example:

-module(euneus_test_plugin).
-behaviour(euneus_plugin).
-export([ encode/2, decode/2 ]).

encode({test, foo}, Opts) ->
    {halt, euneus_encoder:encode_binary(<<"test::foo">>, Opts)};
encode(_Term, _Opts) ->
    next.

decode(<<"test::foo">>, _Opts) ->
    {halt, {test, foo}};
decode(_Bin, _Opts) ->
    next.
1> {ok, JSON} = euneus:encode_to_binary({test, foo}, #{plugins => [euneus_test_plugin]}).
{ok,<<"\"test::foo\"">>}

2> euneus:decode(JSON, #{plugins => [euneus_test_plugin]}).
{ok,{test, foo}}

Deprecation

The datetime_encoder and the timestamp_encoder options are now deprecated in favor of the datetime and timestamp plugins.

Installation

Erlang

% rebar.config
{deps, [{euneus, "1.0.1"}]}

Elixir

def deps do
  [{:euneus, "~> 1.0"}]
end

Note

If you have a built-in plugin suggestion, feel free to open a new issue to discuss it.

7 Likes

New release

Version 1.0.2 was released with a great improvement in the plugin mechanism and a bunch of benchmark runs and comparisons.

Please see the benchmarks in the README file for more info.
All runs summaries are in there by comparing:

All the benchmarks details are available here.

Edit

I fixed the decode benchmarks and improved the readability of the benchmarks.
The decode benchmark with parsed options was erroneously copied from the decode with plugins benchmarks.

3 Likes

New release

Version 1.1.0 is out with format functions.

Changes

A model called euneus_formatter is implemented with three major functions:

  • minify
  • prettify
  • format

Usage

Minify

1> iolist_to_binary(euneus:minify(<<"{\n  \"foo\": \"bar\",\n  \"baz\": {\n    \"foo\": \"bar\",\n    \"0\": [\n      \"foo\",\n      0\n    ]\n  }\n}">>)).
<<"{\"foo\":\"bar\",\"baz\":{\"foo\":\"bar\",\"0\":[\"foo\",0]}}">>

Prettify

1> io:format("~s~n", [euneus:prettify(<<"{\"foo\":\"bar\",\"baz\":{\"foo\":\"bar\",\"0\":[\"foo\",0]}}">>)]).
{
  "foo": "bar",
  "baz": {
    "foo": "bar",
    "0": [
      "foo",
      0
    ]
  }
}

Custom

1> Opts = #{spaces => <<$\t>>, indent => <<$\t, $\t>>, crlf => <<$\n>>}.

2> io:format("~s~n", [euneus:format(<<"{\"foo\":\"bar\",\"baz\":{\"foo\":\"bar\",\"0\":[\"foo\",0]}}">>, Opts)]).
{
                "foo":  "bar",
                "baz":  {
                                "foo":  "bar",
                                "0":    [
                                                "foo",
                                                0
                                ]
                }
}
1 Like

This is awesome, thank you!

Just wondering, given the plugin/extensibility mechanisms, how does one goes about ignoring fields when encoding a map into a JSON object, for example:

#{a => 1, b => undefined}

would be encode as:

{"a": 1}

Would this require overriding the map_encoder altogether? Either way, an example would be greatly appreciated.

Thanks

I don’t think the plugin or encoder system currently supports a skip/drop return, but I think that would be nice/useful to have.

Currently the spec is

-callback encode(Input, Opts) -> Output when
Input :: term(),
Opts :: euneus_encoder:options(),
Output :: {halt, iolist()} | next.

-callback decode(Input, Opts) -> Output when
Input :: term(),
Opts :: euneus_decoder:options(),
Output :: {halt, term()} | next.

Adding support for atom skip or drop to allow discarding of the key/value would make things more flexible.

A simple use case would be dropping properties with null values.

The default behavior is to encode undefined as null though

Thanks!

@paulo-f-oliveira also asked me about this.

TLDR, yes, it’s possible, e.g.:

-module(euneus_ignore_undefined_keys_plugin).
-behaviour(euneus_plugin).
-export([ encode/2, decode/2 ]).

encode(Map0, Opts) when is_map(Map0) ->
    Map = maps:filter(fun(_, V) -> V =/= undefined end, Map0),
    {halt, euneus_encoder:encode_map(Map, Opts)};
encode(_Term, _Opts) ->
    next.

decode(_Bin, _Opts) ->
    next.
1> euneus:encode_to_binary(#{a => 1, b => undefined}, #{plugins => [euneus_ignore_undefined_keys_plugin]}).
{ok,<<"{\"a\":1}">>}

Maybe worth having a built-in plugin for this? It will be a bit more performant.

Edit

Maybe using the ‘nulls’ option can be better:

encode(Map0, Opts) when is_map(Map0) ->
    % By default, the 'nulls' option is [undefined].
    Nulls = euneus_encoder:get_nulls_option(Opts),
    Map = maps:filter(fun(_, V) -> not lists:member(V, Nulls) end, Map0),
    {halt, euneus_encoder:encode_map(Map, Opts)};
encode(_Term, _Opts) ->
    next.

Having it as a default would be great. Would it maybe make more sense to just be an option flag though?

drop_nulls or something like that?

That perform better since it could be a clean function head match.

Actually, one important aspect for me is to make a clear distinction between null and undefined in the same way that JavaScript does.

> JSON.stringify({a: 1, b: undefined})
'{"a":1}'
> JSON.stringify({a: 1, b: null})
'{"a":1,"b":null}'
2 Likes

I’m thinking of changing the encode and decode functions of the eunues_plugin behavior to receive any argument/plugin options:

-module(euneus_plugin).

-callback encode(Input, PluginOpts, EncodeOpts) -> Output when
    Input :: term(),
    PluginOpts :: term(),
    EncodeOpts :: euneus_encoder:options(),
    Output :: {halt, iolist()} | next.

-callback decode(Input, PluginOpts, DecodeOpts) -> Output when
    Input :: term(),
    PluginOpts :: undefined | term()
    DecodeOpts :: euneus_decoder:options(),
    Output :: {halt, term()} | next.

So plugins can be defined as the name or a tuple of 2 elements where the first would be the name and the second one the plugin options: #{plugins => [plugin_name]} or #{plugins => [{plugin_name, plugin_options}]. If only the name is passed, maybe the default plugin options be ‘undefined’ or an empty list or map.
In this way, my example to ignore keys can be:

encode(Map0, Terms, Opts) when is_map(Map0) ->
    Map = maps:filter(fun(_, V) -> not lists:member(V, normalize_terms(Terms)) end, Map0),
    {halt, euneus_encoder:encode_map(Map, Opts)};
encode(_Term, _Opts) ->
    next.

normalize_terms(Terms) when is_list(Terms) -> 
    Terms;
normalize_terms(Term) -> 
    [Term].
1> euneus:encode_to_binary(#{a => 1, b => undefined}, #{plugins => [{euneus_ignore_undefined_keys_plugin, [undefined]}]}).
{ok,<<"{\"a\":1}">>}

I do not have tested/implemented it, just an idea.
What do you think?

Edit

There is a nulls option for the encode function, it is a list, so my first example can be:

encode(Map0, Opts) when is_map(Map0) ->
    % By default, the 'nulls' option is [undefined].
    Nulls = euneus_encoder:get_nulls_option(Opts),
    Map = maps:filter(fun(_, V) -> not lists:member(V, Nulls) end, Map0),
    {halt, euneus_encoder:encode_map(Map, Opts)};
encode(_Term, _Opts) ->
    next.

Edit 2

Just pushed a draft PR implementing the drop_nulls plugin.

New release

Version 1.2.0 is released with a new built-in plugin called drop_nulls that drops all keys of maps and proplists that have the value included in the nulls list of the encoding option.

drop_nulls example

Maps

1> euneus:encode_to_binary(#{a => 1, b => undefined}, #{plugins => [drop_nulls]}).
{ok,<<"{\"a\":1}">>}

2> euneus:encode_to_binary(#{a => 1, b => undefined, c => nil}, #{nulls => [undefined, nil], plugins => [drop_nulls]}).
{ok,<<"{\"a\":1}">>}

Proplists

1> euneus:encode_to_binary([{a, 1}, {b, undefined}], #{plugins => [proplist, drop_nulls]}).
{ok,<<"{\"a\":1}">>}

2> euneus:encode_to_binary([{a, 1}, {b, undefined}, {c, nil}], #{nulls => [undefined, nil], plugins => [proplist, drop_nulls]}).
{ok,<<"{\"a\":1}">>}

Installation

Erlang

% rebar.config
{deps, [{euneus, "1.2.0"}]}

Elixir

# mix.exs
def deps do
  [{:euneus, "~> 1.2"}]
end

Credits

Thanks @asabil, @LeonardB and also @paulo-f-oliveira!

1 Like

I added euneus to JSON parser benchmarks at GitHub - saleyn/simdjsone: Erlang Fast JSON parser

=== Benchmark (file size: 616.7K) ===
   simdjsone:   5539.670us
      euneus:   8435.540us
       thoas:   8902.160us
       jiffy:  13688.250us

=== Benchmark (file size: 1.3K) ===
   simdjsone:      8.030us
       jiffy:     14.950us
       thoas:     14.960us
      euneus:     19.830us

=== Benchmark (file size: 0.1K) ===
   simdjsone:      1.530us
       jiffy:      2.700us
      euneus:      3.220us
       thoas:      3.600us
2 Likes

That’s nice!

Do you know what’s the reason for this difference in the benchmarks of the README?

$ make benchmark
=== Benchmark (file size: 1.3K) ===
   simdjsone:      8.030us
       jiffy:     14.950us
       thoas:     14.960us
      euneus:     19.830us

VS

$ MIX_ENV=test make benchmark
=== Benchmark (file size: 1.3K) ===

Name                ips        average  deviation         median         99th %
simdjsone      128.43 K        7.79 μs   ±468.75%        5.60 μs       21.10 μs
euneus         106.19 K        9.42 μs    ±87.91%        8.80 μs       21.40 μs
poison          97.80 K       10.23 μs    ±74.31%        9.30 μs          23 μs
jason           96.92 K       10.32 μs    ±98.77%        9.50 μs       26.30 μs
jiffy           90.31 K       11.07 μs    ±97.74%        9.20 μs       45.30 μs
thaos           79.04 K       12.65 μs   ±133.82%       11.50 μs       26.70 μs

FWIW I’m unable to compile and run the benchmarks, I got this error:

make nif
make -C c_src
make[1]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
make[1]: Entering directory '/home/williamthome/Projects/erlang/simdjsone/c_src'
g++ simdjson.o simdjson_nif.o -L/home/williamthome/Projects/erlang/otp/lib/erl_interface/lib -lei -shared -o /home/williamthome/Projects/erlang/simdjsone/priv/simdjsone.so
/usr/bin/ld: cannot find -lei: No such file or directory
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:84: /home/williamthome/Projects/erlang/simdjsone/priv/simdjsone.so] Error 1
make[1]: Leaving directory '/home/williamthome/Projects/erlang/simdjsone/c_src'
make: *** [Makefile:19: nif] Error 2

Regarding null, please help me by answering my question below in this issue.

What do you see as the best approach to consider null, undefined, nil, etc from Erlang to JSON and null from JSON to Erlang?

Do you know what’s the reason for this difference in the benchmarks of the README?

The first one is hand-coded in Erlang and executes each test concurrently, and the second is using Elixir’s Benchee. I haven’t tried to compare the results.

FWIW I’m unable to compile and run the benchmarks, I got this error.

Please provide more info about your environment and maybe submit an issue.

Javascript can distinguish between ‘undefined’ and ‘null’, but it tries not to:

undefined === null
false
undefined == null
true

JSON cannot distinguish between undefined and null. This is one of the reasons why generic conversions between Erlang and JSON can never be perfect. The rules for conversion should be set by a higher-order protocol, and implemented by using a ‘streaming’ interface where the higher-order protocol module generates JSON syntax without ever building a JSON object. Do not trust any “always represent in Erlang as in JSON” advice; it certainly isn’t “best practice”.

1 Like

New release

Version v1.2.1 fixes a bug in the decoder’s values and keys options.

Credits

Thanks, @michalmuskala, for figuring it out.

New release

Euneus v2.0 is out!

This version completely rewrites Euneus on top of the new OTP json module. The json_polyfill lib is required for OTP versions below 27.

The minimum OTP version is 24.

Changes

All modules received a new interface.

Many changes were made based on this issue (many thanks to all contributors o/).

Encode

Encode functions available:

The euneus_encoder options:

-type options() :: #{
    codecs => [codec()],
    nulls => [term()],
    skip_values => [term()],
    key_to_binary => fun((term()) -> binary()),
    sort_keys => boolean(),
    proplists => boolean() | {true, is_proplist()},
    escape => fun((binary()) -> iodata()),
    encode_integer => encode(integer()),
    encode_float => encode(float()),
    encode_atom => encode(atom()),
    encode_list => encode(list()),
    encode_map => encode(map()),
    encode_tuple => encode(tuple()),
    encode_pid => encode(pid()),
    encode_port => encode(port()),
    encode_reference => encode(reference())
}.

-type codec() ::
    timestamp
    | datetime
    | ipv4
    | ipv6
    | {records, #{Name :: atom() := {Fields :: [atom()], Size :: pos_integer()}}}
    | codec_callback().
-export_type([codec/0]).

-type codec_callback() :: fun((tuple()) -> next | {halt, term()}).

-type is_proplist() :: fun((list()) -> boolean()).

-type encode(Type) :: fun((Type, json:encoder(), state()) -> iodata()).

Please see the encoder documentation for further information and examples.

Decode

Decode functions available:

The euneus_decoder options:

-type options() :: #{
    codecs => [codec()],
    null => term(),
    binary_to_float => json:from_binary_fun(),
    binary_to_integer => json:from_binary_fun(),
    array_start => json:array_start_fun(),
    array_push => json:array_push_fun(),
    array_finish =>
        ordered
        | reversed
        | json:array_finish_fun(),
    object_start => json:object_start_fun(),
    object_keys =>
        binary
        | copy
        | atom
        | existing_atom
        | json:from_binary_fun(),
    object_push => json:object_push_fun(),
    object_finish =>
        map
        | proplist
        | reversed_proplist
        | json:object_finish_fun()
}.

-type codec() ::
    copy
    | timestamp
    | datetime
    | ipv4
    | ipv6
    | pid
    | port
    | reference
    | codec_callback().

-type codec_callback() :: fun((binary()) -> next | {halt, term()}).

Please see the decoder documentation for further information and examples.

Formatter

Format functions available:

The euneus_formatter options:

-type options() :: #{
    indent_type := tabs | spaces,
    indent_width := non_neg_integer(),
    spaced_values := boolean(),
    crlf := crlf()
}.

-type crlf() :: r | n | rn | none.

Please see the formatter documentation for further information and examples.

Please give it a try, and don’t hesitate to join the project on the Euneus repository or jump into this thread and share your suggestions. Thank you!

2 Likes

New release

v2.2.0

Highlights

Please see the documentation for more information.

Changes

2 Likes