Hi!
I’ve been benchmarking the compressed
option for ETS and results are unexpected.
I’m caching some HTTP data looking like:
{{<<19,192,227,106,66,196,34,54,198,194,138,214,145,246,99,219,83,180,136,48,7,
36,164,223,155,25,229,200>>,
<<218,109,20,124,161,64,36,178,117,238,234,77,188,216,216,109,161,134,64,
124,101,163,83,236,131,237,198,158>>},
#{<<"accept-encoding">> => <<"gzip">>},
<<202,202,12,177,123,34,165,124,70,183,226,42,18,167,173,90,190,70,238,227,
172,98,102,174,140,249,173,211,248,224,237,219>>,
{200,
[{<<"content-type">>,<<"text/html; charset=utf-8">>},
{<<"cache-control">>,<<"s-maxage=1200, public">>},
{<<"x-request-id">>,<<"Fy4_SNMQ8ZQSPmoAAAIk">>},
{<<"x-frame-options">>,<<"SAMEORIGIN">>},
{<<"x-xss-protection">>,<<"1; mode=block">>},
{<<"x-content-type-options">>,<<"nosniff">>},
{<<"x-download-options">>,<<"noopen">>},
{<<"x-permitted-cross-domain-policies">>,<<"none">>},
{<<"cross-origin-window-policy">>,<<"deny">>},
{<<"vary">>,<<"accept-encoding">>},
{<<"content-encoding">>,<<"gzip">>},
{<<"content-length">>,<<"1045">>}]},
#{alternate_keys => [2,3,5,7,11,20],
created => 1670342093,expires => 1670343293,grace => 1670343413,
parsed_headers =>
#{<<"cache-control">> =>
#{<<"public">> => undefined,<<"s-maxage">> => 1200},
<<"content-encoding">> => [<<"gzip">>],
<<"content-type">> =>
{<<"text">>,<<"html">>,[{<<"charset">>,<<"utf-8">>}]}},
ttl_set_by => header},
1670342093}
and using the compressed
option when creating the ETS table results in used memory down by a 2.5x factor compared with uncompressed table.
Regarding performance, I’ve tested ets:lookup/2
and ets:update_element/3
(to update an integer) with set
and ordered_set
tables of 1 million elements. The only case uncompressed table is faster is lookup in a set
(~3% faster). In all other cases, operations on compressed table are faster by ~7%.
Documentation states:
If this option is present, the table data is stored in a more compact format to consume less memory. However, it will make table operations slower. Especially operations that need to inspect entire objects, such as match and select, get much slower. The key element is not compressed.
I’ve taken a look at the code here and there and as far as I understand data is not compressed (with gzip etc.) but stored in a more compact form in memory, is that correct?
How could the benchmark result be explained? Better use of hardware cache thanks to better data locality when data is stored in compressed form?
Except for the functions cited in the documentation (match and select), are they some other trade-offs to be considered?
I never considered using this option before and now I feel like I’ve found a superpower, but I’d like to better understand it
Cheers!