Joe asked this on EF a few years ago:
Wondering whether people make use of it in Erlang - do you use it?
Here’s what he had to say about it:
Joe asked this on EF a few years ago:
Wondering whether people make use of it in Erlang - do you use it?
Here’s what he had to say about it:
When I’m going to use term_to_binary
, first thing out of my mind is dets
and there a question too
Great idea!
Finally, I need term_to_binary
Not anymore so much. Mostly use term_to_iovec/1
now
In Zotonic we are using term_to_binary
regularly to serialize arbitrary data structures. For backing stores, but also for data that is send along user requests (we are adding signatures for those).
It is a great way to “pass by reference” if copying messages between processes becomes a botlleneck. Admittedly with some overhead.
For example, you may have a pool of workers and a dispatcher, and a stream of relatively large messages coming in. In a naive implementation your message will be copied into the dispatcher only to be immediately copied again into one of the workers. If you, instead, transform the message into a binary all you copy between the processes is a reference. You just need to leave enough of a message as a term in order to dispatch it appropriately.
Oh that’s clever! Thanks for the tip
Also, base64:encode(term_to_binary(Anything))
is a great way to piggyback Erlang things over almost any (text-based or string capable) protocol.
For example, we had a need to ship Prometheus metrics from IoT devices in the field, which already had an existing semi-persistent web socket channel open talking JSON-RPC. What did we do? Exported the data from the Prometheus library on device, base64-encoded it and shipped it in a JSON field in a web socket packet. Then we just unpacked it on the server side and sent it to Prometheus when it was scraping for metrics.
Apache CouchDB uses term_to_binary to store all its data on disk.
The only thing you have to be aware of here is that when you want to inspect data then you have to unpack the binary into a copy of its original data. So while you might save copying the actual message you still have to make a copy when you view the data. I personally think it is probably more efficient to let the BEAM copy instead of first making the binary then unpacking it every process.
However one situation where it is more efficient is when you want to pass the message through a chain of processes who don’t need to unpack just pass it on. E.g when processing pictures or video.
I’m interested in this. I saw that rmq is using for writing to files, but did you do any benchmarks outside of that scope?
I did some recent benchmarks and while the tool I used may be folly (need to try with erlperf and making my own test devoid of those tools), but I found term_to_iovec was slower than term_to_binary, which was not expected, but perhaps my expectation is wrong. Tests using term_to_iovec writing to files with a raw file handle were faster, but not significantly faster.
Quick follow up, bench marking with erlperf I found that term_to_iovec was on average 9% faster
sorry for necroposting (not sure about this forum rules):
we use term_to_binary for most of our kafka messages, switching to json for interop with other languages (though I recently started digging into that subject - I think we could parse that term_to_binary format in Go directly with some packages)
Is ‘one file per user’ in this context to be in the format of .dat or .txt or …?
You lost me… what do you mean?
This is originally from Joe’s (quoted) text at the top. I assume that Joe meant that you could use term_to_binary/1
to serialize each user and then write each to a separate file.
This means that none of the files get too large, and you don’t need to worry about concurrent updates (as long as each user is treated serially). It’s basically Joe’s way of saying “shard by user”, I suspect.
I don’t understand this, though. I’ll take a guess.
Since term_to_binary/1
returns binary data (see External Term Format — erts v15.0.1 for details – I guess you know this already, @Maria-12648430), you’d probably want to name your files .bin
or .dat
(if that’s a convention you prefer).
Thanks, Roger. Your response has answered my question. Happy coding!
With ‘one file per user’ [.dat or .bin], would mnesia be suitable for metadata about those files? If so, is there a link to an example?
No.
Think about it like this: mnesia is a cluster-replicated data store. “one file per user” (being stored in the filesystem) isn’t.
So you have a mismatch between the intended use patterns of the two.
Thank you for your insightful and concise explanation, Roger.