AstonJ
January 17, 2024, 10:53pm
1
Thought this would be a nice way to start the year - have you seen or written an Erlang or BEAM language function or piece of code that you are particularly fond of? Perhaps you admired its simplicity (or complexity!) or maybe your surprised yourself or surpassed your own expectations of what is possible?
Whatever the reason, please share!
(If sharing code that is not in Erlang, please state which BEAM language it is)
5 Likes
I have recently written a bit of code for ReVault:
consult(Path) ->
maybe
{ok, Bin} ?= read_file(Path),
{ok, Scanned} ?= scan_all(Bin),
parse_all(Scanned)
end.
read_file(Path) ->
Res = aws_s3:get_object(client(), bucket(), Path),
maybe
ok ?= handle_result(Res),
{ok, #{<<"Body">> := Contents}, _} = Res,
{ok, Contents}
end.
This isn’t particularly fancy or impressive code (and the module at large is full of TODOs), but I was real glad to be able to use maybe
expressions in my own projects in a way that simplified the validation workflow a bunch.
The long time and lot of community work behind getting that feature into the language is part of the feeling, of course.
7 Likes
I have mine:
%% @doc Fails if the option does not exist
-spec get_opt(key() | key_path()) -> value().
get_opt(KeyPath) when is_list(KeyPath) ->
Opts = persistent_term:get(?MODULE),
lists:foldl(fun maps:get/2, Opts, KeyPath);
get_opt(Key) ->
get_opt([Key]).
The config data structure is a TOML tree, stored in a persistent_term
. Those few lines of code are all I need to traverse the tree by a path and get the desired subtree or final leaf. Love the simplicity
3 Likes
nickva
January 18, 2024, 7:10pm
4
One of my favorites last year was:
apache:main
← apache:replace-folsom
opened 06:54AM - 11 Jul 23 UTC
Folsom histograms are a major bottleneck under high concurrency, as described in… #4650. This was noticed during performance testing, confirmed using Erlang VM lock counting, then verified by creating a test release with histogram update logic commented out [1].
CouchDB doesn't use most of the Folsom statistics and metrics; we only use counters, gauges and one type of sliding window, sampling histogram. Instead of trying to re-design and update Folsom, which is a generic stats and metrics library, take a simpler approach and create just the three metrics we need, and then remove Folsom and Bear dependencies altogether.
All the metrics types we re-implement are based on two relatively new Erlang/OTP features: counters [2] and persistent terms [3]. Counters are mutable arrays of integers, which allow fast concurrent updates, and persistent terms allow fast, global, constant time access to Erlang terms.
Gauges and counters are implemented as counter arrays with one element. Histograms are represented as counter arrays where each array element is a histogram bin. Since we're dealing with sliding time window histograms, we have a tuple of counter arrays, where each time instant (each second) is a counter array. The overall histogram object then looks something like:
```
Histogram = {
1 = [1, 2, ..., ?BIN_COUNT]
2 = [1, 2, ..., ?BIN_COUNT]
...
TimeWindow = [1, 2, ..., ?BIN_COUNT]
}
```
To keep the structure immutable we need to set a limit on both the number of bins and the time window size. To limit the number of bins we need to set some minimum and maximum value limits. Since almost all our histograms record access times in milliseconds, we pick a range from 10 microseconds up to over one hour. Histogram bin widths are increasing exponentially in order to keep a reasonable precision across the whole range of values. This encoding is similar to how floating point numbers work. Additional details on how this works are described in the the `couch_stats_histogram.erl` module.
To keep the histogram object structure immutable, the time window is used in a circular fashion. The time parameter to the histogram update/3 function is the monotonic clock time, and the histogram time index is computed as `Time rem TimeWindow`. So, as the monotonic time is advancing forward, the histogram time index will loop around. This comes with a minor annoyance of having to allocate a larger time window to accommodate some process which cleans stale (expired) histogram entries, possibly with some extra buffers to ensure the currently updated interval and the interval ready to be cleaned would not overlap. This periodic cleanup is performed in the couch_stats_server process.
Besides performance, the new histograms have two other improvements over the Folsom ones:
- They record every single value. Previous histograms did sampling and recorded mostly just the first 1024 values during each time instant (second).
- They are mergeable. Multiple histograms can be merged with corresponding bins summed together. This could allow cluster wide histogram summaries or gathering histograms from individual processes, then combining them at the end in a central process.
Other performance improvement in this commit is eliminating the need to periodically flush or scrape stats in the background in both couch_stats and prometheus apps. Stats fetching from persistent terms and counters takes less than 4 milliseconds [4], and sliding time window histogram will always return the last 10 seconds of data no matter when the stats are queried. Now that will be done only when the stats are actually queried.
Since the Folsom library was abstracted away behind a couch_stats API, the rest of the applications do not need to be updated. They still call `couch_stats:update_histogram/2`, `couch_stats:increment_counter/1`, etc.
Previously couch_stats did not have any tests at all. Folsom and Bear had some tests, but I don't think we ever ran those test suites. To rectify the situation added tests to cover the functionality. All the newly added or updated modules should be have near or exactly 100% test coverage.
[1] https://github.com/apache/couchdb/issues/4650#issue-1764685693
[2] https://www.erlang.org/doc/man/counters.html
[3] https://www.erlang.org/doc/man/persistent_term.html
[4] Comparison of the time it takes to fetch all the stats (in msec):
* New Stats:
```
> element(1, timer:tc(fun() -> lists:foreach(fun(_)-> couch_stats:fetch() end, lists:seq(1,100)) end)) / 100_000.
3.20703
```
* Folsom:
```
> element(1, timer:tc(fun() -> lists:foreach(fun(_)-> couch_stats_aggregator:flush(), couch_stats:fetch() end, lists:seq(1,100)) end)) / 100_000.
23.74713
```
Used counters and persistent terms to implement a replacement metrics system (counters, gauges, histograms) for Apache CouchDB. It also removed a significant concurrency bottleneck.
The histogram implementation was the most interesting I thought:
% This module implements windowed base-2 histograms using Erlang counters [1].
%
% Base-2 histograms use power of 2 exponentially increasing bin widths. This
% allows capturing a range of values from microseconds to hours with a
% relatively small number of bins. The same principle is used when encoding
% floating point numbers [2]. In fact, our histograms rely on the ease of
% constructing and mainpululating binary representations of 64 bit floats in
% Erlang to do all of its heavy lifting.
%
% As a refresher, the standard (IEEE 754) 64 bit floating point representations
% looks something like:
%
% sign exponent mantissa
% [s64] [e63...e53] [m52...m1]
% <-1-> <---11----> <---52--->
%
%
% The simplest scheme migth be to use the exponent to select the histogram bin
% and throw away the mantissa bits. However, in that case bin sizes end up
% growing a bit too fast and we lose resolution quickly. To increase the
This file has been truncated. show original
2 Likes
Fylke
January 19, 2024, 4:19pm
5
Here’s a little binary log parser that I made a long time ago. I often refer to it when telling other people about how powerful the binary notation (and pattern matching) is in Erlang. I mostly love just how compact and readable it is.
%%%----------------------------------------------------------------------------
%%% @private
%%% @doc Demo of binary log parsing in Erlang
%%% @end
%%%----------------------------------------------------------------------------
-module(binlog_parser_demo).
-export([decode/2]).
%%-----------------------------------------------------------------------------
%% @spec decode(LogBinary, Types) -> string()
%% LogBinary = binary()
%% Types = [atom()]
%% @doc
%% The binary stream is annotated by the list Types. Every call to decode
%% will look at the type and then pick the proper decode function based on
%% it. In the case of arrays or strings, the size of the field is given in
%% the first 32 bits.
%% @end
%%-----------------------------------------------------------------------------
This file has been truncated. show original
2 Likes
I really like Joe’s “Universal Server” :
universal_server() ->
receive
{become, F} ->
F()
end.
It’s not exactly complicated to understand but for me the way Joe takes this simple function in his post sells it for me And using this simple function, you can introduce people to some concepts of Erlang but also to change the way some people think about processes and Erlang systems as a whole: Joe wrote this post about Erlang / Elixir web servers where he said that “We do not have ONE web-server handling 2 millions sessions. We have 2 million webservers handling one session each.”. That is what really made processes click for me
4 Likes