Configuration of Erlang system

garazdawi · November 2, 2021, 9:33am

Hello!

I’ve long thought that the way that we configure Erlang systems today is not ideal. I’ve done some attempts at making an improvement, but so far nothing has been good enough to warrant a PR.

Some of the things that I find lacking in the current setup are:

Support for validation
- Offline and online checking
- Good error indication
Support for documentation of configuration variables
More expressive configuration merge
- For example in the logger configuration today you cannot just add a new filter to the default handler, you need to copy the entire default configuration.
Configure erts via sys.config mechanism.

Some things that I think would be nice to have, but not strictly necessary:

Support custom config formats, eg. toml, yaml etc.
Executable configs like in Elixir.

So I thought I would ask:

How do you configure your systems today?
How would you like to configure your systems in the future?

seriyps · November 2, 2021, 2:50pm

I have some experience of developing an Erlang system that is expected to be deployed and configured on the end user’s own servers, and end-users are not familiar with Erlang, and quite often are non-programmers. I’m talking about “mtproto proxy” - a proxy server for Telegram messenger people use to bypass censorship GitHub - seriyps/mtproto_proxy: High performance Erlang MTProto proxy that powers https://t.me/socksy_bot.

Have to tell, more than 50% of the support questions I used to receive were due to the users not being able to edit sys.config without breaking some of the syntax.

Also, I had to write my own “hot config reload” helpers, similar to the ones provided by OTP release handler.

This was partially needed because I decided to distribute it as a single OTP application, not as OTP umbrella release, so it can be used as a dependency for other apps by a few power users. But even if I had an “umbrella” structure, there would be quite a lot of custom code to support some alternative config syntax and validation. And it still won’t be possible to configure standard OTP apps (say, logger or kernel) this way, because those apps read their env early at the VM start-up and not always reconfigurable at runtime.

Basho’s cuttlefish was quite close to what I wanted, but the way it is implemented is quite hackish (it starts a VM, generates sys.config and then re-starts the VM with generated sys.config) and did not allow alternative syntaxes; syntax they offer was not really suitable for configs containing lists of complex structures, like {ports, [#{name => my_name, ip => "0.0.0.0", port => 1234}, #{name => my_name2...}]}.

Another issue with application:env is that it only allows atoms as keys and do not support nested structure lookups. Sometimes configuration might be quite heavy (for example, look at the OTP logger configuration in kernel - the configuration is even cached in the persistent term) and you may want to read some small field from this config on some hot path and this means you need to lookup this heavy structure from the env ETS (creating lots of garbage on the heap) to only take some single field from there.

rlipscombe · November 2, 2021, 3:44pm

Yeah, the fact that multiple configuration files only allow replacement at the granularity of the application is … annoying.

We have a heterogeneous-multi-node application which is installed in half a dozen different environments, each with their own foibles.

At one point we used ERB as a templating engine (because we’re using chef to manage the boxes, it made sense). That means that the configuration is correct at run time, but impossible to reason about locally, and changes are hard to test (because managing Ruby versions is industrial-scale yak-shaving…).

We came up with a custom solution which takes a set of input configuration files and then does a recursive merge of the configuration “tree” and then writes the result to an output file. The output sys.config files are then packaged in .deb archives (same as the app), and deployed to the destination host.

This allows us to generate the configuration files at build time, meaning that validation is easier. We can make sure the result is well-formed, at least. It occasionally makes reasoning about the results harder, because of the layering.

We support directives in the input configuration files, such as '$delete', which (you guessed it) causes it to completely leave out that portion of the configuration. This was sorta-inspired by ASP .NET’s XML transform stuff for Web.config.

On top of this, we have a mechanism inside the app which (for TCP listener options) merges the '_' listener (defaults) with per-listener overrides.

phild · November 2, 2021, 4:15pm

I have used rebar3’s dynamic configuration before to some success. I don’t know how that compares to Elixir’s “executable config”, having not used that.

elbrujohalcon · November 3, 2021, 7:14am

How do you configure your systems today?

If I can decide, I still stick to this principle: https://medium.com/hackernoon/system-settings-9ed72d5ef629
TL;DR: Only the most basic and technical stuff that’s needed for the system to boot up is on config files. The rest is in some database shared with the administrative tools which provide a nice interface for the admin user to adjust it (i.e. not just a text editor).

Now, at my current job, I work on a somewhat legacy system where we don’t do that… We move and merge relatively large config files that we personalize for each environment (and adjust/re-read while running in production) through some serious usage of file:consult/1 and file:write_file(io_lib:format(…))..

How would you like to configure your systems in the future?

Just like I described above. I don’t want to care that much about sys.config and its friends, since I would only keep the very basic stuff in them anyway.

As an additional comment… A few years ago, some inakos and some other folks from Erlang Solutions developed: GitHub - inaka/nconf: Nested Configuration Manager for Erlang Applications
This might end up being useful for your project, @garazdawi … at least as some sort of inspiration.

garazdawi · November 15, 2021, 2:46pm

Thank you everyone for sharing!

When talking with some others about this problem before, @MononcQc brought up that there are different levels of configuration, and the different levels have different needs. The levels talked about were:

Compile-time - How should the code be compiled. This is today mostly solved by rebar3, erlang.mk.
Boot-time - This is today solved mostly via sys.config and vm args, which relx/cuttlefish/etc augments.
Run-time - Fetching config from external source at startup or going live-reconfiguration.

We also have different types of products that have different needs. For instance, configuring erlang_ls is very different from configuring a radio base station, which again is very different from configuring dialyzer.

I do not think that Erlang/OTP should include support for doing configuration using an external source as that very quickly becomes a very big problem with many different solutions. I also think that the best way to deal with live-reconfiguration is by doing it the way that logger does, i.e. the application should provide an API that exposes what can be reconfigured.

There are a couple of things with how we configure at boot-time today that I would like to change.

Be able to configure erts using sys.config.
For some things command-line arguments are great, but I think that many of the options going into erts it would be better if they could be configured using sys.config files.
Be able to validate configuration and provide user-friendly error reports.
Allow the application to do the merging of configuration variables, with some good configurable defaults.
Allow plugging in custom file parsers to parse the config file.

I did a prototype of this last year where you could do the following:

> erl -config sys.config -config sys.toml -erts break 'true' -fdconfig 3 3<<EOF
[{kernel,[{logger_level,alles}]}].
EOF                                                                      
./sys.config:2 [erts.schedulers.normal.online] invalid value: a. Must be an integer.
./sys.config:3 [erts.schedulers.dirty.cpu.online] invalid value: 11.
  Number of online schedulers (11) must be less than or equal to available (8)
./sys.config:5 [erts.schedulers.bind_type] invalid value: default_binding.
  Valid values are: [default_bind,no_node_processor_spread,
                     no_node_thread_spread,no_spread,processor_spread,spread,
                     thread_spread,thread_no_node_processor_spread,unbound]
./sys.config:6 [erts.schedulers.topo] invalid key: topo.
  Valid keys are: [bind_type,dirty,forced_wakeup_interval,load,normal,
                   port_parallelism,topology,wake_cleanup_threashold,
                   wakeup_strategy]
./sys.toml: [erts.schedulers.dirty.io.online] invalid value: 10235. Must be 1 =< Value =< 1024.
command line: [erts.break] invalid value: true.
  Valid values are: [disable,ignore]
fd 3.config:1 [kernel.logger_level] invalid value: alles.
  Valid values are: [emergency,alert,critical,error,warning,notice,info,debug,
                     all,none]

Much of the validation logic is derived from the typespecs for the options, with some special code needed for when two configuration options depend on each other or some environmental thing.

I stopped working on it because I did not like the solution I had come up with that allowed the user to create custom config parsers, and also the erts configuration validation was all done in Erlang code which meant that the startup times would suffer.

I’ve now recently started thinking about this again, which is the reason for this post. Configuration is an important part of any software, and I would like to make it easier for developers to provide a great configuration interface to their users (be they end-users or other developers).

seriyps · November 15, 2021, 5:21pm

For the re-configuration at runtime (without restarting) I think how todays config_change/3 callback works mightbe more-or less enough, but the problem with config_change/3 is that right now it is quite tightly coupled to application_controller/release handler or whatever. And there are no easy API provided to actually trigger config_change/3 from application code. Let’s say my app received some configuration update via HTTP or some kind of etcd/consul. Now to convert this request to the format that config_change likes would need some handwritten code to patch the application:env and to construct the arguments for config_change/3. And when doing so you are implementing some hacks, not like you are using the APIs which were designed for such type of the usage. And then you know that if your app restarts, those changes will be overwritten by values from sys.config and there are no easy way to add a hook that will re-fetch the updated config from the network or disk cache.
Maybe partly because of that not many opensource library applications implement config_change/3 callbacks (but maybe also because it’s often easier to just restart your app than implement a live config update).

afa · November 16, 2021, 9:57am

I just want to quickly chime in and say thank you for thinking about this, @garazdawi.

In VerneMQ we use Cuttlefish too, and I’m mostly happy with it. I like the per-application schema files that allow configuration over a central config file for the enduser.

But every aspect of this - from enduser as well as developer perspective - is so important… and taking complexity out of it I guess always worth it.

Maria-12648430 · December 29, 2021, 3:57pm

I’m a bit late to the party, but here goes

I’m not too unhappy with the files, I must say. In terms of robustness, nothing beats files. They may be missing, unreadable or contain something invalid, but that is pretty much all that can go wrong.

for all of that

I don’t think this is a good idea TBH…

It might be nice for the first person deciding on a format if he can use the one he likes best, but it subsequently requires all people coming to the project later to know or learn that format.
People contributing to different projects may need to learn a different format for each project they’re involved in, like normal sys.config, but also toml, yaml, json, xml, ini (of which there is a myriad of variations), and things get worse if you add custom parsers.
In a nutshell: I would hate that

Also, I think the current sys.config format is suited very well to the language it is to be used in. It is regular Erlang, only with some limitations (eg no function definitions or calls).

What I would really like, however, is if I could use maps instead of lists of KV-tuples, but that is mostly for aesthetic reasons I like maps.

I haven’t used executable configs (or Elixir in general, for that matter), but it sounds both exciting and a bit dangerous

What I would like is an easy way to reload app configuration, outside of release upgrades, application-specific or even system-wide. I imagine something like application:reload_config(my_app), which would load the config the same way as it did on start, extract the part concerning my_app, and pass both the old and new app configs to an (optional?) application callback via which the application can perform any necessary validations or alterations and return the config to be used forthwith.

However, I realize that this poses a possible consistency/race problem, eg if another process subsequently retrieves two related config values, but the config changes in between

starbelly · December 29, 2021, 6:07pm

A rebar script is pretty much the equivalent of an elixir script (.exs). The main difference I suppose would be something in standard lib to configure the system. The word executable here probably makes it sound a bit scary

You can read about it all here.

It aligns with what was mentioned above… compile-time, boot-time, and runtime.

garazdawi · December 30, 2021, 10:18am

I go back and forth regarding adding the possibility to configure using a custom config parser. I think that larger projects that get deployed by many users that do not really care about whether it is Erlang or not underneath (like ejabberd/MongooseIM, Riak, RabbitMQ, CouchDB etc) would benefit from being able to configure the system using a custom config parser. However, when the project gets to a certain size, the configuration becomes a major part of the system and many projects would most likely end up rolling their own anyway in order to get more control.

cmo · January 1, 2022, 6:21am

I think it’s a good idea to provide something like Elixir’s Config Providers. My software is run on clients machines so I provide a config.toml as I don’t want them having to grok anything Elixir/Erlang specific, or to break something by messing with the built-in configs.

garazdawi · January 3, 2022, 8:03am

Yes, something like the Elixir config providers is what I had in mind for reading custom configs. One problem with the Elixir Config Providers (as far as I know) is that you cannot configure erts with it, which is something that I would really want to be able to do with the custom config. However, in order to solve that, the only way I have come up with so far is to do something very similar to what cuttlefish does. That is start a separate VM to parse the config that needs to go into the release.

Maybe configuring erts is not as critical as I think for most use cases so that scenario can possibly be ignored and then things become a lot simpler, though not trivial.

tsloughter · January 6, 2022, 12:47pm

I think configuring erts is the most important part for a new configuration setup.

I have different priorities than others of course, I don’t think I’ve ever worked on a system (or at least not on one that ever made it anywhere, hehe) that needed configuration in anything beside sys.config and vm.args because there weren’t end users running it.

But it is that separation between sys.config and vm.args that can be annoying, and at times confusing to new users.

Another aspect is the parsing we have to do in relx’s start script relx/extended_bin at main · erlware/relx · GitHub

Some of this is now only needed for remote_console and maybe that can be resolved in a way separate from configuration changes – similar to how erl_call has removed the need for all those dist args to what we were doing before with nodetool for rpc, so something like erl_remote to open a shell directly without it needing to check epmd?

Oh a new one is hettps://github.com/erlware/relx/blob/main/priv/templates/extended_bin#L679-L696. It parses out the IP from an Erlang tuple because the only way to configure the inet_dist_use_interface is a tuple IP, but we need the IP as a string to use as an address to erl_call.

Regarding format I understand the need for some to use something besides Erlang terms, but it actually makes things a lot simpler that we have the ability to configure in Erlang terms, unlike many languages where it isn’t an option to use their data structures in configuration. Because what you work with in the code after reading in any configuration is Erlang terms.

But I do wish there were better ways to integrate dynamic runtime configuration (like replacing OS environment variables in sys.config.src). The main issue I see is you can easily have an unparsable config file, even if you set the environment variable correctly, because we can’t verify it before packing the release. Like a sys.config.src,

[{some_app, [{config_key, ${OS_VAR}}]].

A missing }, a missing , etc aren’t caught until runtime. You can of course use sys.config which is verified but then all your dynamic values have to be strings, and back to parsing non-Erlang term configuration:

[{some_app, [{config_key, "${OS_VAR}"}]}].

I don’t know what to do about any of this but my hope was first there would be a combined sys.config and vm.args that was done on startup without the need for a full VM to first boot and parse like cuttlefish. And then a tiny toml parser in C could be inserted in there too or something

Last I was going to bring up the configuration we have to do in OpenTelemetry opentelemetry-erlang/otel_configuration.erl at main · open-telemetry/opentelemetry-erlang · GitHub to combine OS env variables and application env variables into a single configuration.

But, hmm, I just realized having a general solution for what we do in otel_configuration could potentially remove the need for having environment variables in sys.config…

I’ll stop the ramble here, hopefully some of that made sense.

igorclark · January 6, 2022, 2:54pm

Regarding format I understand the need for some to use something besides Erlang terms, but it actually makes things a lot simpler that we have the ability to configure in Erlang terms, unlike many languages where it isn’t an option to use their data structures in configuration. Because what you work with in the code after reading in any configuration is Erlang terms.

100% agree. There’s often a tendency where it comes to confit (and other stuff tbh) to do things in certain ways because other languages or systems do it that way but if they’re doing it like that because they don’t have the option to do something better (eg because their language doesn’t have a native term notation) then Erlang doesn’t need to jump on board.

josevalim · January 6, 2022, 4:06pm

I would also add it is important to document and make the configuration type (compile, boot, or runtime) programmatically accessible. This would allow tools to check and potentially raise/warn in case of conflicts.

For example, Elixir has a rudimentary warning if someone tries to change a compile-time configuration after the code has been compiled or a released has been assembled. But it is still quite limited.

garazdawi · January 7, 2022, 7:51am

I too like that we configure systems using Erlang terms (and I think we should keep it as the main way to configure systems), but any user of a product that does not know Erlang will naturally find it a bit foreign. Though people seem to use json to configure all types of systems nowadays and that is arguably an even worse format…

One idea I had at some point was to write an Erlang term parser in C which could parse sys.config for erts. Then if you provided sys.toml, erlexec would automatically start a separate VM that parses that config into a sys.config format. So you only pay the cost of starting a separate VM if you actually use a non-native config.

tsloughter · January 7, 2022, 12:13pm

Why not a small C toml parser, GitHub - cktan/tomlc99: TOML C library, instead of starting the VM?

eproxus · January 7, 2022, 12:51pm

This is what I would prefer. Even better, if the parsing tool could be any executable and pass the resulting config to the Erlang VM via an argument (i.e. so that it doesn’t leave stale sys.config laying around). Then one could use any format that has an external converter (Erlang-based or not).

garazdawi · January 7, 2022, 1:09pm

Yes, that would probably work as well.