Args in gen_*:init/1

I have a rule that I’ve followed for so many years that I cannot remember it’s origins, and that is that the Args argument to the init/1 function of a gen_* behavior module should always be a list. I realize that Args has a term() type spec however I know there used to be some feature interaction where it was assumed to be a list. Can anyone recall what that might be, or have been? Today I’m inclined to use a map().

For supervisors when you use simple_one_for_one it needs to be a list as spawning a new child there is a list join against the arguments passed in; in the supervisor you define some static base args and when you spawn later you can add in some dynamic extras.

Otherwise I think you can just use anything.

1 Like

Yes, that’s it, thank you. Of course I knew that but I wasn’t making the connection to gen_server this morning.

@jimdigriz I think that is wrong, or misleading, or mixing together unrelated things. I might have missed or misunderstood something, though :sweat_smile:

The argument given to init/1 can always be anything (personally, I prefer tuples these days). It is solely the responsibility of the respective start/start_link/start_monitor call to provide the argument in the right form.

As far as (starting from) supervisors are concerned, in a simple_one_for_one supervisor the second argument to start_child/2 must be a list (for other supervisor types, it must be a child spec). The elements of that list are appended to the list of (fixed) arguments specified in the start {M, F, A} in the child_spec as returned from the supervisors’ init function. When a child is started via start_child(Sup, StartChildArgs), it will be done like apply(M, F, A++StartChildArgs).

For example, if your single_one_for_one supervisor has something like start => {my_module, start_link, [a, b]} in the child_spec, a call like supervisor:start_child(Sup, [c]) will lead to Sup calling something like apply(my_module, start_link, [a, b, c]), equivalent to a call like my_module:start_link(a, b, c).

(See also the docs for supervisor:start_child/2).

7 Likes

This is exactly why I have the exact opposite rule as @vances’ one:

Never use lists in the Args/InitArgs parameter for gen_*:start[_link](…) / init/1.

The erlang docs provide an example that includes…

start_link() ->
    gen_server:start_link({local, ch3}, ch3, [], []).

…and…

init(_Args) ->
    {ok, channels()}.

And I think that example, and other similar pieces of documentation, are misleading at best. Mainly for two reasons:

  1. Using an empty list if you don’t need to pass anything to init/1 may lead people to believe that you’re passing 0 arguments to the init function (because sometimes arguments are passed down to some functions like supervisor:start_child/2 as lists. But that’s not true. With an empty list, you’re passing a single argument to init/1, as usual. It’s just that in this case it’s a list. If you don’t need to pass any external values down to init/1, I recommend the usage of an empty map (#{}), an empty tuple ({}), the atom undefined, or something even more explicit, like the atom no_arguments.
  2. Calling the single parameter of init/1 Args. It’s not wrong, but again it may lead people to believe that since it’s written in plural, it has to be a list of arguments. When in reality it’s a single argument.

Using this example code to build servers has led more people than I count to write stuff like this…

-module my_server.
% …exports and everything else…
start_link(AParam) ->
  gen_server:start_link(?MODULE, [AParam], []).

init(AParam) ->
  …

Which is a code that compiles and it then fails mysteriously when it’s executed. It has produced many headaches over the years since that example was written.

3 Likes

Uh, what is? :sweat_smile:

I dislike the list in init argument.
The main reason is the size of git diff when I add a new parameter.

In last few years I prefer to use the record which is used as state:

start_link(A, B, C) ->
  State0 = seed_state(A, B, C), % Validate, parse and store arguments for later use
  gen_server:start_link(?MODULE, State0, []).

init(#worker{interval = T} = State0) ->
  State = State0#worker{timer = erlang:start_timer(T, self(), do_thing)},
  {ok, State}.

This has multiple advantages for me:

  • Initial validation is performed in caller context
  • Clean code (parameter parsing) and non-clean (initialization) are separated
  • Adding a parameter used in one specific case does not modify gen_server:start_linkcall and init/1 header, thus cleaner git history
2 Likes

The confusion between the single init/1 argument (sometimes being a list) and the lists of arguments used elsewhere (for instance on supervisor:start_child/2).

3 Likes

I always thought because it’s the smallest (and simplest) term possible. It’s one word. Or maybe just a convention. I used lists, tuples, maps too, but never benchmarked.

I stand corrected, that actually wasn’t it. Although I often wish it were. I use simple_one_for_one supervisors often and usually want to add extra arguments to init/1 however that’s not how it works. Instead I have to under specify mfargs() in the child spec as [Module, Function] and later provide all the arguments with supervisor:start_child(Sup, [Args]).

If you’re building with OTP, which I always am, you think in terms of the arguments being those in start_link/3,4 however if that were the case we’d need to have specified the gen server type somewhere else. Maybe the supervisor should support OTP children explicitly?

I suspect confusion over this point is indeed why I established the Args :: list() rule so many years ago. I accept @elbrujohalcon’s point that maybe it just adds to the confusion!

1 Like

This thread prompted @juhlig and me to take a closer look at the current supervisor documentation and try our hand at some improvements: Improve `supervisor` specs and docs by Maria-12648430 · Pull Request #8015 · erlang/otp · GitHub*.
Comments welcome :raising_hand_woman:

* Disclaimer: The PR does not solve any of the causes for the confusion that has been discussed in this thread. The causes for that lie deeper.

5 Likes

I can’t remember where I heard this – it might have been in my early Erlang days on the mailing list – but I believe that passing an empty list to functions that mandated taking an argument was something done historically to save space.

The empty list ([]) is a bit of a null element and was likely smaller. You can see the old efficiency list (pretty much similar to the current one) on the wayback machine, and you’ll see it mentions lists being 1 words per element + the size of each element.

Other copies of the efficiency guide state things such as:

According to the myth, recursive functions leave references to dead terms on the stack and the garbage collector will have to copy all those dead terms, while tail-recursive functions immediately discard those terms.

That used to be true before R7B. In R7B, the compiler started to generate code that overwrites references to terms that will never be used with an empty list, so that the garbage collector would not keep dead values any longer than necessary

Another set of assumptions there came from mentally translating how one would make a linked list in C or C++: if the tailing element is null, you have an empty list, and assuming the Erlang empty list was also size 0 (you can reinforce that pattern today by calling erts_debug:flat_size([]) which returns 0).

Digging into the current erlang/otp implementation, though, we can see the following:

  1. the type definitions for lists and nil both are represented by an integer (the list is a tag, the nil is an immediate)
  2. a bunch of functions/macros to work with nils are also defined
  3. checks for empty lists are defined as calls to is_nil, at least in the NIFs.
  4. More interestingly though, checks for whether something is a list are defined as either a check to a list type or a nil type.

Anyway, this sort of cursory look and glance lends some credence to “this is just how linked lists are implemented”.

Doing a similar spot check on tuples (which by the way also have an erts_debug:flat_size({}) == 0) yields a different result. In the same erl_term.h files, tuples are dubbed arityval and share some structure with maps, but they’re more or less a boxed term including a tag, then a reference to a length, and if you want to know the tuple is empty, you have to check that length. This is arguably more work that checking for an immediate NIL value.

Any way, picking the empty list in the early days of Erlang was probably an attempt to find the cheapest way to call for “a null value that is also not confused with a semantic one”:

  • it is/was smaller than an atom (no ref lookup)
  • it is/was smaller than an empty tuple (and faster to check)
  • it has/had similar benefits compared to numbers, but also can be expanded to contain elements

These are likely more valid reasons (back then) to use an empty list for a lack of arguments, and then just start using a list as a container when you did have arguments than it would be today with records/tuples or maps.

It’s not that the list is significant anymore, but back in the day, [] could be understood as a more direct/light/faster null value. As computers got bigger and as Erlang got faster and more practical, the implementation details got rightfully ignored more and more by Erlang users and now you look at the empty list and go “what gives?” but the practicalities back then drove different decision-making.

4 Likes

Ah I forgot.

You can see this sort of decision-making (“lists are smaller and lighter”) be used in the implementation of older data structures like the dict module, where rather than using a {Key, Val} pattern, the whole thing was set up to use [Key | Val] – which often yielded improper lists, or cons cells – as a practice almost certainly borrowed from lisps:

1> dict:from_list([{1,2}, {a,[b,c,d]}, {3,4}, {[1,2,3],[]}]).
{dict,4,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
      {{[],
        [[a,b,c,d],[3|4]],
        [],[],[],
        [[[1,2,3]]],
        [],[],[],[],[],
        [[1|2]],
        [],[],[],[]}}}

The dict as defined in that module was a sort of tuple-based set of buckets (with some metadata in the first elements), each bucket storing many key/value pairs, and each key-value pair as a cons cell is visible in the output above ({1,2} is [1|2], {a,[b,c,d]} is [a|[b,c,d]] or just [a,b,c,d], etc.).

These have the advantage of once again easily defining both a null and non-null values with a similar semantic type while being lighter and faster in use.

We tend to no longer bother with this, even less so with maps and optimizations being driven differently nowadays, but the benefits of small savings like that could compound back in the day.

5 Likes