Using links/monitors to return the result of a short lived process/function

RoadRunnr · March 31, 2023, 8:37am

I’ve been copying a “trick” from the OTP libraries that uses links or monitors to return the result from short lived processes to the parent. Recently a colleague complaint that this “violates” the purpose of monitors/links, that those should only be used for live cycle events and not for returning results, and that I should use plain message sending instead.

What do think about the following code, is it a “proper” use of a monitor or does it violates some Erlang mantra/concept/spirit?

Note: the blocking call from the example could also be done during init when enter_loop is used. However, that would mean that any request to the process while the init has not finished would block and that is not acceptable for my use case.

-module(async).

-behaviour(gen_statem).

[...]

callback_mode() -> [handle_event_function].

init([]) ->
    Now = erlang:monotonic_time(),
    {_, Mref} = erlang:spawn_monitor(fun() -> exit(blocking_call()) end),
    {ok, init, #{init => Now, startup => Mref}}.

handle_event({call, From}, get_state, State, _Data) ->
    {keep_state_and_data, [{reply, From, State}]};

handle_event(info, {'DOWN', Mref, _, _, ok}, init, #{init := Now, startup := Mref} = Data) ->
    ?LOG(info, "~s started in ~w ms",
         [?MODULE, erlang:convert_time_unit(erlang:monotonic_time() - Now, native, millisecond)]),
    {next_state, running, maps:remove(startup, Data)};

handle_event(info, {'DOWN', Mref, _, _, Reason}, init, #{startup := Mref}) ->
    ?LOG(critical, "~s failed with with ~0p", [?MODULE, Reason]),
    {stop, {shutdown, Reason}};

[...]

blocking_call() ->
    %% do something that blocks and return a result
    ok.

elbrujohalcon · March 31, 2023, 9:47am

I think it’s not terribly bad, although your example has too much boilerplate. Hopefully, this tinier example represents the same idea, too…

 {_, MRef} =
  spawn_monitor(
    fun() -> Result = do:something(), exit({ok, Result}) end),

  receive
    {'DOWN', MRef, process, _, {ok, Result}} -> Result;
    {'DOWN', MRef, process, _, Error} -> exit(Error)
  end.

The problem I see with that approach is that for the Erlang VM, your process exited abnormally even if it returned {ok, …}. That’s because the only reason that’s treated as a normal exit from the VM’s perspective is normal, IIRC.
That may not be an issue, but you need to be aware of it because it might have some unforeseen consequences, such as:

Every process that’s linked to the one you spawned will crash as well if it’s not trapping exits. That might actually be your intention or you might be sure not to link to any other process from within blocking_call() and its auxiliary functions.
SASL, the logger, or other tools may emit unwanted reports for your dying processes. But I think this is less of a concern if you use exit/1 instead of error/1.

mikpe · March 31, 2023, 12:07pm

You want to exit({normal, Result}) from the nested process.

We use this pattern enough that we’ve packaged it up in one of our standard libraries. Our main motivation is to avoid having to scan potentially long message queues during the interactions between the spawning and spawned processes.

max-au · April 3, 2023, 2:48am

I’d even argue that spawn_request pretty much leverages this approach.

phild · April 3, 2023, 8:12am

There is so much to OTP I never knew existed!

RoadRunnr · April 3, 2023, 8:34am

actually, a EXIT is only picked up by supervisor and OTP behaviors if there is a link between the two processes (AFAIK). Plain Erlang processes will not generate SASL reports for EXITs (only for errors).

The sample uses spawn_monitor, so no link will be created and you will not get any SASL report regardless of the reason passed to exit.

Maria-12648430 · April 3, 2023, 8:34am

… and IMO he is right, that is not the purpose of monitors and links.

While there is nothing strictly speaking against this way of doing things, I wouldn’t recommend it unless you have a good reason. It is pretty much different from how everybody else does it. IMO you should stick to the usual ways that everybody understands.

This way of returning results also has some pitfalls that normal messaging simply doesn’t have, which you will either have to accept/ignore or provide for, for no other reason than employing this trick instead of normal messaging. Aside from what @elbrujohalcon already mentioned, one thing that comes to mind is that if someone calls exit(ThatProcess, ok) on your spawned process, you will also receive a 'DOWN' message with reason ok. And there may be more.

Finally, let me point out that using a monitor (vs a link) to watch the spawned process has a downside: while the gen_statem process will notice if the spawned process dies, the spawned process will not notice if the gen_statem process dies, and will run through to the end. In the best case, that is work needlessly done.

Maria-12648430 · April 3, 2023, 8:36am

… and if a process linked to the spawned process crashes, it will take it down with it, with the same exit reason.

1> {P1, _}=spawn_monitor(fun() -> timer:sleep(infinity) end).
{<0.85.0>,#Ref<0.367802681.2026897409.196202>}
2> exit(P1, foo).
true
3> flush().
Shell got {'DOWN',#Ref<0.367802681.2026897409.196202>,process,<0.85.0>,foo}
ok

4> {P2, _}=spawn_monitor(fun() -> timer:sleep(infinity) end).
{<0.90.0>,#Ref<0.367802681.2026897409.196228>}
5> spawn(fun() -> link(P2), exit(bar) end).
<0.92.0>
6> flush().
Shell got {'DOWN',#Ref<0.367802681.2026897409.196228>,process,<0.90.0>,bar}
ok

Maria-12648430 · April 3, 2023, 8:47am

There, there *pats*

RoadRunnr · April 3, 2023, 8:51am

Well, you shouldn’t link to that process in the first place. The whole purpose of the spawn_monitor is not to use links. If you wanted to have links, you could have use spawn_link in the first place, but that would defy the purpose of the code.

RoadRunnr · April 3, 2023, 8:57am

@Maria-12648430 the shown approach is not intended for long running process doing complex computation (where you would waste the effort) or situation where someone else could send you messages (if you don’t export the Pid, then no one could send you an exit message).

jimdigriz · April 3, 2023, 9:12am

Unfortunately there are some languages (looking at you Python and Ruby) that tend to overly attract people that get stuck up on the “purity” of code.

There are so many things that are wrong with this attitude but one of the effects of that is the lobbying that there one ‘correct’ way to do something, and everything else is wrong. I think this probably arose from some over simplification of “understandable”, a debate for another time.

Most other (non-straight jacketing, “you must conform or be exiled”, …) languages tend to let you discover and iterate on something to find something that works particularly well for your problem. ‘Well’ is something you and your colleagues get to decide and not by someone who is not in the room with you.

What is important is that ‘understandable’ element, fortunately there are many ways to do this. Some of that is commenting the ‘why’ and not the ‘what’, some of that is by not nesting ten levels of case ... of ... end statements, some of that is “do not use libraries that require learning something new for no tangible reason”, etc etc

Programming is a two part thing of solving problems and communicating ideas. Sure you can communicate ‘ideas’ through enforcement around the conformity of the solutions used (not the same as Coding Style guide) but the innovation you are going to see is likely to be on par with the output of a “designed by committee” government bureaucratic process…yikes!

By (extreme) example, why do non-Perl coders hate Perl. Is it the language, or the output of how it is used? I suspect it attracts people like me that think golfing belongs in production, sure it is fun but it tends to both tank the understanbility of it and worse set you up for two weeks from now being unable to understand your own code!

As already shown later in this thread your solution already matches something OTP recently formalised. So your take away from this should be to retort with “please explain why not using your feelings/opinions/some-blog-post-you-read”.

Maria-12648430 · April 3, 2023, 9:49am

That depends. I’m just pointing out the fact.

What I’m trying to say is, if there is a safe way to do things (normal messaging), why resort to a way laden with implications, even if your (current) code does not hit upon the unsafe parts? Why use a way to do things that, while it works, everybody else (and even you, in a few weeks) will have some difficulty to understand? In other words, what will you gain by doing that, except maybe saving a few lines of code?

Maria-12648430 · April 3, 2023, 10:00am

I’m not sure I grasp what you’re trying to say, but well =^^=

I’m not saying that there is one correct way of doing things. I’m saying there are ways that are safe and that people will understand intuitively, and there are other ways to do things correctly (and there are ways to do it wrong, for completeness sake). If you go for one of the other ways, you should have a reason to do so, and that should not be just “because I know this special trick, look at me” (don’t get me wrong here, @RoadRunnr, I’m not pointing at you ). And, as you say yourself, it should be documented why this very special way of doing things is used.

I’ll not comment on “there is more than one way to do it ~~wrong~~” Perl

They didn’t “formalize” it, in the meaning of “make it everyday use” or encouraging it. They are using it, assumably fully understanding the benefits and implications. Which underlines my point: if you know what you’re doing (and not just think you do), by all means do it (and document it). If you don’t, don’t.

jimdigriz · April 3, 2023, 10:34am

“Citation required”.

Nitpicking, looking at the manpage I see nothing there that says “thou shall not” and “the only and singular purpose of monitor/links is…”

You cannot tell someone “do the thing everyone else understands” without describing it.

By this I mean, I think we all understand Erlang passes messages around and some of those messages are special by being tagged with DOWN or EXIT, but not really sure what you are proposing here that is tangibly different?

I find it is more confusing to abstract and hide needlessly behind many layers something; unfortunately this seems to be a systemic problem in the programming community and akin to burying your head in the sand.

Do understand, I use ‘needless’ here as some metric that everyone gets to stake out in the sand themselves and is dependent on their circumstances and is something that is likely to be different for each team/individual.

As an example, a DNS server I am working on, because of the limitations of ETS, needs to punt transactional changes elsewhere. This means I have to copy the entire existing zone, make changes to it and then arrange for it to be atomically swapped so everything else querying it can see the same view in lockstep with the serial number.

Fortunately for me there is only a single transaction running at most because the problem space operate in allows for it…this lets me take some liberties.

These assumptions let you take liberties that directly impact what is appropriate for the problem you are solving.

To exaggerate my argument, which is more understandable?

monitors
- spawn_monitor
- watch for ets:give_away/3 followed by the DOWN message
- this requires state/handle_info tracking and is all done in one gen_something
- may require one extra separate module for when your spawned function is complex
supervisor and children - assuming this is what you mean by ‘everyone understands’?
- create a supervisor that is started before my main process to spawn processes into
- create an ‘expected’ API to start the transaction in that supervisor
- make sure the child runs as as ‘singleton’ (ie. no more than one at a time)
- involves at least an extra supervisor, a child gen_something in addition to your main module
  - code that someone has to read before using to make sure it is ‘expected’
- …and you still do not get to avoid the arguably most difficult part in all this of tracking this all in your main process state and its callbacks

My point here is telling someone “do not do this” without knowing what liberties they can take will always result in an over-engineered solution that is too complex for purpose.

By tagging your messages, which I think is ‘expected’ and recommended (by that I mean you learn the hard way pretty fast), with something like {?MODULE,...} or pass in a reference you expect to see returned makes this is a non-problem surely?

If something is spoofing your tagging, then you have a different problem.

I think you just described “using a monitor”?

I see this as being no different to if I instead monitored a child process under some other supervisor…just maybe you can now abstract it away and do some more head/sand burying and call it ‘OTP-esque’?

The only downside (with both) I see is you have to figure out how to link() back to your main process but in a way that if you die unexpectedly that you do not want to kill the main process; particularly hairy if you want to avoid process_flag(trap_exit, ...) for some local site reason (ie. assumptions/liberties).

For me, I get the liberty that my DNS zone changes are really cheap, so if the main process dies, I do not care if it runs to completion needlessly and the results are discarded. After all, I am working with transactions so they are by definition expected to needless work.

Similarly, by an exaggerated example, does anyone care if we do a work to then cast messages back over a distribution link? If you are using cast then you have already decided that your environment does not care if the result is ever seen, right?

Problems only arise when you do not regularly revisit those assumptions and decide if the liberties taken still hold. I think people incorrectly call this ‘technical debt’, when I see it as “nice high level problem to have” and all code was written for and with good reason at the time.

jimdigriz · April 3, 2023, 10:53am

“You say {'DOWN', MonitorReq, process, Pid, Reason}^Wpotato I say {ReplyTag, ReqId, error, Reason}^Wpotatoe”

“You must be at least {this} clever/experienced to use this…”

Sounds like a straight jacket.

Some of the best stuff I have done comes with the knowledge I gained through the (sometimes spectacular) mistakes I have made.

phild · April 3, 2023, 12:19pm

Not only that, but it is the subject of such lively debate!

vances · April 3, 2023, 2:32pm

Well the pattern has been part of OTP for a long time. So long that I couldn’t locate the change request I originally submitted to add support for {shutdown, Term} as an equivalent to shutdown. I use this for cleaning up after workers in many places.

It’s in the supervisor documentation.

Maria-12648430 · April 3, 2023, 3:19pm

There (probably) is no manpage for hammers telling you to not hit your thumb. Whack away then.

Also, I didn’t say anything about abstractions, AFAIR.

tl;dr, there may be good reasons for “taking liberties” as you put it, and that is all fine. Just don’t do things differently for the sole reason that you can do it differently. That is my opinion at least.

jimdigriz · April 3, 2023, 3:19pm

On a related note, has anyone any pointers on the history and uses of the {ok,Child,Info} from children starting on a supervisor?

Maybe I missed it (manpages and grepping the OTP source) but I could not see anyway to get something for Info straight out of gen_whatever:init/1 other than something like:

start_link() ->
  {ok,Pid} = gen_whatever:start_link(?MODULE, ...),
  {ok,Info} = gen_whatever:call(Pid, ...),
  {ok,Pid,Info}.

I loves me a bit of history…