Pool of Processes vs One Process

Abdelghani · December 25, 2021, 6:04pm

Hi everybody, it’s recommended to use a pool of Erlang Processes that are ready to accept connections on a Listen Socket and that’s what “Joe Armstrong” said too in his manual book, and the benefits are to doing PARALLEL accept at the same time.

But when we look so deeply in this method, a Listen Socket is an Erlang Port and when Erlang Processes that are distributed across Schedulers Threads (SMP) do gen_tcp:accept(ListenSocket), they try to communicate with the Port (ListenSocket) that results in running the Network Driver, and this Port as all Erlang Ports is protected by a LOCK and can be used by just one Thread at a time and all calls to this Port will be executed in fact in SEQUENCE.

So why using a pool of Processes that accept connections instead of using just one Process that accept and spawn it’s Alternative ?

My searches about this starts from analyzing the source code of ejabberd Server that use just one Process to accept connections.

domi · December 26, 2021, 6:54am

To spawn the process the acceptor has to call a supervisor, which does the actual spawn, waits for the init callback to return, logs progress reports, and updates its internal state. This blocks the acceptor for a while, so it’s worth using multiple acceptors and supervisors to ensure other connections can still be accepted while this is happening.

Abdelghani · December 26, 2021, 3:10pm

So the thing maybe in the code after the gen_tcp:accept() call ,but in practice all acceptors share the same simple_one_for_one supervisor and each accept results in supervisor:start_child(SharedSup, [ ]) (Cowboy works similarly), and in fact I don’t know if spawning Childs from the same Supervisor can be done in PARALLEL but if not and it’s done in SEQUENCE I think logging and other things are very fast and negligibles.

domi · December 27, 2021, 12:38am

Modern Cowboy uses multiple supervisors, in fact it can even use multiple sockets (on the same port). See:

https://ninenines.eu/articles/ranch-2.0.0/

Abdelghani · December 27, 2021, 1:10am

Exactly, I have read the code of ranch_acceptors_sup but not the code of ranch_conns_sup and this explains why there is ranch_conns_sup_sup so now I confirmed where is the Parallel handling in accepting connections, Thank you @domi for your “links” that explain always directly what I search about, I know about many Listen Sockets with the same port number, it lasts just one question that you shouldn’t answer if you want, why ejabberd (know of it’s powerful performance and scalability) use just one process to accept connections(I confirmed by reading latest version of code)?

domi · December 27, 2021, 2:44am

I’m not familiar with it, but I would guess its connections usually live for a long time, so the accept rate isn’t a bottleneck.

Abdelghani · December 27, 2021, 3:09am

This is exactly what I doubted, since “http” restarts the connection each time, the accept have heavy impact to performance comparing with “xmpp”, Thank you @domi for all.

juhlig · December 27, 2021, 9:31pm

There is another point besides performance that should be noted here, which is the separation of accepting from connection handling.

Ranch, the acceptor component of cowboy, has acceptors, ie ranch_acceptor processes that do nothing much but running accept in a loop. If a connection is thus accepted, the acceptor process tells its associated ranch_conns_sup to start a connection handling process, and hands the connection off to it. It doesn’t matter to the acceptor process whether the connection process runs, crashes, or doesn’t even start. And even if an acceptor process crashes, the ranch_acceptors_sup above it will restart it and everything will just continue working as before.

An approach in which a process accepts and spawns a new acceptor is more fragile. The chain breaks/ends if the current acceptor crashes at any point before it can spawn a new acceptor.

Abdelghani · December 28, 2021, 10:35am

Thank you for that @juhlig , I didn’t read the entire code of Cowboy but Iam in progress, yes that is there are acceptors and conns_sup and each acceptor has it’s conns_sup, but I can’t understand how it works exactly until I read the source, there is another thing that I noted : why using the Erlang processes implementation (spawn, receive, Pid !,…) instead of the OTP recommended implementation(gen_server)?

AstonJ · December 28, 2021, 5:11pm

Without reading the code I would guess in those cases a simple spawn was sufficient.

From looking at some of your threads here it sounds like you would benefit from reading a couple of Erlang books. In Elixir in Action (and I am sure many Erlang books cover similar) there is discussion about some of this. Here’s what I had written in my review:

Saša also covers much more than just the fundamentals of the language; after the basics you cover processes and OTP in quite some detail (and from what I’ve read so far, goes into more detail here than PE). You actually build your own server process before he introduces you to GenServers (which I felt was an excellent way to demystify them) - you’ll definitely leave feeling as though you have a fantastic insight into Elixir and Erlang!

I haven’t read it yet but I imagine Erlang and OTP in Action covers similar if not more. Books are incredibly useful in helping you get a good understanding of a language and what’s possible with them. You can even get 35% off using our discount code

Abdelghani · December 28, 2021, 6:06pm

Thank you @AstonJ I have read some of Erlang’s books and I did just a look at some others, but if you note too that all books share the same strategy to explain the language and there is no one (that I looked for) contains internal details, for example will be recommended to use a pool of processes but never find why and where is the parallel and the sequential parts in accepting connections as it aswered here, an other example there is no book absolutely describes in details internal ERTS and Beam, I find one on github that many important chapters are missing… so I think that there is no more benefits from reading books at this stage.
Ok returning to the question, and as you said basing on books recommandations this is what I found in “Erlang Programming” by Francesco Cesarini &
Simon Thompson
“One of the downsides of OTP is the layering that the various behavior modules require. This will affect performance. In the attempt to save a few microseconds from their calls, developers have been known to use
the Pid ! Msg construct instead of a gen_server cast, handling their messages in the handle_info/2 callback.
Don’t do this! You will make your code impossible to support and maintain, as well as losing many of the advantages of using OTP in the first place. If you are obsessed with saving microseconds, try to hold on
and optimize only when you know your program is not fast enough”,
And ?

starbelly · December 28, 2021, 6:07pm

I’d like to piggy back on this by saying Joe hated the words parallel and simultaneous

Abdelghani · December 28, 2021, 6:08pm

@starbelly you are outside the subject Amigo

starbelly · December 28, 2021, 6:14pm

I disagree. The word parallel keeps coming up and I provided a fun fact around what one of the creators of the language thought about said word, especially in the context of Erlang/OTP but also in general

juhlig · December 28, 2021, 10:22pm

You’re mixing things here. If we talk about accepting connections, we’re talking about Ranch, not Cowboy. While Cowboy is the reason behind the existence of Ranch, and in fact the acceptor logic was completely built into Cowboy in the beginning, it was later separated out and moved into an own project, as Ranch can do more/other things than just accept connections for Cowboy.
For example, gen_smtp uses it to accept connections for SMTP, and RabbitMQ uses it to accept connections for AMQP, its REST API, and web interface (the latter then work via Cowboy).

Hm, now that you’re asking, I noticed that I never asked that question in all the time I was doing stuff for Ranch Btw, ranch_conns_sup acts as a supervisor, not a gen_server. Then again a supervisor is a gen_server, so you’re not entirely off

But to answer your question (as far as (I think) I know), ranch_conns_sup needs to do some things that can’t be done with a standard supervisor, or would require a quite complex (and more fragile) setup with extra processes for management and bookkeeping etc.
On the other hand, there is a lot in the standard supervisor which ranch_conns_sup just does not need and which could therefore simply be left out at the same time, resulting in small performance gains, but those were not the reason for the custom implementation, they just happened to come as a plus, for free.

juhlig · December 28, 2021, 10:36pm

Funny that you should mention that book Because in one chapter, they build a TCP server (I forgot what for, though), using just the aforementioned naive accept+spawn approach

The socket chapter of Learn You Some Erlang highlights the shortcomings of this approach in terms of performance, and proposes a different (but really very similar) approach with better performance.

Abdelghani · December 29, 2021, 2:35am

@juhlig you have write a lot but you didn’t answer the question , Ranch is a Network Layer Server and Cowboy is An Application Layer Server so we can use any Application Server on top of Ranch, my question was why using the Erlang implementation instead of the OTP implementation, all recommandations from Joe and others said that should always use the generic server (when I said gen_server in my previous post I mean using gen_server instead of the process created by proc_lib:start_link I didn’t mean ranch_conns_sup so should read correctly my post) and I have posted a note from well know book previously, and also the example from the book you mentioned (I read it a long time ago, you should read it before talk about) use a gen_server acceptor so why Cowboy don’t use ?
I think to gain some performance taking off gen_server additional overload as you said, if this is the case why we don’t all just taking off overload and taking off the OTP implementation ? in other terms when to be better to use proc_lib and when to be better to use gen_server ? this question is not easy to answer

starbelly · December 29, 2021, 2:44am

When did Joe say this? Who else who has said this? You should always prefer to use behaviors provided by OTP, such as gen_server but there is no hard and fast rule. There’s always going to be a case where the general case may be sub-optimal for your needs. That said and iirc per conversations with others, performance was a reason at one point for such a design decision, but that is no longer true AFAIK (i.e., you don’t get much of a performance gain from avoiding gen_server these days).

“Make it work, then make it beautiful, then if you really, really have to, make it fast. 90 percent of the time, if you make it beautiful, it will already be fast. So really, just make it beautiful!” – Joe Armstrong

AstonJ · December 29, 2021, 2:51am

Ah that’s awesome Jan! I really enjoyed the Elixir In Action’s equivalent, so I hope to get around to Erlang and OTP in Action as well one day

This!!^^

domi · December 29, 2021, 3:55am

Loïc, the author of Cowboy / Ranch, gave a talk about this: