Erlang Networking Tasks equivalent at OS Level

In Joe’s Erlang book, he said that it will be convenient to create a pool of processes that wait to accept uncoming connections for more performance where the server hardware is compatible for parallel tasks, but since networking is in fact happens at OS level what exactly is happening behind scenes ? in other terms what the driver inet_drv.c do in response to Erlang Processes networking tasks ?
let’s try with the following example :

sup_module.erl 
----------------------------------------------------------------------------

start() ->
supervisor:start_link(?MODULE, ?MODULE, []). 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

init() ->
ListenSocket=gen_tcp:listen(....),
spawn(fun ->start_acceptors() end ),
{ok,{ {simple_one_for_one,1, 1},[{child,{child_module, start, [ListenSocket]},....] } }.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

start_acceptors() ->
[supervisor:start_child(?MODULE, [ ]) | |   _  <-lists:seq(1,20) ] 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

child_module.erl
----------------------------------------------------------------------------------

start(ListenSocket) ->
gen_server:start_link(?MODULE, [ListenSocket], [ ]). 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

init(ListenSocket) ->
gen_server:cast(self(), accept),
{ok, ListenSocket}. 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

handle_cast(accept, ListenSocket) ->
Socket=gen_server:accept(ListenSocket), 
supervisor:start_child(sup_module, [ ]), 
{noreply, Socket}. 

The code is simple, it creates 20 Parallel Erlang Processes that wait for uncoming connections, when a process accept a connection it creates another same process to keep the number of free acceptors at 20.
The question is what happens REALLY behind scenes ? when connections came, they will be handled at OS Level first (managed by the inet_drv.c) then went to ERTS Level, so what happens at OS Level ? is there 1 OS-Thread to accepts connections? 20 Threads ? N Threads ? 1 Thread accept and deliver connections to a Pool of Threads ? if we don’t have 20 OS -Threads ready to accept connections then there is any advantage in my knowledge to create 20 Erlang acceptors because accepting will be not parallel and we can create jut one process to accept and spawns it’s alternative.
I know that a Single Erlang Node can handle more than 2 millions connections with limited hardware at it’s normal scale so what is the equivalent at OS Level for 2 millions Erlang processes ?
I see that many of Erlanger at this group are contributors to the inet_drv.c so if possible to explain in boring details what this driver do in response to networking tasks, I tried to read it’s source but it’s very complicated so I will be thankful for a clear answer.

4 Likes

I guess one thread is enough in the os level since epoll or kqueue should have been used.

2 Likes

One OS-Thread to handle 2 millions TCP connections ?

2 Likes

In a nutshell here is what happens:

  1. <0.X.0> does gen_tcp:listen (SEQ)
  • A new FD is created using the listen C function.
  1. <0.X.0> does gen_tcp:accept (PARALLEL:ish)
  • Erlang tries to do accept
  • if EAGAIN place listen in pollset using epoll_ctl(EPOLL_CTL_ADD) and place <0.X.0> in a queue
  • else return new connection
  1. <0.X+N.0> does gen_tcp:accept (PARALLEL:ish)
  • If there is a queue for the listen FD, then place <0.X+N.0> in queue
  • else repeat the steps above.
  1. epoll_wait returns indicating that listen FD has new connection available (SEQ)
  2. Do accept in loop until either EAGAIN or queue is empty (SEQ)
  3. Handle data on connection (PARALLEL)

The sections marked as SEQ are done by a single thread, while the PARALLEL section can be done by any scheduler which usually means one thread per logical CPU core. I’ve marked the things that are parallel but still need some locking as PARALLEL:ish.

If you want more parallelism you can use the linux socket feature called reuseaddr to create multiple listen FDs for the same IP:PORT pair, though for most applications the default setup is good enough.

*I’ve skipped over a bunch of details in the above explanation. Hopefully, it will make sense without them.

7 Likes

I’d say that the example is not a good example of “handling 2 million connections”, which most probably is about having 2 million file descriptors that the VM handles read/write through the OS poll/select API on the platform’s limited amount of hardware kernels.

This example has got one file descriptor on which to handle a number of incoming connections. The VM (which inet_drv.c utilizes) handles that one file descriptor along with all others in its poll/select implementation, and does not handle the same file descriptor from more than on OS thread.

4 Likes

@garazdawi I try to read the complicated code line by line from gen_server.erl to inet_rdv.c so Iam just in the begining, I can’t find listen system call in the driver here’s the code :

case INET_REQ_LISTEN: { /* argument backlog */

	int backlog;
	DEBUGF(("tcp_inet_ctl(%p): LISTEN\r\n",
                desc->inet.port)); 
	if (desc->inet.state == INET_STATE_CLOSED)
	    return ctl_xerror(EXBADPORT, rbuf, rsize);
	if (!IS_OPEN(INETP(desc)))
	    return ctl_xerror(EXBADPORT, rbuf, rsize);
	if (len != 2)
	    return ctl_error(EINVAL, rbuf, rsize);
	backlog = get_int16(buf);
	if (IS_SOCKET_ERROR(sock_listen(desc->inet.s, backlog)))
	    return ctl_error(sock_errno(), rbuf, rsize);
	desc->inet.state = INET_STATE_LISTENING;
	return ctl_reply(INET_REP_OK, NULL, 0, rbuf, rsize);
    }
2 Likes

This is a macro resolving to listen.

3 Likes

Oh yes I missed that, Thank you so much. I will return when finish reading all the code

3 Likes

I think you mean reuseport (or rather, SO_REUSEPORT), not reuseaddr? On linux, this can be activated by setting a raw option in the listen call: {raw, 1, 15, <<1:32/native>>}.
There is no nice option to activate it in Erlang AFAIK.

4 Likes

Yes indeed. Reuseaddr would not help. Sorry for the mix up.

2 Likes

Hi for everyone, the code of inet_drv.c is so complicated with the sufficient knowledge of C language but I did some advances so an OS-Thread is the manager of the driver instance and will do all the work, creating a socket and bind it and set it to listen and finally accept connections, when accepting a connection it will create a new socket, but I can’t find in all the code any kqueue or epoll to manage all created sockets so how can that happens ? and how can the OS use these functions without call to them ?

Second, @garazdawi I didn’t understand what do you exactly mean by the algorithm but I didn’t understand it since the FD is created with socket() and not with listen(), and other thing that the Scheduler Thread doesn’t anymore IO events management (since Erlang/OTP 21) from and to the ERTS since there is a special Thread for that

10 posts were split to a new thread: Erlang io-uring support :023:

1 Like