Sending user space SCTP ABORT with gen_sctp

cristconst · December 4, 2023, 4:35pm

Hi,

Lets say an Erlang application uses gen_sctp for communicating over SCTP associations.

Is there any other way of generating an user space SCTP ABORT than explicitly calling gen_sctp:abort()?
How?

Is the source code for the gen_sctp also available?

Thank you,
Cristian

starbelly · December 5, 2023, 6:27pm

Hi, I don’t know that much about SCTP, but three points though and a question…

gen_sctp:abort/2 simply does Mod:sendmsg/3 (where mod is the callback module, inet_sctp or inet6_sctp).
Per the above and docs you can also do this yourself gen_sctp:send/3, though the docs loudly note that the usage (as in a need for it), is and probably should be rare.
You can of course view all of the source related to gen_sctp, inet_sctp*, socket, etc. on github. Perhaps a link to gen_sctp on github can help get you started.

Question : What is the problem with calling gen_sctp:abort/2 ?

cristconst · December 5, 2023, 11:19pm

Hi,

There is no problem sending SCTP user space ABORTs with gen_sctp:abort().
The problem is the other way around:

An Erlang application which uses the following OTP libs:

diameter
diameter_sctp
gen_sctp

and which proxies diameter messages (yes, it uses SCTP transport layer) and works O.K. in, say “normal conditions” sends an SCTP user space ABORT more or less out of the blue when the traffic spikes suddenly. I see the application user space SCTP ABORT chunks in the pcap captured on the machine that runs it.

The application does not use gen_sctp:abort().
diameter lib does not seem to use gen_sctp:abort().
diameter_sctp transport lib (built on top of gen_sctp) does not seem to explicitly use gen_sctp:abort().

My question is more or less: where does this user space SCTP ABORT chunk come from?

I have explained all this in a previous message which I sent to but for one reason or another was not published on “Questions/Help”.

Note:

“SCTP user space ABORT” vs “SCTP kernel space ABORT”

The SCTP socket API RFC 6458 - Sockets API Extensions for the Stream Control Transmission Protocol (SCTP) mandates that the ABORT chunks sent from user space should contain a specific error cause (12):

     SCTP_ABORT:  Setting this flag causes the specified association
        to abort by sending an ABORT message to the peer.  The ABORT
        chunk will contain an error cause of 'User Initiated Abort'
        with cause code 12.  The cause-specific information of this
        error cause is provided in msg_iov.

See also:

One can clearly distinguish the user space SCTP ABORT chunks from the kernel space SCTP ABORT chunks in pcaps.
After the application closes (ungracefully) the SCTP association with an ABORT chunk the local kernel will answer all the other chunks sent by the remote (which were on the wire, in buffers before the local user space ABORT was received by the peer) with kernel generated ABORT chunks.

Thank you very much indeed,
Cristian

cristconst · December 26, 2023, 6:03pm

Hi,

It looks like this is related to another issue described here:

What happens is:

there is one controlling (Erlang) process for the SCTP socket;
the socket was gen_sctp:peeloff(-ed) from a listening one and “holds” one valid association;
during a gen_sctp:send operation, the socket buffer is full (either the network or the remote end is congested) and the send gets an EAGAIN error;
application logic does an exit in the controlling process;

I do not know exactly how are the system resources (like file descriptors, sockets aso) garbage collected at the Erlang process exit but it looks like due to the socket state (?):
a. either there is a proper SCTP SHUTDOWN performed on the respective association (close)
b. or there is an SCTP ABORT performed on the respective association (abort)

When I have some more spare time I will have a look at how:
a. resources are collected when an Erlang proces exits; since I am new to Erlang I could use some hints here.
b. why sometimes there is an association ABORT while other times there is an association SHUTDOWN.

Thanks a lot,
Cristian

starbelly · December 30, 2023, 5:54pm

I think the behaviour you’re interested in is going to be determined by the implementation of SCTP on the platform you’re using.

As an example, from linux/net/sctp/socket.c on main, we can see that calling close can result in an abort. Thus, I think the answer is going to reside there as to why sometimes a shutdown signal is sent and sometimes an abort is sent.

However, as you hinted it, this may also be related to exactly what happens when a socket nif dtor (destructor) is called. The destructor is called at some point (though non-deterministic iirc) when the process the resource was tied to ceases to be (i.e., exits). In the case of the socket dtor, the dtor delegates to the io backend (this is dependent on the platform, I believe), which will close the socket and perform some other env related cleanup, the main dtor function will then destroy mutexes associated with resource as well. Assuming everything goes well, I would expect only a close to happen.

I hope some of this information helps, as previously stated, I do not know much about SCTP