Supercarrier general protection fault with a memlocked setup

starbelly · February 28, 2023, 12:39am

I’m not sure if I hit a bug or an oddity with the system I’m working on.

Over the weekend I set up our dev cluster to utilize memlock (+Mlpm all) with sys allocators disabled (+Musac false). I originally started off in mode 2, but that was a bit brittle such that I had to bump the overcommit ratio and at that point I said well, might as well just go back to mode 1.

The brittleness I saw, I thought was entirely because of being in mode 2, but I was wrong. The supercarrier in question is 55GB. 10GB left in the system for other procs on the system and nifs (and libraries they use) overhead (i.e., they are not using the super carrier).

I’ll see cannot allocate some number of bytes of type heap_frag, the erlang crashdump for one instance showed there’s was only 44GB of the 55GB SC in play at the time. There’s also a general fault protection (on a normal scheduler) message from the linux kernel in the logs.

I’m happy to provide crash dumps, core dumps, etc. And maybe an issue on Erlang/OTP is worth opening in that regard. I default to opening here as I might simply be “holding it wrong”.

It’s possible there’s badly behaved nif, but if that were the case, I would expect that to happen when memock is off and sys allocators are on.