Supercarrier questions - size not included in total memory, and observing memory climb while experimenting with +MMscrpm true and +Musac false

starbelly · November 16, 2022, 6:58pm

I have a few questions about supercarrier:

Supercarrier size is not included in total memory (available). As is tradition, there is usually a good reason for why something is the way it is in OTP, but I would be remiss not to understand the why, especially if I’m thinking about opening an issue or submitting a PR.
I did a quick experiment using +MMscrpm true and +Musac false. I observed memory climb way beyond the specified super carrier size. I have not dived into this yet, but presumably there’s an issue with a nif (or more likely a lib that a nif makes use of) which is the culprit here, however this does not happen when sys allocators are enabled. It might also be that linux overcommit is the problem here (current mode is 1 on a test machine). Before I do a deep dive, I figured I’d just ask and see if others have any off the cuff thoughts

max-au · November 16, 2022, 11:51pm

Supercarrier cannot be included in the total memory, because it’s not memory that is allocated. It is reserved, but not allocated. This behaviour cannot be changed, because it will break compatibility.

I don’t remember if there is a way to fetch super-carrier size (IIRC no, but can be added to system_info/1 with a PR).

Memory can climb beyond super-carrier size for system allocations (like stack, stack guards etc.). I recommend listing mapped memory with cat /proc/<pid>/maps and going from there.

rickard · November 17, 2022, 2:07am

Is it MemTotal and MemAvailable in /proc/meminfo you are referring to here?

I’m not sure that I’m answering your question here but I’ll give it a try:

+MMscrpm true (which also is the default) will map the whole supercarrier with write and read permissions, so that it can be accessed without doing anything else upon access to pages in it. This mapping will also be left as is for the time that the VM lives. This will reserve physical memory for the mapping from the underlying system. How this memory is reserved depends on the underlying system. In the case of Linux we have three different modes which could be explained something like this:

Don’t overcommit (2). There must be available backing store (RAM or swap) where the mapped memory can be backed in order to allow the mapping.
Overcommit (1). Allow any mapping regardless of whether there is available backing store or not.
Heuristic overcommit (0). Some heuristic logic is used to determine if it is reasonable that there will be available backing store.

Depending on how you’ve configured Linux “reserving physical memory” means vastly different things. When using Overcommit (1) there is not much of a reservation being done at all…

As pages in the supercarrier are touched an actual mapping from virtual memory to physical memory will be done, and the resident set size will increase. Available memory will decrease as the resident set size increase.

If you enable locking of physical memory (+Mlpm all) all virtual memory pages of the supercarrier will be mapped to physical memory pages when the supercarrier is created and your resident set size will bounce up to at least the size of the supercarrier. Available memory will instantaneously decrease by the same amount.

On 64-bit systems also literal_alloc has its own super carrier as well. This one defaults to 1GB if I remember correct. NIFs and driver may also allocate memory by other means than the ERTS specific allocators.

This is strange. Seems to indicate that the memory allocations “running wild” are handled by ERTS specific allocators.

rickard · November 17, 2022, 2:25am

The fact that it doesn’t happen when sys_alloc is enabled got me thinking that these allocations were done via ERTS specific allocators. However, by default super carrier only is enabled (+MMsco true) which together with the disabling of sys_alloc should make all ERTS internal allocators create all of their carriers from the supercarrier. This together with that you say that “memory climb way beyond the supercarrier size” points to the opposite. That is, the allocations “running wild” probably doesn’t use ERTS specific allocators, but instead allocate memory by some other means.

starbelly · November 17, 2022, 2:28pm

mem total. I think see some bad instrumentation on my end, so this can be ignored for now. That said, it would be nice to get all allocated memory by the vm. There might already be a simple way to do this, if not, I’m happy to send up a patch.

You are/did, as I was seeking confirmation on behaviour, see below.

Indeed, and I know (have known) I need to move to mode 2, but have been staggering that move as I’ve had trouble going to mode 2 when there’s ample memory available, but now per this recent experiment, something tells me it’s not mode 2 at all

Yup, and not unlike moving to overcommit mode 2, this is where I want to get, but as mentioned, have had problems, which once again I think don’t have anything to do with mode 2 nor memory locking.

This makes sense, and explains at least 1GB of memory that is bigger than the super carrier to me.

Exactly, and this is the confirmation I sought before going deep into what’s going on. I’ll experiment some more and follow up with results. I should have shared some numbers to begin with. In my dev experiment supercarrier would be set to 50GB, it would climb to that, I would then see it go to 54 and hang around there (this makes sense per what you said about ll alloc area and probably a few other things), then I would see climb to 64GB and in an unhealthy way, which once again makes me think something nifarious is going on

As stated, more experiments to follow and will reply with findings here. Cheers!

rickard · November 20, 2022, 5:07pm

Ok, I see what you mean although I’m using a different terminology.

To me “allocated by the VM/runtime system/emulator” is what erlang:memory(total) returns. That is, the amount of memory that the VM has requested from the underlying memory allocators. The allocators supplying this memory may be the ERTS internal memory allocators, but it may also be another memory allocator implementation. For example, by passing +Meamin on the command line you’ll get the default malloc() implementation on the OS you are using.

I guess that you are after the amount of physical memory reserved by the memory allocator implementation. Is this correct? To me this is not memory allocated by the VM, but something internal to the memory allocator implementation. It is, however, still an interesting figure. Today it is not easily available for the simple reason that it is more work than one might think to implement this in order to get the correct value for all different configurations, but it is of course possible to implement.

One good estimate for most cases is the sum of all carriers used by all alloc_util allocators. This, however, ignores carriers in mseg_alloc caches and won’t be a very good estimate at all when reserve physical memory has been enabled on a super-carrier.

A PR implementing this is welcome. Some requirements, though:

It should work for all configurations of the ERTS internal allocators. At least all where you don’t disable any ERTS internal allocators.
Mappings of memory which reserve physical memory should be included, but not mappings which only reserve virtual memory.
It should not modify erlang:memory(total) since it then would alter what that means. I actually don’t think it belong in erlang:memory() at all since it is an allocator internal value and all other values reflects what the VM has requested from the allocator. Perhaps something like erlang:system_info(allocator_segments_size).

starbelly · November 21, 2022, 3:17pm

Correct, and thanks for clearing up terminology when it comes to the VM.

Awesome, this will be an adventure

starbelly · December 9, 2022, 12:20am

Right, so it was combination of being in mode 1 and riding on the edge in the enviroment I was testing (i.e., not having enough memory left for nifs and so forth) I believe. No problems running in mode 2 with memory locking on and sys allocators disabled. Thanks!