Looking for help with BEAM and CPU requests/limits in containers inside Kubernetes

victorolinasc · November 16, 2022, 10:15pm

I’ve been struggling to understand, setup and monitor proper CPU requests/limits for the BEAM running in containers inside Kubernetes.

I know that since OTP-23 the BEAM is aware of cgroups resources and that is awesome. Unfortunately, that works ONLY for cpu limits and not cpu requests.

It seems that the new best practice for Kubernetes is to set requests but NOT limits for CPU.

For the love of god, stop using CPU limits on Kubernetes (updated) | Robusta

Well, it just so happen that when we have a node with 8 CPUs and give the BEAM container only 3 CPUs through limit, it will respect accordingly. That is not true for the requests only.

My colleague devised a simple hack in an Elixir startup script with the following:

NPROC=$(nproc)
case $RELEASE_COMMAND in
  start*|daemon*)
    ELIXIR_ERL_OPTIONS="+S ${NPROC:-1}:${CPU_REQUEST:-1}"
    export ELIXIR_ERL_OPTIONS
    ;;
  *)
    ;;
esac

This will init with 8:3 in the example which is expected. This gives us less cpu throttling as a result.

Is there any other way to tune this up? Does it make sense the way it is? Should we create an issue about this?

Thanks in advance

robsonpeixoto · November 17, 2022, 3:11pm

The envvar CPU_REQUEST is not defined by default, but is easy to define:

- name: CPU_REQUEST
    valueFrom:
    resourceFieldRef:
        containerName: my-container-name
        resource: requests.cpu

rlipscombe · November 23, 2022, 11:18am

The blog post you linked to (good read; thanks) explicitly says that you don’t want to limit CPU usage (because you’re leaving potential compute on the floor). Instead, by using requests, you’re allowing your processes to be a little bit bursty.

So if you don’t want Kubernetes to limit CPU usage (the premise in the blog post), why tell the BEAM to do it instead?

victorolinasc · November 23, 2022, 1:23pm

@rlipscombe this is true when we think about one node per POD/BEAM. If there are more than one POD/BEAM on the same node it will NOT throttle CPU per limits, but it will have CPU contention on the node level. If the node has 8 CPUs and I have no requests set, we will probably have more than one POD on it and then both BEAMs will have 8:8 schedulers by default. That way, a single core will be locking between BEAMs which I believe is not ideal management.

The idea here is to have the BEAM respect CPU requests (and not only limits) so that we don’t see CPU throttling AND we avoid CPU locking on the node level.

Hopefully this makes this a bit more understandable but I am not proficient enough to be sure this is how it works hehehe