Find Out What's Running on Dirty Schedulers

We’re seeing some weird errors occur when the CPU and Memory usage get close to 50%, the entire VM locks up and grinds to a halt. No processes can get started, everything just sort of…hangs.

Looking at msacc, I see stuff like

Average thread real-time    :   2023907 us
Accumulated system run-time : 122979882 us
Average scheduler run-time  :   1181889 us

        Thread      aux check_io emulator       gc    other     port    sleep

Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.01%    0.01%   99.98%
       aux( 1)   95.46%    0.02%    0.00%    0.00%    0.02%    0.00%    4.50%
dirty_cpu_( 1)    0.00%    0.00%    0.02%   99.54%    0.00%    0.00%    0.44%
dirty_cpu_( 2)    0.00%    0.00%    0.02%   97.90%    0.00%    0.00%    2.08%
dirty_cpu_( 3)    0.00%    0.00%    0.02%   99.08%    0.00%    0.00%    0.90%
dirty_cpu_( 4)    0.00%    0.00%    0.02%   99.80%    0.00%    0.00%    0.18%
dirty_cpu_( 5)    0.00%    0.00%    0.02%   97.32%    0.00%    0.00%    2.66%
dirty_cpu_( 6)    0.00%    0.00%    0.02%   97.90%    0.00%    0.00%    2.08%
dirty_cpu_( 7)    0.00%    0.00%    0.02%   98.89%    0.00%    0.00%    1.09%
dirty_cpu_( 8)    0.00%    0.00%    0.02%   99.03%    0.00%    0.00%    0.94%
dirty_cpu_( 9)    0.00%    0.00%    0.02%   97.67%    0.00%    0.00%    2.31%
dirty_cpu_(10)    0.00%    0.00%    0.02%   96.13%    0.00%    0.00%    3.85%
dirty_cpu_(11)    0.00%    0.00%    0.02%   98.25%    0.00%    0.00%    1.72%
dirty_cpu_(12)    0.00%    0.00%    0.01%   99.84%    0.00%    0.00%    0.14%
dirty_cpu_(13)    0.00%    0.00%    0.02%   98.90%    0.00%    0.00%    1.08%
dirty_cpu_(14)    0.00%    0.00%    0.02%   97.25%    0.00%    0.00%    2.73%
dirty_cpu_(15)    0.00%    0.00%    0.02%   97.06%    0.00%    0.00%    2.92%
dirty_cpu_(16)    0.00%    0.00%    0.02%   99.75%    0.00%    0.00%    0.23%
dirty_cpu_(17)    0.00%    0.00%    0.02%   98.95%    0.00%    0.00%    1.04%
dirty_cpu_(18)    0.00%    0.00%    0.02%   98.81%    0.00%    0.00%    1.17%
dirty_cpu_(19)    0.00%    0.00%    0.02%   98.35%    0.00%    0.00%    1.63%
dirty_cpu_(20)    0.00%    0.00%    0.02%   97.21%    0.00%    0.00%    2.76%
dirty_cpu_(21)    0.00%    0.00%    0.02%   98.94%    0.00%    0.00%    1.04%
dirty_cpu_(22)    0.00%    0.00%    0.02%   97.40%    0.00%    0.00%    2.58%
dirty_cpu_(23)    0.00%    0.00%    0.01%   97.60%    0.00%    0.00%    2.38%
dirty_cpu_(24)    0.00%    0.00%    0.02%   98.98%    0.00%    0.00%    1.00%
dirty_cpu_(25)    0.00%    0.00%    0.02%   99.22%    0.00%    0.00%    0.76%
dirty_cpu_(26)    0.00%    0.00%    0.02%   99.65%    0.00%    0.00%    0.33%
dirty_cpu_(27)    0.00%    0.00%    0.02%   99.50%    0.00%    0.00%    0.48%
dirty_cpu_(28)    0.00%    0.00%    0.02%   98.95%    0.00%    0.00%    1.03%
dirty_cpu_(29)    0.00%    0.00%    0.02%   99.35%    0.00%    0.00%    0.63%
dirty_cpu_(30)    0.00%    0.00%    0.01%   98.52%    0.00%    0.00%    1.46%
dirty_cpu_(31)    0.00%    0.00%    0.02%   98.69%    0.00%    0.00%    1.29%
dirty_cpu_(32)    0.00%    0.00%    0.02%   99.04%    0.00%    0.00%    0.94%
dirty_cpu_(33)    0.00%    0.00%    0.01%   99.42%    0.00%    0.00%    0.56%
dirty_cpu_(34)    0.00%    0.00%    0.10%   99.78%    0.00%    0.00%    0.12%
dirty_cpu_(35)    0.00%    0.00%    0.02%   99.58%    0.00%    0.00%    0.39%
dirty_cpu_(36)    0.00%    0.00%    0.02%   99.16%    0.00%    0.00%    0.82%
dirty_cpu_(37)    0.00%    0.00%    0.02%   99.96%    0.00%    0.00%    0.02%
dirty_cpu_(38)    0.00%    0.00%    0.02%   95.51%    0.00%    0.00%    4.47%
dirty_io_s( 1)    0.00%    0.00%    0.03%    0.00%    0.00%    0.00%   99.97%
dirty_io_s( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
      poll( 0)    0.00%    2.06%    0.00%    0.00%    0.00%    0.00%   97.94%
 scheduler( 1)    2.50%    0.18%   64.17%    3.96%    1.35%    0.91%   26.93%
 scheduler( 2)    3.11%    0.20%   60.10%    4.77%    1.15%    1.07%   29.61%
 scheduler( 3)    3.00%    0.15%   57.79%    5.28%    1.33%    1.02%   31.42%
 scheduler( 4)    2.27%    0.17%   55.32%    3.10%    1.34%    0.97%   36.84%
 scheduler( 5)    1.68%    0.14%   60.57%    3.27%    1.00%    0.61%   32.72%
 scheduler( 6)    3.82%    0.16%   62.58%    3.67%    1.20%    0.94%   27.63%
 scheduler( 7)    2.69%    0.15%   52.60%    3.22%    1.15%    0.63%   39.55%
 scheduler( 8)    3.26%    0.17%   53.29%    3.69%    1.03%    0.89%   37.68%
 scheduler( 9)    2.58%    0.16%   56.15%    2.95%    1.16%    0.70%   36.30%
 scheduler(10)    2.47%    0.16%   51.61%    4.94%    1.11%    0.74%   38.96%
 scheduler(11)    3.04%    0.17%   56.17%    4.63%    1.37%    1.00%   33.63%
 scheduler(12)    3.02%    0.14%   56.15%    3.35%    1.05%    0.76%   35.53%
 scheduler(13)    3.01%    0.15%   53.35%    6.15%    0.99%    0.85%   35.49%
 scheduler(14)    2.34%    0.14%   55.08%    3.92%    0.97%    0.76%   36.79%
 scheduler(15)    3.31%    0.16%   59.34%    4.52%    1.11%    0.69%   30.87%
 scheduler(16)    3.30%    0.17%   52.17%    6.91%    1.20%    0.86%   35.38%
 scheduler(17)    3.25%    0.13%   52.67%    2.67%    1.06%    0.56%   39.67%
 scheduler(18)    1.85%    0.14%   48.81%    2.29%    1.02%    0.61%   45.28%
 scheduler(19)    3.30%    0.16%   57.76%    3.47%    1.36%    0.99%   32.95%
 scheduler(20)    2.13%    0.17%   51.69%    3.73%    1.03%    1.85%   39.41%
 scheduler(21)    2.61%    0.19%   53.44%    3.25%    1.14%    1.25%   38.12%
 scheduler(22)    3.33%    0.16%   54.18%    4.06%    1.22%    0.87%   36.18%
 scheduler(23)    2.94%    0.13%   52.54%    3.76%    1.02%    0.97%   38.63%
 scheduler(24)    2.27%    0.13%   60.31%    3.38%    1.14%    0.88%   31.88%
 scheduler(25)    2.13%    0.14%   58.30%    4.44%    1.01%    0.72%   33.26%
 scheduler(26)    4.31%    0.14%   55.71%    2.91%    1.09%    0.77%   35.08%
 scheduler(27)    2.36%    0.22%   50.89%    3.36%    1.36%    1.16%   40.66%
 scheduler(28)    2.10%    0.16%   59.33%    3.63%    1.13%    0.74%   32.92%
 scheduler(29)    2.79%    0.16%   54.97%    3.87%    0.99%    0.84%   36.37%
 scheduler(30)    3.44%    0.14%   53.27%    4.43%    1.16%    2.45%   35.12%
 scheduler(31)    2.07%    0.11%   50.37%    3.36%    0.85%    0.58%   42.64%
 scheduler(32)    2.63%    0.14%   42.84%    2.42%    0.97%    0.68%   50.31%
 scheduler(33)    3.13%    0.10%   39.32%    2.02%    0.67%    0.48%   54.29%
 scheduler(34)    2.25%    0.11%   33.59%    1.75%    0.79%    0.47%   61.04%
 scheduler(35)    2.27%    0.08%   21.66%    1.42%    0.65%    0.39%   73.54%
 scheduler(36)    2.84%    0.08%   17.69%    1.12%    0.42%    0.30%   77.56%
 scheduler(37)    1.86%    0.06%   19.05%    1.29%    0.43%    0.21%   77.10%
 scheduler(38)    1.92%    0.09%   22.21%    1.21%    0.48%    0.32%   73.77%

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.01%    0.01%   99.98%
           aux   95.46%    0.02%    0.00%    0.00%    0.02%    0.00%    4.50%
dirty_cpu_sche    0.00%    0.00%    0.02%   98.60%    0.00%    0.00%    1.37%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
          poll    0.00%    2.06%    0.00%    0.00%    0.00%    0.00%   97.94%
     scheduler    2.72%    0.15%   50.44%    3.48%    1.04%    0.83%   41.35%

I checked How to debug\profile scheduler utilization, which sounded similar, but I’ve determined it’s a slightly different problem. The odd thing is, I can’t seem to reproduce this with artificial load, but it’s trivially reproducible when our load scales up.

msacc seems to suggest most of it doing gc’ing. I’ve checked etop (nothing suspicious) as well as some tracing with redbug and observer and haven’t found anything that’s the culprit.

I was wondering if anyone has seen similar problems, or if there’s a way to see what’s happening on a specific scheduler. I checked Erlang --, but that gives me general scheduler utilization.

1 Like

Yes, it looks like there is a lot of GC happening in every single dirty CPU scheduler. At first I thought of suggesting to use perf -g (see this guide) running the BEAM with +JPperf true.

What I would start looking at, is the heap size of processes running in your system. Dirty CPU schedulers are only used to garbage-collect large (>1Mb) heaps. Take a look at the heap size of your processes running in the system. What makes it so large?

huh, I’ll give it a shot, I never knew there was perf support.

The bulk of our workload is JSON deserialization and loading records from a DB. These records can sometimes grow quite large, and I know that we use a NIF to do some JSON stuff, so I wonder if this is what’s causing it.

Which NIF are you using for that? Is it running on a dirty CPU scheduler? (looks like not, otherwise MSACC would’ve reported that, but it only appears to be GC only).

@max-au we’re using jiffy for decoding JSON, which I think is partially why this is happening

I’ve been able to reproduce this with this minimal example:

C NIF

#include <erl_nif.h>
#include <math.h>
#include <stdlib.h>

#define CPU_ITER 10000
#define LIST_SIZE 100000

static ERL_NIF_TERM make_large_list(ErlNifEnv* env) {
    ERL_NIF_TERM list = enif_make_list(env, 0);

    for(int i = 0; i < LIST_SIZE; i++) {
        list = enif_make_list_cell(env, enif_make_int(env, i), list);
    }

    return list;
}

static ERL_NIF_TERM burn_cpu_dirty_nif(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {
    volatile double d = 0.0;

    for(long i = 0; i < CPU_ITER; i++) {
        d += sqrt(i);
    }

    return enif_make_tuple2(env, enif_make_double(env, d), make_large_list(env));
}

static ERL_NIF_TERM burn_cpu(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {
    return enif_schedule_nif(env, "burn_cpu_dirty", ERL_NIF_DIRTY_JOB_CPU_BOUND, burn_cpu_dirty_nif, argc, argv);
}

static ErlNifFunc nif_funcs[] = {
    {"burn_cpu", 0, burn_cpu}
};

ERL_NIF_INIT(cpu_burner, nif_funcs, NULL, NULL, NULL, NULL)

and Erlang module

-module(cpu_burner).
-on_load(load_nif/0).
-export([burn_cpu/0, burn_gc/0]).

burn_gc() ->
    lists:seq(1, 1000000), % Create a large list
    ok.

load_nif() ->
    erlang:load_nif("./cpu_burner", 0).

burn_cpu() ->
    {_, LargeList} = cpu_burner:burn_cpu(),
    _ = length(LargeList),
    erlang:garbage_collect(),
    ok.

Running burn_cpu causes it to loop and for the gc to start to take resources away from the dirty scheduler

Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
       aux( 1)    2.78%    0.16%    0.00%    0.00%    0.00%    0.00%   97.07%
dirty_cpu_( 1)    0.00%    0.00%   55.59%   13.03%    0.00%    0.00%   31.38%
dirty_cpu_( 2)    0.00%    0.00%   55.57%   13.05%    0.00%    0.00%   31.38%
dirty_cpu_( 3)    0.00%    0.00%   55.58%   13.04%    0.00%    0.00%   31.38%
dirty_cpu_( 4)    0.00%    0.00%   55.59%   13.02%    0.00%    0.00%   31.38%
dirty_cpu_( 5)    0.00%    0.00%   55.60%   13.02%    0.00%    0.00%   31.38%
dirty_cpu_( 6)    0.00%    0.00%   55.58%   13.04%    0.00%    0.00%   31.38%
dirty_cpu_( 7)    0.00%    0.00%   55.58%   13.04%    0.00%    0.00%   31.38%
dirty_cpu_( 8)    0.00%    0.00%   55.55%   13.07%    0.00%    0.00%   31.38%
dirty_cpu_( 9)    0.00%    0.00%   55.56%   13.06%    0.00%    0.00%   31.38%
dirty_cpu_(10)    0.00%    0.00%   55.60%   13.02%    0.00%    0.00%   31.38%
dirty_cpu_(11)    0.00%    0.00%   55.62%   13.00%    0.00%    0.00%   31.38%
dirty_cpu_(12)    0.00%    0.00%   55.57%   13.05%    0.00%    0.00%   31.38%
dirty_cpu_(13)    0.00%    0.00%   55.59%   13.03%    0.00%    0.00%   31.38%
dirty_cpu_(14)    0.00%    0.00%   55.58%   13.04%    0.00%    0.00%   31.38%
dirty_cpu_(15)    0.00%    0.00%   55.58%   13.04%    0.00%    0.00%   31.38%
dirty_cpu_(16)    0.00%    0.00%   55.57%   13.05%    0.00%    0.00%   31.38%
dirty_cpu_(17)    0.00%    0.00%   55.60%   13.02%    0.00%    0.00%   31.38%
dirty_cpu_(18)    0.00%    0.00%   55.60%   13.02%    0.00%    0.00%   31.38%
dirty_cpu_(19)    0.00%    0.00%   55.61%   13.01%    0.00%    0.00%   31.38%

which I guess points at the root cause of this problem, I didn’t realize that gc in this graph directly takes away from processing requests. What’s interesting is that the GC processing actually reduces CPU load once it ramps up, according to top, which I’m not totally clear on what’s happening there.

I guess the behavior of garbage collection is that we have to alternate between running code on the dirty and regular schedulers to actually trigger the GC?

1 Like

Since your NIF reschedules itself to be executed on a dirty CPU scheduler, it indeed has to split the time between actual “processing” and “large heap GC”.

One way to avoid that would be avoiding creating those large heaps (>1 Mb). In the code you pasted you deliberately create heaps over 1 Mb (and therefore GC happens on the dirty scheduler). In actual production code this is rarely deliberate, but often accidental.

For OTP 26, I implemented memory tracing, and OTP 27 also includes Process Heap Profiler that may help you to understand whether memory allocations in your processes are optimal. Fun fact, you don’t need to wait until OTP 27 release, as hprof is compatible with OTP 26. You can just copy and paste the module in your project, and use it.

3 Likes

yep, you’re right in that we don’t deliberately create heaps >1mb in prod, I think something is happening in the JSON NIF from the library we use that’s creating this big heaps.

We have a potential upgrade test for OTP 26, I’ll see how far we get on that. Thanks for the help on this, learning a lot about the internals here.

I have used Erlang -- instrument in the past to find out a NIF that was allocating too much memory and making the system either crash or become unresponsive. I haven’t seen it mentioned before and maybe it can help.

2 Likes

I may recommend something that’s quite opposite to multiple considerations (for a reason!) and ask you to add erlang:garbage_collect() call after the NIF call. The hypothesis I have is, when NIF creates a new term, it quickly gets promoted to the old heap, and many of these accumulate before Garbage Collection heuristics chimes in.
While it’s not necessarily the right approach, it may at least explain what’s happening.

You are getting great advice here, but question : What has led you to believe this because of a nif? I think that’s a great place to start if you have no other clues, but you indeed do have one, which is a lot of gc work happening. You may have some processes getting frequent messages that are relatively big and complex. I also wonder if you have adjusted gc settings globally or on some processes in your system.

You could setup a system monitor to get information when this happens. See Erlang -- erlang . Do be somewhat careful on a production system.

I hope that helps :slight_smile:

ah, I think you’re right @starbelly. I tried out hprof to capture whatever might be causing a ton of GC usage, and it seems like cowboy_stream_h:request_process/3 is the culprit:

****** Process <18318.70487299.0>    -- 0.00 % of total allocations ***
MODULE          FUN/ARITY          CALLS   WORDS  PER CALL  [     %]
cowboy_stream_h request_process/3      1  661290    661290  [100.00]
                                          661290            [ 100.0]

****** Process <18318.70709003.0>    -- 0.00 % of total allocations ***
MODULE          FUN/ARITY          CALLS   WORDS  PER CALL  [     %]
cowboy_stream_h request_process/3      1  754267    754267  [100.00]
                                          754267            [ 100.0]

****** Process <18318.70174709.0>    -- 0.00 % of total allocations ***
MODULE          FUN/ARITY          CALLS   WORDS  PER CALL  [     %]
cowboy_stream_h request_process/3      1  458827    458827  [100.00]
                                          458827            [ 100.0]

****** Process <18318.70710732.0>    -- 0.00 % of total allocations ***
MODULE          FUN/ARITY          CALLS  WORDS  PER CALL  [     %]
cowboy_stream_h request_process/3      1  12939     12939  [100.00]
                                          12939            [ 100.0]

****** Process <18318.70995551.0>    -- 0.00 % of total allocations ***
MODULE          FUN/ARITY          CALLS  WORDS  PER CALL  [     %]
cowboy_stream_h request_process/3      1  81587     81587  [100.00]
                                          81587            [ 100.0]

****** Process <18318.70693398.0>    -- 0.00 % of total allocations ***
MODULE          FUN/ARITY          CALLS   WORDS  PER CALL  [     %]
cowboy_stream_h request_process/3      1  452910    452910  [100.00]
                                          452910            [ 100.0]

We routinely have messages that are big, and we’re routinely processing tons of JSON, so I originally suspected jiffy. But looking at this more, I think it’s that our requests are just big in general.

Although from here, I’m not really sure what the fix is, except to increase the size of the boxes themselves? I’m seeing a weird bug where when the process reaches around 50% CPU usage, the entire application locks up. I wonder if this is because the ratio of dirty schedulers to regular ones is 40:40?

1 Like

Makes sense, do you have a way you can verify this in isolation? That is, just take the payload and drop it on the floor. It also may be it takes both in order for this headache to ensue.

Edit:

I don’t know if I would make that leap yet :slight_smile: What you describe is indeed abnormal, and there’s more questions than potential answers :slight_smile:

I think it would help to know if this bare metal, VMs, containers on VMs, containers on bare metal, etc.

Other question, when you say “locks up”, do you mean it locks up and never resumes? Or that it locks up and then resumes?

That said, if I was you, the first place I would look is at kernel logs or some equivalent, it sounds like something nasty may be happening underneath you and it could be as simple as a resource limit you’re hitting.

Then finally, if you are pretty certain it’s the requests causing the issue, there’s some questions as to exactly how you’re using the web server, whether this is http/1 or http/2, etc. etc.

Ah yeah, I should’ve been more clear. When I say “locks up”, I see my application going to 95+ % on dirty schedulers doing gc, which prevents further requests from being handled because nothing else seems to happen in the system. Using recon_trace and filtering on heaps >1mb shows me just regular cowboy handle loops.

Originally I thought this was a system problem, but that doesn’t seem to be it. I notice when this happens, CPU usage stays around 50% reported from the OS until my application is killed.

I’m running on Kube inside regular docker containers on virtualized boxes. My suspicion at what’s happening is:

  • dirty schedulers are only ever at 50% of cpu because they can only ever equal the # of regular schedulers
  • gc needs to happen because I often have requests on the cowboy loop that allocate and do stuff that takes >1mb
  • gc gets scheduled onto the dirty schedulers
  • gc locks up the system when all the dirty schedulers are occupied
  • resulting in dropped requests
1 Like

Yeah that makes a bit of sense, coupled with hitting resource limits, the latter part is key. This should be reproducible in isolation outside of production as well I do believe (but sometimes harder than originally anticipated).

I can follow up on this later unless someone beats me to it with thoughts, though without having a better understanding of what you’re doing exactly in your loop, things are a bit fuzzy. There’s quite a few different tunables to experiment. Still where I would start with is gathering system information, as an example, are you at close to 100% system utilization when this happens?

In the end I get your problem which is, do I scale (vertically or horizontally) at this point or is there something I can do to make this pain go away or lessen a good deal. Systems level introspection aside, I’d start looking at the code you own (not cowboy) and asking, what could be going wrong? Does this happen only when I decode json or does this also happen when I take in a binary and then do nothing with it, etc. etc. FWIW, I’d expect the decoded json structure in the end to be the issue here, but just a hunch. Large transient maps that are complex (i.e., maps of lists and maps) create a lot of garbage.

It may help to forceerlang:garbage_collect/1 call after completing each request (if a single request makes less than 1 Mb heap, it won’t switch to dirty CPU scheduler). You can also try to reduce JSON sizes, or keep it in an original (binary?) form without fully deserializing (work on a serialized representation).

1 Like

I’d be curious if these are related to long lived processes or short lived processes that just keep getting spun up on every request.

The assumption would be it’s a short lived process. If I understand correctly, this is probably happening in the hand off from the middleware to @peixian’s code. That should still be a binary, if so, it may be worth spinning up a proc after the handoff to do the decoding and what ever else is done with it. Since it’ll will be refc bin, the copy to a new proc would be cheap. You might even be able to specify a bigger initial heap to avoid gc. The rationale being that you’re going to need N bytes of memory anyway to deal with your average size payload, so might as well specify that up front and avoid the major gc. The proc then is torn down and should still avoid a gc, I believe, but do please keep me honest here :slight_smile:

If it is short lived, gc is going to get called before the request completes perhaps because not enough was allocated to the proc up front. That is, a refc bin is passed, the bin is decoded, and now the stack and heap meet.

OTOH, if is happening inside a long lived process, well, the above still applies I do believe :slight_smile:

2 Likes

@starbelly updating this, you were totally right :slight_smile:. We had a Task.async that was doing a bunch of short lived processes that would get GC’d, which would drive up GC pressure. Thanks for the help on this @max-au as well, y’all were very helpful in tracking down this bug.

3 Likes