Memory leak in Docker container when using open_port for image conversion

zabrane · May 26, 2025, 8:27am

Hi all,

I’m experiencing a memory leak issue (not sure at 100%) in my Erlang application running inside a Docker container and hoping someone might have insights or similar experiences to share.

Setup:

Erlang release running in Docker
Application performs image format conversion (any format → WebP)
Using open_port/2 to call external image conversion programs

Problem:
The Docker container’s memory usage continuously grows after each image conversion operation, suggesting a memory leak. The memory is not being reclaimed between conversions.

Code pattern (simplified):

convert_image(InputPath, OutputPath) ->
    Cmd = io_lib:format("convert ~s ~s", [InputPath, OutputPath]),
    Port = open_port({spawn, Cmd}, [exit_status, {line, 1024}]),
    wait_for_port(Port).

wait_for_port(Port) ->
    receive
        {Port, {exit_status, 0}} ->
            port_close(Port),
            ok;
        {Port, {exit_status, Status}} ->
            port_close(Port),
            {error, Status}
    end.

Questions:

Are there known issues with open_port memory management in containerized environments?
Should I be doing additional cleanup beyond port_close/1?
Could this be related to how Docker handles process cleanup vs. the Erlang VM?
Has anyone experienced similar issues with external command execution in Docker?

Any insights, debugging suggestions, or similar experiences would be greatly appreciated.

Thanks.

System details:

Erlang/OTP version: 28 [erts-16.0]
Docker base image: Docker version 28.1.1, build 4eba377
Host OS: Ubuntu 22.04 LTS

vkatsuba · May 26, 2025, 10:40am

Hi,

Thanks for sharing the setup and details - this behavior does look like a memory leak, but from your code it’s more likely a case of unhandled port messages accumulating in the Erlang process mailbox, which can cause memory to grow steadily over time.

Let me explain:

In your current convert_image/2 pattern, you only match the final {Port, {exit_status, _}} message. But the external command (like convert) can and often does emit standard output, standard error, or other intermediate messages - for example:

{Port, {data, Data}}

These are not being handled or flushed from the mailbox, so they accumulate. Even if you don’t care about them, you still need to explicitly receive and discard them - otherwise they sit in the process’s mailbox, causing memory to grow unbounded.

For fix it - you should modify your wait_for_port/1 function to handle all port messages, including data and exit_status messages.

Here’s a better version:

wait_for_port(Port) ->
    receive
        {Port, {data, _Line}} ->
            %% Discard or log if needed
            wait_for_port(Port);
        {Port, {exit_status, 0}} ->
            port_close(Port),
            ok;
        {Port, {exit_status, Status}} ->
            port_close(Port),
            {error, Status};
        {'EXIT', Port, Reason} ->
            {error, Reason}
        Any ->
            {error, {not_handled, Any}}
    after 5000 ->
        port_close(Port),
        timeout
    end.

This way you drain the mailbox of all messages from the port before closing it - preventing memory buildup in your Erlang process.

zabrane · May 26, 2025, 11:59am

@vkatsuba many thanks for the detailed analysis about unhandled port messages.

The pseudo code above was for illustration purposes. In reality, we use sh which handles all these cases (but {'EXIT', Port, Reason}) including proper message draining and port cleanup.

However, your point is still relevant - could the containerized environment cause different timing or more verbose output that leads to missed cleanup? The fact that it works fine on bare metal but shows memory growth in Docker suggests there might be a Docker-specific interaction we’re missing.

I’ll investigate further. Thanks for pointing us in the right direction!

vkatsuba · May 26, 2025, 12:25pm

Thanks for the clarification! If you’re already draining all port messages properly (stdout, stderr, exit_status) and closing the port reliably, then you’re right - the difference in behavior between Docker and bare metal might stem from:

Zombie processes inside the container
If sh spawns subprocesses (e.g., calling convert), and those aren’t properly waited on by the shell, they might become zombies. Even though port_close/1 is called, the process might linger in the container and keep holding memory. You might want to inspect with:

docker exec <container> ps aux | grep defunct

Missing exec in the shell command
If you’re spawning via something like open_port({spawn, "/bin/sh -c ..."}), but don’t prefix the actual command with exec, the shell might stay alive after the actual conversion tool exits. Try:

CommandStr = io_lib:format("exec convert ~s ~s", [InputPath, OutputPath]),
ShellCmd = io_lib:format("/bin/sh -c '~s'", [CommandStr]),
Port = open_port({spawn, ShellCmd}, [exit_status, {line, 1024}]).

Docker memory reporting quirks:
The container might appear to grow in memory because the BEAM doesn’t return freed memory to the OS immediately. erlang:memory/0 might show flat usage even when docker stats shows growth. Use recon_alloc from recon:

1> erlang:memory().
2> recon_alloc:memory(allocated_types).

Or top/htop inside the container.

Also worth noting - in practice, each situation can differ subtly. Even when the code pattern looks correct, subtle differences in environment (Docker cgroup limits, filesystem latency, shell behavior, memory allocators, etc.) can cause surprising side effects. In many real-world cases, narrow leaks or zombie processes are hard to trace without access to the specific project, logs, or ability to reproduce the issue under controlled conditions.

zabrane · May 26, 2025, 2:07pm

@vkatsuba Thanks for the debugging framework!

Zombie processes: No lingering processes found:

$ ps aux | grep defunct | grep -v grep

Missing exec: No /bin/sh -c involved - we use direct spawn.

Memory reporting: Erlang memory shows minimal growth (~122KB total), but Docker stats shows significant container memory growth (+15MB). This suggests the issue might be at the OS/container level rather than within the BEAM.

Before conversion:

1> erlang:memory().
[{total,18326656},
 {processes,1956224},
 {processes_used,1953480},
 {system,16370432},
 {atom,457647},
 {atom_used,457647},
 {binary,98488},
 {code,9799791},
 {ets,829720}]
2> recon_alloc:memory(allocated_types).
[{binary_alloc,720896},
 {driver_alloc,196608},
 {eheap_alloc,3334144},
 {ets_alloc,1179648},
 {fix_alloc,720896},
 {ll_alloc,6291456},
 {sl_alloc,196608},
 {std_alloc,1769472},
 {temp_alloc,393216}]

After conversion:

1> erlang:memory().
[{total,18448840},
 {processes,1984656},
 {processes_used,1982936},
 {system,16464184},
 {atom,459404},
 {atom_used,459404},
 {binary,135952},
 {code,9840759},
 {ets,830712}]
2> recon_alloc:memory(allocated_types).
[{binary_alloc,1245184},
 {driver_alloc,196608},
 {eheap_alloc,3334144},
 {ets_alloc,1179648},
 {fix_alloc,720896},
 {ll_alloc,6291456},
 {sl_alloc,196608},
 {std_alloc,1769472},
 {temp_alloc,393216}]

The disconnect between stable Erlang memory and growing Docker memory suggests either:

The external image conversion tool is leaving artifacts in the container filesystem
Docker’s memory accounting is including cached/buffered data that’s not being released

Will investigate Docker-specific memory behavior next.

zabrane · May 26, 2025, 11:04pm

Hi all,

Just wanted to follow up on my earlier post about memory growth in Docker when using open_port to call external image conversion tools.

Root Cause Found:
The issue wasn’t a memory leak in Erlang or the external processes - it was Linux filesystem cache being held within Docker container memory limits. Even though this cache should be reclaimable, Docker’s memory accounting was preventing normal kernel cache management from working properly.

Solution:

# From the Docker host:
docker exec <container> echo 3 > /proc/sys/vm/drop_caches

# or (they are equivalent):
docker 3 > /proc/sys/vm/drop_caches

This command completely frees all accumulated cache memory. We can now process hundreds of images with minimal memory growth, and periodically drop caches when needed.

Key Insight:
What appeared to be a memory leak was actually normal filesystem caching that couldn’t be automatically reclaimed due to container memory limits. The docker exec approach provides the necessary privileges to trigger cache cleanup.

Implementation:
I’m currently thinking of using this approach (more a hack than a solution) to automatically drop caches when memory usage gets high:

#!/bin/bash
while true; do
    MEMORY=$(docker stats <container> --no-stream --format "{{.MemUsage}}" | cut -d'/' -f1 | sed 's/MiB//')
    [ "$MEMORY" -gt 1024 ] && docker exec <container> echo 3 > /proc/sys/vm/drop_caches
    sleep XXXXX
done

Question for the community:

Does anyone know a more elegant way to trigger cache drops, or is calling docker exec from the host the most reliable approach? I’m curious if there are other techniques people have used for managing container memory pressure.

Thanks again to @vkatsuba for the excellent debugging guidance that led me down the right path with systematic elimination of potential causes!