Memory management of processes - if I forget about a process will it get cleaned up or will it leak unless manually killed?

bobjoe12131 · December 22, 2024, 7:27pm

Hi. This is my first post on here.
I am using Gleam, but i could also use help from Erlang programmers since this is a Erlang feature.

I have only used processes by using gleam/otp, and i have only used it for lazy variables and a small amount of 2-way “object” communication. But i was wondering about processes and garbage collection. If i forget about a process, will it get cleaned up, or will it leak unless i manually kill the process?

I am attempting to make a library that “compiles” a bunch of modules of code and then runs a function in one of the modules. All of the functions would need a reference to the module to get access to themselves, their siblings, imports, etc, which means i need processes; If i was just using a Dict (map) for the module, the functions would only be able to access an outdated module with only the functions that were compiled before the current compiling function. If a user compiles a bunch of modules for a one time purpose and then forgets about the processes (either because they didn’t call the drop function i would give them, or i didn’t give them a drop function), will all that memory leak?

vkatsuba · December 22, 2024, 8:45pm

Welcome to the forum! Great question, and it’s awesome that you’re diving into processes and garbage collection.

In Erlang (and by extension, Gleam/OTP), processes are lightweight and managed by the BEAM virtual machine. Here’s how garbage collection and memory management work for processes:

Process Lifecycle and Garbage Collection

If you “forget” about a process but it’s still running, it won’t be automatically cleaned up. It will continue to exist, consuming memory and possibly other resources, until it either finishes its work and terminates, or you explicitly kill it.
However, if a process terminates (either normally or because of an error), all the memory it used is automatically reclaimed by the VM. There’s no “leaking” in the sense that terminated processes don’t leave behind residual memory.

Orphan Processes

If a process is not linked or monitored by another process and you lose all references to it, it becomes an “orphan process.” While it will continue to run, it’s essentially detached from your application’s control flow, which could be problematic depending on what it’s doing.
You’ll want to ensure you have a way to track or explicitly terminate processes if they aren’t meant to run indefinitely.

Your Use Case

In your library, where each module corresponds to a process, you can implement a mechanism to manage the lifecycle of those processes. For example:
- Provide a “drop” function, as you mentioned, that terminates processes when they’re no longer needed.
- Consider using a supervisor process to manage the module processes. Supervisors can help clean up child processes if your library user forgets to do so themselves.
Alternatively, if these module processes are truly ephemeral and you don’t need them after the function completes, you could design them to self-terminate when their work is done.

Preventing Memory Leaks

Memory leaks are unlikely in the traditional sense, but if you spawn a lot of processes without cleaning them up, you could run into resource exhaustion (e.g., too many processes, high memory usage).
To avoid this, think about adding a safety mechanism, like automatically terminating module processes after a timeout or providing clear documentation for library users about cleaning up.

If you have more questions or want to brainstorm solutions for your specific library design, feel free to ask! This is a fascinating use case for leveraging the power of Erlang/OTP processes.

bobjoe12131 · December 22, 2024, 10:12pm

Thank you for the reply. I guess i was using the wrong term; By leak i mean orphaned processes.
I might reply again when i get to implementing the “compiling” part of the library.

/// A collection of modules.
pub type ModuleBatchMsg {
  /// Add a module.
  AddModule(Module, path: String)
  GetModule(path: String)
  /// Only a module can use this. Deletes the reference to this module from
  ForgetModule(Module)
  /// Stop all modules and remove them from this batch.
  ClearBatch
  /// Stop this batch and the modules added to it.
  StopBatch
  
}

/// A collection of functions or constants.
pub type ModuleMsg {
  GetParent
  /// Set parent only if it doesnt have one.
  SetParent(ModuleBatch)
  // I have a decodable function type, not a process. Internally this will put the module process reference into the function.
  /// Add or update item.
  InsertItem(Dynamic, name: String)
  GetItem(name: String)
  /// Tells the parent to forget about it, if the call isnt from the parent. Then it stops.
  StopModule(from_parent: Bool)
  
}

The functions can access anything by getting the parent of its module.
(imagine the types have their needed return subjects)

I am not sure if i should make the compilation lazy, like modules are just the parsed keywords until something requests it.

So if i got it right, the best way to update processes is to send messages that change the state of the process, not replacing them, which would make them orphans. Don’t needlessly drop them if the user might still be using them.

If you want more info, let me know. I am still not totally sure if i will be able to complete this.

rvirding · December 23, 2024, 4:03pm

Processes or not cleaned up automagically by the system. If you just leave them there they will hang around until someone, maybe themselves, kill them. They are not “gced” like data when nothing references them.

The only time the system/BEAM kills them is if the code they are running gets updated twice, then when the old version of a module is removed then any process using that version is killed.

vances · December 24, 2024, 1:28am

One easy way to handle terminating idle processes is to use gen_server or gen_statem behaviours and set a timeout in the return value of your callbacks. For example, a value of 86400000 will cause a callback call after a day. Keep resetting it and you get a simple idle timeout.