Erlang_python - Async Tasks, Channel API, Dual Pools

erlang_python 2.0 and 2.1 are now available on hex.pm with significant new
capabilities for running Python workloads on the BEAM.

Distributed by Default

Because erlang_python runs on the BEAM, you inherit Erlang’s distribution.
Run Python on any connected node - no extra setup needed:

%% Execute on remote node
rpc:call('worker@host', py, call, [numpy, dot, [MatrixA, MatrixB]]).

%% Parallel map across nodes
Nodes = [node1@h, node2@h, node3@h],
Results = rpc:pmap({py, call}, [numpy, sqrt, [Data]],
                   lists:zip(Nodes, Chunks)).

Docker Compose setup included for testing multi-node clusters.
See: Distributed Python Execution — erlang_python v2.1.0

Async Task API

uvloop-inspired task submission from Erlang:

%% Blocking run
{ok, Result} = py_event_loop:run(Module, Func, Args).

%% Non-blocking with reference
{ok, Ref} = py_event_loop:create_task(Module, Func, Args),
%% ... do other work ...
{ok, Result} = py_event_loop:await(Ref).

%% Fire-and-forget
ok = py_event_loop:spawn_task(Module, Func, Args).

Thread-safe submission via enif_send - works from dirty schedulers.

Channel API

Bidirectional message passing, 8x faster than Reactor for small messages:

%% Erlang side
{ok, Ch} = py_channel:new(),
py_channel:send(Ch, {task, Data}),

%% Python side
for msg in channel:
    result = process(msg)
    erlang.channel.reply(caller_pid, result)

Supports backpressure via max_size option. Zero-copy IOQueue buffering.

Dual Pool Support

Separate pools for CPU-bound and I/O-bound operations:

%% CPU-bound (default pool, sized to schedulers)
py:call(numpy, dot, [A, B]).

%% I/O-bound (io pool, larger for concurrency)
py:call(io, requests, get, [Url]).

%% Or register modules for automatic routing
py:register_pool(io, requests),
py:call(requests, get, [Url]).  %% Routes to io pool automatically

Subinterpreter API (Python 3.12+)

True parallelism via OWN_GIL subinterpreters - each has its own GIL:

%% Create isolated subinterpreters
{ok, Interp1} = py:subinterp_create(),
{ok, Interp2} = py:subinterp_create(),

%% Run in parallel - no GIL contention
spawn(fun() -> py:subinterp_call(Interp1, numpy, dot, [A, B]) end),
spawn(fun() -> py:subinterp_call(Interp2, numpy, dot, [C, D]) end).

%% Or use the pool
py:subinterp_pool_start(4),  %% 4 subinterpreters
{ok, Result} = py:subinterp_call(Pool, module, func, [Args]).

Event Loop Performance

  • Growable pending queue (256 to 16384)
  • Callable cache (64 slots) avoids PyImport/GetAttr per task
  • Task wakeup coalescing
  • Sub-millisecond latency vs 10ms asyncio polling

Other Notable Features

  • Virtual environment management: py:ensure_venv/2,3
  • Python logging → Erlang logger integration
  • Distributed tracing with spans
  • Reactor API for custom protocols
  • Audit hook sandbox (blocks fork/exec in embedded context)
  • Full ETF encoding for PIDs and References

Links

Apache 2.0 licensed. Feedback welcome.

3 Likes