Pooler - the most advanced Erlang worker pool library

seriyps · April 26, 2023, 4:13pm

Pooler is an advanced worker pool library. It can be used to maintain a pool of, eg, database connections, OS subprocesses, homogenous network connections and so on.

Main features are:

Pool size can be fixed or dynamic (init_size, max_size)
Dynamic-sized pool will shutdown extra workers if unused for some configurable timeout (it helps to keep resource usage low while still be able to handle peak load)
Pools can be combined into groups for load-balancing purposes (uses pg or pg2 under the hood)
With take_member(Pool) the client will immediately get error_no_members when pool is empty. But with take_member(Pool, Timeout) the client will be blocked up to (soft) Timeout waiting if the worker becomes available (returned by another client or started)
“broken” workers can be returned with fail flag which makes pool replace this worker with a new one
New workers are started asynchronously, outside of pool’s main gen_server (slow worker start won’t block the whole pool)
It is able to survive the resource outage, will keep trying to restart the worker while returning error_no_members to the clients (some other pools tend to bring whole BEAM VM down with supervisor restart intencity). It allows it to use pooler to supervise database connections without any wrappers and not be afraid of database becoming unavailable
Various metrics can be exported
It is built on OTP principles (all processes are supervised)
It is covered by unit-tests, property-based tests, microbenchmarks, dialyzer

We use it in production in a very critical system to maintain the pool of epgsql PostgreSQL connections.

This library is actually over 10 years old, but was not maintained by the original author since 2017. Recently it has been transferred to epgsql project, modernized and optimized. New release 1.6.0 was published today:

eproxus · April 26, 2023, 5:45pm

I’m interested in this part. How flexible/configurable is it? Would it be appropriate to use it for supervising HTTP connections to a remote host that has potentially longer downtimes?

seriyps · April 26, 2023, 8:38pm

Not realy flexible. I think it does not really keep restarting workers on background. When worker dies, it is just removed from the pool. So if the resource becomes unavailable, pool becomes empty. As soon as client tries to take a member from the pool, pooler will try to spawn a new worker asynchronously and if that fails, it will return error_no_members to the client.

pooler can be used to supervise HTTP client connections, but the problem is that all the workers in the pool are equivalent. So it will work if you only need to connect to just one host. But if you need to connect to many different hosts, you’d need to create separate pool for each host.

I was using pooler to maintain a pool of gun HTTP client connections here: pe4kin/pe4kin_http.erl at 1b44b17405fc281b8750cea27fc57a2364ec26df · seriyps/pe4kin · GitHub

leeyis · April 27, 2023, 7:04am

How different is it from devinus/poolboy? Can you give the relevant MySQL advanced configuration

seriyps · April 27, 2023, 8:26am

Basically all the highlighted features are more or less how it is different from poolboy (however, maybe poolboy have improved since the last times I used it):

Poolboy starts the workers synchronously from the pool’s gen_server. It means if your worker start-up time is high (eg, network connections etc, usually worker start-up time is high, otherwise why do you need pool for it?), your pool will be completely unresponsive while the worker is starting. Pooler starts workers asynchronously from a separate “worker starter” process.
Poolboy has max_overflow to temporarily increase the size of the pool at peak loads. But when poolboy’s pool is at overflow and a worker is returned to the pool it is immediately killed. In pooler we keep the pool oversized for some time cull_interval in order to reuse the extra workers if the spike of the requests is continuous
(maybe it have changed) Poolboy tends to crash hard when the resource becomes unavailable (eg, database goes down) because it reaches the max restart intensity. Pooler just keeps returning error_no_members like the pool is empty until the resource recovers.
Poolboy uses hard gen_server:call timeouts to implement the “get worker with timeout”, while pooler uses internal queue and a “deadline”. Due to this difference there is a small risk of race-conditions in poolboy, some discussion is here Possible race-condition/worker leaks in wait_valid · Issue #21 · devinus/poolboy · GitHub

I don’t have a mysql example, but to start a pool of PostgreSQL connections with initial size of 10 which can spike to 15:

PgConfig = #{
  host => "localhost",
  username => "test",
  database => "test",
  password => get_password_fun()
},
Config = #{
  name => my_postgres,
  init_count => 10,
  max_count => 15,
  start_mfa => {epgsql, connect, [PgConfig]}
},
pooler:new_pool(Config).

case pooler:take_member(my_postgres, 1000) of
    Conn when is_pid(Conn) ->
      epgsql:squery(Conn, "SELECT 1"),
      pooler:return_member(my_postgres, Conn);
    error_no_members ->
      error(empty_pool)
end.

eproxus · April 27, 2023, 12:35pm

Well, that kind of makes sense. I never understood the reasoning behind using one HTTP connection pool for many different hosts.

I would say the purpose of a pool is to maintain multiple hot connections to a resource. Having a pool with mixed destinations never made sense to me… Only perhaps for the purpose of maintaining a max number of total outgoing connections from a system, but that’s also achievable with counters or tokens or similar.

I do wish however there was a pool library that could take connection state into account, with (customisable) behaviours for things like backoff and retries etc.

seriyps · April 27, 2023, 1:21pm

I guess those generic pools are in order to support HTTP standard Keep-Alive behaviour. So, for generic HTTP client it makes sense to have a pool that groups connections by the host/port.

But for the systems which jsut talk to some limited set of API services pooler should work fine, yep.

tsloughter · May 1, 2023, 11:56am

Oh neat. I was interested in sharing a pooling library between pgo and epgsql. Right now pgo’s pool is built in but I was planning to break it out to a separate library like I did with pg_types. Did you look at pgo’s pool at all, or its inspiration Elixir’s db_connection?

seriyps · May 2, 2023, 7:12am

No, I did not look at pgo’s pool. As I mentioned, pooler was originally created in ~2012, but was abandoned after 2017. I just took it over and modernized a bit.

As far as I understand, pgo does not wrap the postgres conection to a process, client communicates with the socket directly. So, does it mean pgo’s pool supervises not Erlang processes, but some structures containing gen_tcp:connection() | ssl:connection()? Asking that, because pooler only knows how to supervise Erlang processes.

juhlig · May 2, 2023, 10:36am

Aside from what @seriyps already said, it looks like poolboy has been pretty much abandoned. I raised issues and submitted a bugfix and another feature pull requests, like 4 years and 1 job ago, which went completely unanswered.

That bugfix was about the rather inconvenient fact that poolboy would crash the entire pool (ie, all present workers) if only one extra (“overflow”, in poolboy terms) worker failed to start, something that can always happen with, say, database connections. I ended up using my own fork for a work-related project for a while.

Other issues, like the synchronous worker start and related ones, were harder or even impossible to address due to the overall architecture. In the end, I ended up with an own pool implementation, hnc (ag-en-cy not actively maintained currently), done with the help of @Maria-12648430, partly due to the fact that pooler showed some signs of abandonment and aging at the time. Glad to see it revived

leeyis · May 6, 2023, 3:29am

Thank you so much for sharing

I replaced my [open source project] (GitHub - imboy-pub/imboy: 使用erlang做后端、flutter做前端开发的一款开源的即时聊天解决方案（基于erlang/otp的高性能web框架 cowboy 做后端服务，用 "8核16G 主机（100万PPS）"压测，保持100万+TCP稳定在线90分钟以上）) from MySQL to PostgreSql15 and used epgsql pooler

tsloughter · May 10, 2023, 12:10am

Ah, correct, pgo supervises connections, not processes.

Kozaky · February 12, 2024, 1:11pm

Actually, I just began to use pooler in a small demo project with cowboy and I am experiencing some unexpected behavior that I reported at Unexpected "error_no_members" when load testing cowboy server · Issue #99 · epgsql/pooler · GitHub

It seems that pooler doesn’t take the timeout value into account when the load is high. Did any of you experience something similar?