Pgmp - PostgreSQL client with logical replication to ETS

Thanks, that looks very nice! Something that is useful is to report the max size of the pool as well, so that the value (even though it never changes) can be used to make dynamic graphs in e.g. Grafana (with a threshold or max value set to the max size). It would be enough to report it at startup (unless it can dynamically be reconfigured?).

1 Like

Agreed! While I was looking through my reply, I was looking at other stuff that should be measured. It does need another pass through (and another!). Happy to collaborate on that :slight_smile:

I use a library called envy which is loosely modelled on the env part of gproc (but doesnā€™t use gproc anymore). So the max is from pgmp_config:pool(max), the value can come from an environment variable or an application variable, or defaults to 5 (in that order).

Envy prefixes the environment variable with the application name:

PGMP_POOL_MAX=10

or:

{pgmp, [{pool_max, 10]}.

or set on a command line shell with application:set_env(pgmp, pool_max, 10).

The value is picked up on init/1 in pgmp_connection:

init([]) ->
    {ok,
     drained,
     #{requests => gen_statem:reqids_new(),
       max => pgmp_config:pool(max),
       ...},
     nei(census)}.

Mostly so that it can be used in the guard clauses later, but it could be attached to a timer to refresh the value, or a function to give it a kick.

2 Likes

Can pgmp be used to connect to several Postgres databases simultaneously? Canā€™t find any documentation regarding connections / configā€¦

1 Like

Unfortunately, single DB at present.

It would be reasonably simple to replicate the current supervisor hierarchy per database, rescoping pg per database so that connections are separated, with a new parameter that can take multiple postgresql:/// URIs.

2 Likes

Hi, as a maintainer of epgsql, Iā€™m curious what features missing in epgsql made you decide to implement a completely new PostgreSQL client?

I see that epgsql has support for text and binary protocols, SCRAM auth, asynchronous requests and pipelining, logical replication, COPY FROM STDIN. Pool and statement cache can be relatively easily added externally. Replication to ETS could be also implemented using epgsql, just not as part of epgsql itself, but as a separate application that will use epgsqlā€™s replication capabilities.

2 Likes

I tried to contribute property-based tests to pg_types, but the PR havenā€™t been merged yet Add proper tests for some types by seriyps Ā· Pull Request #5 Ā· tsloughter/pg_types Ā· GitHub

I think what made me not to use pg_types in epgsql is that for some codecs I had to read some values from the connection state, and ā€œconnection stateā€ has no stable API (eg, here epgsql/epgsql_codec_datetime.erl at devel Ā· epgsql/epgsql Ā· GitHub). And another problem was that epgsql uses type names as keys, while pg_types uses ā€œsend function nameā€ or smth like that.

1 Like

Shit, sorry about that, I forgot about this PR! I see there was some issue at the end on that thread but Iā€™ll have to refresh my memory and hopefully get those in.

I think I moved to returning a map of data (though it still does write to the persistent terms for now ā€“ Iā€™d like to remove it).

How to deal with the connection state issue I donā€™t know :frowning:

1 Like

Our primary goal was building something for logical replication, that we could internally support. I think there were issues with epgsql going straight to START_REPLICATION, whereas we needed to coordinate CREATE_REPLICATION_SLOT, then fetching the snapshot(s) before START_REPLICATION (with variants between PostgreSQL versions). I donā€™t recall epgsql implementing the logical replication protocol at the time, but I may have missed it (apologies if I have).

1 Like

I believe epgsql had support for replication since 2016 https://github.com/epgsql/epgsql/pull/103

We use this replication feature in production at Klarna and I believe at least one more company uses it. However I myself never digged deep into it - it was contributed from outside.

I might be wrong, but I think we have 2 separate APIs to connect to PostgreSQL and to initiate the replication:

  • epgsql:connect(#{replication => "database"}) to connect in replication mode
  • epgsql:start_replication/7 to start the streaming

and any number of epgsql:squery can be executed between them (including CREATE_REPLICATION_SLOT). But I donā€™t remember if we have the actual snapshot fetching implemented in epgsql, at least I donā€™t think there are tests for that.

This was probably implemented in a separate application here GitHub - SIfoxDevTeam/epgl: Erlang PostgreSQL Logical streaming replication library

Thanks for your answers!

2 Likes