Khepri - a tree-like replicated on-disk database library for Erlang and Elixir (introduction & feedbacks)

lpil · May 3, 2022, 3:26pm

I’ve always opted to use the Erlang directly. I find the Elixir wrappers never really do anything beyond change the module name so I never found any value in them.

They are popular though, so I could imagine others liking them.

dumbbell · May 3, 2022, 3:39pm

I see.

If I understand correctly, it should be possible to satisfy both approaches with what @dch said:

One can use the Erlang module directly:

# API from src/khepri.erl.
{:ok, _} = :khepri.start()
{:ok, _} = :khepri.put(path, value)

Or use the Elixir-y API still provided by the same library/Hex package:

# API from src/Elixir.Khepri.erl.
{:ok, _} = Khepri.start()
{:ok, _} = Khepri.put(path, value)

As you said, that’s basically giving an Elixir-y name, not much else. The maintenance burden should be low though.

Also, I suppose this is possible to provide the “bang functions” in khepri.erl for easy pipelining, even though they will look weird in the Erlang code:

%% In src/khepri.erl.
'start!'() ->
    case start() of
        {ok, StoreId} -> StoreId;
        Error -> throw(Error)
    end.

'put!'(Path, Value) ->
    case put(Path, Value) of
        {ok, Result} -> Result;
        Error -> throw(Error)
    end.

# In Elixir code.
:khepri.start!()
|> :khepri.put!(path, value)

Khepri.start!()
|> Khepri.put!(path, value)

dumbbell · May 3, 2022, 4:25pm

I opened the following pull request with a far-from-finished proposal for an Elixir-specific API:

This is to start and open the discussion to tailor something which feels right and comfortable for Elixir developers

I will experiment with an example usage of it tomorrow.

the-mikedavis · May 3, 2022, 4:28pm

I think it might be possible to write and publish Elixir files in an Erlang dependency. So I think you could include a lib/khepri.ex that would be compiled when used from Elixir but ignored when used from Erlang. That would allow you to have an Elixir API (maybe with macros) for an otherwise Erlang library with no cost to the Erlang API. I’ll play around with it and see if it works.

dumbbell · May 3, 2022, 4:30pm

If this works, that would be nice as well. Thank you for looking into it.

LostKobrakai · May 3, 2022, 4:35pm

If all you do is a 1:1 mapping I’d think it’s not worth it to have a separate module. The elixir core team generally discourages people from wrapping erlang dependencies just for the sake of it. I personally think it only really makes sense if it’s providing some additional elixir specific functionality (e.g. macros) or maybe reordering of parameters for better usage with pipelines, but just the latter is still a weak reason to me.

the-mikedavis · May 3, 2022, 4:39pm

Oh yes I see that the docs don’t come up when fetching docs from Elixir. Another thing that might help Elixirists is to use rebar3_ex_doc (it’s an escript so you don’t need an Elixir toolchain to make the docs) which makes ex_doc style docs. The telemetry docs are a great example of this. Then you’d use markdown instead of HTML for guides as they’re known in ex_doc. I could make a PR for this if you’re interested but rebar3_ex_doc does require OTP24 which might not be ideal.

For the colloquial specs, I mean that if you’re using an Elixir library that lets you access or put some data, it’s very common to have fetch/2 that gives {:ok, term()} | :error along with get/2 that gives term() | nil and then a get/3 that lets you pass a default as the third argument. And as you say the bang variants like fetch!/2 that raises an exception when the key isn’t present. Map and Keyword from the Elixir standard library are good examples of this. In Erlang there isn’t necessarily a common API like this (for example, fetch/2 is find/2 in maps but there isn’t an equivalent find/2 in proplists).

For sigils, I think it’s technically possible to write Elixir macros in Erlang but probably very discouraged as it’s an implementation-detail of Elixir. To provide a macro interface I think you’d need an Elixir module.

domi · May 3, 2022, 11:03pm

For the module name, Elixir users can do alias :khepri, as: Khepri and get the same result.

But I would argue it’s idiomatic to just call Erlang libraries as is. It’s often done for telemetry for instance.

dumbbell · June 9, 2022, 10:25am

The RabbitMQ team is pleased to announce the release of Khepri 0.4.0

In this release, the focus was put on the clustering code and how Khepri uses the underlying Ra library. At the same time, that part of the public API should be easier to understand and use, like the rest of the public API which was improved in the 0.3.0 release.

For instance, it is now possible to start a Khepri store in a specific data directory without having to configure Ra beforehand:

{ok, StoreId} = khepri:start("/var/lib/khepri").

There are several more nice changes in this clustering code. The release notes try to go over all of them with more examples. Note that this is a breaking change in the start/clustering API.

This release also introduces changes targetting Elixir developers:

Support for Erlang binaries/Elixir strings in places where Erlang strings were accepted.
New “bang functions” for common operations in a Khepri store.
Sigils to easily parse Khepri Unix-like paths at runtime.

Another highlight: EDoc EEP-48 doc chunks are produced in addition to the HTML pages. You should get the specs and documentation right from your compatible IDE or text editor!

And last but not least, we added a logo \o/

The release notes go into greater details about all the changes in this release. I hope you will find them helpful!

If you try Khepri 0.4.0, please share your feedback, positive or negative! Thank you

lpil · June 9, 2022, 3:19pm

dumbbell:

In this release, the focus was put on the clustering code and how Khepri uses the underlying Ra library. At the same time, that part of the public API should be easier to understand and use, like the rest of the public API which was improved in the 0.3.0 release.

For instance, it is now possible to start a Khepri store in a specific data directory without having to configure Ra beforehand:
{ok, StoreId} = khepri:start("/var/lib/khepri").

This is such a nice change! Really excited with how this project is progressing.

dumbbell · June 9, 2022, 3:20pm

Thank you!

lpil · June 9, 2022, 4:15pm

With Khepri is there a way to backup the database while it is running? I could imagine wanting to use it in a single node situation but still want to be able to restore from some external storage (such as AWS S3 etc) in the instance that the single node is lost.

dumbbell · June 9, 2022, 4:26pm

That will be part of the next release. Only in my head for now, nothing in GitHub, but the idea is to allow the caller to provide an import/export module (or perhaps a set of anonymous functions, I don’t know). Callbacks will be called as part of the same walk-through internal functions used to implement “put” and “get” operations basically.

This will also ensure that the exported state is atomic, i.e. there is no change to the store’s content right in the middle of the export. The export can happen in a separate process to not block the normal operation of the Ra state machine.

The export modules I have in mind are at least JSON and probably an Erlang term (like something you could later read with file:consult/1).

If the export function takes the same arguments as “get” and other read functions, it will be possible to export everything, just a subset of the tree, or even only Khepri nodes verified against conditions.

lpil · June 9, 2022, 4:28pm

Brilliant! Thank you very much

dumbbell · June 9, 2022, 4:28pm

Another feature I want to add to Khepri is the ability to migrate data from Mnesia to Khepri at runtime (and perhaps the opposite too). The code is ready in a RabbitMQ branch, but I need to move it to Khepri and make it less specific probably.

I will work on this step before the import/export feature.

LeonardB · June 9, 2022, 5:29pm

Maybe “borrow” the backup/restore process/logic/code from mnesia itself?

Sounds like you’re heading down the same path anyway with a checkpoint + callback module.

dumbbell · June 9, 2022, 5:34pm

That’s a good idea, I will look at it! Thank you

leeyis · June 27, 2022, 4:27am

What does “tree-like replicated on-disk database” mean?
Is there any relevant documentation?

dumbbell · June 27, 2022, 7:50am

“Tree-like” means that the data in the database is organized in a tree, instead of a flat list. Therefore a key can have a value, but it also can have child/sub keys.
“Replicated” means the data is safely copied to other hosts once you clustered several of them. Khepri is using an implementation of the Raft protocol for that.
“On-disk” means the data is written to disk, as opposed to kept in memory only.

The last two properties are here to increase the safety and availability of the data. They are both provided by another Erlang library called Ra.

The documentation of the library is available on the following link:
https://rabbitmq.github.io/khepri/

You can find a quick overview in the README as well.

leeyis · June 27, 2022, 8:08am

Thank you very much for your reply. Your introduction is very clear, and I seem to understand it too.

I will be keeping an eye on Khepri and will consider using it in my open source project when the time is right (I currently use Mnesia to store users’ online status).

By the way, does Khepri support storing terabytes of data?