Khepri - a tree-like replicated on-disk database library for Erlang and Elixir (introduction & feedbacks)

dumbbell · June 27, 2022, 8:23am

Not really at this time. Currently, the entire database is represented as a single Erlang term held entirely in memory.

From an implementation point of view, Khepri is a Raft state machine and is simply a callback module used by the Ra library. It uses and relies on the default mechanism provided by Ra to write anything to disk and load the data back into memory on start.

In the future, it may evolve and use its dedicated implementation to manage data in memory and on disk. As of now, it is not a priority.

dumbbell · November 14, 2022, 4:59pm

On behalf of the RabbitMQ team, I’m proud to release Khepri 0.6.0! It follows Khepri 0.5.0 which I forgot to talk about here. Therefore I will cover both in this post.

In Khepri 0.5.0, we focused on refining the public API again. It was already the topic of version 0.3.0 but I was still not convinced by the result.

This time, the khepri module exposes functions which should be straightforward and boring to use. Things like getting a value from the store should be mlore obvious now. Return values of khepri:put() and khepri:delete() are very simple too now (they just indicate if the operation succeeded or not).

Here is an example:

ok = khepri:put(StoreId, "/path/to/tree/node", Value),
{ok, Value} = khepri:get(StoreId, "/path/to/tree/node").

the khepri_tx module exposes the same API as khepri, but for transaction functions. If an API is missing in khepri_tx compared to khepri, it is either because it doesn’t make sense in the context of a transaction, or it is a bug.

To get more details for returned tree nodes, there are now “advanced” modules: khepri_adv and khepri_tx_adv. They are the advanced counterpart of khepri and khepri_tx respectively. They only exports functions which have an advanced use case, not all of them.

Error handling was improved as well:

Functions which have room to return an error in an {error, Reason} tuple return errors for situations where the caller is expected to handle failures. For instance, a tree node doesn’t exist or there is a timeout with the underlying Ra cluster.
When an error happens because of a misuse of the library, an exception is thrown using erlang:error().
All error and exception reasons have the same form: {khepri | khepri_ex, Name, Props}. The khepri or khepri_ex atoms help to distinguish errors and exceptions from Khepri from other sources. Name is the actual reason and Props help qualify Name further.

There is still room for improvement in this error handling, like documenting possible errors (including in function specs) or providing a function to format errors for human beings.

The release notes of Khepri 0.5.0 give a before/after example which should help understand this breaking change in the public API.

In Khepri 0.6.0, the focus was put on new features:

The ability to import and export a store (or a part of it). I implemented @LeonardB’s idea of using Mnesia’s backup and restore backend API. The library itself comes with a single backend module which uses a plaintext file with formatted opaque Erlang terms in it, but we plan to add more as separate Erlang applications to not add too many dependencies to Khepri itself.
A new mechanism called “projections” to cache data for fast queries with a low latency. This relies on ETS tables: queries are indeed fast, at the cost of an increased memory footprint and eventually consistent responses.
New maps:fold/3-like functions. We have khepri:fold(), khepri:foreach(), khepri:map() and khepri:filter(). The same functions exist in khepri_tx for transactions. We should probably add khepri:filtermap(), perhaps more, in future releases.
The ability to use stored procedures as transaction functions. This is an important addition because it allows to “pay” the price of function extraction once. This makes transactions faster and consume less memory.

Again, the release notes go in greater details about these new features and even more changes I didn’t repeat in this post.

We have a few more improvements in mind — how we handle errors as mentioned above, how frequently we perform snapshots, an alternative to global locks, etc. — but we are close to a beta for this library. I mean, the API should see less breaking changes in the near future.

Did some of you already try to use Khepri? I would be super happy to know what you think

lpil · November 15, 2022, 3:03pm

Wonderful!

I’m especially interested in the backup and restore API but I couldn’t find the documentation for it. Where can I read more? I’d like to use Khepri on a single node and periodically flush changes to an object store like AWS S3.

dumbbell · November 15, 2022, 3:10pm

Indeed, the documentation’s menu lacks an obvious pointer to the import/export doc.

This feature is mostly documented in the khepri_import_export module.

The khepri_export_erlang backend shipped with Khepri can be used as an example.

lpil · November 15, 2022, 3:36pm

Fab, thank you. Is the idea that I would have my application call ok = khepri:export(StoreId, my_export_behaviour_module, ...). when it suits me? Does performing an export block writes and reads?

Another question: is there an approach to creating secondary indexes with Khepri? So we can query for objects by more than one property without having to perform the equivilent of a full table scan for the non-primary id/path. I suppose I could in a transaction insert the path to the actual location in the secondary location?

dumbbell · November 16, 2022, 9:06am

Yes

In Khepri 0.6.0, yes, because the export callback module is called in the context of the state machine process. We have plans to spawn another process to run the export, but no ETA so far.

You could do what you suggest indeed. We briefly discussed the idea of “symlinks” but no decisions were made yet. Khepri is more like a key/value store where the value is opaque to the library, rather than a database with records having many fields.

lpil · November 16, 2022, 9:10am

Thank you. Perhaps this performance consideration could be noted in the documentation?

Fab, thank you. Sounds like it would be easy for the user to implement this on top of the library.

dumbbell · November 16, 2022, 9:21am

True! I just filed the following two issues to improve the docs w.r.t. import/export:

Make it easy to find the import/export guide in the documentation · Issue #172 · rabbitmq/khepri · GitHub
Expensive queries and exports run in the context of the state machine process · Issue #171 · rabbitmq/khepri · GitHub

dch · November 16, 2022, 11:38am

Do you have any indications of rough size/shape of data & rate of change, that you expect khepri to be used for?

dumbbell · November 16, 2022, 1:25pm

Given the entire store is held in memory and on disk, the current internal implementation of the state machine is clearly not designed for a huge amount of keys and/or large values.

Then there are the constraints of the Raft algorithm. The state machine of one store is a single Erlang process, one per Erlang node if you run a cluster. It is possible to run several stores on the same Erlang node. But this means that, for a given store, queries, writes and deletes are applied sequentially. To sum up, a Khepri store is currently a single Erlang process handling queries/writes/deletes in order, one at a time. The benefit is there are no conflicts or split brain to resolve. The downside is Khepri doesn’t take advantage of concurrency that much.

That said, the most important bottleneck, rate-wise, is the fact that our Raft implementation’s log subsystem uses fsync(2) to provide the guaranties described in Raft. For instance, a call to khepri:put(StoreId, Key, Value) is synchronous by default, meaning that it writes a Raft command to the log, fsync(2) is called and the state machine applies the command. So a single process doing many khepri:put/3 will be slow. But if you have many processes doing the same call in parallel, the rate will be higher because of less frequent fsync(2)s, thanks to some batching in our Raft library.

We have a small benchmark tool running for each commits to the main branch from a GitHub Action worker. The results are published in a GitHub page. The tool runs queries, then writes, then deletes and measures the call rates, the memory footprint and the garbage collections. It does that for a single Erlang node then a cluster of three nodes. Note that the three nodes are still running on the same GitHub Action worker sporting 2 cores.

Does that answer your question?

dch · November 16, 2022, 6:54pm

Thanks yes! The benchmarks are already fantastic throughput but I assume this is concurrent erlang nodes on the same VM (ie local host networking)?

dumbbell · November 17, 2022, 9:06am

Yes, exactly.

zabrane · December 18, 2022, 12:52am

Does khepri support bulk put to speed up store operations?

Instead of:

khepri:put(Path1, Data1),
khepri:put(Path2, Data2),
khepri:put(Path3, Data3),

i’d like to do:

khepri:puts([{Path1, Data1)}, {Path2, Data2)}, {Path3, Data3)}]).

dumbbell · December 18, 2022, 11:52am

Not yet, but I’m currently working on this. It’s at an early design stage. From an API point of view, we are discussing either something like you described, or something closer to the transaction API. For instance:

khepri:batch_write(fun() ->
    lists:foreach(fun(Object) -> khepri:put(...) end, Objects)
end).

dumbbell · April 17, 2023, 10:15am

Khepri 0.7.0 is out!

The focus in this release was put on two things:

Improvements and fixes to projections. Projections are a mechanism introduced in Khepri 0.6.0 to allow fast queries with a low latency. They suffered several bugs; for example, the deletion of an entire subtree may not have been reflected in the projections’ ETS table. They saw several speed improvements as well.
Refactoring to make it easier to maintain the library. The large khepri_machine module was split into several modules. Also, the khepri_fun module was extracted and we created a new library out of it; I talked about it in the “Horus - extract an anonymous function as a standalone module” discussion.

With Khepri 0.7.0, we believe the API and the code to be a lot more stable. We still have several ideas we would like to implement and, as part of the integration into RabbitMQ, we might discover that some aspects of the API are not good enough and need further brekaing changes. Yet, we think the library is in good shape and we “promoted” it from alpha to beta quality.

I’m not sure yet what will be the main topic of the next release at this point. We still want to work on the frequency of the underlying Raft snapshots: the ideal would be something that adjusts itself automatically, but we are not sure how to achieve that yet.

As always, please thumbs up if you like and hit that “Subscribe” button!

zabrane · June 27, 2023, 9:29am

Hi @dumbbell

I’m playing with kephri on a single node and I’d like to make my app run on two nodes.
Is there any examples explaining how to setup a multi-nodes kephri cluster?

Thanks

dumbbell · June 27, 2023, 11:34am

Hi @zabrane!

You can find a basic example in the documentation of the khepri_cluster module.

Here is a copy-paste of that example using the default store:

%% Start the local Khepri store.
{ok, StoreId} = khepri:start().
 
%% Join a remote cluster.
ok = khepri_cluster:join(RemoteNode).

Note that the store (either the default one or the one specified as an argument) must run on both nodes before they can be clustered.

zabrane · June 27, 2023, 1:09pm

@dumbbell Thank you

zabrane · August 10, 2023, 11:47am

Hi @dumbbell

Can Khepri experts help me with this issue please?

github.com/rabbitmq/khepri

{horus_ex,module_not_found,#{module => erlang}}

opened 11:44AM - 10 Aug 23 UTC

Zabrane

bug

### What does not work? Switching from old `khepri` to the latest stable one (`…0.7.0`), we're getting this error in the release's logs: ```bash 2023-08-10T11:14:53.997963+00:00 error: Ranch listener http, connection process <0.1643.0>, stream 1 had its request process <0.1645.0> exit with reason {horus_ex,module_not_found,#{module => erlang}} and stacktrace [{horus,get_object_code_from_code_server,1,[{file,"/home/zab/dev/ics/_build/default/lib/horus/src/horus.erl"},{line,1984}]},{horus,disassemble_module1,2,[{file,"/home/zab/dev/ics/_build/default/lib/horus/src/horus.erl"},{line,1927}]},{horus,disassemble_module,2,[{file,"/home/zab/dev/ics/_build/default/lib/horus/src/horus.erl"},{line,1879}]},{horus,lookfunction,4,[{file,"/home/zab/dev/ics/_build/default/lib/horus/src/horus.erl"},{line,1778}]},{horus,pass1_process_function,4,[{file,"/home/zab/dev/ics/_build/default/lib/horus/src/horus.erl"},{line,1155}]},{maps,fold_1,4,[{file,"maps.erl"},{line,416}]},{horus,extract_module_info_functions,1,[{file,"/home/zab/dev/ics/_build/default/lib/horus/src/horus.erl"},{line,745}]},{horus,to_standalone_fun3,2,[{file,"/home/zab/dev/ics/_build/default/lib/horus/src/horus.erl"},{line,497}]}] [...] ``` ### Expected behavior Our service was perfectly working with old `khepri`. Moreover, this error doesn't happen locally (DEV mode). It only triggers when deploying our service as a release. ### How to reproduce _No response_

Best

zabrane · August 10, 2023, 1:45pm

@dumbbell thanks a lot for your help and support fixing this issue in no time. Long life to khepri.