We’re (Happening, the technology stack behind Superbet) pleased to announce the initial public release of Kafine, our new Kafka client for Erlang.
While Kafine is all-new, and still a bit rough around the edges, it’s based on years of experience using Kafka in production with our existing internal Kafka libraries.
The secret sauce in kafine is the automatically-generated message codecs in the kafcod project, which aims to achieve better performance and compatibility than existing Kafka client libraries in the BEAM ecosystem.
We’re in the process of publishing packages to hex.pm. Until then, you can take a look at the repositories on GitHub:
Note that this release is currently considered unstable. We’re using it in production, but only for a small number of non-critical workloads. We’re working towards being production ready, and we’d appreciate feedback from the community.
We’ve pushed the 0.6.0 release of kafine. Change notes are also at that link, but for completeness:
First pass implementation of process-per-partition. The consumer
callback is invoked from an isolated process (one per partition).
Right now, this doesn’t do much. In future, it will be used to improve
concurrency and reliability.
Fix: docker/get-host-ip.sh doesn’t work on brand-new Ubuntu
installation (discovered while attempting a demo at CodeBEAM Lite London).
Improvements to generated documentation, including main README.
It’s been about six months since the last release, so there are a bunch of breaking changes in this one.
The changes have been mostly about robustness and scalability. For the complete changelog, see the most-recent commits in the above repositories (I forgot to include the changelog in the tags).
Note that this release is still considered unstable, and we don’t recommend using it in production just yet. You’re welcome to experiment with it, and we’d appreciate feedback from the community.
Does this library handle disconnects and other network / message passing issues gracefully, I’ve been using brod in an Elixir cluster on K8 and it absolutely sucks at doing anything reliably, constantly experiencing warnings and errors with a simple consumer over TLS.
The errors can happen when pods are rolling or the broker is rebalancing.
It’s certainly designed to, yes. I can’t say for certain that there aren’t still some bugs in there, though – we’re not putting enough load through it to really tell yet.
We do have some unit/integration tests that exercise various network/rebalancing scenarios.
You’ll still see errors when the connections are dropped – that’s just Erlang, but it ought to recover gracefully. If you don’t want to see the errors, there’s an optional logger filter in there for that.
over TLS
kafine supports TLS, but it’s not obvious (and it’s not in the documentation). You have to pass transport and transport_options in the ConnectionOptions; see kafine_connection_tls_tests.erl for an example.
We don’t have a clean way to support SASL authentication yet. There’s doc note explaining how to experiment with it, but it’s not plumbed all of the way through to the externally-facing API.
Note also that the public API is still unstable – we’re planning to make changes in how you pass the various options maps.
Also note that the producer is still kinda rough around the edges. We’re working on that right now.
Do you mean there are actual bugs, as in brod client permanently stops working and doesn’t recover during/after broker or consumer group rebalancing, or you’re just unhappy about spam in the logs? If it’s the latter, then you can set up logger filters. I think being meticulous about reporting transient&network errors was a deliberate choice in brod, for better or for worse.
Yes, as an end user I do not care about anything out of my control, which is Kafka broker rebalancing or a pod rolling, in a fault tolerant system where network failure is routinely recovered from without issues and with Kafka specifically, I believe those should have been info messages.
There is also a bunch of open issues about these errors, where no recourse is provided, personally I just believe it’s poor design from skimming the code.
Also, if I had the power to choose I would never have chosen Kafka to begin with.
I don’t necessarily believe they should. In kafine, we made a deliberate decision to lean into Erlang’s “let it crash” philosophy.
So: if we lose the connection to the broker, the kafine_connection process crashes, and something else deals with that (in OTP, this would usually be a supervisor; in kafine, it’s the owner, but the principle’s the same).
That’s fair, FWIW I’m not interested in the process dying quietly, but have them handled properly, I see no reason as an end user that I ever have to worry about a connection coming or going the library should handle that, without any noise unless it hitting some kind of undefined behavior and therefor a bug.
I’m aware that this is not easy, as Kafka is inherently noise and unreliable, as you have to keep adjusting configurations in hopes that eventually everything will work, which is a clear sign of a flawed design.