DuckDB NIF - An erlang library for accessing DuckDB databases

The last couple of weeks I have been working on a NIF interface for DuckDB. The library has some rough edges, no documentation, but it is starring to work pretty nicely.

DuckDB is an in-process SQL OLAP database management system (https://duckdb.org). It looked like a nice addition to have, and I was interested to see how their c-api with the appender interface would work out for adding bulk stats data. I compared it to ets and the normal prepared statement interface they have and was pleasantly surprised.

Inserting 100_000 records with 10 integers in ets and retrieve them (with tab2list)

> educkdb_bench:ets().
100_000 inserts took: 37.13 milliseconds
Select of 100000 rows took: 12.32 milliseconds

Inserting 100_000 records with 10 integers in an in memory duckdb database and retrieve them (with new extract api)

2> educkdb_bench:appender().
100_000 inserts took: 43.119 milliseconds
Select of 98 chunks and 100000 rows took: 14.553 milliseconds

Compared to in-memory duckdb with a normal prepared statement interface.

3> educkdb_bench:main().    
100_000 inserts took: 3870.924 milliseconds
Select of 98 chunks and 100000 rows took: 53.696 milliseconds

At the moment all simple data-types, like various integers, timestamps, varchars and blobs can be extracted. At the moment I’m working on extracting complex data types like lists, structs and maps.

9 Likes

Oh duh… the GitHub repository is here: GitHub - mmzeeman/educkdb: DuckDB NIF for Erlang. There is also a hex package: educkdb | Hex

3 Likes

I have just recently been looking at DuckDB for some use cases and was very happy to find your project. At least one or two of those hex downloads are me :slight_smile:. I’m primarily interested in using it to interact with parquet files, however I didn’t have any luck loading one using educkdb. Maybe the extension isn’t present? I also tried :educkdb.query(conn, "INSTALL parquet"), but there appears to be an issue with duckdb’s hosting of extensions. Have you seen similar or do parquet files work out of the box? Any luck with the https extension?

Anyways, glad to see all the progress you’ve made!

2 Likes

It looks like the extensions are part of the big amalgamation C++ file. I will try to enable them. :+1:

I was planning to use duckdb for some simple web stats for zotonic. DuckDB is an interesting project.

3 Likes

Thanks, I thought about opening an issue, but I was so amazed that someone was working on something I needed that I didn’t want to complain!

2 Likes

Don’t worry. It is nice to have somebody interested in it. :slight_smile:

What would be good extensions to have enabled by default? I was thinking about FTS, Parquet, and JSON. Maybe the httpsfs extension as well? I think I will make it configurable so users can enable and disable the extensions they want by setting an environment variable. Hopefully it will still build with gh actions when those extensions are enabled by default. Compiling it takes quite a bit of main memory.

2 Likes

My use case is importing / exporting data on a Nerves device and importing / exporting data on a cloud server to S3, so the parquet and httpfs are of interest to me.

Having the extensions configurable via an environment variable makes a lot of sense. Don’t know how much work that is, though.

2 Likes

That is already in the rebar.config file I think. What is missing are defaults which make more sense. Fts, json, parquet and httpfs are probably sensible defaults.

3 Likes

I’ve asked the duckdb people how to build the extensions so they are included with the amalgamation build. The compile flags are pointing to some hpp files, but it was not clear how to generate the hpp files.

2 Likes

There is currently no support for inlining the extensions into the amalgamation build.

There are some scripts which they use to collect the sources needed for their R and Python clients though. This includes the extensions. So it takes a little bit more time than I expected. To be continued…

3 Likes