I want to introduce you to my ddskerl library, a library that implements the DDSketch algorithm for calculating quantiles in streaming. You can read all the details about it in its related docs at Overview — ddskerl v0.1.1.
Why I came up with this one? I’ve noticed that the OSS was missing an implementation of quantile summaries in streaming. There are histograms, well supported in prometheus | Hex in telemetry_metrics_prometheus_core | Hex, and with a lot of care, actual summaries can be derived from these, but even with a lot of care, no statistical meaning can be safely derived from them.
There are libraries that provide summaries, like statistex | Hex or bear | Hex, but they calculate exact summaries, which requires O(n) memory and O(1.5N) number comparisons on the size of N, which can be prohibitive in streaming settings where the number of events can lead to infinity.
So I took some papers, mainly An Experimental Analysis of Quantile Sketches over Data Streams for a comparison between different options and of course, DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees by Masson et al.
The library exposes four implementations with different advantages and disadvantages, based on the desired use-case one or another might be a good fit. Everything is explained in the docs so please go there and tell me if anything is not understandable.