论文标题

uddsketch:准确跟踪数据流中的分位数

UDDSketch: Accurate Tracking of Quantiles in Data Streams

论文作者

Epicoco, Italo, Melle, Catiuscia, Cafaro, Massimo, Pulimeno, Marco, Morleo, Giuseppe

论文摘要

我们提出UDDSKETCH(统一DDSKetch),这是一个新颖的草图,用于快速准确地跟踪数据流中的分位数。该草图受到最近引入的DDSKetch的重大启发,并且是基于一种新颖的桶形崩溃过程,该过程允许克服相应的DDSKetch过程的固有限制。实际上,DDSKetch桶倒塌程序不允许根据不遵循亚指数分布的数据的准确性来推导形式保证的数据。相反,UDDSKetch的设计是为了在整个分位数和输入中的任意分布中提供准确性保证。此外,我们的算法充分利用预算的内存,以确保在整个分位数中的最佳准确性。合成数据集的广泛实验结果证实了我们方法的有效性。

We present UDDSketch (Uniform DDSketch), a novel sketch for fast and accurate tracking of quantiles in data streams. This sketch is heavily inspired by the recently introduced DDSketch, and is based on a novel bucket collapsing procedure that allows overcoming the intrinsic limits of the corresponding DDSketch procedures. Indeed, the DDSketch bucket collapsing procedure does not allow the derivation of formal guarantees on the accuracy of quantile estimation for data which does not follow a sub-exponential distribution. On the contrary, UDDSketch is designed so that accuracy guarantees can be given over the full range of quantiles and for arbitrary distribution in input. Moreover, our algorithm fully exploits the budgeted memory adaptively in order to guarantee the best possible accuracy over the full range of quantiles. Extensive experimental results on synthetic datasets confirm the validity of our approach.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源