论文标题

哄骗:具有软功能依赖性的多维数据上的相关性索引

COAX: Correlation-Aware Indexing on Multidimensional Data with Soft Functional Dependencies

论文作者

Hadian, Ali, Ghaffari, Behzad, Wang, Taiyi, Heinis, Thomas

论文摘要

最近的工作提出了学习索引结构,这些索引结构学习了基础数据集的分布以提高性能。学习索引的最初工作表明,通过学习数据的累积分布函数,诸如B树之类的索引结构可以通过一个数量级提高其性能,而具有较小的内存足迹。 在本文中,我们提出了同种关系,这是一种用于多维数据的学习索引,它没有学习键的分布,而是学习数据集属性之间的相关性。我们的方法是由这样的观察结果驱动的:在许多数据集中,两个(或多个)属性的值相关。哄骗利用这些相关性以降低数据集的维度。 更确切地说,我们学习如何从剩余属性中推断一个(或多个)属性$ c_d $,因此不再需要索引属性$ c_d $。这降低了维度,因此使指数更小,更有效。 理论上,我们根据FD属性的可预测性研究了提出技术的有效性。我们进一步表明,通过预测数据中的相关属性,我们可以改善查询执行时间并减少索引的内存开销。在我们的实验中,我们将执行时间减少25%,同时将索引的内存足迹降低了四个数量级。

Recent work proposed learned index structures, which learn the distribution of the underlying dataset to improve performance. The initial work on learned indexes has shown that by learning the cumulative distribution function of the data, index structures such as the B-Tree can improve their performance by one order of magnitude while having a smaller memory footprint. In this paper, we present COAX, a learned index for multidimensional data that, instead of learning the distribution of keys, learns the correlations between attributes of the dataset. Our approach is driven by the observation that in many datasets, values of two (or multiple) attributes are correlated. COAX exploits these correlations to reduce the dimensionality of the datasets. More precisely, we learn how to infer one (or multiple) attribute $C_d$ from the remaining attributes and hence no longer need to index attribute $C_d$. This reduces the dimensionality and hence makes the index smaller and more efficient. We theoretically investigate the effectiveness of the proposed technique based on the predictability of the FD attributes. We further show experimentally that by predicting correlated attributes in the data, we can improve the query execution time and reduce the memory overhead of the index. In our experiments, we reduce the execution time by 25% while reducing the memory footprint of the index by four orders of magnitude.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源