论文标题

从不可分配的距离甲壳中恢复

Recovery from Non-Decomposable Distance Oracles

论文作者

Hu, Zhuangfei, Li, Xinda, Woodruff, David P., Zhang, Hongyang, Zhang, Shufan

论文摘要

一条工作已经研究了从距离查询中恢复输入的问题。在这种情况下,有一个未知的序列$ s \ in \ {0,1 \}^{\ leq n} $,并且一个选择一组查询$ y \ in \ in \ {0,1 \}^{\ Mathcal {\ Mathcal {o} {o}(o}(o)}(n)} $,并接收$ d(y)$ d(s,y)$ d。目标是提出尽可能少的查询以恢复$ S $。尽管该问题对于可分解距离进行了充分研究,即表格$ d(s,y)= \ sum_ {i = 1}^n f(s_i,y_i)$的某些功能$ f $,其中包括hamming距离的重要案例,包括$ \ ell_p $ - norms和$ m $ - $ $ - $ - $ - $ -METOBLE的重要案例,我们的知识是不合时宜的,我们的最佳范围是我们的最佳问题。距离有重要的特殊情况,例如编辑距离,动态时间翘曲(DTW),特雷切特距离,地球移动者的距离等。我们启动研究并为此类距离开发一般框架。有趣的是,对于某些距离(例如DTW或Frechet),序列$ s $的精确恢复是不可能的,因此我们通过允许$ y $中的字符从稍大的字母中绘制来显示,然后这是可能的。在许多情况下,我们获得了最佳或近乎最佳的查询复杂性。我们还研究了许多不同距离功能的适应性作用。理解非适应性的一种动机是可以固定查询序列,并且查询输入的距离提供了输入的非线性嵌入,可以在涉及的下游应用程序中使用,例如自然语言处理的神经网络。

A line of work has looked at the problem of recovering an input from distance queries. In this setting, there is an unknown sequence $s \in \{0,1\}^{\leq n}$, and one chooses a set of queries $y \in \{0,1\}^{\mathcal{O}(n)}$ and receives $d(s,y)$ for a distance function $d$. The goal is to make as few queries as possible to recover $s$. Although this problem is well-studied for decomposable distances, i.e., distances of the form $d(s,y) = \sum_{i=1}^n f(s_i, y_i)$ for some function $f$, which includes the important cases of Hamming distance, $\ell_p$-norms, and $M$-estimators, to the best of our knowledge this problem has not been studied for non-decomposable distances, for which there are important special cases such as edit distance, dynamic time warping (DTW), Frechet distance, earth mover's distance, and so on. We initiate the study and develop a general framework for such distances. Interestingly, for some distances such as DTW or Frechet, exact recovery of the sequence $s$ is provably impossible, and so we show by allowing the characters in $y$ to be drawn from a slightly larger alphabet this then becomes possible. In a number of cases we obtain optimal or near-optimal query complexity. We also study the role of adaptivity for a number of different distance functions. One motivation for understanding non-adaptivity is that the query sequence can be fixed and the distances of the input to the queries provide a non-linear embedding of the input, which can be used in downstream applications involving, e.g., neural networks for natural language processing.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源