近乎最佳的非参数顺序测试和置信序列，可能是依赖性观察

论文标题

近乎最佳的非参数顺序测试和置信序列，可能是依赖性观察

Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences with Possibly Dependent Observations

论文作者

Bibaut, Aurelien, Kallus, Nathan, Lindon, Michael

论文摘要

顺序测试及其隐含的置信序列，在任意停止时间有效，保证灵活的统计推断和直接决策。但是，强保证仅限于参数顺序测试，这些测试在实践或基于浓度结合的序列中过度覆盖并具有次优抑制时间。在这项工作中，我们考虑了经典的延迟启动正常混合的顺序概率比测试，并在一般的非参数数据生成过程中提供了第一个渐近类型 - eRROR和预期拒绝时间保证，在该过程中，渐近学由测试的燃烧时间索引。 Error类型的结果主要利用Martingale强大的不变性原理，并确定这些测试（及其暗示置信序列）具有I型错误率在渐变上等同于所需（可能变化的）$α$ level。预期的拒绝时间结果主要利用受ITô引理启发的身份，这意味着，在某些渐近方案中，预期的拒绝时间在渐近上等同于$α$级别测试中可能的最小值。我们展示了如何将结果应用于通过估计方程（例如平均治疗效应）定义的参数的顺序推断。总之，我们的结果将这些（表面上的参数）测试确立为通用，非参数和近乎最好的测试。我们通过数值模拟和Netflix的A/B测试的真实数据应用来说明这一点。

Sequential tests and their implied confidence sequences, which are valid at arbitrary stopping times, promise flexible statistical inference and on-the-fly decision making. However, strong guarantees are limited to parametric sequential tests that under-cover in practice or concentration-bound-based sequences that over-cover and have suboptimal rejection times. In this work, we consider classic delayed-start normal-mixture sequential probability ratio tests, and we provide the first asymptotic type-I-error and expected-rejection-time guarantees under general non-parametric data generating processes, where the asymptotics are indexed by the test's burn-in time. The type-I-error results primarily leverage a martingale strong invariance principle and establish that these tests (and their implied confidence sequences) have type-I error rates asymptotically equivalent to the desired (possibly varying) $α$-level. The expected-rejection-time results primarily leverage an identity inspired by Itô's lemma and imply that, in certain asymptotic regimes, the expected rejection time is asymptotically equivalent to the minimum possible among $α$-level tests. We show how to apply our results to sequential inference on parameters defined by estimating equations, such as average treatment effects. Together, our results establish these (ostensibly parametric) tests as general-purpose, non-parametric, and near-optimal. We illustrate this via numerical simulations and a real-data application to A/B testing at Netflix.

下载PDF全文

下载文献需遵守相关版权规定

论文标题