在存在测量偏置的情况下循环统计的估计

论文标题

在存在测量偏置的情况下循环统计的估计

Estimation of circular statistics in the presence of measurement bias

论文作者

Alsammani, Abdallah, Stacey, William C., Gliske, Stephen V.

论文摘要

背景和目标。循环统计和瑞利测试是分析循环事件发生的重要工具。但是，当前方法在存在测量偏置的情况下失败，例如不完整或不均匀的采样。例如，考虑研究24个环境，但在整个24小时周期中具有数据未均匀记录。本文的目的是提出一种估计循环统计数据及其统计意义的方法，即使在这种情况下也是如此。方法。我们将目标作为一个更普遍的问题的特殊情况提出：在不完善的测量结果中估算概率分布，这是高能量物理学中的一个经过深入研究的问题。我们的解决方案结合了1）现有方法，通过数字模拟和2）对基础分布的线性参数化的创新使用来估计测量过程。我们计算了几个玩具示例的估计误差以及一个现实世界中的示例：分析控制警惕状态的癫痫组织的电学生物标志物的24小时循环性。结果。我们的方法显示出低估计误差。在一个现实世界中，我们观察到校正后的矩形均方根残差小于0.007。我们还发现，即使展开，瑞利测试统计数据仍然经常低估在存在不均匀抽样的情况下的p值（从而高估统计意义）。因此，如本文所述，统计显着性的数值估计是可取的。结论。提出的方法为解决不完整或其他不均匀采样提供了强大的解决方案。提出的一般方法也适用于更广泛的分析集，涉及对不完美测量过程调整的真实概率分布的估计。

Background and objective. Circular statistics and Rayleigh tests are important tools for analyzing the occurrence of cyclic events. However, current methods fail in the presence of measurement bias, such as incomplete or otherwise non-uniform sampling. Consider, for example, studying 24-cyclicity but having data not recorded uniformly over the full 24-hour cycle. The objective of this paper is to present a method to estimate circular statistics and their statistical significance even in this circumstance. Methods. We present our objective as a special case of a more general problem: estimating probability distributions in the context of imperfect measurements, a highly studied problem in high energy physics. Our solution combines 1) existing approaches that estimate the measurement process via numeric simulation and 2) innovative use of linear parametrizations of the underlying distributions. We compute the estimation error for several toy examples as well as a real-world example: analyzing the 24-hour cyclicity of an electrographic biomarker of epileptic tissue controlled for state of vigilance. Results. Our method shows low estimation error. In a real-world example, we observed the corrected moments had a root mean square residual less than 0.007. We additionally found that, even with unfolding, Rayleigh test statistics still often underestimate the p-values (and thus overestimate statistical significance) in the presence of non-uniform sampling. Numerical estimation of statistical significance, as described herein, is thus preferable. Conclusions. The presented methods provide a robust solution to addressing incomplete or otherwise non-uniform sampling. The general method presented is also applicable to a wider set of analyses involving estimation of the true probability distribution adjusted for imperfect measurement processes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题