论文标题
分布稳健且可推广的推断
Distributionally robust and generalizable inference
论文作者
论文摘要
我们讨论了最近开发的方法来量化分布变化下统计发现的稳定性和概括性。在许多实际问题中,没有绘制数据。来自目标人群。例如,与I.I.D相比,未观察到的采样偏差,批处理效应或未知关联可能会膨胀该方差。采样。因此,对于可靠的统计推断,必须考虑这些类型的变异。我们讨论并审查了两种允许基于单个数据集量化分布稳定性的方法。第一种方法计算出最坏情况下分布扰动下参数的敏感性,以了解哪种类型的转移对外部有效性构成威胁。第二种方法将分布移位视为随机的变化,它允许评估平均鲁棒性(而不是最坏情况)。基于对单个数据集上多个估计器的稳定性分析,它将抽样和分布不确定性集成到单个置信区间中。
We discuss recently developed methods that quantify the stability and generalizability of statistical findings under distributional changes. In many practical problems, the data is not drawn i.i.d. from the target population. For example, unobserved sampling bias, batch effects, or unknown associations might inflate the variance compared to i.i.d. sampling. For reliable statistical inference, it is thus necessary to account for these types of variation. We discuss and review two methods that allow quantifying distribution stability based on a single dataset. The first method computes the sensitivity of a parameter under worst-case distributional perturbations to understand which types of shift pose a threat to external validity. The second method treats distributional shifts as random which allows assessing average robustness (instead of worst-case). Based on a stability analysis of multiple estimators on a single dataset, it integrates both sampling and distributional uncertainty into a single confidence interval.