论文标题
模型故障的案例研究? COVID-19纽约州的每日死亡和ICU床利用预测
A Case Study in Model Failure? COVID-19 Daily Deaths and ICU Bed Utilisation Predictions in New York State
论文作者
论文摘要
预测模型在塑造19日大流行中的决策方面具有影响力。但是,人们担心他们的预测可能是误导的。在这里,我们剖析了四个模型对3月25日至6月5日在纽约州的Daily Covid-19死亡人数的预测,以及有影响力的IHME模型对ICU床利用的预测。我们评估了点估计值的准确性和模型预测的不确定性估计的准确性。首先,我们比较了对这些模型进行培训的每日死亡的“地面真相”数据源。这些模型使用了三种不同的数据源,这些数据源在记录的每日死亡人数上存在实质性差异。我们检查的另外两个数据来源也提供了每天不同的死亡计数。为了进行预测的准确性,所有模型的表现都非常差。不论未来的距离如何,只有10.2%的预测落在其训练地面真相的10%之内。为了准确评估不确定性,只有一个模型相对匹配名义95%的覆盖范围,但是该模型直到4月16日才开始预测,因此对早期的重大决定没有影响。对于ICU床利用率,IHME模型高度不准确。大流行浪潮开始减弱之后,估计值才开始与地面真相相匹配。我们得出的结论是,值得信赖的模型需要培训值得信赖的输入数据。此外,在向决策者和公共卫生官员提供结果之前,需要对模型进行预定的实时绩效测试。
Forecasting models have been influential in shaping decision-making in the COVID-19 pandemic. However, there is concern that their predictions may have been misleading. Here, we dissect the predictions made by four models for the daily COVID-19 death counts between March 25 and June 5 in New York state, as well as the predictions of ICU bed utilisation made by the influential IHME model. We evaluated the accuracy of the point estimates and the accuracy of the uncertainty estimates of the model predictions. First, we compared the "ground truth" data sources on daily deaths against which these models were trained. Three different data sources were used by these models, and these had substantial differences in recorded daily death counts. Two additional data sources that we examined also provided different death counts per day. For accuracy of prediction, all models fared very poorly. Only 10.2% of the predictions fell within 10% of their training ground truth, irrespective of distance into the future. For accurate assessment of uncertainty, only one model matched relatively well the nominal 95% coverage, but that model did not start predictions until April 16, thus had no impact on early, major decisions. For ICU bed utilisation, the IHME model was highly inaccurate; the point estimates only started to match ground truth after the pandemic wave had started to wane. We conclude that trustworthy models require trustworthy input data to be trained upon. Moreover, models need to be subjected to prespecified real time performance tests, before their results are provided to policy makers and public health officials.