论文标题
使用结合
Towards More Efficient Data Valuation in Healthcare Federated Learning using Ensembling
论文作者
论文摘要
联合学习(FL),其中多个机构在不共享数据的情况下进行协作训练机器学习模型正在变得流行。参与机构可能不会平等地贡献,有些贡献了更多的数据,一些更好的数据或一些更多样化的数据。为了公平地排名不同机构的贡献,沙普利价值(SV)已成为选择方法。确切的SV计算非常昂贵,尤其是在有数百个贡献者的情况下。现有的SV计算技术使用近似值。但是,在医疗保健中,贡献机构的数量可能不是巨大的规模,计算精确的SV仍然很昂贵,但并非不可能。对于此类设置,我们提出了一种称为安全的高效SV计算技术(用于使用Enemblobly的联合学习的Shapley值)。我们从经验上表明,安全计算接近精确SV的值,并且其性能优于当前SV近似值。这在医学成像环境中尤其重要,在医学成像环境中,整个机构之间的广泛异质性猖ramp,并且需要快速准确的数据估值来确定每个参与者在多机构协作学习中的贡献。
Federated Learning (FL) wherein multiple institutions collaboratively train a machine learning model without sharing data is becoming popular. Participating institutions might not contribute equally, some contribute more data, some better quality data or some more diverse data. To fairly rank the contribution of different institutions, Shapley value (SV) has emerged as the method of choice. Exact SV computation is impossibly expensive, especially when there are hundreds of contributors. Existing SV computation techniques use approximations. However, in healthcare where the number of contributing institutions are likely not of a colossal scale, computing exact SVs is still exorbitantly expensive, but not impossible. For such settings, we propose an efficient SV computation technique called SaFE (Shapley Value for Federated Learning using Ensembling). We empirically show that SaFE computes values that are close to exact SVs, and that it performs better than current SV approximations. This is particularly relevant in medical imaging setting where widespread heterogeneity across institutions is rampant and fast accurate data valuation is required to determine the contribution of each participant in multi-institutional collaborative learning.