论文标题

分割模型可以通过完全合成生成的数据训练吗?

Can segmentation models be trained with fully synthetically generated data?

论文作者

Fernandez, Virginia, Pinaya, Walter Hugo Lopez, Borges, Pedro, Tudosiu, Petru-Daniel, Graham, Mark S, Vercauteren, Tom, Cardoso, M Jorge

论文摘要

为了实现良好的性能和概括性,医疗图像分割模型应在具有足够可变性的大量数据集上进行培训。由于道德和治理限制以及与标签数据相关的成本,科学发展经常被扼杀,并经过对有限数据的培训和测试。数据增强通常用于人为地增加数据分布的可变性并提高模型的通用性。最近的工作探索了图像合成的深层生成模型,因为这种方法将使能够有效地生成无限量的各种数据,从而解决了通用性和数据访问问题。但是,许多提出的解决方案限制了用户对生成内容的控制。在这项工作中,我们提出了Brainspade,该模型将基于合成扩散的标签发生器与语义图像发生器结合在一起。我们的模型可以在有或没有感兴趣的病理的情况下产生完全合成的大脑标签,然后产生与任意引导样式的相应MRI图像。实验表明,Brainspade合成数据可用于训练分割模型,其性能与在真实数据中训练的模型相当。

In order to achieve good performance and generalisability, medical image segmentation models should be trained on sizeable datasets with sufficient variability. Due to ethics and governance restrictions, and the costs associated with labelling data, scientific development is often stifled, with models trained and tested on limited data. Data augmentation is often used to artificially increase the variability in the data distribution and improve model generalisability. Recent works have explored deep generative models for image synthesis, as such an approach would enable the generation of an effectively infinite amount of varied data, addressing the generalisability and data access problems. However, many proposed solutions limit the user's control over what is generated. In this work, we propose brainSPADE, a model which combines a synthetic diffusion-based label generator with a semantic image generator. Our model can produce fully synthetic brain labels on-demand, with or without pathology of interest, and then generate a corresponding MRI image of an arbitrary guided style. Experiments show that brainSPADE synthetic data can be used to train segmentation models with performance comparable to that of models trained on real data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源