论文标题

P3GM:通过隐私保存分阶段生成模型的私人高维数据发布

P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model

论文作者

Takagi, Shun, Takahashi, Tsubasa, Cao, Yang, Yoshikawa, Masatoshi

论文摘要

在减轻隐私风险的同时,我们如何发布大量敏感数据?隐私保护数据综合使数据持有人可以将分析任务外包给不受信任的第三方。该问题的最新方法是在差异隐私下建立生成模型,该模型提供了严格的隐私保证。但是,现有方法无法充分处理高维数据。特别是,当输入数据集包含大量功能时,现有技术需要注入过度的噪声以满足差异隐私,从而导致外包数据分析毫无意义。为了解决上述问题,本文提出了保护隐私的分阶段生成模型(P3GM),该模型是一个用于释放此类敏感数据的差异私有生成模型。 P3GM采用两相学习过程,以使其与噪音相抵触,并提高学习效率(例如,易于收敛)。我们对P3GM的学习复杂性和隐私损失进行了理论分析。我们进一步实验评估了我们提出的方法,并证明p3GM明显优于现有解决方案。与最先进的方法相比,我们生成的样本看起来更少,并且在数据多样性方面更接近原始数据。此外,在具有合成数据的几种数据挖掘任务中,我们的模型在准确性方面优于竞争对手。

How can we release a massive volume of sensitive data while mitigating privacy risks? Privacy-preserving data synthesis enables the data holder to outsource analytical tasks to an untrusted third party. The state-of-the-art approach for this problem is to build a generative model under differential privacy, which offers a rigorous privacy guarantee. However, the existing method cannot adequately handle high dimensional data. In particular, when the input dataset contains a large number of features, the existing techniques require injecting a prohibitive amount of noise to satisfy differential privacy, which results in the outsourced data analysis meaningless. To address the above issue, this paper proposes privacy-preserving phased generative model (P3GM), which is a differentially private generative model for releasing such sensitive data. P3GM employs the two-phase learning process to make it robust against the noise, and to increase learning efficiency (e.g., easy to converge). We give theoretical analyses about the learning complexity and privacy loss in P3GM. We further experimentally evaluate our proposed method and demonstrate that P3GM significantly outperforms existing solutions. Compared with the state-of-the-art methods, our generated samples look fewer noises and closer to the original data in terms of data diversity. Besides, in several data mining tasks with synthesized data, our model outperforms the competitors in terms of accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源