论文标题
基于得分的生成模型检测歧管
Score-Based Generative Models Detect Manifolds
论文作者
论文摘要
基于得分的生成模型(SGMS)需要近似中间分布的分数$ \ nabla \ log p_t $以及远期过程的最终分布$ p_t $。这些近似值的理论基础仍然缺乏。我们发现SGM能够从基础(低维)数据歧管$ \ MATHCAL {M} $中产生样本的确切条件。这确保我们能够生成“正确的样本”。例如,以$ \ Mathcal {M} $为面部图像子集,我们发现SGM稳健产生面部图像的条件,即使这些图像的相对频率可能无法准确表示真实的数据生成分布。此外,该分析是了解SGMS的概括属性的第一步:采用$ \ Mathcal {M} $作为所有培训样本的集合,我们的结果提供了SGM何时记住其培训数据的精确描述。
Score-based generative models (SGMs) need to approximate the scores $\nabla \log p_t$ of the intermediate distributions as well as the final distribution $p_T$ of the forward process. The theoretical underpinnings of the effects of these approximations are still lacking. We find precise conditions under which SGMs are able to produce samples from an underlying (low-dimensional) data manifold $\mathcal{M}$. This assures us that SGMs are able to generate the "right kind of samples". For example, taking $\mathcal{M}$ to be the subset of images of faces, we find conditions under which the SGM robustly produces an image of a face, even though the relative frequencies of these images might not accurately represent the true data generating distribution. Moreover, this analysis is a first step towards understanding the generalization properties of SGMs: Taking $\mathcal{M}$ to be the set of all training samples, our results provide a precise description of when the SGM memorizes its training data.