扩散视觉反事实解释

论文标题

扩散视觉反事实解释

Diffusion Visual Counterfactual Explanations

论文作者

Augustin, Maximilian, Boreiko, Valentyn, Croce, Francesco, Hein, Matthias

论文摘要

视觉反事实解释（VCE）是了解图像分类器的决策的重要工具。它们是图像改变分类器决策的“小”但“现实”的语义变化。当前生成VCE的方法仅限于对抗性的模型，并且通常包含非现实的人工制品，或者仅限于几个类别的图像分类问题。在本文中，我们通过通过扩散过程为任意成像网分类器生成扩散视觉反事实解释（DVCE）来克服这一点。对扩散过程的两个修改是我们DVCE的关键：首先，自适应参数化，其超参数跨图像和模型概括，以及距离正则化和扩散过程的较晚开始，使我们能够生成对原始类别但不同分类的语义更改的图像。其次，我们通过对抗性稳健模型的锥体正则化确保了扩散过程不会收敛到琐碎的非语义变化，而是产生目标类别的逼真的图像，从而获得了分类器的高信心。

Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image classifier. They are 'small' but 'realistic' semantic changes of the image changing the classifier decision. Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts, or are limited to image classification problems with few classes. In this paper, we overcome this by generating Diffusion Visual Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers via a diffusion process. Two modifications to the diffusion process are key for our DVCEs: first, an adaptive parameterization, whose hyperparameters generalize across images and models, together with distance regularization and late start of the diffusion process, allow us to generate images with minimal semantic changes to the original ones but different classification. Second, our cone regularization via an adversarially robust model ensures that the diffusion process does not converge to trivial non-semantic changes, but instead produces realistic images of the target class which achieve high confidence by the classifier.

下载PDF全文

下载文献需遵守相关版权规定

论文标题