论文标题
基于梯度反向传播的特征归因以在边缘启用可解释的ai
Gradient Backpropagation based Feature Attribution to Enable Explainable-AI on the Edge
论文作者
论文摘要
最近在可解释的AI(XAI)领域激增,该领域解决了对黑箱机器学习模型的行为提供见解的问题。在此字段中,\ textIt {功能属性}包括分配相关得分以输入功能并将其视为热图的方法。为多种这样的算法设计灵活的加速器,因为尚未研究这些算法的硬件映射。在这项工作中,我们首先分析基于梯度反向传播的功能归因算法的数据流,以确定所需的推断所需的资源开销。优化梯度计算以最大程度地减少内存开销。其次,我们开发了基于高级合成(HLS)的可配置FPGA设计,该设计针对边缘设备并支持三种功能属性算法。在遵守资源约束时,采用基于瓷砖的计算来最大程度地使用片上资源。代表性的CNN在CIFAR-10数据集上进行了培训,并使用16位固定点精度在多个Xilinx FPGA上实现,以证明我们的库灵活性。最后,通过有效重复使用分配的硬件资源,我们的设计方法演示了一种重新利用推理加速器的途径,以用最小的开销来支持功能属性,从而使实时XAI能够处于边缘。
There has been a recent surge in the field of Explainable AI (XAI) which tackles the problem of providing insights into the behavior of black-box machine learning models. Within this field, \textit{feature attribution} encompasses methods which assign relevance scores to input features and visualize them as a heatmap. Designing flexible accelerators for multiple such algorithms is challenging since the hardware mapping of these algorithms has not been studied yet. In this work, we first analyze the dataflow of gradient backpropagation based feature attribution algorithms to determine the resource overhead required over inference. The gradient computation is optimized to minimize the memory overhead. Second, we develop a High-Level Synthesis (HLS) based configurable FPGA design that is targeted for edge devices and supports three feature attribution algorithms. Tile based computation is employed to maximally use on-chip resources while adhering to the resource constraints. Representative CNNs are trained on CIFAR-10 dataset and implemented on multiple Xilinx FPGAs using 16-bit fixed-point precision demonstrating flexibility of our library. Finally, through efficient reuse of allocated hardware resources, our design methodology demonstrates a pathway to repurpose inference accelerators to support feature attribution with minimal overhead, thereby enabling real-time XAI on the edge.