自我隔离和协调的隔离变压器，用于聚焦深度多模块化网络，用于视觉问题

论文标题

自我隔离和协调的隔离变压器，用于聚焦深度多模块化网络，用于视觉问题

Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering

论文作者

Sur, Chiranjib

论文摘要

注意机制由于其在不同领域的高精度方面的有效性而获得了巨大的知名度。但是关注是机会主义的，并且没有内容的内容或可用性是合理的。像变压器这样的结构会产生所有/可能的关注。我们定义了隔离策略，这些策略可以优先考虑提高性能的应用程序的内容。我们定义了两种策略：自我隔离变压器（SST）和协调隔离的变压器（CST），并将其用于求解视觉问题答案应用程序。注意注意力的自我隔离策略有助于更好地理解和过滤信息，这可能最有助于回答问题并创造视觉探索的多样性，以引起人们的注意。这项工作可以轻松地用于许多涉及重复和多个功能框架的应用中，并将在很大程度上减少注意力的共同点。视觉问题回答（VQA）需要理解和协调图像和文本解释。实验表明，级联的多头变压器注意力的隔离策略优于以前的许多作品，并且对VQA-V2数据集基准实现了可观的改进。

Attention mechanism has gained huge popularity due to its effectiveness in achieving high accuracy in different domains. But attention is opportunistic and is not justified by the content or usability of the content. Transformer like structure creates all/any possible attention(s). We define segregating strategies that can prioritize the contents for the applications for enhancement of performance. We defined two strategies: Self-Segregating Transformer (SST) and Coordinated-Segregating Transformer (CST) and used it to solve visual question answering application. Self-segregation strategy for attention contributes in better understanding and filtering the information that can be most helpful for answering the question and create diversity of visual-reasoning for attention. This work can easily be used in many other applications that involve repetition and multiple frames of features and would reduce the commonality of the attentions to a great extent. Visual Question Answering (VQA) requires understanding and coordination of both images and textual interpretations. Experiments demonstrate that segregation strategies for cascaded multi-head transformer attention outperforms many previous works and achieved considerable improvement for VQA-v2 dataset benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题