拉维斯：语言智能的图书馆

论文标题

拉维斯：语言智能的图书馆

LAVIS: A Library for Language-Vision Intelligence

论文作者

Li, Dongxu, Li, Junnan, Le, Hung, Wang, Guangsen, Savarese, Silvio, Hoi, Steven C. H.

论文摘要

我们介绍了Lavis，这是一个开源深度学习库，用于语言视觉研究和应用。拉维斯（Lavis）的目标是作为一个一站式综合图书馆，它为研究人员和从业人员提供了语言视觉领域的最新进步，并赋予了未来的研究和发展。它具有统一的界面，可轻松访问最新的图像语言，视频语言模型和常见数据集。 Lavis支持培训，评估和基准测试各种任务，包括多模式分类，检索，字幕，视觉问题答案，对话和预训练。同时，该库还高度可扩展且可配置，从而促进了未来的开发和自定义。在此技术报告中，我们描述了图书馆的设计原理，关键组成部分和功能，并在常见的语言视觉任务中提出基准测试结果。该库可用：https：//github.com/salesforce/lavis。

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications. LAVIS aims to serve as a one-stop comprehensive library that brings recent advancements in the language-vision field accessible for researchers and practitioners, as well as fertilizing future research and development. It features a unified interface to easily access state-of-the-art image-language, video-language models and common datasets. LAVIS supports training, evaluation and benchmarking on a rich variety of tasks, including multimodal classification, retrieval, captioning, visual question answering, dialogue and pre-training. In the meantime, the library is also highly extensible and configurable, facilitating future development and customization. In this technical report, we describe design principles, key components and functionalities of the library, and also present benchmarking results across common language-vision tasks. The library is available at: https://github.com/salesforce/LAVIS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题