多语言dall-e故事时间

论文标题

多语言dall-e故事时间

Multi-Lingual DALL-E Storytime

论文作者

Mudrik, Noga, Charles, Adam S.

论文摘要

尽管人工智能（AI）语言模型的最新进步在使用英语文本时表现出了尖端的性能，但同等模型不存在其他语言或不达到相同的性能水平。 AI进步的这种不希望的效果增加了全世界不同人群获得新技术的差距。这种不受欢迎的偏见主要是针对英语技能不那么发达的人，例如非英语说话者的孩子。近年来，随着AI研究的重大进步，OpenAI最近介绍了DALL-E：根据英语文本提示创建图像的强大工具。虽然DALL-E是许多应用程序的有前途的工具，但在以不同语言的方式给出输入时，其性能下降，限制了受众群体并加深人群之间的差距。当前的DALL-E模型的另一个限制是，它仅允许为给定的输入提示符创建一些图像，而不是一系列连续的连贯帧，这些连贯的框架讲述了一个故事或描述随着时间而变化的过程。在这里，我们提出了一个易于使用的自动dall-e讲故事框架，该框架利用现有的dall-e模型来实现非英语歌曲和故事的快速而连贯的可视化，从而推动了目前提供的单步选择DALL-E的限制。我们表明，我们的框架能够有效地想象非英语文本中的故事，并随着时间的推移描绘情节的变化。它还能够创建叙事并在跨帧的描述中保持可解释的变化。此外，我们的框架为用户提供了对故事元素（例如特定位置或上下文）的限制，并在整个可视化过程中保持一致样式。

While recent advancements in artificial intelligence (AI) language models demonstrate cutting-edge performance when working with English texts, equivalent models do not exist in other languages or do not reach the same performance level. This undesired effect of AI advancements increases the gap between access to new technology from different populations across the world. This unsought bias mainly discriminates against individuals whose English skills are less developed, e.g., non-English speakers children. Following significant advancements in AI research in recent years, OpenAI has recently presented DALL-E: a powerful tool for creating images based on English text prompts. While DALL-E is a promising tool for many applications, its decreased performance when given input in a different language, limits its audience and deepens the gap between populations. An additional limitation of the current DALL-E model is that it only allows for the creation of a few images in response to a given input prompt, rather than a series of consecutive coherent frames that tell a story or describe a process that changes over time. Here, we present an easy-to-use automatic DALL-E storytelling framework that leverages the existing DALL-E model to enable fast and coherent visualizations of non-English songs and stories, pushing the limit of the one-step-at-a-time option DALL-E currently offers. We show that our framework is able to effectively visualize stories from non-English texts and portray the changes in the plot over time. It is also able to create a narrative and maintain interpretable changes in the description across frames. Additionally, our framework offers users the ability to specify constraints on the story elements, such as a specific location or context, and to maintain a consistent style throughout the visualization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题