使用深层生成模型的低带宽视频压缩

论文标题

使用深层生成模型的低带宽视频压缩

Low Bandwidth Video-Chat Compression using Deep Generative Models

论文作者

Oquab, Maxime, Stock, Pierre, Gafni, Oran, Haziza, Daniel, Xu, Tao, Zhang, Peizhao, Celebi, Onur, Hasson, Yana, Labatut, Patrick, Bose-Kolanu, Bobo, Peyronel, Thibault, Couprie, Camille

论文摘要

为了解锁因连通性差或无法承受的数据成本阻碍的数亿人的视频聊天，我们建议使用在发件人方面提取的面部标志并通过网络传输的面部标志在接收器的设备上真实地重建面孔。在这种情况下，我们讨论并评估了几种深层对抗方法的益处和缺点。特别是，我们根据静态地标，动态地标或细分图探索了质量和带宽权衡。我们根据Siarohin等人的第一阶动画模型设计了与移动兼容的体系结构。此外，我们利用铲子块在重要区域（例如眼睛和嘴唇）中完善结果。我们将网络压缩到约3MB，允许模型在iPhone 8（CPU）上实时运行。这种方法可以以每秒几次KBIT的方式进行视频呼叫，比当前可用的替代方案低的数量级。

To unlock video chat for hundreds of millions of people hindered by poor connectivity or unaffordable data costs, we propose to authentically reconstruct faces on the receiver's device using facial landmarks extracted at the sender's side and transmitted over the network. In this context, we discuss and evaluate the benefits and disadvantages of several deep adversarial approaches. In particular, we explore quality and bandwidth trade-offs for approaches based on static landmarks, dynamic landmarks or segmentation maps. We design a mobile-compatible architecture based on the first order animation model of Siarohin et al. In addition, we leverage SPADE blocks to refine results in important areas such as the eyes and lips. We compress the networks down to about 3MB, allowing models to run in real time on iPhone 8 (CPU). This approach enables video calling at a few kbits per second, an order of magnitude lower than currently available alternatives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题