研究频率：在傅立叶空间中的时间一致的人类运动转移

论文标题

研究频率：在傅立叶空间中的时间一致的人类运动转移

Delving into the Frequency: Temporally Consistent Human Motion Transfer in the Fourier Space

论文作者

Yang, Guang, Liu, Wu, Liu, Xinchen, Gu, Xiaoyan, Cao, Juan, Li, Jintao

论文摘要

人类运动转移是指综合照片现实和时间连贯的视频，使一个人能够模仿他人的运动。但是，当前的合成视频遭受了序列帧的时间不一致，这些框架显着降低了视频质量，但远未通过像素域中的现有方法解决。最近，由于图像合成方法的频率不足，一些有关DeepFake检测的作品试图区分频域中的自然图像和合成图像。尽管如此，从天然和合成视频之间的频域间隙的各个方面研究合成视频的时间不一致。在本文中，我们建议深入探究频率空间，以进行时间一致的人类运动转移。首先，我们对频域中的自然和合成视频进行了首次综合分析，以揭示单个帧的空间维度和视频的时间维度的频率差距。为了弥补自然视频和合成视频之间的频率差距，我们提出了一个新型的基于频率的人类运动转移框架，名为Fremotr，该框架可以有效地减轻空间伪像以及合成视频的时间不一致。 Fremotr探索了两个基于新型的基于频率的正则化模块：1）频域外观正则化（FAR），以改善人在单个帧中的外观和2）时间频率正则化（TFR），以确保相邻帧之间的时间一致性。最后，全面的实验表明，FremoTR不仅在时间一致性指标中产生卓越的性能，而且还提高了合成视频的框架级视觉质量。特别是，时间一致性指标比最新模型提高了近30％。

Human motion transfer refers to synthesizing photo-realistic and temporally coherent videos that enable one person to imitate the motion of others. However, current synthetic videos suffer from the temporal inconsistency in sequential frames that significantly degrades the video quality, yet is far from solved by existing methods in the pixel domain. Recently, some works on DeepFake detection try to distinguish the natural and synthetic images in the frequency domain because of the frequency insufficiency of image synthesizing methods. Nonetheless, there is no work to study the temporal inconsistency of synthetic videos from the aspects of the frequency-domain gap between natural and synthetic videos. In this paper, we propose to delve into the frequency space for temporally consistent human motion transfer. First of all, we make the first comprehensive analysis of natural and synthetic videos in the frequency domain to reveal the frequency gap in both the spatial dimension of individual frames and the temporal dimension of the video. To close the frequency gap between the natural and synthetic videos, we propose a novel Frequency-based human MOtion TRansfer framework, named FreMOTR, which can effectively mitigate the spatial artifacts and the temporal inconsistency of the synthesized videos. FreMOTR explores two novel frequency-based regularization modules: 1) the Frequency-domain Appearance Regularization (FAR) to improve the appearance of the person in individual frames and 2) Temporal Frequency Regularization (TFR) to guarantee the temporal consistency between adjacent frames. Finally, comprehensive experiments demonstrate that the FreMOTR not only yields superior performance in temporal consistency metrics but also improves the frame-level visual quality of synthetic videos. In particular, the temporal consistency metrics are improved by nearly 30% than the state-of-the-art model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题