中国科学技术大学学报 ›› 2021, Vol. 51 ›› Issue (1): 1-11.DOI: 10.52396/JUST-2020-0022

• 科研论文 •    下一篇

基于网格流的视频修补网络

刘森, 张直政, 俞涛, 陈志波*   

  1. 中国科学技术大学中国科学院空间信息处理与应用重点实验室,安徽合肥 230027
  • 出版日期:2021-01-31 发布日期:2021-05-27

MOVIE: Mesh oriented video inpainting network

LIU Sen, ZHANG Zhizheng, YU Tao, CHEN Zhibo*   

  1. CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China, Hefei 230027, China
  • Online:2021-01-31 Published:2021-05-27
  • Contact: *E-mail: chenzhibo@ustc.edu.cn
  • About author:Sen Liu received the B.S. degree in computer science from the Beijing University of Posts and Telecommunications, Beijing, China, in 2013. Currently, he is working towards the PhD degree at School of Information Science and Technology, University of Science and Technology of China. His area of interests includes artificial intelligence, deep learning, video coding, computer vision and pattern recognition and reinforcement learning.
    Zhibo Chen (M'01-SM'11) received the B. S., and PhD degree from Department of Electrical Engineering Tsinghua University in 1998 and 2003, respectively. He is now a professor in University of Science and Technology of China. His research interests include image and video compression, visual quality of experience assessment, immersive media computing and intelligent media computing. He has more than 100 publications and more than 50 granted EU and US patent applications. He is IEEE senior member, Secretary (Chair-Elect) of IEEE Visual Signal Processing and Communications Committee. He was TPC chair of IEEE PCS 2019 and organization committee member of ICIP 2017 and ICME 2013, served as Track chair in IEEE ISCAS and Area chair in IEEE VCIP.
    Zhizheng Zhang (S'19) received the B.S. degree in electronic information engineering from the University of Electronic Science and Technology of China, Chengdu, China, in 2016. He is currently pursuing the PhD degree in the University of Science and Technology of China, Hefei, China. His current research interests include reinforcement learning, few-shot learning, and intelligent media computing.
    Tao Yu is currently pursuing the PhD degree with the Department of Electronic Engineering and Information Science, University of Science and Technology of China. He received the B.S. degree in Electronics and Information Engineering in Anhui University in 2018. His research interests include computer vision, image processing and reinforcement learning.
  • Supported by:
    National Natural Science Foundation of China(61571413,61632001).

摘要: 视频修补的目的是基于视频帧之间的时空域上下文信息修补空洞.现有的方法由于不能准确地对运动轨迹进行建模经常导致修补结果无法保持良好的时空一致性.为此引入灵活的形状自适应网格作为基本处理单元,将网格流用于运动表示,提出了一个基于网格流的视频修补网络,通过先预测网络流再添补空洞区域的方式对视频进行修补.具体地,首先设计了一个网格流预测模块用于预测视频中可见内容的网格流的预测和一个网格流修补模块用于修补视频中空洞区域的网格流,通过这种方式将网格流的预测和修补解耦以达到更容易训练优化的目的.我们进一步设计了一个混合损失函数用于同时优化可见区域、修补区域和整个视频帧范围的网格流预测结果.为修正经过网格流变换引起的失真现象,最后设计了一个修补优化网络.大量试验结果证明,本文提出的方法不仅从主观评判和客观指标得到相比于现有方法更好的修补结果,而且相比于现有最快的方法达到了4倍的速度提升.

关键词: 视频修补, 网格流, 深度神经网络

Abstract: Video inpainting aims to fill the holes across different frames upon limited spatio-temporal contexts. The existing schemes still suffer from achieving precise spatio-temporal coherence especially in hole areas due to inaccurate modeling of motion trajectories. In this paper, we introduce fexible shape-adaptive mesh as basic processing unit and mesh flow as motion representation, which has the capability of describing complex motions in hole areas more precisely and efficiently. We propose a Mesh Oriented Video Inpainting nEtwork, dubbed MOVIE, to estimate mesh flows then complete the hole region in the video. Specifically, we first design a mesh flow estimation module and a mesh flow completion module to estimate the mesh flow for visible contents and holes in a sequential way, which decouples the mesh flow estimation for visible and corrupted contents for easy optimization. A hybrid loss function is further introduced to optimize the flow estimation performance for the visible regions, the entire frames and the inpainted regions respectively. Then we design a polishing network to correct the distortion of the inpainted results caused by mesh flow transformation. Extensive experiments show that MOVIE not only achieves over four-times speed-up in completing the missing area, but also yields more promising results with much better inpainting quality in both quantitative and perceptual metrics.

Key words: mesh flow, deep neural networks, video inpainting

中图分类号: