View Extrapolation of Human Body from a Single Image
We study how to synthesize novel views of human body from a single image. Though recent deep learning based methods work well for rigid objects, they often fail on objects with large articulation, like human bodies. The core step of existing methods is to fit a map from the observable views to novel views by CNNs; however, the rich articulation modes of human body make it rather challenging for CNNs to memorize and interpolate the data well. To address the problem, we propose a novel deep learning based pipeline that explicitly estimates and leverages the geometry of the underlying human body. Our new pipeline is a composition of a shape estimation network and an image generation network, and at the interface a perspective transformation is applied to generate a forward flow for pixel value transportation. Our design is able to factor out the space of data variation and makes learning at each step much easier. Empirically, we show that the performance for pose-varying objects can be improved dramatically. Our method can also be applied on real data captured by 3D sensors, and the flow generated by our methods can be used for generating high quality results in higher resolution.
[download] - CVPR 2018 Paper.
Method
![]() |
Our approach, based upon the flow prediction idea, is a combination of deep learning and traditional geometry based methods. Instead of resorting to purely learning based schemes, we consider how to explicitly leverage geometric constraints to reduce the problem space. Intuitively speaking, the first subnetwork estimates a rough geometry of the human body and the second sub-network corrects the error caused by the inaccuracy and infers invisible regions caused by occlusions. In this way, the entanglement between shape estimation, flow estimation, and invisible region synthesis are detached.
Results
Our intermediate and final results:
![]() |
The visualization of the image sequence predictions and full views results.
![]() | ![]() | ![]() | ![]() |
Input/ 20° result / 40° result | Full views | Input/ 20° result / 40° result | Full views |
Dataset
Pose-Varying Human Model Dataset (PVHM)
The data set consists of 10,200 pose-varying human mesh models and rendering tools. There are 22 different appearances among all the dataset, and the model with each appearance is deformed from 200 to 1200 different poses. The models are triangle mesh with texture maps, and are stored as OBJ model files. The appearances and corresponding index range is shown below (10000-10199 models are used only in testing phase).
![]() |
Some rendered model samples are shown below:
![]() |
We provide the tools to render the models into color images, optical flow (backward flow) images and depth maps which are used in our paper. The tools are Matlab scripts tested on Windows 10 platform. Blender (version > 2.79) is required for the color image rendering part. The dataset and tools could be downloaded from Google Drive or Baidu Disk (see below). This database is made available only for non-commercial use.
[download] from Google Drive [download] from Baidu Disk (25.7 Gb)
Citation
@InProceedings{Zhu_2018_CVPR,
author = {Zhu, Hao and Su, Hao and Wang, Peng and Cao, Xun and Yang, Ruigang},
title = {View Extrapolation of Human Body from a Single Image},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}