Skip to content
Massachusetts Institute of Technology
  • on: Dec. 3, 2018
  • in: CVPR
  • ✨Oral

DeepVoxels: Learning Persistent 3D Feature Embeddings

  • Vincent Sitzmann
  • Justus Thies
  • Heide Felix
  • Matthias Niessner
  • Gordon Wetzstein
  • Michael Zollhoefer
@inproceedings{sitzmann2019deepvoxels,
    author = {Sitzmann, Vincent
              and Thies, Justus
              and Heide, Felix
              and Nie{\ss}ner, Matthias
              and Wetzstein, Gordon
              and Zollh{\"o}fer, Michael},
    title = {DeepVoxels: Learning Persistent 3D Feature Embeddings},
    booktitle = {Proc. Computer Vision and Pattern Recognition (CVPR), IEEE},
    year={2019}
}
  • Copy to Clipboard

Deep Generative Models today allow us to perform highly-realistic image synthesis. While each generated image is of high quality, a major challenge is to generate a series of coherent views of the same scene. This requires the network to have a latent space representation that fundamentally understands the 3D layout of the scene; e.g., how would the same chair look from a different viewpoint?

Unfortunately, this is challenging for existing models that are based on a series of 2D convolution kernels. Instead of parameterizing 3D transformations, they will explain training data in a higher-dimensional feature space, leading to poor generalization to novel views at test time - such as the output of Pix2Pix trained on images of the cube above.