on: June 20, 2023
in: NeurIPS
✨Spotlight

Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision

Ayush Tewari *
Tianwei Yin *
George Cazenavette
Semon Rezchikov
Joshua B. Tenenbaum
Frédo Durand
William T. Freeman
Vincent Sitzmann

*: shared first author

@inproceedings{tewari2023diffusionwithforward,
    title = { Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision },
    author = { Tewari, Ayush and 
               Yin, Tianwei and 
               Cazenavette, George and 
               Rezchikov, Semon and 
               Tenenbaum, Joshua B. and 
               Durand, Frédo and 
               Freeman, William T. and 
               Sitzmann, Vincent },
    year = { 2023 },
    booktitle = { NeurIPS },
}

Copy to Clipboard

Denoising diffusion models have emerged as a powerful class of generative models capable of capturing the distributions of complex, real-world signals. However, current approaches can only model distributions for which training samples are directly accessible, which is not the case in many real-world tasks. In inverse graphics, for instance, we seek to sample from a distribution over 3D scenes consistent with an image but do not have access to ground-truth 3D scenes, only 2D images.

We present a new class of denoising diffusion probabilistic models that learn to sample from distributions of signals that are never observed directly, but instead are only measured through a known differentiable forward model that generates partial observations of the unknown signal. To accomplish this, we directly integrate the forward model into the denoising process. At test time, our approach enables us to sample from the distribution over underlying signals consistent with some partial observation.

We demonstrate the efficacy of our approach on three challenging computer vision tasks. For instance, in inverse graphics, we demonstrate that our model enables us to directly sample from the distribution 3D scenes consistent with a single 2D input image.