Scene Representation Group

Our goal is to enable artificial intelligence to perceive and interact with our world the way we humans do.

From a single picture we reconstruct mental representations of the underlying 3D scene that contain information on geometry, materials, mechanics, appearance, and more. This process, dubbed neural scene representation, allows us to understand, navigate, plan, and interact with our environment in our everyday lives. Humans learn this skill from little data or supervision, mostly from self-play and passive observations.

We want to build machines that achieve these skills computationally. How can we learn to infer properties of 3D scenes just from images? How can we quickly acquire new concepts, such as learning about a new object and how to use it, from only a single demonstration? How do we learn to make sense from our sensory input and how do we correlate it with our own actions? And how can we use such representations for planning and control?

To achieve this, our research develops tools in the areas of representation learning, geometric deep learning, geometry processing, generative modeling, and 3D computer vision. Our publications tackle problems in vision, robotics, machine learning, and graphics.