Course Contents
From a single picture, humans reconstruct a mental representation of the underlying 3D scene that is incredibly rich in information such as shape, appearance, physical properties, purpose, how things would feel, smell, sound, etc. These mental representations allow us to understand, navigate, and interact with our environment in our everyday lives. We learn this from little supervision, mainly by interacting with our world and observing the world around us.
Emerging neural scene representations aim to build models that replicate this behavior: Trained in a self-supervised manner, the goal is to reconstruct rich representations of 3D scenes that can then be used in downstream tasks such as computer vision, robotics, and graphics.
This course covers fundamental and advanced techniques in this field at the intersection of computer vision, computer graphics, and geometric deep learning. It will lay the foundations of how cameras see the world, how we can represent 3D scenes for artificial intelligence, how we can learn to reconstruct these representations from only a single image, how we can guarantee certain kinds of generalization, and how we can train these models in a self-supervised way.
What you will learn
- Computer vision & computer graphics fundamentals (pinhole camera model, camera pose, projective geometry, light fields, multi-view geometry).
- Volumetric scene representations for deep learning: Neural fields & voxel grids.
- Differentiable rendering in 3D representations and light fields.
- Inference algorithms for deep-learning based 3D reconstruction: convolutional neural networks, auto-decoding.
- Basics of geometric deep learning: Representation theory, groups, group actions, equivariance, equivariant neural network architectures.
- Self-supervised learning of scene representations via 3D-aware auto-encoding.
- Applications of neural scene representations in graphics, robotics, vision, and scientific discovery.
For details see the Syllabus.
Prerequisites
No computer vision or graphics specific background is required. We will however generally expect you to:
- have taken a machine learning class with a focus on deep learning
- be comfortable with picking up new mathematics as needed ("mathematical maturity")
We expect you to have a solid knowledge of these specific topics:
Grading Policy
Grading will be split between four module-specific problem sets and a final project:
70% | Homework Assignments 4 programming assignments × 17.5% each |
30% | Final Project Proposal (5%) + Mid-term Report (5%) + Final Report & Video/Presentation (20%) |
We encourage you to discuss the ideas in your problem sets with other students and AI tools, but we expect you to code up solutions individually. For paper presentations and final projects students may group up in teams of two to three. You may use up to 5 late days total for the problem sets (for exceptional situations, contact the course staff).
Schedule
6.S980 will be held as lectures in room 4-270:
Tuesday |  –  |
Thursday |  –  |
Click the button above to ask your calendar application to subscribe using a webcal
URL. Alternatively you can try this plain HTTPS
URL.
Just in case, here are vendor-specific instructions for Google Calendar (under "Use a link to add a public calendar"), for Apple Calendar on macOS, and for Mozilla Thunderbird (under "Subscribe to it on the internet").
Office Hours
Most questions can be answered asynchronously on our piazza discussion forum. For anything else, we hold office hours:
Monday |  –  Prof. Sitzmann 32-340 (office hour not for homework questions) |
Thursday |  –  David Charatan Embodied Intelligence Common Area (outside 32-385) Note that alternate locations may be announced via Piazza. (ask homework questions here) |
Course Level
6.S980 is aimed at graduate students and advanced undergraduate students. It's a second-time offering/pilot course and thus scoped as a graduate-level seminar. This class will not count for qualification exams.
Syllabus
Module 0 | ||
---|---|---|
Introduction |
| |
Module 1: Fundamentals of Image Formation | ||
Image Formation |
| Assignment 1 Released |
Multi-View Geometry |
| |
Module 2: 3D Scene Representations & Neural Rendering | ||
Lecture Canceled Due to Travelcanceled | ||
Scene Representations I: 2.5D and Monocular Depth Prediction |
| |
Scene Representations II: Surface Representations and Discrete Field Representations |
| Assignment 1 Due |
Scene Representations III: Neural Fields and Hybrid Discrete-Neural Field Representations |
| Assignment 2 Released |
Light Transport |
| |
Differentiable Rendering |
| |
Student Holidayholiday | ||
Differentiable Rendering II |
| |
Module 3: Representation Learning, Latent Variable Models, and Auto-encoding | ||
Prior-Based Reconstruction |
| Assignment 2 Due |
Prior-Based Reconstruction II |
| Assignment 3 Released |
Prior-Based Reconstruction III |
| |
How to do Research? |
| |
Removing Camera Poses |
| |
Unconditional and Text-Conditional Generative Models |
| Assignment 3 Due |
Conditional Probabilistic Models |
| Assignment 4 Released |
Module 4: Motion and Objectness | ||
Dynamic Scene Representations |
| Project Proposal Due |
Dynamic Scene Representations II |
| |
Module 5: Geometric Deep Learning | ||
Representation Theory & Symmetries |
| |
How to give talks | Assignment 4 Due | |
Thanksgivingholiday | ||
Mid-Term Project Updatesproject |
| Project Update Due |
Module 6: Applications | ||
Robotics |
| |
Scientific discovery (Cryo-EM)guest lecture |
| |
Module 7: Final Project Presentations | ||
No Class (Go Work on Projects!)canceled | ||
Virtual Final Project Presentationsproject |
| Project Presentation Due |
Final Project Reports Dueproject | Project Reports Due |