Machine Learning for Inverse Graphics – Scene Representation Group

Details
Syllabus
Related Courses
Piazza
Canvas

Course Contents

From a single picture, humans reconstruct a mental representation of the underlying 3D scene that is incredibly rich in information such as shape, appearance, physical properties, purpose, how things would feel, smell, sound, etc. These mental representations allow us to understand, navigate, and interact with our environment in our everyday lives. We learn this from little supervision, mainly by interacting with our world and observing the world around us.

Emerging neural scene representations aim to build models that replicate this behavior: Trained in a self-supervised manner, the goal is to reconstruct rich representations of 3D scenes that can then be used in downstream tasks such as computer vision, robotics, and graphics.

This course covers fundamental and advanced techniques in this field at the intersection of computer vision, computer graphics, and geometric deep learning. It will lay the foundations of how cameras see the world, how we can represent 3D scenes for artificial intelligence, how we can learn to reconstruct these representations from only a single image, how we can guarantee certain kinds of generalization, and how we can train these models in a self-supervised way.

What you will learn

Computer vision & computer graphics fundamentals (pinhole camera model, camera pose, projective geometry, light fields, multi-view geometry).
Volumetric scene representations for deep learning: Neural fields & voxel grids.
Differentiable rendering in 3D representations and light fields.
Inference algorithms for deep-learning based 3D reconstruction: convolutional neural networks, auto-decoding.
Basics of geometric deep learning: Representation theory, groups, group actions, equivariance, equivariant neural network architectures.
Self-supervised learning of scene representations via 3D-aware auto-encoding.
Applications of neural scene representations in graphics, robotics, vision, and scientific discovery.

For details see the Syllabus.

Prerequisites

No computer vision or graphics specific background is required. We will however generally expect you to:

have taken a machine learning class with a focus on deep learning
be comfortable with picking up new mathematics as needed ("mathematical maturity")

We expect you to have a solid knowledge of these specific topics:

linear algebra,
multivariate calculus,
probability theory, and
programming with vectors and matrices
(such as in Numpy, PyTorch or Jax)

Grading Policy

Grading will be split between four module-specific problem sets and a final project:

70%	Homework Assignments 4 programming assignments × 17.5% each
30%	Final Project Proposal (5%) + Mid-term Report (5%) + Final Report & Video/Presentation (20%)

We encourage you to discuss the ideas in your problem sets with other students and AI tools, but we expect you to code up solutions individually. For paper presentations and final projects students may group up in teams of two to three. You may use up to 5 late days total for the problem sets (for exceptional situations, contact the course staff).

Schedule

6.S980 will be held as 1.5 hour long lectures in room 4-270:

Tuesday	2:30 – 4:00pm
Thursday	2:30 – 4:00pm

iCal Calendar

Click the button above to ask your calendar application to subscribe using a webcal URL. Alternatively you can try this plain HTTPS URL.

Just in case, here are vendor-specific instructions for Google Calendar (under "Use a link to add a public calendar"), for Apple Calendar on macOS, and for Mozilla Thunderbird (under "Subscribe to it on the internet").

Office Hours

Most questions can be answered asynchronously on our piazza discussion forum. For anything else, we hold office hours:

Monday

4:00 – 5:00pm Prof. Sitzmann

32-340

(office hour not for homework questions)

Thursday

4:00 – 5:00pm David Charatan

Embodied Intelligence Common Area (outside 32-385)

Note that alternate locations may be announced via Piazza.

(ask homework questions here)

Course Level

6.S980 is aimed at graduate students and advanced undergraduate students. It's a second-time offering/pilot course and thus scoped as a graduate-level seminar. This class will not count for qualification exams.

Syllabus

Module 0
Introduction Thu, Sept. 7^th	Learning goals How to think about the environment we're in? Computer Vision as Inverse Graphics Different ways of defining 3D	Recording Slides
Module 1: Fundamentals of Image Formation
Image Formation Tue, Sept. 12^th	Pinhole camera model Rigid-body transforms and camera poses Projective image formation Camera conventions	Recording Slides Assignment 1 Released
Multi-View Geometry Thu, Sept. 14^th	How 3D is encoded in multi-view images Epipolar Geometry Bundle Adjustment	Recording Slides
Module 2: 3D Scene Representations & Neural Rendering
Lecture Canceled Due to Travel Tue, Sept. 19^th canceled
Scene Representations I: 2.5D and Monocular Depth Prediction Thu, Sept. 21^st	Surface Representations: Point Clouds, Depth Maps, Meshes Monocular Depth Prediction Stereo Depth Prediction Self-Supervised Monocular Depth & Ego-Motion Prediction	Recording Slides
Scene Representations II: Surface Representations and Discrete Field Representations Tue, Sept. 26^th	Voxelgrids Continuous Representations: Neural Fields Hybrid Discrete-Continuous Representations How to parameterize geometry Pros and cons of different representations: run time, memory usage, etc	Recording Slides Assignment 1 Due
Scene Representations III: Neural Fields and Hybrid Discrete-Neural Field Representations Thu, Sept. 28^th	The Rendering Equation Radiance Materials Degrees of realism in computational light transport	Recording Slides Assignment 2 Released
Light Transport Tue, Oct. 3^rd	The Rendering Equation Radiance Materials Degrees of realism in computational light transport	Recording Slides
Differentiable Rendering Thu, Oct. 5^th	Sphere Tracing Volume Rendering Light Field Rendering Inverse Graphics via Differentiable Rendering	Recording Slides
Student Holiday Tue, Oct. 10^th holiday
Differentiable Rendering II Thu, Oct. 12^th	Gaussian Splatting	Recording Slides
Module 3: Representation Learning, Latent Variable Models, and Auto-encoding
Prior-Based Reconstruction Tue, Oct. 17^th	Neural Networks as models for prior-based inference Auto-encoding for representation Learning Auto-Decoding	Recording Slides Assignment 2 Due
Prior-Based Reconstruction II Thu, Oct. 19^th	Global and local conditioning Directly Inferring 3D scenes from images Epipolar Line inference with volume rendering	Recording Slides Assignment 3 Released
Prior-Based Reconstruction III Tue, Oct. 24^th	Light Field Representations Epipolar Line inference with light field rendering Attention-based inference and conditioning Inference via gradient-based meta-learning Contrastive learning, DiNO & Co	Recording Slides
How to do Research? Thu, Oct. 26^th	Advice on picking research topics The Heilmeier Catechism Expectations for your project proposal The role of publishing Why research is fun!	Recording
Removing Camera Poses Tue, Oct. 31^st	Learning to infer camera poses and 3D scenes RUST, FlowCam & Co	Recording Slides
Unconditional and Text-Conditional Generative Models Thu, Nov. 2^nd	Generative models of 3D scenes 3D GANs 3D Diffusion Models	Recording Slides Assignment 3 Due
Conditional Probabilistic Models Tue, Nov. 7^th	Sampling from the distribution of 3D scenes conditioned on images	Recording Slides Assignment 4 Released
Module 4: Motion and Objectness
Dynamic Scene Representations Thu, Nov. 9^th	Optical Flow Scene Flow Algorithms for estimating optical flow Algorithms for estimating scene flow	Recording Slides Project Proposal Due
Dynamic Scene Representations II Tue, Nov. 14^th	Modeling motion as part of a scene representation	Recording Slides
Module 5: Geometric Deep Learning
Representation Theory & Symmetries Thu, Nov. 16^th	The problem of generalization High-level intro to Representation Theory: Groups Representations Group actions Equivariance Invariance Important symmetry groups: Rotation Translation Scale	Recording Slides
How to give talks Tue, Nov. 21^st		Recording Slides Assignment 4 Due
Thanksgiving Thu, Nov. 23^rd holiday
Mid-Term Project Updates Tue, Nov. 28^th project	2 minute presentations per team: Answers to Heilmeier Questions Results of first simple experiment Definition and expected outcome of final experiment	Project Update Due
Module 6: Applications
Robotics Thu, Nov. 30^th	Guest Lecture by Ge Yang, Postdoc @ MIT
Scientific discovery (Cryo-EM) Tue, Dec. 5^th guest lecture	Guest Lecture by Prof. Ellen Zhong
Module 7: Final Project Presentations
No Class (Go Work on Projects!) Thu, Dec. 7^th canceled
Virtual Final Project Presentations Tue, Dec. 12^th project	Students present their final projects on zoom / deadline to upload presentation videos	Project Presentation Due
Final Project Reports Due Wed, Dec. 13^th project		Project Reports Due

Course Contents

What you will learn

Prerequisites

Grading Policy

Schedule

Office Hours

Course Level

Syllabus

Module 0

Introduction

Module 1: Fundamentals of Image Formation

Image Formation

Multi-View Geometry

Module 2: 3D Scene Representations & Neural Rendering

Lecture Canceled Due to Travel

Scene Representations I: 2.5D and Monocular Depth Prediction

Scene Representations II: Surface Representations and Discrete Field Representations

Scene Representations III: Neural Fields and Hybrid Discrete-Neural Field Representations

Light Transport

Differentiable Rendering

Student Holiday

Differentiable Rendering II

Module 3: Representation Learning, Latent Variable Models, and Auto-encoding

Prior-Based Reconstruction

Prior-Based Reconstruction II

Prior-Based Reconstruction III

How to do Research?

Removing Camera Poses

Unconditional and Text-Conditional Generative Models

Conditional Probabilistic Models

Module 4: Motion and Objectness

Dynamic Scene Representations

Dynamic Scene Representations II

Module 5: Geometric Deep Learning

Representation Theory & Symmetries

How to give talks

Thanksgiving

Mid-Term Project Updates

Module 6: Applications

Robotics

Scientific discovery (Cryo-EM)

Module 7: Final Project Presentations

No Class (Go Work on Projects!)

Virtual Final Project Presentations

Final Project Reports Due

Related Courses and Credits

Image Attribution