Machine Learning for Inverse Graphics – Scene Representation Group

Details
Syllabus
Related Courses
Piazza
Canvas

Course Contents

From a single picture, humans reconstruct a mental representation of the underlying 3D scene that is incredibly rich in information such as shape, appearance, physical properties, purpose, how things would feel, smell, sound, etc. These mental representations allow us to understand, navigate, and interact with our environment in our everyday lives. We learn this from little supervision, mainly by interacting with our world and observing the world around us.

Emerging neural scene representations aim to build models that replicate this behavior: Trained in a self-supervised manner, the goal is to reconstruct rich representations of 3D scenes that can then be used in downstream tasks such as computer vision, robotics, and graphics.

This course covers fundamental and advanced techniques in this field at the intersection of computer vision, computer graphics, and geometric deep learning. It will lay the foundations of how cameras see the world, how we can represent 3D scenes for artificial intelligence, how we can learn to reconstruct these representations from only a single image, how we can guarantee certain kinds of generalization, and how we can train these models in a self-supervised way.

What you will learn

Computer vision & computer graphics fundamentals (pinhole camera model, camera pose, projective geometry, light fields, multi-view geometry).
Volumetric scene representations for deep learning: Neural fields & voxel grids.
Differentiable rendering in 3D representations and light fields.
Inference algorithms for deep-learning based 3D reconstruction: convolutional neural networks, auto-decoding.
Basics of geometric deep learning: Representation theory, groups, group actions, equivariance, equivariant neural network architectures.
Self-supervised learning of scene representations via 3D-aware auto-encoding.
Applications of neural scene representations in graphics, robotics, vision, and scientific discovery.

For details see the Syllabus.

Prerequisites

No computer vision or graphics specific background is required. We will however generally expect you to:

have taken a machine learning class with a focus on deep learning
be comfortable with picking up new mathematics as needed ("mathematical maturity")

We expect you to have a solid knowledge of these specific topics:

linear algebra,
multivariate calculus,
probability theory, and
programming with vectors and matrices
(such as in Numpy, Pytorch or Jax)

Grading Policy

Grading will be split between three module-specific problem sets, student paper presentations, and a final project:

60%

Homework Assignments
3 Jupyter Notebook assignments × 20% each

10%

Paper Discussion

20 minutes presentation + 10 minutes Q&A

30%

Final Project
Proposal (5%) + Mid-term Report (5%) + Final Report & Video (20%)

We encourage you to discuss the ideas in your problem sets with other students, but we expect you to code up solutions individually. For paper presentations and final projects students may group up in teams of two to three. Students may use up to five late days to help accomodate exceptional situations.

Schedule

6.S980 will be held as 1.5 hour long lectures in room 32-124:

Tuesday	2:30 – 4:00pm
Thursday	2:30 – 4:00pm

iCal Calendar

Calendar button not working as expected?

Click the above button to ask your calendar application to subscribe using a webcal URL. Alternatively you can try this plain HTTPS URL.

Here are vendor-specific instructions for Google Calendar (under "Use a link to add a public calendar"), for Apple Calendar on macOS, and for Mozilla Thunderbird (under "Subscribe to it on the internet").

Office Hours

Most questions can be answered asynchronously on our piazza discussion forum. For anything else, we hold office hours:

Tuesday	5:00 – 6:00pm Prafull Sharma via Zoom
Friday	2:00 – 3:00pm Prof. Sitzmann 32-340

If you expect office hours to be crowded, such as right before deadlines, we recommend you sign up for a specific time slot.

Course Level

6.S980 is aimed at graduate students and advanced undergraduate students. It's a first time offering/pilot course and thus scoped as a graduate-level seminar. This class will not count for qualification exams.

Even though this course discusses advanced, research-level topics, when designing this course we aim to be respectful of your time. For instance, assignments are provided as Jupyter Notebooks ready to run on Google Colab, no setup needed. We will ask you to write code only for the juicy parts that make you think, not the boilerplate that makes you sigh. ☺︎

Feedback

We want to hear from you on how to improve this class and your learning experience. Your frank and constructive feedback is much appreciated!

Most feedback will have to go into the next iteration of this class, but we aim to react quickly so that you may still benefit from potential adjustments yourself.

You can always approach teaching staff in-person after class, during office hours, or write us on Piazza. If you prefer to stay anonymous, use this form:

General Feedback

Syllabus

Module 0
Introduction Thu, Sept. 8^th	Learning goals How to think about the environment we're in? Computer Vision as Inverse Graphics Different ways of defining 3D	Recording Slides
Module 1: Fundamentals of Image Formation
Image Formation Tue, Sept. 13^th	Pinhole camera model Rigid-body transforms and camera poses Projective image formation Camera conventions	Recording Slides Assignment 1 Released
Multi-View Geometry Thu, Sept. 15^th	How 3D is encoded in multi-view images Epipolar Geometry Bundle Adjustment	Recording Slides
Module 2: 3D Scene Representations & Neural Rendering
Scene Representations Tue, Sept. 20^th	Surface Representations: Point Clouds, Depth Maps, Meshes Voxelgrids Continuous Representations: Neural Fields Hybrid Discrete-Continuous Representations How to parameterize geometry Pros and cons of different representations: run time, memory usage, etc	Recording Slides
Light Transport Thu, Sept. 22^nd	The Rendering Equation Radiance Materials Degrees of realism in computational light transport	Recording Slides Assignment 1 Due
Differentiable Rendering Tue, Sept. 27^th	Sphere Tracing Volume Rendering Light Field Rendering Inverse Graphics via Differentiable Rendering	Recording Slides Assignment 2 Released
Module 3: Representation Learning, Latent Variable Models, and Auto-encoding
Prior-Based Reconstruction Thu, Sept. 29^th	Neural Networks as models for prior-based inference Auto-encoding for representation Learning Scene Representation Learning Auto-Decoding Prior-based reconstruction of 3D scenes Global and local conditioning	Recording Slides
Advanced Inference Topics Tue, Oct. 4^th	Light Field Representations Attention-based inference and conditioning Inference via gradient-based meta-learning Contrastive learning, DiNO & Co	Recording Slides
Multi-view Geometry and Differentiable Rendering Thu, Oct. 6^th paper session	Student Paper Session 1	Slides Assignment 2 Due
Student Holiday Tue, Oct. 11^th holiday
Topics in Advanced Inference Thu, Oct. 13^th paper session	Student Paper Session 2	Slides
Unconditional Generative Models Tue, Oct. 18^th	Generative models of 3D scenes 3D GANs 3D Diffusion Models	Recording Slides Project Proposal Due
Unconditional Generative Models Thu, Oct. 20^th paper session	Student Paper Session 3	Assignment 3 Released
Module 4: Geometric Deep Learning
Representation Theory & Symmetries Tue, Oct. 25^th	The problem of generalization High-level intro to Representation Theory: Groups Representations Group actions Equivariance Invariance Important symmetry groups: Rotation Translation Scale	Recording Slides
Dynamic Scene Representations Tue, Nov. 8^th	Optical Flow Scene Flow Algorithms for estimating optical flow Algorithms for estimating scene flow Modeling motion as part of a scene representation, canonical spaces	Recording Slides
Guest Lecture: Ben Mildenhall Tue, Nov. 1^st guest lecture		Recording
Mid-Term Project Updates Thu, Nov. 3^rd project	3 minute presentations per team: Heilmeier Questions Results of 3 simple experiments Definition of 3 final experiments	Project Update Due
Contrastive Learning for Scene Representation Tue, Nov. 8^th		Slides
Geometric Deep Learning Thu, Nov. 10^th paper session	Student Paper Session 4
Module 5: Motion and Objectness
Guest Lecture: Prof. Andrea Tagliasacchi Tue, Nov. 15^th guest lecture		Assignment 3 Due
Dynamic Scene Representations Thu, Nov. 17^th paper session	Student Paper Session 5
TBD Tue, Nov. 22^nd
Thanksgiving Thu, Nov. 24^th holiday
Module 6: Applications
Robotics Tue, Nov. 29^th paper session	Student Paper Session 6
Vision Thu, Dec. 1^st guest lecture	Neural Scene Representation Applications
Scientific discovery (Cryo-EM) Tue, Dec. 6^th guest lecture	Guest Lecture by Prof. Ellen Zhong
Module 7: Final Project Presentations
Final Project Presentations 1/2 Thu, Dec. 8^th project	Students present their final project	Project Presentation Due
Final Project Presentations 2/2 Tue, Dec. 13^th project	Students present their final project	Project Presentation Due
Thu, Dec. 15^th	No final exam	Final Report Due

Course Contents

What you will learn

Prerequisites

Grading Policy

Schedule

Office Hours

Course Level

Feedback

Syllabus

Module 0

Introduction

Module 1: Fundamentals of Image Formation

Image Formation

Multi-View Geometry

Module 2: 3D Scene Representations & Neural Rendering

Scene Representations

Light Transport

Differentiable Rendering

Module 3: Representation Learning, Latent Variable Models, and Auto-encoding

Prior-Based Reconstruction

Advanced Inference Topics

Multi-view Geometry and Differentiable Rendering

Student Holiday

Topics in Advanced Inference

Unconditional Generative Models

Unconditional Generative Models

Module 4: Geometric Deep Learning

Representation Theory & Symmetries

Dynamic Scene Representations

Guest Lecture: Ben Mildenhall

Mid-Term Project Updates

Contrastive Learning for Scene Representation

Geometric Deep Learning

Module 5: Motion and Objectness

Guest Lecture: Prof. Andrea Tagliasacchi

Dynamic Scene Representations

TBD

Thanksgiving

Module 6: Applications

Robotics

Vision

Scientific discovery (Cryo-EM)

Module 7: Final Project Presentations

Final Project Presentations 1/2

Final Project Presentations 2/2

Related Courses and Credits

Image Attribution