Skip to content
Massachusetts Institute of Technology

Course Contents

From a single picture, humans reconstruct a mental representation of the underlying 3D scene that is incredibly rich in information such as shape, appearance, physical properties, purpose, how things would feel, smell, sound, etc. These mental representations allow us to understand, navigate, and interact with our environment in our everyday lives. We learn this from little supervision, mainly by interacting with our world and observing the world around us.

Emerging neural scene representations aim to build models that replicate this behavior: Trained in a self-supervised manner, the goal is to reconstruct rich representations of 3D scenes that can then be used in downstream tasks such as computer vision, robotics, and graphics.

This course covers fundamental and advanced techniques in this field at the intersection of computer vision, computer graphics, and geometric deep learning. It will lay the foundations of how cameras see the world, how we can represent 3D scenes for artificial intelligence, how we can learn to reconstruct these representations from only a single image, how we can guarantee certain kinds of generalization, and how we can train these models in a self-supervised way.

What you will learn

  • Computer vision & computer graphics fundamentals (pinhole camera model, camera pose, projective geometry, light fields, multi-view geometry).
  • Volumetric scene representations for deep learning: Neural fields & voxel grids.
  • Differentiable rendering in 3D representations and light fields.
  • Inference algorithms for deep-learning based 3D reconstruction: convolutional neural networks, auto-decoding.
  • Basics of geometric deep learning: Representation theory, groups, group actions, equivariance, equivariant neural network architectures.
  • Self-supervised learning of scene representations via 3D-aware auto-encoding.
  • Applications of neural scene representations in graphics, robotics, vision, and scientific discovery.

For details see the Syllabus.

Prerequisites

No computer vision or graphics specific background is required. We will however generally expect you to:

  • have taken a machine learning class with a focus on deep learning
  • be comfortable with picking up new mathematics as needed ("mathematical maturity")

We expect you to have a solid knowledge of these specific topics:

  • linear algebra,
  • multivariate calculus,
  • probability theory, and
  • programming with vectors and matrices

    (such as in Numpy, Pytorch or Jax)

Grading Policy

Grading will be split between three module-specific problem sets, student paper presentations, and a final project:

60% Homework Assignments
3 Jupyter Notebook assignments × 20% each
10% Paper Discussion

20 minutes presentation + 10 minutes Q&A

sign up for a specific paper session and time slot

30% Final Project
Proposal (5%) + Mid-term Report (5%) + Final Report & Video (20%)

We encourage you to discuss the ideas in your problem sets with other students, but we expect you to code up solutions individually. For paper presentations and final projects students may group up in teams of two to three. Students may use up to five late days to help accomodate exceptional situations.

Schedule

6.S980 will be held as lectures in room 32-124:

Tuesday  – 
Thursday  – 
Calendar button not working as expected?

Click the above button to ask your calendar application to subscribe using a webcal URL. Alternatively you can try this plain HTTPS URL.

Here are vendor-specific instructions for Google Calendar (under "Use a link to add a public calendar"), for Apple Calendar on macOS, and for Mozilla Thunderbird (under "Subscribe to it on the internet").

Office Hours

Most questions can be answered asynchronously on our piazza discussion forum. For anything else, we hold office hours:

Tuesday  –  Prafull Sharma via Zoom
Friday  –  Prof. Sitzmann 32-340

If you expect office hours to be crowded, such as right before deadlines, we recommend you sign up for a specific time slot.

Course Level

6.S980 is aimed at graduate students and advanced undergraduate students. It's a first time offering/pilot course and thus scoped as a graduate-level seminar. This class will not count for qualification exams.

Even though this course discusses advanced, research-level topics, when designing this course we aim to be respectful of your time. For instance, assignments are provided as Jupyter Notebooks ready to run on Google Colab, no setup needed. We will ask you to write code only for the juicy parts that make you think, not the boilerplate that makes you sigh. ☺︎

Feedback

We want to hear from you on how to improve this class and your learning experience. Your frank and constructive feedback is much appreciated!

Most feedback will have to go into the next iteration of this class, but we aim to react quickly so that you may still benefit from potential adjustments yourself.

You can always approach teaching staff in-person after class, during office hours, or write us on Piazza. If you prefer to stay anonymous, use this form:

Syllabus

Module 0

Introduction

  • Learning goals
  • How to think about the environment we're in?
  • Computer Vision as Inverse Graphics
  • Different ways of defining 3D

Module 1: Fundamentals of Image Formation

Image Formation

  • Pinhole camera model
  • Rigid-body transforms and camera poses
  • Projective image formation
  • Camera conventions
Assignment 1 Released

Multi-View Geometry

  • How 3D is encoded in multi-view images
  • Epipolar Geometry
  • Bundle Adjustment

Module 2: 3D Scene Representations & Neural Rendering

Scene Representations

  • Surface Representations: Point Clouds, Depth Maps, Meshes
  • Voxelgrids
  • Continuous Representations: Neural Fields
  • Hybrid Discrete-Continuous Representations
  • How to parameterize geometry
  • Pros and cons of different representations: run time, memory usage, etc

Light Transport

  • The Rendering Equation
  • Radiance
  • Materials
  • Degrees of realism in computational light transport
Assignment 1 Due

Differentiable Rendering

  • Sphere Tracing
  • Volume Rendering
  • Light Field Rendering
  • Inverse Graphics via Differentiable Rendering
Assignment 2 Released

Module 3: Representation Learning, Latent Variable Models, and Auto-encoding

Prior-Based Reconstruction

  • Neural Networks as models for prior-based inference
  • Auto-encoding for representation Learning
  • Scene Representation Learning
  • Auto-Decoding
  • Prior-based reconstruction of 3D scenes
  • Global and local conditioning

Advanced Inference Topics

  • Light Field Representations
  • Attention-based inference and conditioning
  • Inference via gradient-based meta-learning
  • Contrastive learning, DiNO & Co

Multi-view Geometry and Differentiable Rendering

paper session
  • Student Paper Session 1
Assignment 2 Due

Student Holiday

holiday

Topics in Advanced Inference

paper session
  • Student Paper Session 2

Unconditional Generative Models

  • Generative models of 3D scenes
  • 3D GANs
  • 3D Diffusion Models
Project Proposal Due

Unconditional Generative Models

paper session
  • Student Paper Session 3
Assignment 3 Released

Module 4: Geometric Deep Learning

Representation Theory & Symmetries

  • The problem of generalization
  • High-level intro to Representation Theory:
    • Groups
    • Representations
    • Group actions
    • Equivariance
    • Invariance
  • Important symmetry groups:
    • Rotation
    • Translation
    • Scale

Dynamic Scene Representations

  • Optical Flow
  • Scene Flow
  • Algorithms for estimating optical flow
  • Algorithms for estimating scene flow
  • Modeling motion as part of a scene representation, canonical spaces

Guest Lecture: Ben Mildenhall

guest lecture

Mid-Term Project Updates

project
  • 3 minute presentations per team:
    • Heilmeier Questions
    • Results of 3 simple experiments
    • Definition of 3 final experiments
Project Update Due

Contrastive Learning for Scene Representation

Geometric Deep Learning

paper session
  • Student Paper Session 4

Module 5: Motion and Objectness

Guest Lecture: Prof. Andrea Tagliasacchi

guest lecture
Assignment 3 Due

Dynamic Scene Representations

paper session
  • Student Paper Session 5

TBD

Thanksgiving

holiday

Module 6: Applications

Robotics

paper session
  • Student Paper Session 6

Vision

guest lecture
  • Neural Scene Representation Applications

Scientific discovery (Cryo-EM)

guest lecture
  • Guest Lecture by Prof. Ellen Zhong

Module 7: Final Project Presentations

Final Project Presentations 1/2

project
  • Students present their final project
Project Presentation Due

Final Project Presentations 2/2

project
  • Students present their final project
Project Presentation Due

  • No final exam
Final Report Due