Skip to content
Massachusetts Institute of Technology

Course Contents

From a single picture, humans reconstruct a mental representation of the underlying 3D scene that is incredibly rich in information such as shape, appearance, physical properties, purpose, how things would feel, smell, sound, etc. These mental representations allow us to understand, navigate, and interact with our environment in our everyday lives. We learn this from little supervision, mainly by interacting with our world and observing the world around us.

Emerging neural scene representations aim to build models that replicate this behavior: Trained in a self-supervised manner, the goal is to reconstruct rich representations of 3D scenes that can then be used in downstream tasks such as computer vision, robotics, and graphics.

This course covers fundamental and advanced techniques in this field at the intersection of computer vision, computer graphics, and geometric deep learning. It will lay the foundations of how cameras see the world, how we can represent 3D scenes for artificial intelligence, how we can learn to reconstruct these representations from only a single image, how we can guarantee certain kinds of generalization, and how we can train these models in a self-supervised way.

What you will learn

  • Computer vision & computer graphics fundamentals (pinhole camera model, camera pose, projective geometry, light fields, multi-view geometry).
  • Volumetric scene representations for deep learning: Neural fields & voxel grids.
  • Differentiable rendering in 3D representations and light fields.
  • Inference algorithms for deep-learning based 3D reconstruction: convolutional neural networks, auto-decoding.
  • Basics of geometric deep learning: Representation theory, groups, group actions, equivariance, equivariant neural network architectures.
  • Self-supervised learning of scene representations via 3D-aware auto-encoding.
  • Applications of neural scene representations in graphics, robotics, vision, and scientific discovery.

For details see the Syllabus.

Prerequisites

No computer vision or graphics specific background is required. We will however generally expect you to:

  • have taken a machine learning class with a focus on deep learning
  • be comfortable with picking up new mathematics as needed ("mathematical maturity")

We expect you to have a solid knowledge of these specific topics:

  • linear algebra,
  • multivariate calculus,
  • probability theory, and
  • programming with vectors and matrices

    (such as in Numpy, PyTorch or Jax)

Grading Policy

Grading will be split between four module-specific problem sets and a final project:

70% Homework Assignments
4 programming assignments × 17.5% each
30% Final Project
Proposal (5%) + Mid-term Report (5%) + Final Report & Video/Presentation (20%)

We encourage you to discuss the ideas in your problem sets with other students and AI tools, but we expect you to code up solutions individually. For paper presentations and final projects students may group up in teams of two to three. You may use up to 5 late days total for the problem sets (for exceptional situations, contact the course staff).

Schedule

6.S980 will be held as lectures in room 4-270:

Tuesday  – 
Thursday  – 

Click the button above to ask your calendar application to subscribe using a webcal URL. Alternatively you can try this plain HTTPS URL.

Just in case, here are vendor-specific instructions for Google Calendar (under "Use a link to add a public calendar"), for Apple Calendar on macOS, and for Mozilla Thunderbird (under "Subscribe to it on the internet").

Office Hours

Most questions can be answered asynchronously on our piazza discussion forum. For anything else, we hold office hours:

Monday  –  Prof. Sitzmann

32-340

(office hour not for homework questions)

Thursday  –  David Charatan

Embodied Intelligence Common Area (outside 32-385)

Note that alternate locations may be announced via Piazza.

(ask homework questions here)

Course Level

6.S980 is aimed at graduate students and advanced undergraduate students. It's a second-time offering/pilot course and thus scoped as a graduate-level seminar. This class will not count for qualification exams.

Syllabus

Module 0

Introduction

  • Learning goals
  • How to think about the environment we're in?
  • Computer Vision as Inverse Graphics
  • Different ways of defining 3D

Module 1: Fundamentals of Image Formation

Image Formation

  • Pinhole camera model
  • Rigid-body transforms and camera poses
  • Projective image formation
  • Camera conventions
Assignment 1 Released

Multi-View Geometry

  • How 3D is encoded in multi-view images
  • Epipolar Geometry
  • Bundle Adjustment

Module 2: 3D Scene Representations & Neural Rendering

Lecture Canceled Due to Travel

canceled

Scene Representations I: 2.5D and Monocular Depth Prediction

  • Surface Representations: Point Clouds, Depth Maps, Meshes
  • Monocular Depth Prediction
  • Stereo Depth Prediction
  • Self-Supervised Monocular Depth & Ego-Motion Prediction

Scene Representations II: Surface Representations and Discrete Field Representations

  • Voxelgrids
  • Continuous Representations: Neural Fields
  • Hybrid Discrete-Continuous Representations
  • How to parameterize geometry
  • Pros and cons of different representations: run time, memory usage, etc
Assignment 1 Due

Scene Representations III: Neural Fields and Hybrid Discrete-Neural Field Representations

  • The Rendering Equation
  • Radiance
  • Materials
  • Degrees of realism in computational light transport
Assignment 2 Released

Light Transport

  • The Rendering Equation
  • Radiance
  • Materials
  • Degrees of realism in computational light transport

Differentiable Rendering

  • Sphere Tracing
  • Volume Rendering
  • Light Field Rendering
  • Inverse Graphics via Differentiable Rendering

Student Holiday

holiday

Differentiable Rendering II

  • Gaussian Splatting

Module 3: Representation Learning, Latent Variable Models, and Auto-encoding

Prior-Based Reconstruction

  • Neural Networks as models for prior-based inference
  • Auto-encoding for representation Learning
  • Auto-Decoding
Assignment 2 Due

Prior-Based Reconstruction II

  • Global and local conditioning
  • Directly Inferring 3D scenes from images
  • Epipolar Line inference with volume rendering
Assignment 3 Released

Prior-Based Reconstruction III

  • Light Field Representations
  • Epipolar Line inference with light field rendering
  • Attention-based inference and conditioning
  • Inference via gradient-based meta-learning
  • Contrastive learning, DiNO & Co

How to do Research?

  • Advice on picking research topics
  • The Heilmeier Catechism
  • Expectations for your project proposal
  • The role of publishing
  • Why research is fun!

Removing Camera Poses

  • Learning to infer camera poses and 3D scenes
  • RUST, FlowCam & Co

Unconditional and Text-Conditional Generative Models

  • Generative models of 3D scenes
  • 3D GANs
  • 3D Diffusion Models
Assignment 3 Due

Conditional Probabilistic Models

  • Sampling from the distribution of 3D scenes conditioned on images
Assignment 4 Released

Module 4: Motion and Objectness

Dynamic Scene Representations

  • Optical Flow
  • Scene Flow
  • Algorithms for estimating optical flow
  • Algorithms for estimating scene flow
Project Proposal Due

Dynamic Scene Representations II

  • Modeling motion as part of a scene representation

Module 5: Geometric Deep Learning

Representation Theory & Symmetries

  • The problem of generalization
  • High-level intro to Representation Theory:
    • Groups
    • Representations
    • Group actions
    • Equivariance
    • Invariance
  • Important symmetry groups:
    • Rotation
    • Translation
    • Scale

How to give talks

Assignment 4 Due

Thanksgiving

holiday

Mid-Term Project Updates

project
  • 2 minute presentations per team:
    • Answers to Heilmeier Questions
    • Results of first simple experiment
    • Definition and expected outcome of final experiment
Project Update Due

Module 6: Applications

Robotics

  • Guest Lecture by Ge Yang, Postdoc @ MIT

Scientific discovery (Cryo-EM)

guest lecture
  • Guest Lecture by Prof. Ellen Zhong

Module 7: Final Project Presentations

No Class (Go Work on Projects!)

canceled

Virtual Final Project Presentations

project
  • Students present their final projects on zoom / deadline to upload presentation videos
Project Presentation Due

Final Project Reports Due

project
Project Reports Due