Skip to content
Massachusetts Institute of Technology

Course Contents

This course dives into advanced concepts in computer vision. A first focus is geometry in computer vision, including image formation, represnetation theory for vision, classic multi-view geometry, multi-view geometry in the age of deep learning, differentiable rendering, neural scene representations, correspondence estimation, optical flow computation, and point tracking.

Next, we explore generative modeling and representation learning including image and video generation, guidance in diffusion models, conditional probabilistic models, as well as representation learning in the form of contrastive and masking-based methods.

Finally, we will explore the intersection of robotics and computer vision with "vision for embodied agents", investigating the role of vision for decision-making, planning and control.

Prerequisites

The formal prereqs of this course are: 6.7960 Deep Learning, (6.1200 or 6.3700), (18.06 or 18.C06).

This class is an advanced graduate-level class. You have to have working knowledge of the following topics, i.e., be able to work with them in numpy / scipy / pytorch. There will be no explainer on this and TAs will not be able to help you with these basics.

Deep Learning: Proficiency in Python, Numpy, and PyTorch, vectorized programming, and training deep neural networks. Convolutional neural networks, transformers, MLPs, backpropagation.

Linear Algebra: Vector spaces, matrix-matrix products, matrix-vector products, change-of-basis, inner products and norms, Eigenvalues, Eigenvectors, Singular Value Decomposition, Fourier Transform, Convolution.

Schedule

6.8300 will be held as lectures in room 26-100:

Tuesday  – 
Thursday  – 

Collaboration Policy

Problem sets should be written up individually and should reflect your own individual work. However, you may discuss with your peers, TAs, and instructors.

You should not copy or share complete solutions or ask others if your answer is correct, whether in person or via Piazza or Canvas.

If you work on the problem set with anyone other than TAs and instructors, list their names at the top of the problem set.

Office Hours

Monday  –  Ben 32-G575
 –  Ariba Zoom
 –  Tianyuan 45-205
Tuesday  –  Ate Zoom
Wednesday  –  Vincent 45-741B
 –  Vivek 32-D451
 –  Jane 32-D451
Thursday  –  Christian 32-370
Friday  –  Chenyu 32-262
 –  Isabella 32-D451
 –  Adriano Zoom

AI Assistants Policy

Our policy for using ChatGPT and other AI assistants is identical to our policy for using human assistants.

Just like you can come to office hours and ask a human questions (about the lecture material, clarifications about problem set questions, tips for getting started, etc), you are very welcome to do the same with AI assistants.

But: just like you are not allowed to ask an expert friend to do your homework for you, you also should not ask an expert AI.

If it is ever unclear, just imagine the AI as a human and apply the same norm as you would with a human.

If you work with any AI on a problem set, briefly describe which AI and how you used it at the top of the problem set (a few sentences is enough).

Grading Policy

Grading will be split between four module-specific problem sets and a final project:

65% Problem Sets

5 problem sets

note our separate policies on Collaboration, AI Assistants, and Late Submissions.

35% Final Project
Proposal (10%) + Blog Post (90%)

The final project will be a research project on perception of your choice:

  • You will run experiments and do analysis to explore your research question.
  • You will write up your research in the format of a blog post. Your post will include an explanation of background material, new investigations, and results you found.
  • You are encouraged to include plots, animations, and interactive graphics to make your findings clear. Here are some examples of well-presented research.
The final project will be graded for clarity and insight as well as novelty and depth of the experiments and analysis. Detailed guidance will be given later in the semester.

Late Submissions Policy

Homeworks will not be accepted more than 7 days after the deadline.

The grade on a homework received n days after the deadline (n<=7) will be multiplied by (1-n/14). We will round up to units of full days; submitting 1 hour late counts as using 1 late day.

Ten penalty days will be automatically waived for each student.

For example, let's say a student perfectly solves the first three homeworks but submits the first homework 8 days late, the second homework six days late and the third homework five days late. Then the student will score zero on the first homework, 100% on the second homework (using up six late days) and 100% * (1-1/14) = 92.9% on the third homework (using up the last four remaining late days).

The slack days are meant to be used for all the normal circumstances of life: being behind on work, forgetting the deadline, having a conference to attend, etc. We will not grant further extensions for these routine issues. For any extension request, such as serious medical issues or major life events, please contact S3 (for undergrads) or GradSupport (for grad students) and we will work with them to find a good solution.

We will not be able to support course incompletes.

FAQ

Q Can I take this course if I have not taken 6.7960 Deep Learning or a comparable class?
A I advise against it. There will be homework assignments where you will be asked to re-implement deep learning papers by yourself. If you don't have working knowledge in Deep Learning using Pytorch, you are unlikely to perform well on these assignments. We will generally not discuss topics that were discussed in the Deep Learning class, i.e., we will not be reiterating Transformers, CNNs, how to train these models, etc, but will assume that you are already familiar with them.
Q Is this class a CI-M class?
A No, this is a graduate class.
Q Is 6.8301 (the undergraduate version) taught this semester?
A The undergraduate version is taught this semester as well. For logistical reasons, it had to be renamed to 6.S058, and is taught by Profs. Bill Freeman and Phillip Isola. 6.S058 is a CI-M class, and does not have a prerequisite on 6.7960 Deep Learning.
Q Is attendance required? Will lectures be recorded?
A Attendance is at your discretion. Yes, lectures will be recorded and uploaded.

Syllabus

Module 0: Introduction to Computer Vision

Introduction to Vision

  • Administrativa & Logistics
  • Historical perspective on vision: problems identified so far
  • What is vision?
  • Outlook

Module 1: Module 1: Geometry, 3D and 4D

What is an Image: Pinhole Cameras & Projective Geometry

  • Image as a 2D signal
  • Image as measurements of a 3D light field
  • Pinhole camera and perspective projection
  • Camera motion and poses

Linear Image Processing & Transformations

  • Images as functions: continuous vs discrete
  • Function spaces and Fourier transform overview
  • Image filtering: gradients, Laplacians, convolutions
  • Multi-scale processing: Laplacian and multi-scale pyramids

Representation Theory in Vision

  • Groups
  • Group Representations
  • Steerable Bases
  • Invariant Operators
  • Finding Steerable Bases via the Eigendecomposition of Invariant Operators

No Class (Monday Schedule)

holiday
  • pset 1 due

Geometric Deep Learning and Vision

  • Equivariance and invariance
  • Regular Group Convolutions
  • Steerable Group Convolutions
  • Challenges of applying geometric techniques to vision tasks

Optical Flow

  • What is optical flow?
  • Color Constancy Assumption
  • Infinitesimal Optical Flow
  • Multi-Scale Cost and Correlation Volumes
  • Learning-based optical flow
  • RAFT

Point Tracking, Scene Flow and Feature Matching

  • Point Tracking
  • Scene Flow
  • Connection of Scene Flow and Pixel Motion, FlowMap
  • Sparse Correspondence and Invariant Descriptors
  • SIFT
  • pset 2 due

Multi-View Geometry

  • Triangulation in Light Fields: Infinetismal perspective
  • Finite Triangulation
  • Epipolar Geometry
  • Eight-point algorithm and bundle adjustment
  • Learning-Based Approaches: Dust3r & Mast3r

Differentiable Rendering: Data Structures and Signal Parameterizations

  • Surface-Based Representations
  • (Volumetric) Field Representations
  • Grid-based and adaptive data structures
  • Neural Fields
  • Hybrid Neural / discrete fields

Guest Lecture by Eric Brachmann: Deep Learning for 3D Reconstruction

guest lecture
  • Guest Lecture on Recent Techniques of Deep Learning for 3D Reconstruction
  • pset 3 due

Differentiable Rendering: Novel View Synthesis

  • Sphere tracing and volume rendering
  • Differentiable rendering techniques

Differentiable Rendering: Novel View Synthesis 2

  • Gaussian splatting
  • Advanced differentiable rendering methods

Differentiable Rendering: Prior-Based 3D Reconstruction and Novel View Synthesis

  • Global inference techniques
  • Light field inference and generative models
  • pset 4 due

Student Holiday: Spring Break

holiday

Student Holiday: Spring Break

holiday

Module 2: Module 2: Unsupervised Representation Learning and Generative Modeling

Introduction to Representation Learning and Generative Modeling

  • What makes a good representation? How do we know that we found one?
  • Generative modeling: density estimation, uncertainty modeling
  • Representation learning: task-relevant encoding
  • Surrogate tasks: compression, denoising, imputation

Peter Holderrieth: Diffusion Models 1

  • Mathematical Foundations of Diffusion Models
  • ODE and SDE perspective of Diffusion
  • Score Matching and Flow

Peter Holderrieth: Diffusion Models 2

  • Classifier-Free Guidance
  • Case study: SOTA models in image and video generation
  • SOTA architectures

Diffusion Models 3

  • A spectral perspective on image and video diffusion
  • Why do Diffusion Models generalize?

Sequence Generative Models

  • Auto-regressive and full-sequence models
  • Compounding errors and stability
  • Diffusion Forcing
  • History Guidance
  • pset 5 due

  • Project proposal due

3D Generative Modeling via Differentiable Rendering

  • Neural scene representation and rendering
  • Domain gap challenges in vision

Non-Generative Representation Learning

  • Alternative representation learning techniques
  • Applications in computer vision

Open Problems in Representation Learning

  • What are objects and how to learn them?
  • Discovering geometry in representations

Guest Lecture: Self-Supervised Learning for Vision

guest lecture
  • Guest Lecture

Module 3: Module 3: Vision for Embodied Agents

Introduction to Robotic Perception

  • Definition and challenges of embodied agents
  • Intersection with vision

Sequence Generative Modeling for Decision-Making

  • Diffusion-based planning and policy models

Vision for Inverse Kinematics and State Estimation

  • Inverse kinematics and state estimation models
  • Applications in robotics

TBD

  • TBD
  • Final project due