Advances in Computer Vision – Scene Representation Group

Course Contents

This course dives into advanced concepts in computer vision. A first focus is geometry in computer vision, including image formation, a closer look at the fourier transform and its relationship to geometric deep learning, classic multi-view geometry, multi-view geometry in the age of deep learning, differentiable rendering, neural scene representations, correspondence estimation, optical flow computation, and point tracking.

Next, we explore generative modeling and representation learning including image and video generation, guidance in diffusion models, conditional probabilistic models, as well as representation learning in the form of contrastive and masking-based methods.

Finally, we will explore the intersection of robotics and computer vision with imitation learning and world models.

Prerequisites

The formal prereqs of this course are: 6.7960 Deep Learning, (6.1200 or 6.3700), (18.06 or 18.C06).

This class is an advanced graduate-level class. You have to have working knowledge of the following topics, i.e., be able to work with them in numpy / scipy / pytorch. There will be no explainer on this and TAs will not be able to help you with these basics.

Deep Learning: Proficiency in Python, Numpy, and PyTorch, vectorized programming, and training deep neural networks. Convolutional neural networks, transformers, MLPs, backpropagation.

Linear Algebra: Vector spaces, matrix-matrix products, matrix-vector products, change-of-basis, inner products and norms, Eigenvalues, Eigenvectors, Singular Value Decomposition, Fourier Transform, Convolution.

Schedule

6.8300 will be held as 1.5 hour long lectures in room 26-100:

Tuesday	1:00 – 2:30 pm
Thursday	1:00 – 2:30 pm

Collaboration Policy

Problem sets should be written up individually and should reflect your own individual work. You cannot copy code from another student. However, you may discuss with your peers, TAs, and instructors.

You should not copy or share complete solutions or ask others if your answer is correct, whether in person or via Piazza or Canvas.

If you work on the problem set with anyone other than TAs and instructors, list their names at the top of the problem set.

Office Hours

Monday	2:00 – 3:00 pm	36-156
Tuesday	5:00 – 6:00 pm	Zoom
Wednesday	11:00 – 12:00 pm	36-112
Thursday	5:00 – 6:00 pm	Zoom
Friday	3:00 – 4:00 pm	36-153

Office Hours are subject to change based on staff availability. Please check Piazza for Office Hour schedule updates.

Late Submissions Policy

There are no late days for problem sets, the submissions close with the deadline and late submissions will not be graded.

Grading Policy

Grading will be split between five module-specific problem sets and a final project:

10%

Problem Sets

5 problem sets

note our separate policies on Collaboration, AI Assistants, and Late Submissions.

45%

In-class Midterm
Pen-and-paper, closed-book in-class quiz. Will deal with content of homework assignments and lectures.

45%

Final Project
Blog Post (80%) + Recorded two-minute Talk(20%)

The final project will be a research project on perception of your choice:

You will run experiments and do analysis to explore your research question.
You will write up your research in the format of a blog post. Your post will include an explanation of background material, new investigations, and results you found.
You are encouraged to include plots, animations, and interactive graphics to make your findings clear. Here are some examples of well-presented research.

The final project will be graded for clarity and insight as well as novelty and depth of the experiments and analysis. Detailed guidance will be given later in the semester.

FAQ

Q	Can I take this course if I have not taken 6.7960 Deep Learning or a comparable class?
A	We advise against it. There will be homework assignments where you will be asked to re-implement deep learning papers by yourself. If you don't have working knowledge in Deep Learning using Pytorch, you are unlikely to perform well on these assignments. We will generally not discuss topics that were discussed in the Deep Learning class, i.e., we will not be reiterating Transformers, CNNs, how to train these models, etc, but will assume that you are already familiar with them.
Q	Is this class a CI-M class?
A	No, this is a graduate class.
Q	Is 6.8301 (the undergraduate version) taught this semester?
A	The undergraduate version is taught this semester as well. For logistical reasons, it had to be renamed to 6.S058 / 6.4300, and is taught by Profs. Bill Freeman and Phillip Isola. 6.S058 is a CI-M class, and does not have a prerequisite on 6.7960 Deep Learning.
Q	Is attendance required? Will lectures be recorded?
A	Attendance is at your discretion. Yes, lectures will be recorded and uploaded.
Q	Is this a TQE course?
A	Yes.

Contact Instructors

If you have a question regarding extensions, accommodations, the midterm quiz, or other logistics, please email 6.830-instructors-sp26@mit.edu. Please note that extensions will ONLY be offered for extenuating circumstances with S³ support, so please reach out to S³ first and cc your S³ dean when you reach out to us.

AI Assistants Policy

Different from last year, this year, we welcome you to finish the problem set with the help of AI assistants. The homeworks are designed to give you a deeper, practical understanding of the course material, but are not the primary means of assessment any more - the midterm quiz, which will deal with content of homework assignments and lectures, will be the primary means of assessment. We used this additional degree of freedom to make the homework assignments more educational and interactive, and it will be easier to judge at the time of submission if you got everything right.

Syllabus

Module 0: Introduction to Computer Vision
Introduction to Vision Tue, Feb. 3^rd	Administrativa & Logistics Historical perspective on vision: problems identified so far What is vision? Outlook	Recording Slides
Module 1: Module 1: Geometry, 3D and 4D
What is an Image: Pinhole Cameras & Projective Geometry Thu, Feb. 5^th	Image as a 2D signal Image as measurements of a 3D light field Pinhole camera and perspective projection Camera motion and poses	Recording Slides pset 1 out
Linear Image Processing & Transformations Tue, Feb. 10^th	Images as functions: continuous vs discrete Function spaces and Fourier transform overview Image filtering: gradients, Laplacians, convolutions Multi-scale processing: Laplacian and multi-scale pyramids	Recording Slides
Representation Theory in Vision Thu, Feb. 12^th	Groups Group Representations Steerable Bases Invariant Operators Finding Steerable Bases via the Eigendecomposition of Invariant Operators	Recording Slides
No Class (Monday Schedule) Tue, Feb. 17^th holiday		pset 1 due pset 2 out
Geometric Deep Learning and Vision Thu, Feb. 19^th	Equivariance and invariance Regular Group Convolutions Steerable Group Convolutions Challenges of applying geometric techniques to vision tasks
Optical Flow Tue, Feb. 24^th	What is optical flow? Color Constancy Assumption Infinitesimal Optical Flow Multi-Scale Cost and Correlation Volumes Learning-based optical flow RAFT
Point Tracking, Scene Flow and Feature Matching Thu, Feb. 26^th	Point Tracking Scene Flow Connection of Scene Flow and Pixel Motion, FlowMap Sparse Correspondence and Invariant Descriptors SIFT	pset 2 due pset 3 out
Multi-View Geometry Tue, March 3^rd	Triangulation in Light Fields: Infinetismal perspective Finite Triangulation Epipolar Geometry Eight-point algorithm and bundle adjustment Learning-Based Approaches: Dust3r & Mast3r
Differentiable Rendering: Data Structures and Signal Parameterizations Thu, March 5^th	Surface-Based Representations (Volumetric) Field Representations Grid-based and adaptive data structures Neural Fields Hybrid Neural / discrete fields
TBD Tue, March 10^th		pset 3 due pset 4 out
Differentiable Rendering: Novel View Synthesis Thu, March 12^th	Sphere tracing and volume rendering Differentiable rendering techniques
Differentiable Rendering: Novel View Synthesis 2 Tue, March 17^th	Gaussian splatting Advanced differentiable rendering methods	Final Project Guidelines Released
Differentiable Rendering: Prior-Based 3D Reconstruction and Novel View Synthesis Thu, March 19^th	Global inference techniques Light field inference and generative models	pset 4 due
Student Holiday: Spring Break Tue, March 24^th holiday
Student Holiday: Spring Break Thu, March 26^th holiday
Module 2: Module 2: Unsupervised Representation Learning and Generative Modeling
Introduction to Representation Learning and Generative Modeling Tue, March 31^st	What makes a good representation? How do we know that we found one? Generative modeling: density estimation, uncertainty modeling Representation learning: task-relevant encoding Surrogate tasks: compression, denoising, imputation	pset 5 out
Diffusion Models 1 Thu, April 2^nd	Mathematical Foundations of Diffusion Models ODE and SDE perspective of Diffusion Score Matching and Flow
Diffusion Models 2 Tue, April 7^th	Classifier-Free Guidance Case study: SOTA models in image and video generation SOTA architectures
In-Class Midterm Quiz Thu, April 9^th quiz	In-class pen-and-paper, closed-book quiz.
Diffusion Models 3 Tue, April 14^th	A spectral perspective on image and video diffusion Why do Diffusion Models generalize?	pset 5 due Project proposal due
Sequence Generative Models Thu, April 16^th	Auto-regressive and full-sequence models Compounding errors and stability Diffusion Forcing History Guidance
Sequence Generative Models II Tue, April 21^st	Another perspective on Sequence generation History Guidance
Non-Generative Representation Learning Thu, April 23^rd	Alternative representation learning techniques Applications in computer vision
TBD Tue, April 28^th
Module 3: Module 3: Vision for Embodied Agents
Introduction to Robotic Perception Thu, April 30^th	Definition and challenges of embodied agents Intersection with vision Controlling Robots from Vision
TBD Tue, May 5^th
Learning Skills from Demonstrations Thu, May 7^th	Behavior Cloning and Imitation Learning from Vision
TBD Tue, May 12^th		Final project due

Course Contents

Prerequisites

Schedule

Collaboration Policy

Office Hours

Late Submissions Policy

Grading Policy

FAQ

Contact Instructors

AI Assistants Policy

Syllabus

Module 0: Introduction to Computer Vision

Introduction to Vision

Module 1: Module 1: Geometry, 3D and 4D

What is an Image: Pinhole Cameras & Projective Geometry

Linear Image Processing & Transformations

Representation Theory in Vision

No Class (Monday Schedule)

Geometric Deep Learning and Vision

Optical Flow

Point Tracking, Scene Flow and Feature Matching

Multi-View Geometry

Differentiable Rendering: Data Structures and Signal Parameterizations

TBD

Differentiable Rendering: Novel View Synthesis

Differentiable Rendering: Novel View Synthesis 2

Differentiable Rendering: Prior-Based 3D Reconstruction and Novel View Synthesis

Student Holiday: Spring Break

Student Holiday: Spring Break

Module 2: Module 2: Unsupervised Representation Learning and Generative Modeling

Introduction to Representation Learning and Generative Modeling

Diffusion Models 1

Diffusion Models 2

In-Class Midterm Quiz

Diffusion Models 3

Sequence Generative Models

Sequence Generative Models II

Non-Generative Representation Learning

TBD

Module 3: Module 3: Vision for Embodied Agents

Introduction to Robotic Perception

TBD

Learning Skills from Demonstrations

TBD

Related Courses and Credits

Image Attribution