Course Contents
This course dives into advanced concepts in computer vision. A first focus is geometry in computer vision, including image formation, a closer look at the fourier transform and its relationship to geometric deep learning, classic multi-view geometry, multi-view geometry in the age of deep learning, differentiable rendering, neural scene representations, correspondence estimation, optical flow computation, and point tracking.
Next, we explore generative modeling and representation learning including image and video generation, guidance in diffusion models, conditional probabilistic models, as well as representation learning in the form of contrastive and masking-based methods.
Finally, we will explore the intersection of robotics and computer vision with imitation learning and world models.
Prerequisites
The formal prereqs of this course are: 6.7960 Deep Learning, (6.1200 or 6.3700), (18.06 or 18.C06).
This class is an advanced graduate-level class. You have to have working knowledge of the following topics, i.e., be able to work with them in numpy / scipy / pytorch. There will be no explainer on this and TAs will not be able to help you with these basics.
Deep Learning: Proficiency in Python, Numpy, and PyTorch, vectorized programming, and training deep neural networks. Convolutional neural networks, transformers, MLPs, backpropagation.
Linear Algebra: Vector spaces, matrix-matrix products, matrix-vector products, change-of-basis, inner products and norms, Eigenvalues, Eigenvectors, Singular Value Decomposition, Fourier Transform, Convolution.
Schedule
6.8300 will be held as lectures in room 26-100:
| Tuesday |  –  |
| Thursday |  –  |
Collaboration Policy
Problem sets should be written up individually and should reflect your own individual work. You cannot copy code from another student. However, you may discuss with your peers, TAs, and instructors.
You should not copy or share complete solutions or ask others if your answer is correct, whether in person or via Piazza or Canvas.
If you work on the problem set with anyone other than TAs and instructors, list their names at the top of the problem set.
Office Hours
Late Submissions Policy
There are no late days for problem sets, the submissions close with the deadline and late submissions will not be graded.
Grading Policy
Grading will be split between five module-specific problem sets and a final project:
| 10% | Problem Sets 5 problem sets note our separate policies on Collaboration, AI Assistants, and Late Submissions. |
| 45% | In-class Midterm Pen-and-paper, closed-book in-class quiz. Will deal with content of homework assignments and lectures. |
| 45% | Final Project Blog Post (80%) + Recorded two-minute Talk(20%) |
The final project will be a research project on perception of your choice:
- You will run experiments and do analysis to explore your research question.
- You will write up your research in the format of a blog post. Your post will include an explanation of background material, new investigations, and results you found.
- You are encouraged to include plots, animations, and interactive graphics to make your findings clear. Here are some examples of well-presented research.
AI Assistants Policy
Different from last year, this year, we welcome you to finish the problem set with the help of AI assistants. The homeworks are designed to give you a deeper, practical understanding of the course material, but are not the primary means of assessment any more - the midterm quiz, which will deal with content of homework assignments and lectures, will be the primary means of assessment. We used this additional degree of freedom to make the homework assignments more educational and interactive, and it will be easier to judge at the time of submission if you got everything right.
FAQ
| Q | Can I take this course if I have not taken 6.7960 Deep Learning or a comparable class? |
| A | We advise against it. There will be homework assignments where you will be asked to re-implement deep learning papers by yourself. If you don't have working knowledge in Deep Learning using Pytorch, you are unlikely to perform well on these assignments. We will generally not discuss topics that were discussed in the Deep Learning class, i.e., we will not be reiterating Transformers, CNNs, how to train these models, etc, but will assume that you are already familiar with them. |
| Q | Is this class a CI-M class? |
| A | No, this is a graduate class. |
| Q | Is 6.8301 (the undergraduate version) taught this semester? |
| A | The undergraduate version is taught this semester as well. For logistical reasons, it had to be renamed to 6.S058 / 6.4300, and is taught by Profs. Bill Freeman and Phillip Isola. 6.S058 is a CI-M class, and does not have a prerequisite on 6.7960 Deep Learning. |
| Q | Is attendance required? Will lectures be recorded? |
| Attendance is at your discretion. Yes, lectures will be recorded and uploaded. |
Syllabus
Module 0: Introduction to Computer Vision | ||
|---|---|---|
Introduction to Vision |
| |
Module 1: Module 1: Geometry, 3D and 4D | ||
What is an Image: Pinhole Cameras & Projective Geometry |
| |
Linear Image Processing & Transformations |
| |
Representation Theory in Vision |
| |
No Class (Monday Schedule)holiday | | |
Geometric Deep Learning and Vision |
| |
Optical Flow |
| |
Point Tracking, Scene Flow and Feature Matching |
| |
Multi-View Geometry |
| |
Differentiable Rendering: Data Structures and Signal Parameterizations |
| |
TBD | | |
Differentiable Rendering: Novel View Synthesis |
| |
Differentiable Rendering: Novel View Synthesis 2 |
| |
Differentiable Rendering: Prior-Based 3D Reconstruction and Novel View Synthesis |
| |
Student Holiday: Spring Breakholiday | ||
Student Holiday: Spring Breakholiday | ||
Module 2: Module 2: Unsupervised Representation Learning and Generative Modeling | ||
Introduction to Representation Learning and Generative Modeling |
| |
Diffusion Models 1 |
| |
Diffusion Models 2 |
| |
In-Class Midterm Quizquiz |
| |
Diffusion Models 3 |
| |
Sequence Generative Models |
| |
Sequence Generative Models II |
| |
Non-Generative Representation Learning |
| |
TBD | ||
Module 3: Module 3: Vision for Embodied Agents | ||
Introduction to Robotic Perception |
| |
TBD | ||
Learning Skills from Demonstrations |
| |
TBD | ||