Skip to content
Massachusetts Institute of Technology

Course Contents

This course dives into advanced concepts in computer vision. A first focus is geometry in computer vision, including image formation, represnetation theory for vision, classic multi-view geometry, multi-view geometry in the age of deep learning, differentiable rendering, neural scene representations, correspondence estimation, optical flow computation, and point tracking.

Next, we explore generative modeling and representation learning including image and video generation, guidance in diffusion models, conditional probabilistic models, as well as representation learning in the form of contrastive and masking-based methods.

Finally, we will explore the intersection of robotics and computer vision with "vision for embodied agents", investigating the role of vision for decision-making, planning and control.

Schedule

Tuesday  – 
Thursday  – 

Syllabus

Module 0: Introduction to Computer Vision

Introduction to Vision

  • Historical perspective on vision: problems identified so far
  • Impact of deep learning: dataset-driven solutions
  • Unsolved challenges: OOD generalization, learning from limited data, world models

Module 1: Module 1: Geometry, 3D and 4D

What is an Image: Pinhole Cameras & Projective Geometry

  • Image as a 2D signal
  • Image as measurements of a 3D light field
  • Pinhole camera and perspective projection
  • Camera motion and poses

Linear Image Processing & Transformations

  • Images as functions: continuous vs discrete
  • Function spaces and Fourier transform overview
  • Image filtering: gradients, Laplacians, convolutions
  • Multi-scale processing: Laplacian and multi-scale pyramids

Representation Theory in Vision

  • Representations of groups and spaces
  • Lie groups and exponential maps
  • Equivariance and invariance
  • Shift-equivariance in CNNs and Fourier transform connections

No Class (Monday Schedule)

holiday

Geometric Deep Learning (or the lack thereof) for Vision

  • Overview of geometric deep learning principles
  • Challenges of applying geometric techniques to vision tasks
  • Potential research directions

Correspondence, Optical Flow, and Scene Flow

  • Single images vs dynamic measurements
  • Sparse Correspondence and Invariant Descriptors
  • SIFT and SuperGlue
  • Scene Flow

Correspondence, Optical Flow, and Scene Flow 2

  • Dense Correspondence
  • Optical flow equation
  • RAFT and point tracking methods

Multi-View Geometry 1

  • Triangulation and epipolar geometry
  • Eight-point algorithm and bundle adjustment
  • Depth prediction and self-supervised approaches

Data Structures and Signal Parameterizations

  • Efficient representations of signals
  • Grid-based and adaptive data structures
  • Applications in vision tasks

Differentiable Rendering & Novel View Synthesis

  • Sphere tracing and volume rendering
  • Differentiable rendering techniques

Differentiable Rendering & Novel View Synthesis 2

  • Gaussian splatting
  • Advanced differentiable rendering methods

Prior-Based 3D Reconstruction and Novel View Synthesis

  • Global inference techniques
  • Light field inference and generative models

Open Problems in Geometry, 3D, and 4D

  • Multi-view generative models
  • Open research directions

Student Holiday: Spring Break

holiday

Student Holiday: Spring Break

holiday

Module 2: Module 2: Unsupervised Representation Learning and Generative Modeling

Introduction to Representation Learning and Generative Modeling

  • Generative modeling: density estimation, uncertainty modeling
  • Representation learning: task-relevant encoding
  • Surrogate tasks: compression, denoising, imputation

Latent Variable Models and VAEs

  • Latent variable models: unconditional and conditional priors
  • VAEs and generative query networks
  • Comparative analysis of latent spaces

Diffusion Models

  • Optimal Denoiser Perspective on Diffusion
  • Spectral Perspective on Diffusion
  • Generalization in Diffusion Models

Diffusion Models 2

  • Guidance
  • Score Distillation Sampling

Sequence Generative Models

  • Auto-regressive and full-sequence models
  • Compounding errors and stability

Bridging Domain Gaps

  • Neural scene representation and rendering
  • Domain gap challenges in vision

Non-Generative Representation Learning

  • Alternative representation learning techniques
  • Applications in computer vision

Open Problems in Representation Learning

  • What are objects and how to learn them?
  • Discovering geometry in representations

Module 3: Module 3: Vision for Embodied Agents

Introduction to Robotic Perception

  • Definition and challenges of embodied agents
  • Intersection with vision

Sequence Generative Modeling for Decision-Making

  • Diffusion-based planning and policy models

Vision for Inverse Kinematics and State Estimation

  • Inverse kinematics and state estimation models
  • Applications in robotics