ECE 6123 – Image and Video Processing (Fall 2023)

Course Description:  This course introduces fundamentals of image and video processing, including color image capture and representation; contrast enhancement; spatial domain filtering; two-dimensional (2D) Fourier transform and frequency domain interpretation of linear convolution; image sampling and resizing; multi-resolution image representation using pyramid and wavelet transforms; feature point detection and feature correspondence; geometric transformation, image registration, and image stitching;  video motion characterization and estimation; video stabilization and panoramic view generation; image representation using orthogonal transforms; sparsity-based image recovery; basic image compression techniques and standards (JPEG and JPEG2000 standard); video compression using adaptive spatial and temporal prediction; video coding standards (MPEGx/H26x); Stereo and multi- view image and video processing (depth from disparity, disparity estimation, video synthesis, compression). Basics of deep learning for image processing and computer vision will also be introduced.  Students will learn to implement selected algorithms in Python.   Prior experience with Python and deep learning are not required. You will learn as the course progresses. A class project, preferably in teams of 2 to 3 people,  is required.

Prerequisites:  Graduate status. ECE-GY 6113 and ECE-GY 6303 preferred but not required. Should have good background in linear algebra / matrix theory, probability, and signals and systems. Undergraduate students must have completed EE-UY 3054 Signals and systems and EE-UY 2233 Probability, and linear algebra.

Instructor:  Professor Yao Wang, 370 Jay Street, Rm 957, (646)-997-3469, Email: yaowang at nyu.edu. Homepage  Office hour: Mon. 4:00-5:00 PM (online), Wed. 4:30- 6:00 PM (after class or in office). Monday  hour Zoom links available on Brightspace. Contact me via email to schedule other times.

Teaching Assistants:  Jinhan Zhang ( jz5952 at nyu.edu). Office Hour:  Tues 2:30-3:30.  Chenhao Zhang ( cz2632 at nyu.edu ). Office hour: Fri 11:00-noon.  In person: 370 Jay St, rm 966 (also available over zoom if you cannot come in person). Zoom link available on bright space.

Course Schedule:  In person: Wed. 2:00 PM – 4:30 PM, 2 MTC Rm. 907, Brooklyn.  

Text Book/References: 

  1. Richard Szeliski, Computer Vision: Algorithms and Applications. 2nd Edition (Sept. 30,2021 version) (Available online:”Link“) (Cover most of the material, except sparsity-based image processing and image and video coding)
  2. (Optional) Y. Wang, J. Ostermann, and Y.Q.Zhang, Video Processing and Communications. Prentice Hall, 2002. “Link” (Reference for Fourier transforms, image and video coding, motion estimation, and stereo)
  3. (Optional) R. C. Gonzalez and R. E. Woods, Digital Image Processing, Prentice Hall, (3rd Edition) 2008. ISBN number 9780131687288. “Link” (Good reference for basic image processing, wavelet transforms and image coding).

Course Structure:  The class will consist of weekly lectures, weekly written homework assignments (not graded, solution will be given), roughly biweekly short quizzes (based on homework assignment), computer assignments (CA), a team project (2-3 people in a team).  There will be two optional tutorials outside the class time, one to introduce Python programing, another to introduce PyTorch and Google Cloud Platform.

Grading:  Quizzes: 40%, Computer assignments: 30%,  Project: 30%. Project grade depends on project proposal (2%), midterm project report (3%), final report (5%), project presentation (10%), and technical accomplishment (10%). 

Attendance:  Students  are expected to attend all lectures and quizzes in-person.

Homework:  Written HW will be assigned after each lecture but not graded, and solutions will be provided.  Programming assignments will be due as posted. Each assignment counts for 10 points. Late submission of programming assignment will be accepted up to 3 days late, with 2 pt deduction for each day.  Students can work in teams, but you must submit you own solutions. Solutions to computer assignments will be posted 1 week after the due date. We will aim to complete the grading of each quiz and computer assignment within 1-2 weeks.

Quiz:  A quiz will be held biweekly. The total time for each quiz is 20 minutes. The quiz problems will be similar to the written HW problems and/or review questions in the lecture note. 

Project Guideline: Link

Suggested Project List: Link (Updated 9/16/2023)

Sample Data:  Sample Images Middelbury Stereo Image Database

Links to Resources (lecture notes) in Previous Offerings: 

Other Useful Links 

Tentative Course Schedule (lecture notes may be updated shortly before the lecture date)

 

Lecture Time Lecture content and notes Quiz and Assignment Due Date
Lec. 1
(9/6): 
Part 1: Course introduction. Lecture note (Updated 9/06/2023)
Part 2: Image Formation and Representation: 3D to 2D projection, photometric image formation, trichromatic color representation, video format (SD, HD, UHD, HDR). Lecture note (Updated 1/25/2022).
Part 3: Contrast enhancement (concept of histogram, nonlinear mapping, histogram equalization).  Lecture note (Updated 1/25/2022)
CA1 posted.
Tutorial 1 Tutorial on python (9/8, 10:00 AM-11:30 AM).
Materials (Updated 01/26/2021)
 
Lec. 2
(9/13)
Review of 1D Fourier transform and convolution.
Concept of spatial frequency. Continuous and Discrete Space 2D Fourier transform. Lecture note: “FT.pdf” (updated 09/12/2023)  
Quiz 1 (9/13): Covering lecture 1

CA2 posted.

Lec. 3
(9/20)
2D convolution and its interpretation in frequency domain. Implementation of 2D convolution. Separable filters. Frequency response.
Lecture note: “convolution.pdf” (updated 09/18/2023) 
Linear filtering (2D convolution) for noise removal, image sharpening, and edge detection. Gaussian filters, DOG and LOG filters as image gradient operators. 
Lecture note: “filtering_edge detection.pdf” (updated 09/18/2023) 
Due (9/22)  CA 1 : Learning Python and histogram equalization

Due (9/22): You should have formed a project team and selected a proj topic. Feel free to schedule a meeting with the instructor to  discuss your project ideas.

Lec. 4 (9/27) Image sampling and resizing. Antialiasing and interpolation filters. Spatial and temporal resolutions  of human visual systems.   Lecture note on ImageSampling (updated 09/30/23). Reference materials (updated 2/15/19):  Selesnick_MultirateSystemsSelesnick_SamplingTheorem Quiz 2 (9/27): covering lecture 2 and 3.

Due (9/29)  CA 2 : 2D filtering

Lec. 5
(10/4)
Image representation using orthonormal transform and dictionary. DCT and KLT; DCT-based image coding (JPEG).
Lecture note on transform coding (updated 10/01/2023).
Quiz 3 (10/4) Covering lecture 4.

Due (10/6) Submit the project proposal

Lec. 6
(10/11)
Multi-resolution representation: Pyramid and Wavelet Transforms.   
Wavelet-based image coding (JPEG2K). 
Lecture note on Wavelet (updated 10/10/2023). 
CA3 posted.
Lec. 7
(10/18)
Sparse-representation based image recovery. General formulation of image enhancement as an optimization problem. Sparsity for regularization. L0 vs. L1 vs. L2 prior. Optimization techniques for solving L2-L1 problems (soft thresholding, ISTA, ADMM). Applications in denoising, debluring, inpainting, compressive sensing, superresolution.
Lecture note (updated 2/27/2020)  Supplementary materials (updated 10/22/2023)
Quiz 4 (10/18) Covering lecture 5 and 6.
Lec. 8
(10/25)
Overview of machine learning, neural networks, convolutional networks. Convolutional Network for classification. Training and validation.
Lecture note on CNN (part 1) (updated 3/24/2023)
Due (10/27) CA 3: Pyramids and wavelet transforms
Tutorial 2
Tutorial on using PyTorch and Google Cloud Platform for deep learning (10/27, 10:00AM-11:30AM)
Materials (updated 3/25/2023)
CA4 posted.
Lec. 9
(11/1)
Convolutional Networks for Image Processing, including segmentation, denoising, object detection.
Lecture note on CNN (part2) (updated 3/30/2023)
 
Lec. 10
(11/8)
Feature detection (Harris corner, scale space, SIFT), feature descriptors (SIFT).  Learning based feature detection. Bag of Visual Word representation for image classification. 
Lecture note on Features  (updated 3/27/2022)
Quiz 5 (11/8) (covering lecture 8 and 9)

Due (11/10)
Submit the midterm project report

Lec. 11
(11/15)
Geometric mapping (affine, homography), Feature based camera motion estimation (RANSAC). Image warping. Image registration. Panoramic view stitching.  Video stabilization. Lecture note (updated 11/15/2023)  Due (11/17) CA 4:  Training a U-Net for image segmentation

CA5 posted.

Lec. 12
(11/29)
Dense motion/displacement estimation: optical flow equation, optical flow estimation (Lucas-Kanade method, KLT tracker); block matching, multi-resolution estimation. Deformable registration (medical applications). Deep learning approach for moion estimation. Lecture note. (updated 04/19/2023) Quiz 6 (11/29) (covering lecture 10 and 11) 
Lec. 13
(12/6)
Video Coding Part 1: block-based motion-compensated prediction and interpolation, adaptive spatial prediction, block-based hybrid video coding, rate-distortion optimized mode selection, rate control, Group of pictures (GoP) structure, the tradeoff between coding efficiency, delay, and complexity. Learning-based image and video compression. Lecture note (updated 4/26/2023)  Due (12/6) CA 5:
Stitching a panoramic picture. 
Lec. 14 (12/13) Stereo and multiview video: depth from disparity, disparity estimation, view synthesis. Multiview video compression. Depth camera (Kinect). 360 video camera and view stitching. Lecture note. (updated 13/12/2023);
Video Coding Part 2: Overview of video coding standards (AVC/H.264, HEVC/H.265); Layered video coding: general concept and H.264/SVC. Lecture note (updated 13/12/2023)
Quiz 7 (12/13): covering lecture 12, 13
Week 15
(12/20) 
Project Presentation Due (12/22) CA 6: Video Coding

Due (12/22)
Project Report and all other material must be uploaded

Sample Exams: 

Sample Images: 

Policy on Academic Integrity:  The School of Engineering encourages academic excellence in an environment that promotes honesty, integrity, and fairness. Please see the policy on academic dishonesty: Link to NYU Tandon Policy,  Link to NYU Policy.

Inclusion Statement: The NYU Tandon School values an inclusive and equitable environment for all our students. I hope to foster a sense of community in this class and consider it a place where individuals of all backgrounds, beliefs, ethnicities, national origins, gender identities, sexual orientations, religious and political affiliations, and abilities will be treated with respect.   It is my intent that all students’ learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit.  If this standard is not being upheld, please feel free to speak with me. Please visit this link for NYU Tandon’s effort in diversity and inclusion.

Moses Center Statement of Disability: If you are a student with a disability and would like to request accommodations, please contact New York University’s Moses Center for Students with Disabilities (CSD).  You must be registered with CSD to receive accommodations.  Information about the Moses Center can be found at www.nyu.edu/csd. The Moses Center is located at 726 Broadway on the 3rd floor.