ECE6123: Image and Video Processing (Spring 2019)
Course Description:
This course introduces fundamentals of image and video processing, including color image capture and representation; contrast enhancement; spatial domain filtering; two-dimensional (2D) Fourier transform and frequency domain interpretation of linear convolution; image sampling and resizing; multi-resolution image representation using pyramid and wavelet transforms; feature point detection and global alignment between images based on feature correspondence; geometric transformation, image registration; video motion characterization and estimation; video stabilization and panoramic view generation; image and video segmentation; selected advanced image processing techniques; basic compression techniques and standards (JPEG image compression standard; wavelet transform and JPEG2000 standard; video compression using adaptive spatial and temporal prediction; video coding standards (MPEGx/H26x); Stereo and multi- view image and video processing (depth from disparity, disparity estimation, video synthesis, compression). Students will learn to implement selected algorithms in Python. A term project will be required.
Prerequisites:
Graduate status. ECE-GY 6113 and ECE-GY 6303 preferred but not required. Should have good backbround in linear algebra. Undergraduate students must have completed EE-UY 3054 Signals and systems and EE-UY 2233 Probability, and linear algebra.
Instructor:
Professor Yao Wang, MTC2 Room 9.122, (646)-997-3469, Email: yaowang at nyu dot edu. Homepage
Teaching Assistants:
Feng Wang (fw778 at nyu dot edu), Nitin Nair (nn1174 at nyu dot edu), Weixi Zhang (wz1219 at nyu dot edu).
Course Schedule:
Thursday 3.20 PM – 5.50 PM at 370 Jay St. 202
Office Hour:
Yao Wang: Mon 4-5 PM, Wed 4-5 PM or appointment by email.
Feng Wang: Tues 1-3PM, in 2 MetroTech Center, 9th floor, at the lounge area outside the NYU Wireless back Conference Room
Nitin Nair: Wed 1-2PM, outside MTC2 Room 9.122
Weixi Zhang: Thurs 11-12AM, outside MTC2 Room 9.122
Text Book/References:
- Richard Szeliski, Computer Vision: Algorithms and Applications. (Available online:”Link“) (Cover most of the material, except sparsity-based image processing and image and video coding)
- (Optional) Y. Wang, J. Ostermann, and Y.Q.Zhang, Video Processing and Communications. Prentice Hall, 2002. “Link” (Reference for image and video coding, motion estimation, and stereo)
- (Optional) R. C. Gonzalez and R. E. Woods, Digital Image Processing, Prentice Hall, (3rd Edition) 2008. ISBN number 9780131687288. “Link” (Good reference for basic image processing, wavelet transforms and image coding).
Grading Policy:
Exam: 40%, Final Project: 30%, Programming assignments: 20%, Written assignments: 10%.
Homework Policy:
Written HW will be assigned after each lecture and due at the beginning of the following lecture time. Programming assignment will be due as posted and will be submitted through NYUclasses. Each assignment counts for 10 points. Late submission of written assignment and programming assignment will be accepted up to 3 days late, with 2 pt deduction for each day. Students can work in teams, but you must submit you own solutions.
Project Guideline: Link
Suggested Project List: Link (Updated 2/18/2019)
Sample Data:
Sample Images
Middelbury Stereo Image Database
Links to Resources (lecture notes and sample exams) in Previous Offerings:
- EL 5123 Image Processing
- EL 6123 Video Processing
- EL 6123 Image and Video Processing (S16)
- EL 6123 Image and Video Processing (S18)
- The coursera image processing course by Prof. Katsaggelos: Link
- The image processing course at Stanford: Link
- The computer vision course at U. Washington: Link
Other Useful Links
- Basics of Python and Its Application to Image Processing Through OpenCV: Link
- Example codes and images used in the above guide: Link
- OpenCV: an open source package including many computer vision algorithms
- Numpy
- Scipy
- Matrix Reference Manual
- Codeacdemy : python
- Anaconda
Tentative Course Schedule
- Week 1 (1/31): Course introduction. Part 1: Image Formation and Representation: 3D to 2D projection, photometric image formation, trichromatic color representation, video format (SD, HD, UHD, HDR). Lecture note (Updated 1/27/2019). Part 2: Contrast enhancement (concept of histogram, nonlinear mapping, histogram equalization). Lecture note (Updated 1/27/2019)
- Tutorial on python (2/1 10:30 AM-12PM)
- Computer assignment 1 (Learning Python and histogram equalization) (Due 2/7)
- Week 2 (2/7): Review of 1D Fourier transform and convolution. Concept of spatial frequency. Continuous and Discrete Space 2D Fourier transform. 2D convolution and its interpretation in frequency domain. Implementation of 2D convolution. Separable filters. Frequency response. Linear filtering (2D convolution) for noise removal, image sharpening, and edge detection. Gaussian filters, DOG and LOG filters as image gradient operators. Lecture note (Updated 2/7/2019).
- Computer assignment 2 (2D filtering) (Due 2/21)
- Week 3 (2/14): Image sampling and resizing. Antialiasing and interpolation filters. Spatial and temporal resolutions of human visual systems. Lecture note on ImageSampling (updated 2/14/19). Reference materials (updated 2/15/19): Selesnick_MultirateSystems, Selesnick_SamplingTheorem
- Week 4 (2/21): Image representation using orthonormal transform. DCT and KLT; multi-resolution representation: Pyramid and Wavelet Transforms. Transform-based image coding. Lecture note on transform (updated 2/21/2019), Lecture note on Wavelet (updated 2/21/2019).
- Week 5 (2/28): Sparse-representation based image recovery. General formulation of image enhancement as an optimization problem. Sparsity for regularization. L0 vs. L1 vs. L2 prior. Optimization techniques for solving L2-L1 problems (soft thresholding, ISTA, ADMM). Applications in denoising, debluring, inpainting, compressive sensing, superresolution. Lecture note (updated 2/26/2019).
- Programming assignment 3 (Pyramids and wavelet transforms) (Due 3/18)
- Week 6 (3/7): Overview of deep convolutional networks. Applications for image denoising, super resolution, image segmentation, image classification, object detection and classification. Lecture note (updated 3/7/2019)
- Tutorial on using PyTorch and Google Cloud Platform for deep learning (3/11)
- Week 7(3/14): Convolutional Networks for Image Processing, part2, part3 (updated 3/13/2019)
- Programming assignment 4 (Training a U-Net for image segmentation) (Due 4/4)
- 3/18–3/22: Spring Recess
- Week 8 (3/28): Project proposal due (You should prepare the proposal following the format described in project guideline. You should have read a couple of reference papers and a detailed milestone chart and partition of project roles among the project team members).
- Week 8 (3/28): Feature detection (Harris corner, scale space, SIFT), feature descriptors (SIFT). Bag of Visual Word representation for image classification. Lecture note (updated 3/28/2019)
- Week 9 (4/4): Geometric mapping (affine, homography), Feature based camera motion estimation (RANSAC). Image warping. Image registration. Panoramic view stitching. Lecture note (updated 4/2/2019)
- Programming assignment 5 (Due 4/18): Stitching a panoramic picture (Feature detection, finding global mapping, warping, combining).
- Week 10 (4/11): Dense motion/displacement estimation: optical flow equation, optical flow estimation (Lucas-Kanade method, KLT tracker); block matching, multi-resolution estimation. Deformable registration (medical applications). Deep learning approach. Lecture note. (updated 4/11/2019)
- Week 11 (4/18): Moving object detection (background/foreground separation): Robust PCA (low rank + sparse decomposition). Global camera motion estimation from optical flows. Video stabilization. Video scene change detection. Lecture note. (updated 4/18/2019)
- Week 12 (4/25): Exam
- Week 13(5/2): Stereo and multiview video: depth from disparity, disparity estimation, view synthesis. Depth camera (Kinect). 360 video camera and view stitching. Lecture note.(updated 5/2/2019)
- Week 14 (5/9): Video Coding. Part 1: block-based motion compensated prediction and interpolation, adaptive spatial prediction, block-based hybrid video coding, rate-distortion optimized mode selection, rate control, Group of pictures (GoP) structure, tradeoff between coding efficiency, delay, and complexity.Lecture note.(updated 5/9/2019) Part 2: Overview of video coding standards (AVC/H.264, HEVC/H.265); Layered video coding: general concept and H.264/SVC. Multiview video compression. Lecture note.(updated 5/9/2019)
- Programming assignment 6 (Due 5/18): Video Coding
- Week 15 (5/16): Project Presentation.
- 5/18: Project Report and all other material must be uploaded.
Sample Exams:
- S15_midterm_w_solution
- S15 Final Exam solution
- S16_midterm solution
- S16 final exam solution
- S17 exam solution (updated 4/17/2019)
- S18 exam solution (updated 4/19/2019)
Sample Images:
Policy on Academic Dishonesty:
The School of Engineering encourages academic excellence in an environment that promotes honesty, integrity, and fairness. Please see the policy on academic dishonesty: Link