CS-GY 9223 Selected Topics in Computer Science / EL-GY 9173 Selected Topics in Signal Processing: Audio Content Analysis
Meetings (Spring 2018): Wednesdays 12.25 – 2.55pm, Room 774, Jacobs Academic Building.
Office hours: Tuesdays 2-5pm, Room 626, 6th floor, 35 W 4th street
__________________________________________________________________________________________
Overview: The field of audio content analysis groups systems and techniques intended for the automatic analysis and understanding of sounds, in other words, the development of “listening machines”. It combines knowledge and methods from a variety of disciplines including signal processing, machine learning, psychoacoustics, cognition, and music. Example problems include determining the instrumentation, key or genre of a music track, classifying species in wildlife field recordings, or recognizing speech in noisy environments. This course will introduce students to a variety of techniques for the computer-based analysis of audio signals, with a focus on music and environmental sounds. In the process, the students will gain the ability to review, understand and implement computer audition approaches and explore their application to real-world problems.
Goals: Students will undergo advanced training in techniques for audio content analysis. They will read and understand the literature describing state-of-the-art methods and gain hands-on experience on the implementation and application of standard and advanced techniques by means of programming assignments and a final project. The knowledge that they will acquire, will be relevant for future careers in fields as diverse as multimedia analysis, processing and distribution, and the development of context-aware technologies for robotics, mobile applications, defense and surveillance, and hearing aids.
Pre-requisites: For CSE students: undergrad-level multivariate calculus and linear algebra; for ECE students: EL6113 Signals, Systems and Transforms.
__________________________________________________________________________________________
Calendar and Lecture notes:
Lecture notes will be added and/or updated (as pdf files) as the course progresses, sometimes just before the corresponding lecture. Dates are tentative and subject to change
- 01.24 Introduction / Time-Frequency Representations
- 01.31 Time-frequency representations (cont’d)
- 02.07 Novelty: onset detection
- 02.14 Novelty: onset detection (cont’d)
- 02.21 Low-level features: timbre analysis
- 02.28 Low-level features: timbre analysis (cont’d)
- 03.07 Periodicity: pitch detection
- 03.14 Spring Break
- 03.21 Periodicity: pitch and beat tracking (cont’d)
- 03.28 Mid-Term Exam
- 04.04 Pitch distribution: chroma features
- 04.11 Pitch distribution:chord and key recognition (cont’d)
- 04.18 Sound classification: genre, artist and instrument ID
- 04.25 Sound classification: genre, artist and instrument ID (cont’d)
- 05.02 Content/project review
- 05.09 Project presentation
The Instructor will provide individual guidance during office hours and by email.
Additionally, a tutor will be available to assist with class content and other practical issues:
Vincent Lostanlen (vl1019@nyu.edu), Fridays 10am-12pm, 13th floor, 370 Jay Street
____________________________________________________________________________________________
Assessment:
- Assignments: 40% (10% each) — see details below.
- Project: 30% — see details below
- Project Proposal: 5%
- Final project report + presentation: 25%
- Mid-term Exam: 30% (choose 3 out of 4 questions)
- Class Participation: extra points (discussions, questions, attendance, interest and enthusiasm)
____________________________________________________________________________________________
Assignments:
There are 4 assignments to be distributed during the semester. Please read carefully and follow these instructions:
-
All assignments consist of two parts: (A) Implementation and (B) Analysis.
-
All submissions consist of a single zip file of the folder named YourLastNameAssignment# (e.g. Bello1.zip or Bello3.zip).
-
The folder should include the code implementation, following the exact naming and I/O conventions of Part A of the assignment, and a PDF addressing the questions in Part B.
-
Submissions that fail to follow these conventions will not be accepted as valid.
-
Clearly indicate your name in the body of every file (at the top of your pdf document, and as a comment on your code).
-
All submissions must be uploaded to the course’s NYU Classes page before 11.55pm on the due date. A penalty of 0.5 pts will be applied for every hour of delay until assignments are no longer accepted at 7.55pm of the following day.
Keep checking this space regularly for assignments and due dates.
____________________________________________________________________________________________
Projects and presentations:
-
Projects are done in groups of 2 students each.
-
The project consists of proposing and implementing a solution to a selected problem in music, speech or environmental sound analysis (preferably in MATLAB or Python); writing a report discussing the specifics of the problem, the approach taken, and its results; and demonstrating your implementation and presenting the highlights of your work to the class.
-
Projects should go beyond the materials covered on the assignments, and include a combination of signal processing and machine learning components and challenges. They do not need to be new or original, i.e. can be a reimplementation of existing work, but all code and results most be your own.
- Students should select a group and topic early in the semester, and attend office hours to seek guidance and advise about topic selection, and the execution of the project.
Important dates:
04.06 Project Proposals (5%, 4 pages or less): this document should include a project title and clearly explain the proposal by introducing: context, problem, proposed algorithm(s), evaluation method (and data to be used), brief work-plan and at least 5 bibliographical references that are directly relevant. The document should also name the group members and briefly discuss how the work will be divided amongst them. Teams should identify an existing experimental data set for their project by the time of the proposal.
05.09 Project demonstration and final report (25%, both a written report of no more than 8 pages and source code should be submitted).
Instructions for Final Project Submission and Presentation (PLEASE READ CAREFULLY):
-
Presentations slots are strictly 10 minutes long: 7 minutes for the presentation and 3 minutes for questions. This time includes changeover time (the time it takes you to set up everything). You should bring the presentation on a laptop and try the projector and sound on the classroom beforehand. This is specially important for those of you who want to use any additional piece of equipment for the demonstration.
-
Attendance and participation on all presentations is mandatory (ask questions, make comments).
-
The final project report should not exceed a max. number of 8 pages, and should be written like a conference paper. Structure should be more or less as follows: introduction (including motivation), theoretical background, your approach and its implementation, experimental part (including a detailed explanation of the evaluation method and the data used), discussion, conclusions and future work.
-
Students are encouraged to develop their project using Matlab or Python, but are welcomed to use other languages as they see fit.
-
During the project due date you will be expected to demonstrate your software application.
____________________________________________________________________________________________
Recommended Books:
-
Virtanen, T., Plumbley, M., and Ellis, D. (Eds) “Computational Analysis of Sound Scenes and Events”. Springer (2018)
-
Lerch, A. “An Introduction to Audio Content Analysis”. John Wiley & Sons (2012)
- Müller, M. “Fundamentals of Music Processing: Audio, Analysis, Algorithms and Applications”. Springer (2015)
-
Klapuri, A. and Davy, M. (Eds.) “Signal Processing Methods for Music Transcription”. Springer (2006)
-
Smith, J.O. “Mathematics of the Discrete Fourier Transform (DFT)”. 2nd Edition, W3K Publishing (2007)
-
Witten, I. and Frank, E. “Data Mining: Practical Machine Learning Tools and Techniques”. Morgan Kaufmann (2005)
-
Further reading will be recommended as the course progresses (see last slide of every lecture).
____________________________________________________________________________________________
Tools:
-
Librosa: https://github.com/librosa/librosa
- scikit-learn: http://scikit-learn.org/stable/
-
Matlab documentation, tutorials, examples: http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.html
-
Signal Processing Toolbox documentation, tutorials, examples: http://www.mathworks.com/access/helpdesk/help/toolbox/signal/
-
MATLAB array manipulation tips and tricks by Peter J. Acklam
-
Data Mining Software: http://www.cs.waikato.ac.nz/ml/weka/
-
Sonic Visualizer: http://www.sonicvisualiser.org/
Research Resources:
-
-
IEEE: https://2018.ieeeicassp.org/, http://www.waspaa.com/ , http://www.asru2013.org/ , http://www.signalprocessingsociety.org/technical-committees/list/audio-tc/ , http://www.signalprocessingsociety.org/publications/periodicals/
-
ISCA: http://www.isca-speech.org/ , http://www.interspeech2013.org/, http://www.journals.elsevier.com/speech-communication
-
AES: http://www.aes.org/events/conventions/ , http://www.aes.org/events/conferences/ , http://www.aes.org/journal/
-
ASA: http://acousticalsociety.org/meetings , http://asadl.org/jasa/
-
EURASIP: http://www.eurasip.org/index.php , http://www.eusipco2013.org/
-
ISMIR: http://www.ismir.net/, http://www.ismir.net/all-papers.html
-