Author: vl1019 (Page 2 of 2)

“PCEN: Why and How” published in IEEE SPL

January 16, 2019 / vl1019

We are happy to announce that our article: “Per-channel energy normalization: Why and How” is featured in the latest issue of IEEE Signal Processing Letters.

In the context of automatic speech recognition and acoustic event detection, an adaptive procedure named per-channel energy normalization (PCEN) has recently shown to outperform the pointwise logarithm of mel-frequency spectrogram (logmelspec) as an acoustic frontend. This article investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints. First, we apply PCEN on various datasets of natural acoustic environments and find empirically that it Gaussianizes distributions of magnitudes while decorrelating frequency bands. Secondly, we describe the asymptotic regimes of each component in PCEN: temporal integration, gain control, and dynamic range compression. Thirdly, we give practical advice for adapting PCEN parameters to the temporal properties of the noise to be mitigated, the signal to be enhanced, and the choice of time-frequency representation. As it converts a large class of real-world soundscapes into additive white Gaussian noise (AWGN), PCEN is a computationally efficient frontend for robust detection and classification of acoustic events in heterogeneous environments.

Per-channel energy normalization: Why and How
V. Lostanlen, J. Salamon, M. Cartwright, B. McFee, A. Farnsworth, S. Kelling, and J. P. Bello
In IEEE Signal Processing Letters, vol. 26, no. 1, pp. 39-43, January 2019.
[PDF][IEEE][Companion website][BibTeX][Copyright]

Below is a plot from our paper comparing the application of log vs PCEN on a mel-spectrogram computed from an audio recording captured by a remote acoustic sensor for avian flight call detection (as part of our BirdVox project). In the top plot (log) we clearly see energy from undesired noise sources such as insects and a car, whereas in the bottom plot (PCEN) we see these confounding factors have been attenuated, while the flight calls we wish to detect (which appear as very short chirps) are kept.

Elizabeth Mendoza joins BirdVox

July 23, 2018 / vl1019

We are delighted to announce that Elizabeth Mendoza, a junior student from Forest Hills High School (New York City borough of Queens) is joining BirdVox for a one-month research internship under the mentorship of Dr. Vincent Lostanlen, as part of the ARISE program at the NYU Tandon School of Engineering. Below is her research proposal.

ARISE (Applied Research Innovations in Science and Engineering) is an intensive program for academically strong, current 10th and 11th grade New York City students with a demonstrated interest in science, technology, engineering and math. More information on ARISE can be found at this address.

Synthesizing training data for automatic detection and classification of bird songs

Annual variations in the migratory routes of passerines is among the predominant markers of ecological disruption at temperate latitudes. Yet, although it is well established that migratory birds face an ever-increasing number of threats — including habitat loss, invasive species, and collisions with buildings or vehicles — little is known about the respective factors of risk influencing the abundancy of a given species at a fire spatiotemporal resolution. In this context, the deployment of an acoustic sensor network of autonomous recording units (ARU) offers an interesting trade-off between a relatively low cost and a highly informative output. Yet, despite the growing interest for bioacoustic analysis in avian ecology, the scalability of ARU deployment is currently hampered by the shortage of human experts that are trained to pinpoint and identify bird vocalizations in continuous audio recordings. In this context, closing the discrepancy between the cost of hardware ($1k/year) and the cost of human labor ($1M/year) is crucial to achieving the long-term goal of enabling the deployment of an acoustic sensor network for bird migration monitoring at the continental scale. One way to reduce this annotation overhead human experts by software. As the past years have witnessed a relative democratization of high-performance computing (HPC), it has become possible to design more ambitious software architectures, and notably deep learning, for large-scale automated species classification of bird songs and calls. The main contribution of this ARISE internship is to address the lack of diversity in training data in the context of avian flight call detection in audio. To this aim, the intern will synthesize artificial sound recordings containing bird calls, alongside a computer-generated annotation. The release of these synthetic recordings to the international research community could enable the deployment of larger deep learning models while avoiding statistical overfitting, by virtue of a source of training data that is virtually infinite.

Interview of BirdVox for NYU Scienceline podcast

June 24, 2018 / vl1019

Brianna Abbott, a graduate student in the Science, Health, and Environmental Reporting Program at New York University, has interviewed Andrew Farnsworth and Vincent Lostanlen to discuss their research as part of the BirdVox project. Her podcast is published by Scienceline, the online media for scientific journalism of the Arthur J. Carter Institute.

It’s a bird! It’s a plane! No, wait, can you hear that? It actually is a bird. Keeping tabs on our feathered friends during migration is vital for conservation efforts, though dark skies and massive amounts of data make it tricky to do so. But individual species of birds talk with each other through flight calls, so we can listen in to determine exactly which species are flying overhead. And now, researchers are developing a machine learning system — dubbed BirdVox — that automatically picks out and identifies the different calls. In this podcast, creators of BirdVox lay out how they cut through the noise to get to the birds.
— Brianna Abbott, June 2018

http://scienceline.org/2018/06/monitoring-bird-communications-birdvox/

Kendra Oudyk joins BirdVox

June 18, 2018 / vl1019

We are delighted to announce that Kendra Oudyk from Jyväskylä University (Finland) is joining BirdVox for a research internship. She is working on developing new computational tools for understanding how humans imitate bird songs.

Below is her research proposal and biography.

What was that bird? Birdsong query-by-humming using asymmetric set inclusion of pitch-curve segments

The purpose of this project is to create a query-by-humming system for birdsong. Such a system would take a human imitation of birdsong as input, and output likely species classifications, as well as retrieved bird audio recordings that resemble the query.

This presents a unique methodological situation for two reasons:

many methods for birdsong classification may not be applicable because they rely on spectral features that may not be imitable by humans; and
alternatively, many methods for music query-by-humming may not be ideal because birdsong query by humming involves classifying a species rather than a particular song, and birdsong may vary both between and within individual birds of a species.

Therefore, this project will test a novel methodology for query-by-humming; the proposed method involves asymmetric set inclusion of query pitch-curve segments in the set of birdsong pitch-curve segments for each species in the system. This proof-of-concept research may have applications for creating a birdsong query-by-humming tool for everyday users, and additionally it may further our understanding of how humans imitate birdsong.

Kendra is in the second and final year of the Music, Mind, and Technology Masters Degree Program at the University of Jyväskylä in Finland, where she is also completing a minor in Cognitive Neuroscience. For her masters thesis, she used functional Magnetic Resonance Imaging to investigate how personality modulates brain responses to emotion in music, under the supervision of Dr.’s Iballa Burunat, Elvira Brattico, and Petri Toiviainen. She received funding from the European Commission to work on this project during the summer of 2017 at the Center for Music in the Brain in Aarhus University in Denmark. Previously, Kendra completed her undergraduate studies in Music Cognition as well as a Diploma in Music Performance (piano) from McMaster University in Canada. At McMaster, she received two Undergraduate Student Research Awards to investigate choral-conducting gestures using three-dimensional motion-capture technology, under the supervision of Dr.’s Steven Livinstone and Rachel Rensink-Hoff. Additionally, she has worked as a research assistant, teaching assistant, private piano teacher, and leader of wilderness camping trips. Kendra will begin doctoral studies in September at McGill University in the Integrated Program in Neuroscience’s Rotation Program.

New publication in ICASSP 2018: BirdVox-full-night

May 16, 2018 / vl1019

We have recently released Birdvox-full-night, a new challenging dataset for machine learning on bioacoustic data.

Details about the dataset and the models we benchmarked are provided in our ICASSP 2018 paper:

BirdVox-full-night: a dataset and benchmark for avian flight call detection
V. Lostanlen, J. Salamon, J. P. Bello, A. Farnsworth, and S. Kelling
In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, April 2018.
[PDF][Poster][Companion website][BibTeX][Copyright]

This article addresses the automatic detection of vocal, nocturnally migrating birds from a network of acoustic sensors. Thus far, owing to the lack of annotated continuous recordings, existing methods had been benchmarked in a binary classification setting (presence vs. absence). Instead, with the aim of comparing them in event detection, we release BirdVox-full-night, a dataset of 62 hours of audio comprising 35402 flight calls of nocturnally migrating birds, as recorded from 6 sensors. We find a large performance gap between energy based detection functions and data-driven machine listening. The best model is a deep convolutional neural network trained with data augmentation. We correlate recall with the density of flight calls over time and frequency and identify the main causes of false alarm.

You can download the dataset after filling in the form on the companion website of the paper: https://wp.nyu.edu/birdvox/birdvox-full-night/

“PCEN: Why and How” published in IEEE SPL

Elizabeth Mendoza joins BirdVox

Interview of BirdVox for NYU Scienceline podcast

Kendra Oudyk joins BirdVox

New publication in ICASSP 2018: BirdVox-full-night

News