Page 2 of 3

“PCEN: Why and How” published in IEEE SPL

January 16, 2019 / vl1019

We are happy to announce that our article: “Per-channel energy normalization: Why and How” is featured in the latest issue of IEEE Signal Processing Letters.

In the context of automatic speech recognition and acoustic event detection, an adaptive procedure named per-channel energy normalization (PCEN) has recently shown to outperform the pointwise logarithm of mel-frequency spectrogram (logmelspec) as an acoustic frontend. This article investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints. First, we apply PCEN on various datasets of natural acoustic environments and find empirically that it Gaussianizes distributions of magnitudes while decorrelating frequency bands. Secondly, we describe the asymptotic regimes of each component in PCEN: temporal integration, gain control, and dynamic range compression. Thirdly, we give practical advice for adapting PCEN parameters to the temporal properties of the noise to be mitigated, the signal to be enhanced, and the choice of time-frequency representation. As it converts a large class of real-world soundscapes into additive white Gaussian noise (AWGN), PCEN is a computationally efficient frontend for robust detection and classification of acoustic events in heterogeneous environments.

Per-channel energy normalization: Why and How
V. Lostanlen, J. Salamon, M. Cartwright, B. McFee, A. Farnsworth, S. Kelling, and J. P. Bello
In IEEE Signal Processing Letters, vol. 26, no. 1, pp. 39-43, January 2019.
[PDF][IEEE][Companion website][BibTeX][Copyright]

Below is a plot from our paper comparing the application of log vs PCEN on a mel-spectrogram computed from an audio recording captured by a remote acoustic sensor for avian flight call detection (as part of our BirdVox project). In the top plot (log) we clearly see energy from undesired noise sources such as insects and a car, whereas in the bottom plot (PCEN) we see these confounding factors have been attenuated, while the flight calls we wish to detect (which appear as very short chirps) are kept.

Elizabeth Mendoza joins BirdVox

July 23, 2018 / vl1019

We are delighted to announce that Elizabeth Mendoza, a junior student from Forest Hills High School (New York City borough of Queens) is joining BirdVox for a one-month research internship under the mentorship of Dr. Vincent Lostanlen, as part of the ARISE program at the NYU Tandon School of Engineering. Below is her research proposal.

ARISE (Applied Research Innovations in Science and Engineering) is an intensive program for academically strong, current 10th and 11th grade New York City students with a demonstrated interest in science, technology, engineering and math. More information on ARISE can be found at this address.

Synthesizing training data for automatic detection and classification of bird songs

Annual variations in the migratory routes of passerines is among the predominant markers of ecological disruption at temperate latitudes. Yet, although it is well established that migratory birds face an ever-increasing number of threats — including habitat loss, invasive species, and collisions with buildings or vehicles — little is known about the respective factors of risk influencing the abundancy of a given species at a fire spatiotemporal resolution. In this context, the deployment of an acoustic sensor network of autonomous recording units (ARU) offers an interesting trade-off between a relatively low cost and a highly informative output. Yet, despite the growing interest for bioacoustic analysis in avian ecology, the scalability of ARU deployment is currently hampered by the shortage of human experts that are trained to pinpoint and identify bird vocalizations in continuous audio recordings. In this context, closing the discrepancy between the cost of hardware ($1k/year) and the cost of human labor ($1M/year) is crucial to achieving the long-term goal of enabling the deployment of an acoustic sensor network for bird migration monitoring at the continental scale. One way to reduce this annotation overhead human experts by software. As the past years have witnessed a relative democratization of high-performance computing (HPC), it has become possible to design more ambitious software architectures, and notably deep learning, for large-scale automated species classification of bird songs and calls. The main contribution of this ARISE internship is to address the lack of diversity in training data in the context of avian flight call detection in audio. To this aim, the intern will synthesize artificial sound recordings containing bird calls, alongside a computer-generated annotation. The release of these synthetic recordings to the international research community could enable the deployment of larger deep learning models while avoiding statistical overfitting, by virtue of a source of training data that is virtually infinite.

Interview of BirdVox for NYU Scienceline podcast

June 24, 2018 / vl1019

Brianna Abbott, a graduate student in the Science, Health, and Environmental Reporting Program at New York University, has interviewed Andrew Farnsworth and Vincent Lostanlen to discuss their research as part of the BirdVox project. Her podcast is published by Scienceline, the online media for scientific journalism of the Arthur J. Carter Institute.

It’s a bird! It’s a plane! No, wait, can you hear that? It actually is a bird. Keeping tabs on our feathered friends during migration is vital for conservation efforts, though dark skies and massive amounts of data make it tricky to do so. But individual species of birds talk with each other through flight calls, so we can listen in to determine exactly which species are flying overhead. And now, researchers are developing a machine learning system — dubbed BirdVox — that automatically picks out and identifies the different calls. In this podcast, creators of BirdVox lay out how they cut through the noise to get to the birds.
— Brianna Abbott, June 2018

http://scienceline.org/2018/06/monitoring-bird-communications-birdvox/

Kendra Oudyk joins BirdVox

June 18, 2018 / vl1019

We are delighted to announce that Kendra Oudyk from Jyväskylä University (Finland) is joining BirdVox for a research internship. She is working on developing new computational tools for understanding how humans imitate bird songs.

Below is her research proposal and biography.

What was that bird? Birdsong query-by-humming using asymmetric set inclusion of pitch-curve segments

The purpose of this project is to create a query-by-humming system for birdsong. Such a system would take a human imitation of birdsong as input, and output likely species classifications, as well as retrieved bird audio recordings that resemble the query.

This presents a unique methodological situation for two reasons:

many methods for birdsong classification may not be applicable because they rely on spectral features that may not be imitable by humans; and
alternatively, many methods for music query-by-humming may not be ideal because birdsong query by humming involves classifying a species rather than a particular song, and birdsong may vary both between and within individual birds of a species.

Therefore, this project will test a novel methodology for query-by-humming; the proposed method involves asymmetric set inclusion of query pitch-curve segments in the set of birdsong pitch-curve segments for each species in the system. This proof-of-concept research may have applications for creating a birdsong query-by-humming tool for everyday users, and additionally it may further our understanding of how humans imitate birdsong.

Kendra is in the second and final year of the Music, Mind, and Technology Masters Degree Program at the University of Jyväskylä in Finland, where she is also completing a minor in Cognitive Neuroscience. For her masters thesis, she used functional Magnetic Resonance Imaging to investigate how personality modulates brain responses to emotion in music, under the supervision of Dr.’s Iballa Burunat, Elvira Brattico, and Petri Toiviainen. She received funding from the European Commission to work on this project during the summer of 2017 at the Center for Music in the Brain in Aarhus University in Denmark. Previously, Kendra completed her undergraduate studies in Music Cognition as well as a Diploma in Music Performance (piano) from McMaster University in Canada. At McMaster, she received two Undergraduate Student Research Awards to investigate choral-conducting gestures using three-dimensional motion-capture technology, under the supervision of Dr.’s Steven Livinstone and Rachel Rensink-Hoff. Additionally, she has worked as a research assistant, teaching assistant, private piano teacher, and leader of wilderness camping trips. Kendra will begin doctoral studies in September at McGill University in the Integrated Program in Neuroscience’s Rotation Program.

New publication in ICASSP 2018: BirdVox-full-night

May 16, 2018 / vl1019

We have recently released Birdvox-full-night, a new challenging dataset for machine learning on bioacoustic data.

Details about the dataset and the models we benchmarked are provided in our ICASSP 2018 paper:

BirdVox-full-night: a dataset and benchmark for avian flight call detection
V. Lostanlen, J. Salamon, J. P. Bello, A. Farnsworth, and S. Kelling
In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, April 2018.
[PDF][Poster][Companion website][BibTeX][Copyright]

This article addresses the automatic detection of vocal, nocturnally migrating birds from a network of acoustic sensors. Thus far, owing to the lack of annotated continuous recordings, existing methods had been benchmarked in a binary classification setting (presence vs. absence). Instead, with the aim of comparing them in event detection, we release BirdVox-full-night, a dataset of 62 hours of audio comprising 35402 flight calls of nocturnally migrating birds, as recorded from 6 sensors. We find a large performance gap between energy based detection functions and data-driven machine listening. The best model is a deep convolutional neural network trained with data augmentation. We correlate recall with the density of flight calls over time and frequency and identify the main causes of false alarm.

You can download the dataset after filling in the form on the companion website of the paper: https://wp.nyu.edu/birdvox/birdvox-full-night/

New publication in ICASSP 2017: Fusing Shallow and Deep Learning

December 16, 2016 / Justin Salamon

Following on the heels of the PLOS ONE article, the second BirdVox publication will be presented at the ICASSP 2017 conference:

Fusing Shallow and Deep Learning for Bioacoustic Bird Species Classification
J. Salamon, J. P. Bello, A. Farnsworth and S. Kelling
In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, March 2017.
[PDF][Copyright]

Abstract:

Automated classification of organisms to species based on their vocalizations would contribute tremendously to abilities to monitor biodiversity, with a wide range of applications in the field of ecology. In particular, automated classification of migrating birds’ flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we explore state-of-the-art classification techniques for large-vocabulary bird species classification from flight calls. In particular, we contrast a “shallow learning” approach based on unsupervised dictionary learning with a deep convolutional neural network combined with data augmentation. We show that the two models perform comparably on a dataset of 5428 flight calls spanning 43 different species, with both significantly outperforming an MFCC baseline. Finally, we show that by combining the models using a simple late-fusion approach we can further improve the results, obtaining a state-of-the-art classification accuracy of 0.96.

New publication in PLOS ONE

November 23, 2016 / Justin Salamon

The first study to come out of the BirdVox project has just been published in PLOS ONE:

Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring
J. Salamon , J. P. Bello, A. Farnsworth, M. Robbins, S. Keen, H. Klinck and Steve Kelling
PLOS ONE 11(11): e0166866, 2016. doi: 10.1371/journal.pone.0166866.
[PLOS ONE][PDF][BibTeX]

Abstract:

Automatic classification of animal vocalizations has great potential to enhance the monitoring of species movements and behaviors. This is particularly true for monitoring nocturnal bird migration, where automated classification of migrants’ flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we investigate the automatic classification of bird species from flight calls, and in particular the relationship between two different problem formulations commonly found in the literature: classifying a short clip containing one of a fixed set of known species (N-class problem) and the continuous monitoring problem, the latter of which is relevant to migration monitoring. We implemented a state-of-the-art audio classification model based on unsupervised feature learning and evaluated it on three novel datasets, one for studying the N-class problem including over 5000 flight calls from 43 different species, and two realistic datasets for studying the monitoring scenario comprising hundreds of thousands of audio clips that were compiled by means of remote acoustic sensors deployed in the field during two migration seasons. We show that the model achieves high accuracy when classifying a clip to one of N known species, even for a large number of species. In contrast, the model does not perform as well in the continuous monitoring case. Through a detailed error analysis (that included full expert review of false positives and negatives) we show the model is confounded by varying background noise conditions and previously unseen vocalizations. We also show that the model needs to be parameterized and benchmarked differently for the continuous monitoring scenario. Finally, we show that despite the reduced performance, given the right conditions the model can still characterize the migration pattern of a specific species. The paper concludes with directions for future research.

BirdVox awarded grant from the National Science Foundation (NSF)

November 7, 2016 / Justin Salamon

BirdVox has been awarded a $1.5 million Big Data program grant, awarded to the project, BirdVox: Automatic Bird Species Identification from Flight Calls, conducted jointly by NYU and the Cornell Lab of Ornithology (CLO), who lead the project.

Further information is provided in the NYU press release.

Collecting reliable, real-time data on the migratory patterns of birds can help foster more effective conservation practices, and – when correlated with other data – provide insight into important environmental phenomena. Scientists at CLO currently rely on information from weather surveillance radar, as well as reporting data from over 400,000 active birdwatchers, one of the largest and longest-standing citizen science networks in existence. However, there are important gaps in this information since radar imaging cannot differentiate between species, and most birds migrate at night, unobserved by citizen scientists. The combination of acoustic sensing and machine listening in this project addresses these shortcomings, providing valuable species-specific data that can help biologists complete the bird migration puzzle.

BirdVox is hiring!

October 14, 2016 / Justin Salamon

The Music Technology program of New York University is accepting applications for at least 4 fully-funded PhD fellowships to start in Fall 2017. Fellowships are for 4 years including full tuition remission, health insurance and a yearly stipend. Accepted candidates will join the Music and Audio Research Laboratory (MARL), a multidisciplinary team of scholars and practitioners working at the intersection of sound, music, science and technology, and will work on a variety of projects including recently-funded initiatives such as the NYU Holodeck, SONYC and BirdVox.

For further details please see the call for applications.

Juan Pablo Bello talks BirdVox on Science Friday

June 25, 2016 / Justin Salamon

On Friday June 24th the popular Science Friday radio show featured a segment about the BirdVox project. The segment included a live interview with Juan Pablo Bello, as well as sound bites from Andrew Farnsworth and Justin Salamon.

You can listen to the segment here.

UPDATE: PRI has published a follow-up article about BirdVox.

From right to left: Science Friday director Charles Bergquist and BirdVox researchers Juan Pablo Bello, Andrew Farnsworth and Justin Salamon.

News