We are happy to announce that our article: “Long-distance detection of bioacoustic events with per-channel energy normalization” is featured in the proceedings of the DCASE 2019 workshop. This paper is a collaboration between the BirdVox project; Kaitlin Palmer from San Diego State University; Elly Knight from the University of Alberta; Christopher Clark and Holger Klinck from the Cornell Lab of Ornithology; NYU ARISE student Tina Wong; and Jason Cramer from New York University.

This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalizes logarithm-based spectral flux, yet with a tunable time scale for background noise estimation. In comparison with pointwise logarithm, PCEN reduces false alarm rate by 50x in the near field and 5x in the far field, both on avian and marine bioacoustic datasets. Such improvements come at moderate computational cost and require no human intervention, thus heralding a promising future for PCEN in bioacoustics.

 

Long-distance detection of bioacoustic events with per-channel energy normalization
V. Lostanlen, K. Palmer, E. Knight, C. Clark, H. Klinck, A. Farnsworth, T. Wong, J. Cramer, J.P. Bello
In Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE), 2019.
[PDF][Companion website]

 

@inproceedings{lostanlen2019dcase,
    author = "Lostanlen, Vincent and Palmer, Kaitlin and Knight, Elly and Clark, Christopher and Klinck, Holger and Farnsworth, Andrew and Wong, Tina and Cramer, Jason and Bello, Juan",
    title = "Long-distance Detection of Bioacoustic Events with Per-channel Energy Normalization",
    booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)",
    address = "New York University, NY, USA",
    month = "October",
    year = "2019",
    pages = "144--148",
}

 

 

The figure below displays the mel-frequency spectrogram of one Common Nighthawk call at various distances, after processing them with either pointwise logarithm (left) or PCEN (right). Atmospheric absorption is particularly noticeable above 200 meters, especially in the highest frequencies. Furthermore, we observe that max-pooled spectral flux is numerically unstable, because it triggers at different time-frequency bins from one sensor to the next. In comparison, PCEN is more consistent in reaching maximal magnitude at the onset of the call, and at the same frequency band.

Effect of pointwise logarithm (left) and per-channel energy normalization (right) on the same Common Nighthawk vocalization.

Effect of pointwise logarithm (left) and per-channel energy normalization (right) on the same Common Nighthawk vocalization, as recorded from various distances. White dots depict the time-frequency locations of maximal spectral flux (left) or maximal PCEN magnitude (right). The spectrogram covers a duration of 700 ms and a frequency range between 2 and 10 kHz.