In May 2020, NYU student Sripathi Sridhar presented a new paper by the BirdVox team to the attendees of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). We reproduce the abstract of the paper below.  

Learning the helix topology of musical pitch.

Vincent Lostanlen, Sripathi Sridhar, Andrew Farnsworth, Juan Pablo Bello.

To explain the consonance of octaves, music psychologists represent pitch as a helix where azimuth and axial coordinate correspond to pitch class and pitch height respectively. This article addresses the problem of discovering this helical structure from unlabeled audio data. We measure Pearson correlations in the constant-Q transform (CQT) domain to build a K-nearest neighbor graph between frequency subbands. Then, we run the Isomap manifold learning algorithm to represent this graph in a three-dimensional space in which straight lines approximate graph geodesics. Experiments on isolated musical notes demonstrate that the resulting manifold resembles a helix which makes a full turn at every octave. A circular shape is also found in English speech, but not in urban noise. We discuss the impact of various design choices on the visualization: instrumentarium, loudness mapping function, and number of neighbors K.

 

We have uploaded the video of Sripathi’s presentation to YouTube: 

 

 

The preprint of the ICASSP paper can be found at:
https://arxiv.org/abs/1910.10246

The TinySOL and SONYC-UST dataset can be downloaded on Zenodo:
https://zenodo.org/record/3685367

https://zenodo.org/record/3873076

The source code to reproduce the figures of the paper can be cloned from GitHub: https://github.com/BirdVox/lostanlen2020icassp