Onset detection and stroke recognition for percussion instruments in Carnatic music
In Carnatic music a number of different percussion instruments are used, namely, the mridangam, the ghatam, the kanjira, the morsing and the tavil. Mridangam is the main percussion instrument, while the others are secondary. Some of the percussion instruments are pitched, in that it is dependent upon the tonic, while others for example, kanjira are not pitched. In the work proposed in this paper, onset detection using group delay functions is performed. The group delay function is used to segment the solo percussion into strokes. The strokes correspond to the syllables/aksharas that are produced by the mridangam. The mridangam waveform is treated as an amplitude and angle modulated signal. The Hilbert envelope of the differenced waveform is first obtained, which is then followed by envelope tracking. The envelope is then subjected to group delay processing which results in syllable or akshara boundaries. A sequence of aksharas make up a matra, a sequence of matras make up a cycle. A composition in Carnatic music is made up of a finite number of cycles. The ultimate objective is to determine the start of a cycle (sama) using properties of the aksharas and to exploit the properties of a sequence of aksharas to segment a composition/tani avarthanam into different parts. The onset detection accuracy is as high as 95% and works very well when the tempo varies significantly. The performance of the onset detection algorithm is much better than our earlier work, and is comparable to the state-of-art machine learning approaches, although the proposed algorithm does not require any machine learning. The second part of the paper demonstrates an application of the proposed algorithm for stroke recognition. Mridangam taniavarthanam waveforms are first segmented using the onset detection algorithm and then subjected to iso-lated style hidden Markov model (HMM) based recognition system. The stroke classification accuracy increases significantly ≈ 20%, when compared to that of traditional flat-start embedded reestimation in the HMM framework.