MISNA - A musical instrument segregation system from noisy audio with LPCC-S features and extreme learning

Mukherjee, Himadri; Obaidullah, Sk Md; Phadikar, Santanu; Roy, Kaushik

doi:10.1007/s11042-018-5993-6

MISNA - A musical instrument segregation system from noisy audio with LPCC-S features and extreme learning

Published: 26 April 2018

Volume 77, pages 27997–28022, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

MISNA - A musical instrument segregation system from noisy audio with LPCC-S features and extreme learning

Download PDF

Himadri Mukherjee¹,
Sk Md Obaidullah²,
Santanu Phadikar³ &
…
Kaushik Roy ORCID: orcid.org/0000-0002-3360-7576¹

198 Accesses
13 Citations
Explore all metrics

Abstract

Technology has developed a lot over the last decades and has made a profound impact in almost every field. The field of Music Information Retrieval (MIR) has not been an exception to this as well, one of its most promising applications being Automatic Music Transcription (AMT). It is important to identify the active regions of various Instruments in a piece before transcription and the challenge elevates even more when the audio clips are contaminated with noise. MISNA (Musical Instrument Segregation from Noisy Clips) is a system proposed towards the identification of isolated Instruments from noisy clips which can aid towards AMT in noisy environments. The system works using statistical features (LPCC-S) derived from raw Linear Predictive Cepstral Coefficient values on very short clips of lengths 1 and 2 seconds. The system has been tested for various SNR scenarios and highest accuracies of 98.63% and 97.42% for Individual Instruments and Instrument Family identification has been obtained with the aid of Extreme Learning based classifier for a highest of 2626 clips.

Segregation of Speech and Songs - A Precursor to Audio Interactive Applications

SMIL - A Musical Instrument Identification System

Automatic Classification of Carnatic Music Instruments Using MFCC and LPC

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The field of MIR has fascinated the research community for a long time and one of its most promising applications has come to us in the form of Automatic Music Transcription (AMT). AMT is the process of identification of the notes played by an Instrument from an audio clip. In a music piece, more than 1 Instrument is played at a time and not all the Instruments are played through the entire length of the piece. Thus it is essential to identify the active regions of the Instruments in a piece before transcription. The challenge of identifying such Instruments increases even more when a piece is accompanied by noise. It is important to be able to identify musical Instruments in isolation from noisy clips before identifying the same from a piece and MISNA is a system proposed towards such a task. The main contribution of our work includes the use of proposed lower dimensional features (LPCC-S) derived from standard LPCC values for minimizing computational overhead and overcoming uneven dimensionality issue, the use of Extreme Learning Machine based classification which is a faster version of standard neural network based classifier, experimentation with various levels and types of noisy environments and verification of the generalization capability of the proposed system for both Individual Instruments and Instrument families using clips of short durations.

2 Related works

Masood et al. [22] identified 5 different Instruments using MFCC and Timbral features with an accuracy of 89.17%. They used a neural network based classifier, which was trained using Conjugate gradient back propagation and Fletcher-Reeves updation technique. Patil et al. [25] classified 15 Instruments with an accuracy of 86.04% using a SVM and concept analysis based technique. Eronen et al. [7] used features based on temporal and spectral properties of sound to classify 30 orchestral Instruments from the bass, string and woodwind families and obtained an accuracy of 94% in identification of the correct Family. A system to identify 7 different Instruments was presented by Sturm et al. [30] using multiscale MFCC based features. A highest accuracy of 84.69% was obtained for the system using a SVM based classifier. Martin et al. [21] used a statistical pattern recognition based approach to identify 15 different orchestral Instruments using acoustic features related to physical properties of source excitation and resonance structure. Accuracies of 90% and 70% were obtained for Instrument Family and identification Individual Instrument identification using Gaussian models and Fisher multiple discriminant analysis. Takashi et al. [31] designed a system to identify 12 musical Instruments using zero crossing, pitch, brightness and spectral centroid based features. They obtained highest average accuracies of 82.1% and 56.2% for the University of Iowa musical Instrument database and RWC music databases using Random Forest and Linear Discriminant Analysis technique respectively. A system to classify 19 different musical Instruments was presented by Kitahara et al. [16] with the help of 18 dimensional features. The feature set was composed of F0 normalized covariance and mean which produced an accuracy of 79.73%. Benetos et al. [2] used various classification techniques to distinguish 20 Instruments with the help of MPEG-7 audio descriptors as well as zero crossing, spectrum flatness, MFCC, auto correlation, spectrum roll off frequency, specific loudness sensation and total loudness and produced accuracies as low as 88.7% and as high as 95.3%. Livshin et al. [19] presented a real time Instrument recognition technique from solos for 7 Instruments. Post 62 dimensional feature extraction, a dimension reduction technique using Gradual Descriptor Elimination was applied to reduce the computational overhead. Accuracies of 88.13% and 85.24% were obtained respectively for the non reduced and reduced sets with the aid of KNN classification and LDA transformed learning set. Kaminskyj and Czaszejko [15] classified 19 Instruments from 9 major and sub families. They extracted 6 features for each namely cepstral coefficients, multidimensional scaling analysis trajectories, constant transform frequency spectrum, RMS amplitude envelope, presence of vibrato and spectral centroid. They obtained a highest accuracy of 97% using KNN classification technique for Family Identification. Lita et al. [17] presented a smart sound sensor based system for the identification of 3 musical Instruments in real time and obtained an average accuracy of 98.33%. Kaminskyj et al. [14] distinguished 4 different Instruments from 4 different families by employing various mechanisms in the pre processing stage including short term RMS energy envelope computation, Principal Component Analysis and Ratio or Product transformations of the same. Artificial Neural network and nearest neighbour based classifiers were applied for the same and accuracies in the range of 93.8% - 100% were obtained. Yu et al. [34] differentiated 14 Instruments from 4 Chinese folk Instrument families and obtained a highest accuracy of 89% by combining perceptron based features along with Mel Scale Cepstral Coefficients. Liu et al. [18] designed a system for identification of 4 Instrument families for both Chinese and Western Instruments. They experimented with various classifiers and features for both Chinese and Western genres and concluded that Spectral Flatness Measure coupled with KNN classifier produced the best result in the case of Chinese Instruments and the same feature coupled with SVM or MFCC coupled with KNN produced the highest accuracy for Western Instruments. They obtained a difference of 28% in the accuracy between the best and worst classification scheme. Agostini et al. [1] presented a system for the identification of 30 musical Instruments from the McGill University Master samples database using spectral features. Various classification techniques encompassing k-Nearest Neighbour, Canonical Discriminant Analysis, Quadratic Discriminant Analysis (QDA) and SVM were applied out of which highest accuracies of 80.2%, 78.6% and 69.7% were obtained for 17, 20 and 27 instruments respectively using SVM with a RBF kernel. They further obtained accuracies of 81% and 92.2% for the 27 instruments family and pizzicato-sustained discriminations respectively using QDA. They also highlighted obtained accuracies of 89%, 94% and 96% using QDA for rock strings, woodwind and brass families respectively as well. Livshin et al. [20] presented algorithms for outlier or bad sample Detection to improve musical Instrument identification. Sliding window of 60 ms along with a 66% overlap were used for calculation of features which helped in successfully discarding 70.1% of the bad samples which generally degrade Instrument recognition performance. Fragoulis et al. [8] designed a system to recognize 2 different Instruments namely guitar and piano using tonal spectral content for clips of average length of 1.8 sec. An accuracy of 100% was obtained for 926 isolated piano notes and 612 similar guitar notes. Röver et al. [29] presented a Hough transformation based approach to identify musical Instruments. They used a hybrid of Linear Discriminant Analysis and Quadratic Discriminant Analysis known as Regularised Discriminant Analysis to identify 25 Instruments and obtained a lowest misclassification rate of 26.1%. Donnelly et al. [6] used different Bayesian Networks to classify 24 different orchestral Instruments. Bayesian networks along with conditional dependencies in the frequency and time domain produced accuracies of 98% and 97% for Individual Instrument and Instrument Family identification. Yu et al. [33] proposed an improved matching pursuit algorithm for the identification of musical Instruments. They extracted atomic parameters for Instruments from the algorithm and fed it to a SVM in order to differentiate 10 musical Instruments and obtained an accuracy of 87.44% in only 1/3^rd of the time as required by standard matching pursuit algorithm. Jadhav [12] obtained accuracies of 88%, 84% and 73.33% for 5, 10 and 15 different Instruments with the help of timbral audio descriptors and Binary Tree classifier. Accuracies of 90%, 77% and 75.33% were obtained for the same set using KNN classifier along with MFCC features.

3 Dataset development

One of the most important facets of any experiment is data collection. The database of our experiment was put together with the aid of synthesized tones of 7 different Instruments namely Flute, Grand Piano, Guitar, Saxophone, Harmonium, Violin and Santoor. The Instruments hailed from 3 families namely Wind (Flute and Saxophone), Keyboard (Grand Piano and Harmonium) and String (Violin, Nylon String Guitar and Santoor). Such Instruments were chosen to include both Indian as well as Western Instruments from the various families which are some of the most essential ingredients of melody. All the 22 natural notes in the scale of C from C2 to C5 were played 20 to 30 times for every Instrument in various playing styles including Fortississimo, Fortissimo, Mezzo forte, Forte, Marcato, Staccato, Legato, Pianissimo and Pianississimo. These clips were used to engender 2 datasets (D1 and D2) consisting of 2626 (1 second each) and 1311 (2 seconds each) clips respectively. The clips were stored in uncompressed .wav stereo format at a bitrate of 1411 kbps. The number of clips for both individual Instruments as well as for Instrument Families is presented in Table 1. Each of the presented datasets were used for both the recognition of Individual Instruments as well as Instrument families.

Table 1 Individual Instrument level and Instrument family level details of datasets D1 and D2 along with total (T) clips

MISNA - A musical instrument segregation system from noisy audio with LPCC-S features and extreme learning

Abstract

Similar content being viewed by others

Segregation of Speech and Songs - A Precursor to Audio Interactive Applications

SMIL - A Musical Instrument Identification System

Automatic Classification of Carnatic Music Instruments Using MFCC and LPC

Explore related subjects

1 Introduction

2 Related works

3 Dataset development

3.1 Instruments in the dataset

4 Proposed methodology

4.1 Pre-processing

4.1.1 Framing

4.2 Windowing

4.3 Feature extraction

4.4 LPCC-S generation

4.5 Classification with extreme learning machine (ELM)

Feature mapping

ELM learning:

4.6 Statistical significance test

5 Result and discussion

5.1 Individual instrument level

5.2 Instrument family level

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation