Abstract
We present an algorithm that predicts musical genre and artist from an audio waveform. Our method uses the ensemble learner ADABOOST to select from a set of audio features that have been extracted from segmented audio and then aggregated. Our classifier proved to be the most effective method for genre classification at the recent MIREX 2005 international contests in music information extraction, and the second-best method for recognizing artists. This paper describes our method in detail, from feature extraction to song classification, and presents an evaluation of our method on three genre databases and two artist-recognition databases. Furthermore, we present evidence collected from a variety of popular features and classifiers that the technique of classifying features aggregated over segments of audio is better than classifying either entire songs or individual short-timescale features.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Ahrendt, P., & Meng, A. (2005). Music Genre Classification using the multivariate AR feature integration model. Extended Abstract. MIREX genre classification contest (www.music-ir.org/evaluation/mirex-results).
Aucouturier, J., & Pachet, F. (2002). Music Similarity Measures: Whats the Use?. In: Fingerhut, M. (ed.): Proceedings of the Third International Conference on Music Information Retrieval (ISMIR 2000).
Aucouturier, J., & Pachet, F. (2003). Representing musical genre: A state of the art. Journal of New Music Research 32(1), 1–12.
Bello, J., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. (2005). A Tutorial on Onset Detection in Music Signals. IEEE Transactions on Speech and Audio Processing.
Bergstra, J., Casagrande, N., & Eck, D. (2005a). Artist Recognition: A Timbre- and Rhythm-Based Multiresolution Approach. MIREX artist recognition contest.
Bergstra, J., Casagrande, N., & Eck, D. (2005b). Genre Classification: A Timbre- and Rhythm-Based Multiresolution Approach. MIREX genre classification contest.
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press.
Breiman, L. (1996). Bagging Predictors. Machine Learning 24(2), 123–140.
Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning 20(3), 273–297.
Crawford, T., & Sandler, M. (eds.) (2005). Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR 2005).
Eck, D., & Casagrande, N. (2005). Finding Meter in Music Using an Autocorrelation Phase Matrix and Shannon Entropy. In: Proc. 6th International Conference on Music Information Retrieval (ISMIR 2005).
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139.
Gold, B., & Morgan, N. (2000). Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley.
Junqua, J., & Haton, J. (1996). Robustness in Automatic Speech Recognition. Boston: Kluwer Academic.
Kedem, B. (1986). Spectral analysis and discrimination by zero-crossings. Proc. IEEE 74(11), 1477–1493.
Kunt, M. (1986). Digital Signal Processing. Artech House.
Lambrou, T., Kudumakis, P., Speller, R., Sandler, M., & Linney, A. (1998). Classification of audio signals using statistical features on time and wavelet tranform domains. In: Proc. Int. Conf. Acoustic, Speech, and Signal Processing (ICASSP-98), 6, 3621–3624.
Lippens, S., Martens, J., & De Mulder, T. (2004). A comparison of human and automatic musical genre classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 4, 233–236.
Li, T., Ogihara, M., & Li, Q. (2003). A comparative study on content-based music genre classification. In: SIGIR 03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. New York, NY, USA, (pp. 282–289) ACM Press.
Li, T., & Tzanetakis, G. (2003). Factors in automatic musical genre classification. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
Logan, B., & Salomon, A. (2001). A music similarity function based on signal analysis. In: 2001 IEEE International Conference on Multimedia and Expo (ICME’01). (p. 190).
Makhoul, J. (1975). Linear Prediction: A Tutorial Review. In: Proceedings of the IEEE, 63, 561–580.
Mandel, M. I., & Ellis, D. P. (2005a), Song-level features and support vector machines for music classification. In (Crawford and Sandler, 2005).
Mandel, M., & Ellis, D. (2005b). Song-level features and SVMs for music classification. Extended Abstract. MIREX 2005 genre classification contest (www.music-ir.org/evaluation/mirex-results).
Pampalk, E., Flexer, A., & Widmer, G. (2005). Improvements Of Audio-Based Music Similarity And Genre Classification. In (Crawford and Sandler, 2005).
Schapire, R. E., & Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. In: COLT 98: Proceedings of the eleventh annual conference on Computational learning theory. New York, NY, USA, (pp. 80–91) ACM Press.
Soltau, H. (1997). Erkennung von Musikstilen. Masters thesis, Universitat Karlsruhe.
Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10(5), 293–302.
Tzanetakis, G., Ermolinskyi, A., & Cook, P. (2002). Pitch histograms in audio and symbolic music information retrieval. In: Fingerhut, M. (ed.): Proceedings of the Third International Conference on Music Information Retrieval: ISMIR 2002, 31–38.
West, K., & Cox, S. (2004). Features and classifiers for the automatic classification of musical audio signals. In: Proc. 5th International Conference on Music Information Retrieval (ISMIR 2004).
West, K., & Cox, S. (2005). Finding an Optimal Segmentation for Audio Genre Classification. In (Crawford and Sandler, 2005).
Xu, C., Maddage, N.C., Shao, X., & Tian, Q. (2003). Musical Genre Classification Using Support Vector Machines. In: In International Conference of Acoustics, Speech & Signal Processing (ICASSP03).
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Gerhard Widmer
Rights and permissions
About this article
Cite this article
Bergstra, J., Casagrande, N., Erhan, D. et al. Aggregate features and ADABOOST for music classification. Mach Learn 65, 473–484 (2006). https://doi.org/10.1007/s10994-006-9019-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-9019-7