Keywords

1 Introduction

Music is often referred to as language of emotions [32]. Generally music is classified into emotional categories as it evokes different emotions in the listener. Mood identification from music is challenging compared to domains like movie reviews and e-mails as music emotions are highly subjective. The wide availability and easy accessibility of online music libraries over the past few years has made Music Information Retrieval (MIR) researchers amplify the development and maintenance of automated MIR systems. The Music Information Research Evaluation eXchange (MIREX) is a community based framework for formally evaluating Music-IR systems and algorithms [7] since 2004 and has included music mood classification using audio as a task in 2007 [8]. The main modalities considered in the literature for music mood classification are audio, lyrics and combination of both. Very few works have focused on using meta data like social tags and user reviews [4, 15].

The relevance of music dimension depends on music style like audio for dance music and lyrics for poetic music [26]. Initial systems for music mood classification were audio based [17, 23, 24]. Later bi-modal research (combining audio and lyrics) gained importance and has proved increased accuracy [13, 33, 44, 47]. Recently, there is an increasing importance to Lyric based MIR as they exclusively express semantic information of part of a song or the whole and plays a key role in determining the mood of the song when considered from the perspective of reader and listener [26].

Generally, any music mood classification system follows the architecture shown in Fig. 1.

Fig. 1.
figure 1

General architecture of a music mood categorization system

The remainder of the paper is organized as follows. Section 2 briefly discusses various Data collection and standardization techniques. Different mood Taxonomies are described in Sect. 3. Section 4 discusses various mood categorization techniques implemented and Sect. 5 describes Frameworks developed. Section 6 presents open challenges and Sect. 7 defines conclusion.

2 Data Collection and Standardization

Choosing an appropriate data set that fits the chosen mood categories is important for any mood classification task and the data standardization helps in maintaining the quality of the data which is crucial for any classification task. One of the challenges being faced by MIR research community is the development of standard data sets annotated for mood. Till date gold standard mood annotated data sets are available for AMC task at MIREX (only available to its participants) and no such data is available for Indian languages [30]. Because of huge availability of online music sources and lack of availability of standard data set, researchers have reported their work by developing their own data sets.

Ground truth data can be collected through one of the following means.

  1. (1)

    Employing human annotators.

  2. (2)

    Crawling websites.

  3. (3)

    Creating and playing annotation games.

A dataset comprising of 4578 English song lyrics was developed [44] where every song has an associated social tag from last.fm. To filter out noisy data of social tags, WordNet-Affect is used to assign labels to moods. A total of 18 mood categories was identified and experimented. A large scale data set of 5296 songs comprising of both audio and lyrics for every song, representing 18 mood categories for which social tags are available on last.fm is has been developed [13] and the same data set has been used by the authors for further experiments reported in [43]. [26] used a manually annotated data set of 180 song lyrics. A survey has been conducted [41] to collect CAL500 data set of music annotated using a vocabulary of 174 words and songs of 500 unique artists.

In the context of Indian languages [30] has developed a mood annotated data set for audio and lyrics for both Hindi and Western songs by collecting songs from music CDs and lyrics by crawling the web. They annotated the data with the help of human annotators belonging to various age groups. [10] has created their own data set of 300 Telugu songs and lyrics collected from YouTube.

All the above discussed works have manually annotated the data which involve huge man labour and at the same time expensive and time consuming task. To overcome the issues researchers have contributed in developing online games to collect ground truth data [27]. [42] has designed A Listen Game, which is an online multi player game to associate semantic labels to music pieces. An audio based two player game where human plays with a simulated player as its partner is implemented to annotate sounds and music [21]. An arousal-valence based two player online annotation game called MoodSwings is developed by [16] which records dynamic labels of music mood. The game consists of five rounds with scores calculated at each round based on the overlapping of cursor positions of both the players which shows maximum mood agreement with the players.

To the best of the knowledge of the authors no such gaming strategy has been developed for Indian language music mood annotations. However [6] has proposed an interactive game called “Dr. Sentiment” to create and validate SentiwordNet(s) for three Indian languages: Bengali, Hindi and Telugu with the help of internet population and is helpful for regular sentiment analysis tasks for Indian languages.

The authors feel that in spite of the data sets collected using above approaches that are claimed to be of highly accurate, there still lies a compromise with the quality as its contributors belong to varied communities with different psychological moods.

3 Mood Taxonomy

Taxonomy assigns descriptor labels for multiple levels of music content descriptors ranging from low-level acoustical features to high-level structural descriptions and acts as a bridge between system development and user interfaces [28]. The primary focus of any music mood classification system is the design and the selection of appropriate mood model. Literature supports two types of mood representations. They are Categorical and Dimensional. Apart from the above two, researchers have reported their work on social tagging. However, there is no universally accepted model that describes music mood [44].

3.1 Categorical Mood Representation

This approach denotes mood as a set of categories represented using a list of distinct adjectives or tags according to their relevance to a music piece. The study conducted by Hevner in 1936 was one of the earliest of this type. She proposed taxonomy with 8 related clusters comprising of a total of 66 adjectives [11] arranged in a circular fashion where adjacent clusters are dissimilar by a small varying factor. The adjectives in the same cluster are close in meaning and those of opposite clusters are differed by a larger factor. Hevner’s mood model is shown in Fig. 2.

Fig. 2.
figure 2

Hevner’s mood model

Categorizing moods into 66 various categories is difficult especially for automatic systems. Some of the adjectives used in Hevner’s circle may not be used to define present day moods as language evolves with generations [12]. The data set collected using any of the approaches discussed in Sect. 2 might not also be annotated for all the categories defined by the model.

It is also supposed that taxonomy with reduced mood categories may help to achieve better performance for automatic systems. A five cluster categorical mood taxonomy as shown in Table 1 was proposed by MIREX community for Audio Mood Classification (AMC) task.

Table 1. MIREX mood taxonomy

The categorization of Indian art, drama and music is generally based on Navarasa (meaning nine rasas), a word derived from Sanskrit by Indian musicologist, Bharata [19]. This model serves as the base for Indian mood categorization tasks. Figure 3 shows Navarasa mood model. In terms of Indian music, rasa is a combination of few emotional states. However, the number of rasas is a subject of debate [18].

Fig. 3.
figure 3

Navarasa mood model

3.2 Dimensional Mood Representation

This approach categorizes mood with respect to specific number of dimensions or axes that represents human emotions. This denotes a mood as a point on a dimensional space of emotions.

Two well known models of this mood taxonomy are Russell’s circumplex model and Thayer model. Russell’s model positions mood adjectives on a two dimensional plane with horizontal axis indicating valence and vertical axis indicating arousal [35]. He stated that moods are not independent or unique but are connected to each other in an orderly manner. He proposed a taxonomy with 8 related groups consisting of a total of 28 affect words arranged meaningfully in a circular fashion along the circumference of the circle in the two dimensional space. Figure 4 shows the Russell’s mood taxonomy.

Fig. 4.
figure 4

Russell’s mood model

Thayer’s model is based on two dimensions namely energy along vertical axis and stress along horizontal axis [38] and it is a variant of Russell’s model. This model arranges moods into four clusters, namely Contentment, Depression, Exuberance and Anxious. Figure 5 shows Thayer’s taxonomy. These adjectives of Thayer’s model can be mapped to the unique quadrants of Russell’s model [9].

Fig. 5.
figure 5

Thayer’s mood model

Most of the above discussed models were criticized for lack of social context of music listening as they were laboratory based [14].

However, to avoid the confusion with large number of mood categories [30] proposed an extended mood taxonomy consisting of five mood classes each with three subclasses by grouping closely related mood adjectives of Russell’s circumplex model. [5] has derived a folksonomy representation consisting of four clusters by applying an unsupervised clustering method. With respect to Indian music [18] considered a taxonomy that consisted of ten rasas.

3.3 Social Tagging

Tag is a phrase or a label assigned to an item by a non-expert and contains relevant information. Social tagging of music helps to create better ground truth [5, 22] and is a good source of human generated contextual knowledge [20]. Till date, no work related to social tagging is reported in the context of Indian languages [30]. Since this approach involves laymen, the quality of data collected may be compromised.

4 Mood Categorization

Works on mood categorization is investigated in literature based on three modalities. They are audio, lyrics and multimodal (audio + lyric). Some researchers have also considered meta data for mood classification. The concentration of early music mood classification tasks were purely audio based. [24] proposed a framework to track the moods across four principal V-A quadrants by dividing the music piece into independent segments. Further the authors have extended the mood detection approach to mood tracking since mood changes during the entire duration of a musical piece. [31] proposed an unsupervised classifier to classify Hindi music by mood by considering a manually annotated dataset of 250 Hindi music clips and reported an accuracy of 48%.

Since using only spectral features lacked recognizing several of high level music features the music research community started combining audio and lyrics and showed improved performance. [13] and [47] are considered as one of the earliest works of this kind. Later the work reported by [34] has applied several natural language processing techniques to extract features and performed a bi-modal analysis on a dataset of 764 samples by combining the best audio and lyric features and attained an F-measure of 63.9. They run SVM, k-NN, C4.5 and NB algorithms. With concern to lyrical features, bag-of-words gave better results reinforcing the importance of content-based features. A fine grained classification of emotions has been addressed using a novel corpus of 100 songs for both music and lyrics annotated at line level for emotions [33]. They considered six Ekman’s emotions and repeated three sets of experiments using linear regression with textual features, musical features and combining both. This work is considered as first of its kind for mood classification at line level and is observed that bi-modal classification showed improved classification results.

In the Indian language context [30] has reported a study on multimodal mood classification of Hindi and Western songs using LibSVM and Feed-forward Neural Network and reported FFNNs as the best performing system with F-measures of 0.751 for Hindi and 0.835 for Western songs. A correlation based supervised feature selection technique is used to identify the important audio and lyric features. [10] has devised an approach to combine both audio and text features of 100 Telugu songs. Audio features are extracted from beginning, ending as well from the whole song and lyrical features are extracted from BOW and reported that considering beginning gave better results than whole or the end of the song. They have run SVM, NB and GMM algorithms to classify mood.

Lyric based Music Emotion Recognition has gained momentum since last decade. In spite of lyric of a song playing a prominent role in determining the mood of a song it is considered challenging as lyrics are much abstract and smaller in size than text contents like reviews. An unsupervised fuzzy clustering technique is proposed by [45] for detecting emotions from 500 Chinese song lyrics using an affective Lexicon called Affective Norm of English Words(ANEW) that works effectively in small devices. [1] has performed automatic classification of mood from Chinese lyrics using Naive Bayes approach and reported a final accuracy of 68%.

A novel Psychological emotional model using a training set of randomly chosen 1032 songs is developed [46] that covered 23 specific mood categories. This approach explored 182 Psychological features of each song lyric. The challenges behind lyric based mood classification are examined in [2]. They employed NLP techniques to identify the polarity of a song and concluded that a corpus-specific lexicon helps in improving the accuracy than using a generic subjectivity lexicon. The work addressed by [44] reported that lyric features performed well compared to audio features when mood categories are semantically bound with the lyrics. They also reported that combined features improve performance for more of the categories but not all of them. A later study of the same authors on feature analysis [43] has proved that certain lyric features has outperformed audio features in seven mood categories out of 18 mood categories used in the study comprising of a dataset of 5296 songs. While this study showed that lyric based mood classification works well, every single lyric feature underperformed audio features on negative valence and negative arousal quadrant.

Apart from the above listed modalities, some studies have concentrated on exploring usefulness of metadata in music mood classification. User-generated interpretations of lyrics collected from songmeaning.com to develop a system that classifies music subject automatically [15]. They run four classifiers linear SVM, RBF SVM, NB and k-NN on a dataset of 800 songs for 8 categories and reported that user-generated interpretations outperformed lyrics. They also reported that interpretation terms are more semantically related to subject categories than lyrics.

A different approach compared to all the above is the work presented in [37]. They made an attempt to predict the decade to which a song belongs to using lyric-based features and observed a general change of lyrics that make sense over time.

5 Various Frameworks

Another view point of the music research community which helps to browse music collections efficiently is the development of applications and interfaces that allows users to listen and retrieve music. A technique to build a search engine for a large collection of music by responding to a natural language query is proposed [17]. Relevant web pages of each song in the dataset are retrieved and are represented as term vectors to index the contents for retrieval tasks. A web service called Lyric Jumper to explore music is developed by [40] that allows user to look into lyrics based on the topic of the lyrics and [36] has proposed a lyric retrieval system called LyricsRadar that analyses the lyrics topics using a text analysis method called latent Dirichlet allocation (LDA). The system automatically generates a two dimensional space using LDA which analyses the common topics of the lyrics that appear in several music pieces. A music playback interface called LyricListPlayer has been developed which lets user to view lyrics while listening and also view the word sequences of other songs similar to currently playing song lyrics using local lyrics similarity [29].

A music visualization technique called Lyricon is presented [25] that automatically selects multiple icons of tunes using musical and lyrical features and helps users to choose the songs of their interest based on the visual representation of the mood icons. [3] proposed an application named Songwords that allows users to explore music collections based on lyrics of songs and [39] has proposed a system named SLAVE (Score Lyrics Audio Video Explorer) that allows users to explore multimedia music collections using different varieties of music documents.

6 Open Challenges

Music has become a part of lifestyle for all groups of people. Various researchers have contributed for automation of mood identification from music. A comparison of various music mood categorization systems is given in Table 2. It is observed from the table that most of the reported works has derived their own mood taxonomy and the most commonly used features for audio are intensity and rhythm where as that of lyrics is lexicon based and text stylistic features. Of the various approaches used SVM is most commonly used for audio, lyric and multimodal classification. Very few works were reported with respect to Indian languages of which the maximum performance is an F-measure of 0.751 for multimodal classification.

Table 2. Comparison of different music mood categorization systems

On observing the works listed in the survey, the following are the open challenges that can be addressed.

  1. 1.

    Scarcity of standard mood annotated data resulted in many researchers of music research community prepare and standardize their own data. This is both expensive and time consuming in spite of the fact that it involves more human annotators. Few works have proposed data collection and standardization of western music in the form of online games but no such data exists for Indian languages in the music domain.

  2. 2.

    Work on sentiment analysis for Indian languages (Telugu) and few works on music mood categorization (specially for Hindi and Bengali) have been reported. However, extending from polarity classification to mood categorization (especially for Telugu language) can be addressed.

  3. 3.

    Mood taxonomies refined by combining both theoretical models and social tags may meet current day music moods.

  4. 4.

    The data set collected for Indian music might not include all the nine rasas of navarasa model as it is a generally used model even for Indian art and drama. Specific mood taxonomy only for music in terms of Indian languages can be addressed.

  5. 5.

    A considerable amount of work has been reported on audio and combining audio and lyrics where as Lyric based Music information retrieval is still in the budding stage.

  6. 6.

    Most lyrical features considered till date are BOW, PoS, Content words, lexicon based and text stylistic features which are generally used in regular text mining tasks.

But lyrics are generally considered as different from ordinary text because of their abstract nature and so is the text mining and lyric mining. Hence, introducing features dedicated to lyrics with due consideration to lyric dimensionality may improve classification accuracy.

7 Conclusion

Humans experience different kinds of emotions while listening to music as the mood being expressed by the same music piece is often ambiguous and changes over its entire duration and is also based on the psychological condition of the listener. This paper has presented a survey on recent updates in the field of automatic music mood classification by summarizing various contributions to the aspects like mood taxonomies, data creation and standardization, approaches and systems. The analysis of the contributions has revealed that most of the previous works have concentrated on combination of audio and lyrics where as purely lyric based MIR is an emerging interest area. It is observed that most frequently used algorithms for the mood classification task are Naive Bayes, SVMs and k-NN. Though techniques and methods used for automatic music mood classification are advancing, there are open challenges to be addressed and there is a good scope for research contributions in Indian languages and Indian music.