Dynamic Time Warping for Automatic Musical Form Identification in Symbolical Music Files

Bañuelos, Cristian; Orduña, Felipe

doi:10.1007/978-3-319-71827-9_19

Cristian Bañuelos¹⁶ &
Felipe Orduña¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10527))

Included in the following conference series:

International Conference on Mathematics and Computation in Music

1175 Accesses
1 Citations

Abstract

Music information retrieval techniques are used to automatically extract structural data of a piece, however there have been few attempts to study ways to automatically identify the musical form of digital files. In this work we present an implementation of the dynamic time warping algorithm for the automatic identification of musical form structure by means of a segmentation matrix in which we group elements according to maximal similarity. The system was implemented in symbolic files parsed with the music21 library. We tested it in two pieces: Bagatelle No. 25 in A minor by L.V. Beethoven, and Piano Sonata No. 11 in A major, K331, movement 3 by W.A. Mozart. The system obtained a correct identification of the similar sections, both with a rondo form. We foresee that this algorithm can be extended to measure harmonic similarity and with this be able to analyze more complex forms, like a sonata.

C. Bañuelos—This work was made possible by the doctoral financial support of the Consejo Nacional de Ciencia y Tecnología (CONACYT), México. Grant number: CVU 662627 no: 583329.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Computational Analysis of Musical Form

Automatic Recognition of Key Modulations in Symbolic Musical Pieces Using Information Theory

Information Synthesis of Time-Geometry QCurve for Music Retrieval

Keywords

1 Introduction

Music Information Retrieval uses different algorithms and techniques to extract structural content from musical files. There are mainly two approaches to this, on one hand, we can study an audio signal, like in [9, 10], on the other, we can take a symbolical musical file, like MIDI or MusicXML, and apply pattern recognition techniques to extract harmonic or melodic content, this approach is taken in [3, 5].

Many of the Music Information Retrieval problems addressed revolve around developing genre recommendation systems [7], style imitation with artificial intelligence and machine learning [4], chord recognition and harmonic extraction [12], instrument recognition from an audio file [14]. However there have been few attempts to analyze and extract musical form, among these [8], in which they find internal similarity within an audio file to find sections that relate among themselves and can be used as transitions between them. In [1], they analyze internal similarity of an audio file to create a general thumbnail, that is the minimal amount of music that represents the whole piece, they use this in order to simplify the search in extensive databases.

In symbolical music files analysis, there is a need to apply these kind of similarity analysis to create a system that classifies internal similarity of a musical piece in order to find its musical form. We argue that a system that can extract and identify the formal structure of symbolical music files can be useful to simplify other tasks, like automatic harmonic and chord labeling, removing redundancies in the calculation of repeating or similar sections. It can also be a useful pedagogical tool that can be used in music education contexts to help the student to better understand the concept of form.

In this work, we propose to apply the time series technique Dynamic Time Warping to a symbolical musical file. This algorithm is normally used in audio signals and it’s useful to calculate a measure of how different two signals are by finding the cost of transforming one into another by means of time stretching. The novelty of this work is that instead of working with audio files, we will use the much smaller symbolical representation of music. In order to classify the musical form, we run through all possible sub segments of the piece and compare them in a similarity matrix, similar to the one used in [1], we then group segments with maximal similarity and label them as a new section. The process is repeated until all the maximal similarities have been found. In the present work we apply the algorithm to find the repeating section of a rondo form.

2 Dynamic Time Warping

The DTW algorithm uses dynamic programming to find the optimum alignment between two time series. It does this by calculating the cost to align each point of the first to the second. Afterwards, it takes the minimal path of change needed to transform each point to the other. It’s very useful in cases where we need to compare sequences that are time stretched or transposed. Because of this flexibility we consider that its application to a musical context would bring optimal results.

The following is a brief review of the DTW as presented by Müller in [11]. If we have sequences $X := (x_1, x_2, . . . , x_N)$ and $Y :=(y_1, y_2, . . . , y_N)$, the warping path $p=(p_1,\dots ,p_L)$ is defined as the assignment of $x_{nl}$ to $y_{ml}$. The path must satisfy the boundary condition that it always aligns the first elements, and the last elements of the sequences, respectively; it must be monotonic; and only move by unitary steps; also all elements from X and Y must be paired and there can be no repetitions. Müller also defines the total cost as

$$\begin{aligned} c_p(X,Y) := \sum _{l=1}^L c(x_{nl},y_{ml}). \end{aligned}$$

(1)

where c is the local cost, in this case we will use the euclidean distance as a measure of difference. To find the optimal cost we search for the path with the minimum value.

$$\begin{aligned} DTW(X,Y)=\min \{c_p(X, Y ) \,|\, p \,\,\,\text {is a transformation path}\} \end{aligned}$$

(2)

With this we find the minimum cost of the different possible paths of transforming X into Y. This measure will be useful to compare similarity segments within the piece. The system developed in this work uses the Fast DTW [13], which is based on dynamic programming concepts. The DTW measure can be used for query systems by comparing one small segment to different windows of a larger segment, this methodology is used in [6]. In order to apply this to our symbolic musical file, we shall use the sequence of notes given by a MusicXML file and treat them as a time series.

3 Segmentation Matrix

The proposed system was implemented in Python with the library Music21 [2], which is optimal for the parsing and processing of symbolical musical files like MIDI and MusicXML. In order to apply the DTW algorithm we need to prepare the data so it takes the form of a time series. We use the flat functionality in the music21 library to convert the xml file to a linear representation of notes and time offsets. Next, we translate the musical information into a time series of all the notes, thus having a sequence of order pairs giving the time stamp and the pitch of a particular event.

$$\begin{aligned} P= \{(time_i,pitch_i)\} \end{aligned}$$

(3)

Where $0 \le i\le N$, and N is the total number of notes in the piece to analyze; we will call this: the events list.

As a next step we will create a list of all the possible sub segmentations of the events list, that is all the possible subsets of P in which all consecutive elements from i to j are present.

$$\begin{aligned} Segs =\{U(i,j) \subset P|\,\, \text { if}\,\, i \le k\le j, \implies (t_k,p_k)\in U(i,j) \} \end{aligned}$$

(4)

We will use the notation $Segs_{i,j}$ to indicate the segment time consecutive notes from element i to j in the events list. We have $N^2$ different segments, and we need to group them according to their similarity. In order to do this, we will create a similarity matrix in which each entry (p, q) will have the similarity measure assigned between $Segment_p$ and $Segment_q$, that is

(5)

$$\begin{aligned} a_{p,q}=DTW(Seg_p,Seg_q) \end{aligned}$$

(6)

In this matrix we calculate the DTW similarity measure between all segments and thus can be used to obtain a measure to classify and group similar sections of the piece. The method for computing the DTW similarity value is exactly the same used for acoustic time series as shown by Müller [11], which uses dynamic programming to find the cost of transforming one series to the other, as was explained in the previous section.

4 Musical Tests

For the purpose of testing the system we choose two pieces that have a rondo form, in order to see if the algorithm correctly predicts the repeating pattern of the analysis.

In the Bagatelle in A minor, Fur Elise, from Beethoven we have a rondo structure in which there is a section A at the beginning, and the musical form as a whole is A - B - A - C - A. When we apply the DTW measures to this piece we obtain a clear indication of a repeating section. In this work we show an example of use of the algorithm to find the repeating section of the rondo piece, that is, the A part. In Fig. 1a, we have a plot of time offset to the cost of the current test window. The graph takes an initial window of size 30, and runs through the rest of the piece in steps of 10 notes, obtaining the cost of comparing the test window to each of the other segments, the initial window size and step length were chosen empirically. The program then increases the window size to check if a larger frame would give a better matching result for some sections. The windows change in size from 30 to 100 in steps of 10, Fig. 1 show the costs of different windows sizes. Given the local minimums that appear around offsets 20, 55 and 130 we see that there appears a clear similarity between these sections and the test window. Given that the test window was the initial part of the piece and repeats several times, we will call it section A, that in fact corresponds to the repeating section of the rondo. To analyze if the parts that differ from A are in fact sections B and C, we would need to apply the algorithm starting in the first offset not classified as section A, and search for similarities, this will be addressed in future work, but currently the algorithm shows a good result in classifying one section at a time.

In Mozart’s Piano Sonata No. 11 in A major, K331, movement 3, also known as Rondo Alla Turca, we also have a similar form like in Beethoven, and if we apply the DTW algorithm to the piece we get the cost graph shown in Fig. 1b. We can see that the beginning section of the piece repeats itself at time offsets around 40, 140, and 160. In 140 we have an almost identical similarity regardless of window size, while in the other cases we have some minor variation in the structure, as shown by the higher cost obtained. We will call this repeating section A. Again, to find the rest of the sections we would need to apply the algorithm to the rest of the piece that was not labeled as A, this analysis will be applied in future work.

5 Conclusions

The use of DTW to compare internal sections of a musical piece gives us enough flexibility to compare sections of a work by allowing us to compare all the possible combinations of windows sizes, we use this to find section repetitions, variations and changes. The current work presents a way to identify a repeating section, and it will be further developed to find and classify all repeating sections. One of the main challenges to solve is that we don’t have yet a clear maximum value for the window size, so we have to decide what is the cost tolerance of a section and assign a degree of belonging of each part of the score, to the test window. In future work, this will lead to establish some fuzzy belonging functions to help us decide how we should segment the similarity sections given by the DTW algorithm. The current implementation only takes into account the height and time stamp of the note, however, in future work we could implement other kind of metrics to include more parameters, like harmonic distances, which can help to further classify similar parts and sections of a symbolical music file using features others than just the pitch of the notes. Another thing to consider is the range of the costs values, in this version, the cost is given directly by the DTW, but we can see that it can range from hundreds to thousands, it would be a real benefit to find an optimal normalization of values, so we can have a clearer estimation of what the cost is indicating. In conclusion, Dynamic Time Warping gives us a good estimation of how to find internal similarity on a musical work to find its structural form, we foresee that this kind of approaches, complemented with other similarity measures will contribute to future work in Music Information Retrieval research.

References

Cooper, M.L., Foote, J.: Automatic music summarization via similarity analysis. In: ISMIR 2002 (2002)
Google Scholar
Cuthbert, M.S., Ariza, C.: music21: a toolkit for computer-aided musicology and symbolic music data. In: ISMIR 2010, pp. 637–642 (2010)
Google Scholar
Cuthbert, M.S., Ariza, C., Friedland, L.: Feature extraction and machine learning on symbolic music using the music21 toolkit. In: ISMIR 2011, pp. 387–392 (2011)
Google Scholar
Dubnov, S., Assayag, G., Lartillot, O., Bejerano, G.: Using machine-learning methods for musical style modeling. IEEE Comput. 36(10), 73–80 (2003)
Article Google Scholar
Droettboom, M., Fujinaga, I., MacMillan, K., Patton, M., Warner, J., Sayeed Choudhury, G., DiLauro, T.: Expressive and efficient retrieval of symbolic musical data. In: ISMIR 2001 (2001)
Google Scholar
Hu, N., Dannenberg, R.B., Tzanetakis, G.: Polyphonic audio matching and alignment for music retrieval. In: 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 185–188. IEEE (2003)
Google Scholar
Kaminskas, M., Ricci, F.: Contextual music information retrieval and recommendation: state of the art and challenges. Comput. Sci. Rev. 6(2), 89–119 (2012)
Article Google Scholar
Lamere, P.: The Infinite Jukebox (2012). www.infinitejuke.com. Accessed 24 May 2016
Lee, K., Slaney, M.: Automatic chord recognition from audio using a HMM with supervised learning. In: ISMIR 2006, pp. 133–137 (2006)
Google Scholar
McVicar, M., Santos-Rodríguez, R., Ni, Y., De Bie, T.: Automatic chord estimation from audio: a review of the state of the art. IEEE/ACM Trans. Audio Speech Lang. Process. 22(2), 556–575 (2014)
Article Google Scholar
Müller, M.: Dynamic time warping. In: Information Retrieval for Music and Motion, pp. 69–84. Springer, Heidelberg (2007)
Google Scholar
Pardo, B., Birmingham, W.P.: Algorithms for chordal analysis. Comput. Music J. 26(2), 27–49 (2002)
Article Google Scholar
Salvador, S., Chan, P.: FastDTW: toward accurate dynamic time warping in linear time and space. In: 3rd Workshop on Mining Temporal and Sequential Data, ACM KDD 2004, Seattle, Washington, 22–25 August 2004 (2004)
Google Scholar
Park, T., Lee, T.: Musical instrument sound classification with deep convolutional neural network using feature fusion approach. arXiv preprint arXiv:1512.07370 (2015)

Download references

Author information

Authors and Affiliations

Universidad Nacional Autónoma de México, México City, México
Cristian Bañuelos
Universidad Nacional Autónoma de México, Centro de Ciencias Aplicadas y Desarrollo Tecnológico, México City, México
Felipe Orduña

Authors

Cristian Bañuelos
View author publications
You can also search for this author in PubMed Google Scholar
Felipe Orduña
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristian Bañuelos .

Editor information

Editors and Affiliations

UNCA and UTM, Oaxaca, Mexico
Octavio A. Agustín-Aquino
UNAM, Mexico City, Mexico
Emilio Lluis-Puebla
Georgia State University, Atlanta, Georgia, USA
Mariana Montiel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bañuelos, C., Orduña, F. (2017). Dynamic Time Warping for Automatic Musical Form Identification in Symbolical Music Files. In: Agustín-Aquino, O., Lluis-Puebla, E., Montiel, M. (eds) Mathematics and Computation in Music. MCM 2017. Lecture Notes in Computer Science(), vol 10527. Springer, Cham. https://doi.org/10.1007/978-3-319-71827-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-71827-9_19
Published: 18 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71826-2
Online ISBN: 978-3-319-71827-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dynamic Time Warping for Automatic Musical Form Identification in Symbolical Music Files

Abstract