Correspondence Analysis, Cross-Autocorrelation and Clustering in Polyphonic Music

Cocco, Christelle; Bavaud, François

doi:10.1007/978-3-662-44983-7_35

Christelle Cocco²¹ &
François Bavaud²¹

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2954 Accesses
1 Altmetric

Abstract

This paper proposes to represent symbolic polyphonic musical data as contingency tables based upon the duration of each pitch for each time interval. Exploratory data analytic methods involve weighted multidimensional scaling, correspondence analysis, hierarchical clustering, and general autocorrelation indices constructed from temporal neighborhoods. Beyond the analysis of single polyphonic musical scores, the methods sustain inter-voices as well as inter-scores comparisons, through the introduction of ad hoc measures of configuration similarity and cross-autocorrelation. Rich musical patterns emerge in the related applications, and preliminary results are encouraging for clustering tasks.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Cross Entropy as a Measure of Coherence and Uniqueness

A Review of Musical Rhythm Representation and (Dis)similarity in Symbolic and Audio Domains

The Multileveled Rhythmic Structure of Ragtime

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

This paper aims to produce an exploratory data analysis of symbolic polyphonic musical data represented as contingency tables, which count the duration of each pitch for each time interval, given a predefined partition of the musical score into equal durations. This representation, not so far from the piano-roll representation or from the Chroma representation for audio files (see, e.g., Müller and Ewert 2011 or Ellis and Poliner 2007), has the advantage of representing digital polyphonic musical scores, being usable with common data analytic methods, such as correspondence analysis and being aggregation-invariant (Sect. 2).

In Sect. 3.1, analyses of whole music pieces are proposed, by means of correspondence analysis and a flexible autocorrelation index able to deal with general neighborhoods. Both methods grasp intrinsic structures of musical scores and provide pattern visualizations. Multiple voices within a single musical score are analyzed through soft multiple correspondence analysis and a cross-autocorrelation index (Sect. 3.2). Finally, based on the choice of the contingency table, a similarity measure, aimed to cluster music pieces according to composers, is proposed and illustrated (Sect. 3.3).

2 Data Representation

In this contribution, symbolic music files are used, and especially files in Humdrum **kern format, as they are well structured with all voices, independent of the performer and freely available on the web (http://kern.ccarh.org/). Moreover, Humdrum extras (http://extra.humdrum.org/) are used when modifications, such as transposition, are needed, as well as to transform **kern files in Melisma format (http://www.link.cs.cmu.edu/music-analysis/), easily handleable for the representation proposed in this paper. Note that the representation proposed in the followings could have also been obtained with other digital score formats, such as ABC or MIDI files, especially if the latter is performed with a constant tempo.

Each musical score is represented, with all repeated passages, as a contingency table X = (x _tj) crossing pitches (j = 0, …, m) and time intervals (t = 1, …, n). The table gives the duration of each pitch in each time interval. Notice that the repetition of notes of the same pitch within a time interval is not coded. In more detail, MIDI note numbers (0 to 127) are transformed in a 12-note octave-equivalent pitch set using a modulo 12, where 0 stands for C; 1, for C♯ or D♭; 2, for D; etc. Moreover, a true rest z is added whenever no note is played. Thus, j can take on 13 different values: 0 to 11 and z. Regarding time intervals, each one has a constant duration of τ which can take any value, such as a 16th note, a measure or a number of milliseconds. Consequently, the total duration of the musical score is τ _tot = n τ. An example of the transposed contingency table is given in Fig. 1 for two different values of τ.

Besides the advantage to deal with polyphonic music, this representation is aggregation-invariant in the sense that doubling τ amounts to summing counts within two consecutive parts. So, considering an interval T made out of smaller intervals t, the new counts are $\tilde{x}_{\mathit{Tj}} =\sum _{t\in T}x_{\mathit{tj}}$. Lavrenko and Pickens (2003) and Morando (1981) use a quite similar representation, except that the former do not take into account the duration of and between notes and the latter bases his representation upon the succession of chords. However, in contrast to the present representation, theirs are not aggregation-invariant.

Then, as a second step, the contingency table X = (x _tj) is normalized to Ξ = (ξ _tj) in order that the sum of each row ∑ _j ξ _tj = ξ _t• equals to 1, that is $\xi _{\mathit{tj}} = \frac{x_{\mathit{tj}}} {x_{t\bullet }}$. Thus, the same importance is given to each time interval, regardless of the duration and the number of pitches.

3 Methods and Applications

3.1 Single Score Analysis

3.1.1 Correspondence Analysis

To perform the correspondence analysis (CA) on the Ξ matrix, an equivalent method is used which consists in applying a weighted multidimensional scaling on the chi-squared dissimilarities between time intervals $\hat{D} = (\hat{D}_{\mathit{st}})$ and between pitches $\check{D} = (\check{D}_{\mathit{ij}})$:

$$\displaystyle{ \hat{D}_{\mathit{st}} =\sum _{j}\rho _{j}(q_{\mathit{sj}} - q_{\mathit{tj}})^{2}\qquad \quad \check{D}_{\mathit{ ij}} =\sum _{t}f_{t}(q_{\mathit{ti}} - q_{\mathit{tj}})^{2} }$$

(1)

where $f_{t} = 1/n$ is the relative weight of time intervals, $\rho _{j} =\xi _{\bullet j}/n$ is the relative weight of pitches, and $q_{\mathit{tj}} =\xi _{\mathit{tj}}n/\xi _{\bullet j}$ is the independence ratio.

In a nutshell, scalar products between time intervals $\hat{B} = (\hat{b}_{\mathit{st}})$ and between pitches $\check{B} = (\check{b}_{\mathit{ij}})$ are computed from the dissimilarity matrices as:

$$\displaystyle{\hat{B} = -\frac{1} {2}H^{f}\hat{D}(H^{f})^{{\prime}}\qquad \quad \check{B} = -\frac{1} {2}H^{\rho }\check{D}(H^{\rho })^{{\prime}}}$$

where $H^{f} = I -\mathbf{1}f^{{\prime}}$, $H^{\rho } = I -\mathbf{1}\rho ^{{\prime}}$ are the corresponding centering matrices. Then, weighted scalar products $\hat{K} = (\hat{k}_{\mathit{st}})$ and $\check{K} = (\check{k}_{\mathit{ij}})$ are defined as:

$$\displaystyle{ \hat{k}_{\mathit{st}} = \sqrt{f_{s } f_{t}}\hat{b}_{\mathit{st}}\qquad \quad \check{k}_{\mathit{ij}} = \sqrt{\rho _{i } \rho _{j}}\check{b}_{\mathit{ij}} }$$

(2)

The spectral decomposition of the matrix $\hat{K}$ (respectively $\check{K}$) provides the eigenvectors u _{t α} (resp. v _{j α}) and the corresponding eigenvalues λ _α (identical for both matrices) from which stem the factor coordinates for time intervals (x _{t α}) and for pitches (y _{j α}):

$$\displaystyle{x_{t\alpha } = \frac{\sqrt{\lambda _{\alpha }}} {\sqrt{f_{t}}}u_{t\alpha } = \frac{1} {\sqrt{\lambda _{\alpha }}}\sum _{j=1}^{m}\rho _{ j}q_{\mathit{tj}}y_{j\alpha }\qquad y_{j\alpha } = \frac{\sqrt{\lambda _{\alpha }}} {\sqrt{\rho _{j}}}v_{j\alpha } = \frac{1} {\sqrt{\lambda _{\alpha }}}\sum _{t=1}^{n}f_{ t}q_{\mathit{tj}}x_{t\alpha }}$$

An example of this formalism for the well-known French monophonic nursery melody Frère Jacques (Are you sleeping? in English) is given in Fig. 2. The graph on the left shows the result obtained with τ equal to an eighth note, which means that no more than one pitch is played during each time interval, i.e. the representation is totally monophonic. In that case, chi-squared dissimilarities between time intervals are “star-like”, i.e. of the form $\hat{D}_{\mathit{st}} = a_{s} + a_{t}$ (see, e.g., Critchley and Fichet 1994). Consequently all λ _α are equal and data are difficult to compress by factor analysis. When τ is equal to a measure (graph in the middle), the graph reveals the structure of the music piece, with each measure played two times. Note the “horseshoe effect” resulting from the temporal ordering of time intervals. The right graph highlights that when increasing the duration τ, the percentage of explained inertia climbs, except when τ is smaller than or equal to a eighth note, the smallest duration of a note, and between τ equal to a whole note (corresponding to a measure) and equal to two whole notes, due to the repeated structure of the piece.

Another example is given in Fig. 3 for a Mazurka by Chopin, with three different interval durations. The structure emerges more clearly for large values of τ. In particular, the right graph, with τ equal to eight measures, reveals the similar (e.g., 1, 3, 6, 9 and 13) and different passages (e.g., 2 against 3).

While these two examples clearly highlight the structure of the piece, results are less comprehensible when a motif is transposed in the same piece or when a true rest appears. In fact, in the latter case, the first factor often exclusively expresses the contrast between true rests and pitches.

3.1.2 Autocorrelation Index

Consider now the neighborhood analysis between ordered time intervals, represented by the rows of Ξ. Temporal neighborhoods can be defined by a non-negative symmetric exchange matrix E = (e _st) obeying $e_{t\bullet } = e_{\bullet t} = f_{t} = 1/n$. The associated autocorrelation index (Bavaud et al. 2012) is calculated as:

$$\displaystyle{ \delta:= \frac{\varDelta -\varDelta _{\text{loc}}} {\varDelta } \in [-1,1] }$$

(3)

where Δ is the (global) inertia and Δ _loc is the local inertia:

$$\displaystyle{ \varDelta:= \frac{1} {2}\sum _{\mathit{st}}f_{s}f_{t}\hat{D}_{\mathit{st}} = \frac{1} {2n^{2}}\sum _{\mathit{st}}\hat{D}_{\mathit{st}}\qquad \varDelta _{\text{loc}}:= \frac{1} {2}\sum _{\mathit{st}}e_{\mathit{st}}\hat{D}_{\mathit{st}} }$$

(4)

Thus, the autocorrelation index measures the difference between the overall variability of chi-squared interval dissimilarities and the local variability within some neighborhood defined by E, generalizing the usual “immediate left-right neighborhood” (see, e.g., Morando (1981) for a musical data-analytic approach). A large positive (resp. negative) autocorrelation means that the pitches distributions are more (resp. less) similar in the neighborhood than in randomly chosen intervals.

Among all possible exchange matrices, it turns out to be convenient to define a periodic exchange matrix, with a neighborhood at temporal distance (or lag) r (right and left) of the current interval, E ^(r):

$$\displaystyle{e_{\mathit{st}}^{(r)} = \frac{1} {2n}[1(t = (s \pm r)\bmod n) + 1((s \pm r)\bmod n = 0) \cdot 1(t = n)]}$$

For statistical testing of the autocorrelation index, see, e.g., Cliff and Ord (1981) and Bavaud (2013). Note that in contrast to the usual autocorrelation function in time series analysis (see, e.g., Box and Jenkins 1976) which considers a single numerical variable, the autocorrelation index can deal with multiple simultaneous categorical variables.

The autocorrelation index is computed on three musical scores (Fig. 4). As expected, δ = 1 for r = 0 and the figures are symmetric, since the neighborhood is periodic ($E^{(r)} = E^{(n-r)}$). Moreover, noticeable peaks appear in all graphs. For the monophonic music piece Are you sleeping?, the highest value (δ = 0. 495) appears for r = 4 which corresponds to the duration of a measure. In fact, due to the systematic repetition of each measure, at each point the same pitches are played at a distance equal to four, sometimes on the left, sometimes on the right. For Chopin’s piece, peaks occur each eight measures as expected by the results obtained in Fig. 3. Finally, for Scarlatti’s sonata, there are two remarkable peaks (δ = 0. 25 and δ = 0. 21), for r = 54 and r = 61 measures, corresponding to the length of the two repeated parts of the piece, which compose the whole piece. In conclusion, peaks of δ appear to detect strict or approximate repetitions, but do not detect transposed repetitions.

3.2 Between Voices Analysis

3.2.1 Soft Multiple Correspondence Analysis

Let Ξ ^v denote the row-normalized contingency table for voice v = 1, …, V occurring in a music piece. The complete contingency table of the musical score is given as Ξ ^COMP = (Ξ ¹ | Ξ ² | … | Ξ ^V), on which a CA is carried out. Whereas an usual multiple correspondence analysis (MCA) is computed on a disjunctive table, the present procedure is applied to row cells containing, due to row-normalization, the pitch proportions of the voice during a given t, and hence constitutes a soft variant of MCA.

Figure 5 shows the results obtained for the Pachelbel’s canon. On the right graph, different zones appear depending on the number of instruments which are playing. For instance, in the bottom zone, only the harpsichord is playing, and so there are true rests for the three violins. Again, as for CA, the clarity of pattern representation largely depends upon the value of τ.

3.2.2 Cross-Autocorrelation Index

Define the “raw” coordinates of the voice Ξ ^v as $^{{\ast}}\xi _{\mathit{tj}}^{v} = \sqrt{\rho _{j }^{v}}(q_{\mathit{tj}}^{v} - 1)$, with the property that the associated squared Euclidean distances $D_{\mathit{st}} =\sum _{j}(^{{\ast}}\xi _{\mathit{sj}}^{v} -^{{\ast}}\xi _{\mathit{tj}}^{v})^{2}$ are equal to the chi-squared distances $\hat{D}_{\mathit{st}}$ of Eq. (1).

To extend the autocorrelation index to two voices (α and β), one proposes a cross-autocorrelation index for multidimensional variables Ξ ^α and Ξ ^β, which measures the similarity between the pitch distribution of α and the pitch distribution of β within a fixed lag or, more generally, a defined neighborhood, namely:

$$\displaystyle{\delta (\varXi ^{\alpha },\varXi ^{\beta }):= \frac{\varDelta (\varXi ^{\alpha },\varXi ^{\beta }) -\varDelta _{\text{loc}}(\varXi ^{\alpha },\varXi ^{\beta })} {\sqrt{\varDelta (\varXi ^{\alpha })\varDelta (\varXi ^{\beta })}} \in [-1,1]}$$

In the latter, Δ(Ξ ^v) is the inertia of the voice v [see the first part of (4)], $\varDelta (\varXi ^{\alpha },\varXi ^{\beta }) = \frac{1} {2}\sum _{\mathit{st}}f_{s}f_{t}D_{\mathit{st}}^{\alpha \beta } =\sum _{ s}f_{s}\sum _{j}^{{\ast}}\xi _{ \mathit{sj}}^{}{\alpha }^{{\ast}}\xi _{ \mathit{sj}}^{\beta } -\sum _{ j}^{{\ast}}\bar{\xi }_{ j}^{}{\alpha }^{{\ast}}\bar{\xi }_{ j}^{\beta }$ is the cross-inertia between the voice α and the voice β, where $D_{\mathit{st}}^{\alpha \beta } =\sum _{j}(^{{\ast}}\xi _{\mathit{sj}}^{\alpha } -^{{\ast}}\xi _{\mathit{tj}}^{\alpha })(^{{\ast}}\xi _{\mathit{sj}}^{\beta } -^{{\ast}}\xi _{\mathit{tj}}^{\beta })$ is the cross-dissimilarity between two time intervals of two voices, and finally $\varDelta _{ \text{loc}}(\varXi ^{\alpha },\varXi ^{\beta }) = \frac{1} {2}\sum _{\mathit{st}}e_{\mathit{st}}D_{\mathit{st}}^{\alpha \beta } =\sum _{ s}f_{s}\sum _{j}^{{\ast}}\xi _{ \mathit{sj}}^{}{\alpha }^{{\ast}}\xi _{ \mathit{sj}}^{\beta } -\sum _{\mathit{ st}}e_{\mathit{st}}\sum _{j}^{{\ast}}\xi _{ \mathit{sj}}^{}{\alpha }^{{\ast}}\xi _{ \mathit{tj}}^{\beta }$ is the local cross-inertia between voices α and β.

In particular, Δ(Ξ, Ξ) = Δ(Ξ) and Δ _loc(Ξ, Ξ) = Δ _loc(Ξ), so $\delta (\varXi,\varXi ) =\delta (\varXi ) =\delta$ given in (3). It must be noticed that this formalism works in this specific context because $f_{t}^{\alpha } = f_{t}^{\beta } = f_{t} = \frac{1} {n}$ due to the normalization of Ξ or Ξ ^v and since all voices have the same number of time intervals.

This cross-correlation index is computed on two multiple-voice music pieces with the same exchange matrix as the one proposed for the autocorrelation index (Fig. 6). For Pachelbel’s canon, highest peaks on the left graph appear at r = 2 for the cross-autocorrelation between violins I and II and between violins II and III, and at r = 4 between violins I and III, corresponding to the lag of two or four measures between the starts of each violin. For Beethoven’s string quartet (center and right graphs), peaks at r = 0 reveal largest melodic similarities between violin I and violin II on the one hand, and between viola and cello on the other hand. Moreover, both graphs exhibit large peaks at r = 114 measures, corresponding to a repetition in the music piece.

Thus, the cross-autocorrelation index allows the comparison of different voices of a music piece. It can also be implemented to compare two music piece variants. See, e.g., Ellis and Poliner (2007), who apply cross-correlation on audio files.

3.3 Between Scores Analysis

To measure the configuration similarity between two musical scores a and b, a weighted dual version of the RV-Coefficient proposed by Robert and Escoufier (1976) is computed:

$$\displaystyle{\text{CS}_{\mathit{ab}} = \frac{\text{Tr}(\check{K}^{a}\check{K}^{b})} {\sqrt{\text{Tr} ((\check{K}^{a } )^{2 } )\text{Tr} ((\check{K}^{b } )^{2 } )}}}$$

where $\check{K}^{a}$ (resp. $\check{K}^{b}$) is the weighted scalar product between pitches of the musical score a (resp. b) as defined in the second part of the Eq. (2). By construction, the components of $\check{K}^{a}$ (or $\check{K}^{b}$) are zero for a pitch absent in the corresponding musical score. Both $\check{K}^{a}$ and $\check{K}^{b}$ depend upon the reference duration τ, chosen as identical for both music pieces.

Define the dissimilarity between two musical scores as $D_{\mathit{ab}} = 1 -\text{CS}_{\mathit{ab}}$. This dissimilarity can be seen as a generalization of the well-known cosine distance (see, e.g., Weihs et al. 2007), and turns out to be squared Euclidean. Usual clustering methods between musical scores, based upon D _ab, can in turn be applied.

Figure 7 presents the results obtained with an agglomerative hierarchical clustering on a dataset made up of 20 music pieces written by four composers:

Scarlatti: Sonatas L. 1 (K. 514), L. 16 (K. 306), L. 336 (K. 93), L. 345 (K. 113), and L. 346 (K. 408). They all have a 2/2 time signature.
Mozart: First movement of piano sonatas n^o1, 2, 3, 4, and 5.
Beethoven: First movement of piano sonatas n^o1, 2, 3, 4, and 5.
Chopin: Mazurkas Op. 6 (No. 1), Op. 7 (No. 1), Op. 17 (No. 1), Op. 24 (No. 1), and Op. 30 (No. 1).

For comparison sake, the 20 music pieces are all transposed in C, with a common τ value of one measure. Although the dataset is small, this first result is encouraging, producing well-grouped music pieces with respect to each composer, especially for Beethoven.

4 Conclusion

The present data-analytic treatment of musical scores is based upon two primitives, namely a dissimilarity matrix and a neighborhood matrix between time intervals, defined with respect to a reference duration. It covers and generalizes well-known multi-categorical, factorial and time-series techniques, and is able to treat polyphonic pieces, as well as performing between-voices and between-scores analyses, with encouraging clustering results. Its modest computational cost makes it amenable to the automatic treatment of large symbolic musical data sets. Furthermore, it allows the consideration of flexible alternatives, both for the dissimilarity matrix (other than the chi-square) and for the exchange matrix (other than periodic neighborhood), deserving further investigation.

So far, exploratory analyses are interpretable in a fairly satisfactory way, although the complex factorial structures exhibited by rich music pieces certainly deserve further attention. In the near-future agenda, within the present formalism, we hope to progress in the automatic detection of τ, motif recognition and large dataset clustering or classification.

References

Bavaud, F. (2013). Testing spatial autocorrelation in weighted networks: The modes permutation test. Journal of Geographical Systems, 15, 233–247.
Article Google Scholar
Bavaud, F., Cocco, C., & Xanthos, A. (2012). Textual autocorrelation: Formalism and illustrations. In 11èmes Journées internationales d’analyse statistique des données textuelles (pp. 109–120). Liège: Université de Liège.
Google Scholar
Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis: Forecasting and control. San Francisco: Holden-Day.
MATH Google Scholar
Cliff, A. D., & Ord, J. K. (1981). Spatial processes: Models and applications. London: Pion.
MATH Google Scholar
Critchley, F., & Fichet, B. (1994). The partial order by inclusion of the principal classes of dissimilarity on a finite set, and some of their basic properties. In: B. Van Cutsem (Ed.), Classification and dissimilarity analysis (pp. 5–65). New York: Springer.
Chapter Google Scholar
Ellis, D. P. W., & Poliner, G. E. (2007). Identifying ‘Cover Songs’ with chroma features and dynamic programming beat tracking. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2007 (pp. IV-1429–IV-1432).
Google Scholar
Lavrenko, V., & Pickens, J. (2003). Polyphonic music modeling with random fields. In Proceedings of the Eleventh ACM International Conference on Multimedia (pp. 120–129), Berkeley, CA.
Google Scholar
Morando, M. (1981). L’analyse statistique des partitions de musique. In J.-P. Benzécri, et al. (Eds.), Pratique de l’analyse des données, tome 3: Linguistique et lexicologie (pp. 507–522). Paris: Dunod.
Google Scholar
Müller, M., & Ewert, S. (2011). Chroma toolbox: Matlab implementations for extracting variants of chroma-based audio features. In Proceedings of the 12th International Conference on Music Information Retrieval (pp. 215–220).
Google Scholar
Robert, P., & Escoufier, Y. (1976). A unifying tool for linear multivariate statistical methods: The RV-coefficient. Journal of the Royal Statistical Society. Series C (Applied Statistics), 25, 257–265.
MathSciNet Google Scholar
Weihs, C., Ligges, U., Mörchen, F., & Müllensiefen, D. (2007). Classification in music research. Advances in Data Analysis and Classification, 1, 255–291.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Lausanne, Lausanne, Switzerland
Christelle Cocco & François Bavaud

Authors

Christelle Cocco
View author publications
You can also search for this author in PubMed Google Scholar
François Bavaud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christelle Cocco .

Editor information

Editors and Affiliations

University of Essex, Colchester, United Kingdom
Berthold Lausen
University of Luxembourg, Walferdange, Luxembourg
Sabine Krolak-Schwerdt
University of Luxembourg, Walferdange, Luxembourg
Matthias Böhmer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cocco, C., Bavaud, F. (2015). Correspondence Analysis, Cross-Autocorrelation and Clustering in Polyphonic Music. In: Lausen, B., Krolak-Schwerdt, S., Böhmer, M. (eds) Data Science, Learning by Latent Structures, and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44983-7_35

Download citation

DOI: https://doi.org/10.1007/978-3-662-44983-7_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44982-0
Online ISBN: 978-3-662-44983-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics