Keywords

1 Introduction

Early twenty-first century music theory explored a two-pronged generalization of traditional set theory. One prong situated sets and set-classes in continuous, non-Euclidean spaces whose paths represented voice leadings, or ways of moving notes from one chord to another [4, 13, 16]. This endowed set theory with a contrapuntal aspect it had previously lacked, embedding its discrete entities in a robustly geometrical context. Another prong involved the Fourier transform as applied to pitch-class distributions: this provided alternative coordinates for describing chords and set classes, coordinates that made manifest their harmonic content [1, 3, 8, 10, 19,20,21]. Harmonies could now be described in terms of their resemblance to various equal divisions of the octave, paradigmatic objects such as the augmented triad or diminished seventh chord. These coordinates also had a geometrical aspect, similar to yet distinct from voice-leading geometry.

In this paper, we describe a new convergence between these two approaches. Specifically, we show that there exists a class of simple circular voice-leading spaces corresponding, in the case of n-note nearly even chords, to the nth Fourier “phase spaces.” An isomorphism of points exists for all chords regardless of structure; when chords divide the octave evenly, we can extend the isomorphism to paths, which can then be interpreted as voice leadings. This leads to a general technique for replacing individual components of a Fourier analysis with qualitatively similar voice-leading calculations.

2 Voice Leading and Fourier Phase

We begin by considering transpositions of a single n-note chord type lying in some c-note scale. We first explain how the nth Fourier component represents chords on a circular space, sharing the same angular coordinate when related by O/n semitone transposition. (Here O is the size of the octave.) We then show that voice-leading spaces contain very similar subspaces, only now with the sum of the chord’s pitch classes determining the angular coordinate. Thus when restricting our attention to the transpositions of a single chord, the nth Fourier phase is equivalent to pitch-class sum.

In what follows we represent pitches by real numbers in \(\mathbb {R}\) rather than discrete values in \(\mathbb {Z}\), as in much previous music-theoretical work.Footnote 1 Pitch classes arise by identifying octave-related pitches, and can be represented by real numbers in the range \(0 \le p < O\); the collection of distinct pitch classes forms a one-dimensional circular space known as the “pitch class circle.” These basic definitions are common to both theoretical approaches considered below. We will generally consider the octave to have size 12, with C corresponding to 0, C\(\sharp \) to 1, and so on. In scalar contexts, it is useful to consider an octave of size n. This amounts to using the scale as a metric, so that, by definition, it divides the octave evenly [16].

The Fourier transform represents musical objects as a collection of complex numbers or components. The kth Fourier component of a pitch class is a vector with magnitude 1 and angular position p mod c/k; the resulting space can be understood as the quotient of the familiar pitch-class circle by rotation, as if the octave had been “reduced” to size O/k. The kth Fourier component of a chord is represented by the vector sum of its component pitch classes. For a finite collection of notes, \(X = \{x_1, x_2, \dots , x_n\} \in \mathbb {R}/c\), we have:

$$\begin{aligned} F_k(X) = \sum _{x\in X}e^{-2i\pi kx/c} \end{aligned}$$
(1)

(Again many previous authors consider only equal-tempered pitches with values in \(\mathbb {Z}_c\), but the approach extends naturally to continuous pitch classes.) The angle of the resulting chord, or arg(\(F_n\)) is its phase. The combination of reduced octave and vector sum gives rise to many of the Fourier transform’s distinctive properties.

The left side of Fig. 1 presents the third Fourier component for single pitch classes and for major triads in the familiar twelve-tone chromatic universe. Because 3 divides 12 evenly, major-third transpositions leave angular position unchanged. The twelve equal-tempered triads occupy four separate angular positions dividing the circle into four equal parts; transposing a chord by descending semitone moves its angular position a quarter-turn clockwise. For some combinations of chord- and scale-size, the reduced octave may not be equivalent to an integer pitch-class interval, and no two distinct pitch-classes or transpositions of a chord have the same angular position. The right side of Fig. 1 shows the \(F_5\) position of the twelve chromatic pitch classes, and minor ninth chords. Here the octave has size 12/5 and semitone transposition corresponds to rotation by \((5/12)2\pi \). The distinct transpositions of any 12-tone equal-tempered chord will therefore have unique angular positions, with fifth-related chords adjacent to one another.

Fig. 1.
figure 1

The third Fourier component in complex space for pitch classes and major triads, and the fifth component for pitch classes and minor ninth chords.

On the surface, voice-leading spaces are very different from Fourier spaces, as they use neither the reduced octave nor vector summation of pitch classes. Instead, the theory of voice leading represents pitch classes as points on a circle whose size is equal to the octave O. The main objects of interest are paths in pitch class space which represent motion along the circle: thus C\(\xrightarrow {4}\)E corresponds to the ascending major third, ascending by a quarter turn, while C\(\xrightarrow {-8}\)E represents the descending minor sixth, moving three-quarters of a turn in the other direction. A voice leading is a multiset of paths in pitch-class space, representing a way of moving from one chord to another.

This situation can be modeled geometrically by configuration spaces in which points represent entire chords and paths represent voice leadings; distance in these spaces can therefore be understood as the aggregate physical distance required to move one set of notes to another on an instrument like the piano. Different paths between the same points correspond to different voice leadings. These spaces are quotients of \(R^n\) modulo octave equivalence and permutation of their coordinates: for an n-note chord, the configuration space is \(T^n/S_n\), the n-torus modulo the symmetric group on n letters. (Starting with \(\mathbb {R}^n\), with each dimension representing the pitch of one voice, we can derive these spaces by identifying octave-related pitches and permutationally related chords, or those with the same notes in different voices; the resulting orbifold is known as n-note chord space [13, 16].) These chord spaces have one circular dimension representing transposition; the remaining (“horizontal”) dimensions form an (\(n-1\))-dimensional simplex with singular boundaries. These horizontal cross-sections can be taken to contain all chords whose pitch classes sum to the same value modulo O. For a c-note scale, there will be c distinct cross sections containing chords lying in that scale. (Here, pitch-class sum is computed by scalar addition modulo O, as opposed to vector addition.) A fundamental and counterintuitive fact is that the line containing transpositionally related n-note chords winds n times around the circular dimension, since transposition by 12/n leaves a chord’s pitch-class sum unchanged. Since a complete turn along the circular dimension represents transposition by 12/n, chords have the same circular coordinate if they are related by 12/n semitone transposition.

In other words, we find a role for the “reduced octave” O/n in both models. This quantity is manifest in the basic definition of Fourier space but arises as a non-obvious consequence of the fundamental geometry of voice-leading space. Figure 2 compares the two perspectives for the case of major triads in the chromatic scale. On the left we show the phases of the chords’ F\(_3\) component; on the right we represent them using a spiral diagram devised by Tymoczko [11, 17], where the angular component corresponds to the circular dimension (with coordinates given by pitch-class sum) and the“line of transposition” is winds n times around it.

Fig. 2.
figure 2

Major triads in Ph\(_3\) and the circular dimension of voice-leading space

This correspondence can be generalized.

Proposition 1

Consider the collection \(T = \{t_x\}\) consisting of the transpositions of any n-note chord, A, in any c-note scale. There is an equivalence between (a) differences between sum of the pitch classes for each \(t_x\) and (b) differences between their component-n phase values (Ph\(_n\)). That is, \(\varSigma \text {T}_x(A) - \varSigma A =_{mod12} \text {Ph}_n(A) - \text {Ph}_n(\text {T}_x(A))\) (where \(\varSigma \) denotes pitch-class sum).

Proof

Transposition by x is equivalent to multiplication by a unit vector in the Fourier space, so it changes Ph\(_n\) by nx mod 12. Similarly, T\(_x\) adds a constant of nx, mod 12, to the pitch-class sum. Therefore the change in Ph\(_n\) is the change of pitch-class sum.

This result also holds in continuous pitch space via the limit \(c \rightarrow \infty \). The two different music-theoretical approaches thus converge on very similar graphs, so long as we restrict our attention to just a single chord type. Such graphs can be found throughout the analytical literature, as it is often useful to focus on e.g. diatonic triads or chromatic dominant sevenths. Historically, the circular coordinate was of crucial importance, as the initial exploration of both kinds of space was motivated by the goal of understanding set classes; in the Fourier realm this space can be constructed by ignoring phase while in the voice-leading case it involves focusing on (quotients of) the cross-sections with fixed pitch-class sum [2, 9].

3 Glide Paths in Fourier Space

One of the central ideas in the theory of voice leading is to associate discrete events (voice leadings) with continuous paths in configuration space. Specifically, to the discrete pitch succession \(X \rightarrow Y: (x_1, x_2, ..., x_n) \rightarrow (y_1, y_2, ..., y_n)\) we associate the image, in \(T^n/S_n\) of the line segment \(X \rightarrow Y\) in \(R^n\). These images, or “generalized line segments,” trace the sonorities that result from a continuous linear interpolation between chord X and Y, with each note \(x_i\) of chord X gliding smoothly to its destination \(y_i\). The resulting paths in \(T^n/S_n\) can be associated with ways of moving the notes of X to the notes of Y, or voice leadings as musicians think of them. Equivalently, voice leadings can be understood as homotopy classes of paths in the orbifold \(T^n/S_n\), since there is exactly one homotopy class for each generalized line segment [6, 7]. The homotopy classes of paths in a circular space such as Fig. 2 can be associated with a special kind of voice leading: bijective, strongly-crossing free voice leadings, or one-to-one mappings that have no crossings no matter how their voices are arranged in register [14]. These can in turn be decomposed into the product of a transposition \(T_x\), and a “zero-sum” voice leading, or strongly-crossing-free voice leading Z = \(X \rightarrow T_{-x}Y\) whose paths sum to 0. (Since the latter voice leading need not connect chords lying in the same scale, we need continuous space for this decomposition [16]). The angular component of a path in voice-leading space is given entirely by the transposition.

We can apply a similar approach in the Fourier domain as well, using voice leadings to define glide paths in the complex plane of the nth Fourier component. A continuous path \(X \rightarrow Y\) is given by a vector sum \(e^{ict}z_1(t) + e^{ict}z_2(t) + ... + e^{ict}z_n(t)\) in this space, where \(e^{ict}\) represents the voice leading’s transpositional component, each \(z_i(t)\) is a voice leading moving a single voice and \(Z = \sum {z_i(t)}\) is a zero-sum voice leading. A question immediately arises whether the resulting paths are homotopically equivalent to those in circular voice-leading space.

By complex linearity, we can factor out the transpositional component \(e^{ict}\), rewriting this vector sum as \(e^{ict}Z(t)\); the total angular motion in Fourier space will be the sum of the angular motions of \(e^{ict}\) and Z(t). From this it follows that we can restrict our attention to the zero-sum component: if a voice leading \(X \rightarrow Y\) produces homotopically distinct paths in the two spaces, then its zero-sum component will do the same. Thus we ask whether we can find bijective voice leadings Z(t) connecting transpositionally related chords, whose paths sum to zero (using standard addition), but which traverse one or more complete circles in Fourier space.

The answer is that we can, but only when the chord divides the octave somewhat unevenly. For example, consider the voice leading (C, D, E)\(\xrightarrow {(-4,2,2)}\)(G\(\sharp \), E, F\(\sharp \)). Figure 3 shows that the two voices D\(\rightarrow \)E and E\(\rightarrow \)F\(\sharp \) point in opposite directions in Ph\(_3\), adding to 0 by vector addition; since both rotate counterclockwise one half-turn they contribute nothing to the vector sum. That sum is instead determined by the voice C\(\rightarrow \)G\(\sharp \), which makes a complete clockwise turn. So this is a bijective, zero-sum voice leading between transpositionally related chords that has no angular component in voice-leading space but makes a complete turn in Fourier space. The example generalizes to the case where \(n-1\) voices divide the circle equally (summing to 0); these can be moved by \(O/(n-1)\) in one direction while a final voice makes a complete circle in the other direction.

Fig. 3.
figure 3

The voice leading (C, D, E)\(\xrightarrow {(-4,2,2)}\)(G\(\sharp \), E, F\(\sharp \)) in F\(_3\)

By contrast, when chords divide the octave relatively evenly then paths will be homotopically equivalent. Figure 4 shows the paths corresponding to the voice leading (C, E, G)\(\xrightarrow {-1,0,1}\)(B, E, G\(\sharp \)) in the third Fourier space. Here the vectors marked \(x_1\) and \(x_3\) simply switch positions, so that the chord’s vector sum remains pointing in the upper right quadrant. It is clear that a complete circle will never result so long as all the chord’s vectors remain pointing in the same half-plane throughout the voice leading. It follows from basic voice-leading geometry that the bijective, strongly crossing-free voice leadings of a nearly-even chord will always be of this form.

Fig. 4.
figure 4

The voice leading (C, E, G)\(\xrightarrow {(-1,0,1)}\)(B, E, G\(\sharp \)) in F\(_3\)

We can understand this phenomenon heuristically as follows: when a chord is nearly even, its bijective, strongly crossing free voice leadings are all transposition-like in the sense that they move all their notes by approximately the same distances [14, 16]. Thus when we factor out transpositional motion, what remains is something close to the identity, which by continuity will involve small changes in Fourier space. By contrast when a chord is very uneven, its bijective, strongly crossing-free voice leadings are not at all transposition-like; hence factoring out transposition can produce a voice leading that traverses a full circle in Fourier space. We will return to this point shortly.

As of this writing, we cannot specify precisely how uneven a chord may become before the equivalence breaks down. The criterion that all voice leadings remain in a single half-plane is sufficient to ensure the correspondence, and covers many common musical cases (e.g. equal-tempered triads and seventh chords). It is not necessary for convergence, though: the (025) and (015) trichords also have balanced voice leadings that do not produce phase-space cycles. Nor is it straightforward to characterize the cases in which the correspondence fails: the 18-tone equal-tempered voice leading \((0, \frac{16}{3}, 6)\xrightarrow {\frac{4}{3}, -\frac{10}{3}, 2}(\frac{4}{3}, 2, 8)\) results from a deformation of Fig. 3 above; even though no single voice makes a complete circle in phase space, the glide path does. Establishing precise bounds on the correspondence between the two spaces is thus a project for future work.

4 Crossfade Paths in Fourier Space

While some theorists [1, 3, 15] have applied the Fourier transform in continuous pitch-class space, the more common approach [1, 10, 20] assumes pitches lying in a particular equal division of the octave; these may be assigned real-valued weights representing musical salience. In this context, paths have been defined by gradually “fading out” some pitch classes while fading in others – a smooth interpolation of magnitudes that never leaves the equal-tempered domain.Footnote 2 That is, we define a crossfade path from chord X to \(X'\) as \((t)X + (1-t)X'\) for \(t: 0 \rightarrow 1\) where the multiplication can be understood as applying to the weightings of either individual pitch classes or the resultant chordal vectors.Footnote 3 Such “crossfade paths” do not, on the surface, carry any implications about voice leading. Nevertheless, Yust, in [20] and subsequently in [21,22,23], has used the language of voice leading to interpret paths in these spaces. Here we consider the justification for this association.

Clearly, for non-antipodal points, “crossfade paths” will trace out a minimal trajectory through Fourier phase space. Figure 5, from [21], records the fifth Fourier phase of the twelve diatonic scales in the familiar chromatic universe; it is equivalent to the circular voice-leading space for equal tempered diatonic scales. Imagine fading out the F of C major (0\(\sharp \)) while fading in F\(\sharp \) to move to G major (1\(\sharp \)); the phase of the resultant vector will move clockwise by one twelfth of a cycle. By our previous work, this is the same path in phase that would be traced out by a maximally efficient voice leading between the same scales, one in which F ascends by semitone to F\(\sharp \). Thus there is indeed justification for associating“crossfade paths” with particular voice leadings.

Fig. 5.
figure 5

Diatonic scales (labeled by number of accidentals) in Ph\(_5\)

However, one must be careful when drawing theoretical conclusions from this association, as the two forms of path arise in very different ways: changes of weighting rather than paths along the circle. Consider Yust’s identification of enharmonicism with complete circles in Fig. 5. Tymoczko [12, 16] and Hook [5] have argued that notation reflects the logic of voice leading, with each letter name recording an abstract musical voice: when C major moves to G major, the “F voice” ascends by semitone to F\(\sharp \). Yust [18] explains enharmonicism in exactly these (voice-leading-based) terms, using similar language to describe Ph\(_5\) cycles [20, 22, 23]. In particular, he points out that a sequence of modulations that travels a full circle in this space will involve enharmonic respelling, sending C major to either B\(\sharp \) major (for a clockwise path) or D\(\flat \flat \) major (counterclockwise).Footnote 4 The subtle question is whether this phenomenon is to be explained by voice leading or the Fourier transform.

Here the important observation is that distinct voice leadings will generally produce different paths, whether in voice-leading space or using glide paths in Fourier space. Therefore, voice-leading methods can distinguish the modulation from C major to \(D\flat \) major, \(C\rightarrow D\flat \), that descends five semitones from the modulation \(C\rightarrow C\sharp \) that ascends by seven semitones. (Note that we are making this point using notation, but as Tymoczko [16] argues, the notation serves to distinguish different voice leadings between background scales, and these can be present even in non-notated contexts.) The crossfade method will always choose the shortest way in Fourier phase space, so to make this kind of distinction requires, e.g., adding some other intermediary. We conclude that only voice-leading accurately represents enharmonicism as it can be modeled by scalar context, and hence that voice-leading provides a sufficient explanation of enharmonicism as we most commonly encounter it.Footnote 5

5 Simulating Fourier Methods with Voice Leading

The correspondence we have been exploring is delicate one that arises only in certain special and limiting cases. This is illustrated by Fig. 6, which graphs the position of {CEG} and {C\(\sharp \)EF\(\sharp \)} in the two spaces; here chords with the same sum have different Fourier phases, a divergence that reflects differences between vector and scalar addition. The mere introduction of a second chord type thus breaks the correspondence between the two worlds.

Fig. 6.
figure 6

Positions of {CEG} and {C\(\sharp \)EF\(\sharp \)} in F\(_3\) and circular voice-leading space.

However, from another point of view the connection is more robust. In earlier work Tymoczko [15] argued that there is a close correspondence between Fourier magnitudes and voice-leading distance: specifically the magnitude of the nth Fourier component is closely correlated with the voice-leading proximity to the nearest “doubled subset” of the nearest perfectly even n-note chord.Footnote 6 We can now give similar characterization of Fourier phase as well: the phase of a chord X’s nth Fourier component is closely correlated with the transposition of E, the perfectly even n-note chord that is “nearest” to X, with distance measured by the size of the smallest voice leading from X to some subset of E’s notes. (These subsets are represented by unisons in the nth Fourier space.) Fig. 7 plots Fourier phase against the transposition of E for 100,000 randomly chosen trichords, tetrachords, and pentachords in continuous space. The correlation and its approximate nature are both clear: while the two quantities are generally related, it is possible for them to diverge substantially, particularly in the case of chords with small Fourier magnitude.

Fig. 7.
figure 7

Plots of Ph\(_n\) versus pitch-class sum for 100,000 randomly chosen trichords, tetrachords, and pentachords

The difference between the perspectives is largely attributable to the divergence between scalar and vector addition. In Fourier space we compute the magnitude and phases by adding the vectors representing the pitch classes of a chord; in voice-leading space, we can perform a similar calculation by asking what pitch class in the “reduced octave” of size O/n, has the smallest voice leading to the pitch classes of the chord; if we adopt the Euclidean metric, the resulting vector is one of the n vectors that can serve as the “average” of the n points on the circle. Figure 8 illustrates in the case of the “fourth chord” (C, F, B\(\flat \)), where the two methods coincide (compare Fig. 1).

Fig. 8.
figure 8

Calculation of Ph\(_3\) and the nearest subset of a perfectly even chord for {CFB\(\flat \)}

This twofold correspondence gives us a general strategy for using voice leading to approximate the results of Fourier analysis: we replace the phase of the nth Fourier component with the transposition of the “nearest” perfectly even n-note chord (as just defined), and the magnitude with the voice-leading proximity (a decreasing function of distance) to that chord. While these quantities will not reproduce the Fourier transform exactly, they often provide an acceptable approximation. Furthermore, there is no obvious musical reason to privilege Fourier analysis over voice leading: at present, it remains controversial whether Fourier analysis, for all its mathematical elegance and familiarity, directly models anything in the minds of composers or listeners; while voice leading is more straightforwardly connected to the basic mechanics of music-making. Thus divergences between the two methods need not count against the voice-leading approach.

This more general connection between the two worlds provides a way to understand some puzzling features of the Fourier transform. Consider for example the divergence between pitch-class sum and Fourier phase noted at the beginning of this section: this results from the fact that chords with the same sum can be close to different subsets of the same perfectly even chord, or even different subsets of different perfectly even chords. For example, the first chord in Fig. 6, (C, E, G) is maximally close to the augmented triad a third of a semitone below C, while the second, (C\(\sharp \), E, F\(\sharp \)) is maximally close to (C\(\sharp \), F, F); in the context of our approximation, this is straightforward. Likewise, when we restrict our attention to a collection of highly even chords, all representing small perturbations of perfectly even chords, then we can expect a convergence between the methods. Thus for example, the positions of major and minor triads are consistent in both Ph\(_3\) and circular voice-leading space.

Return now to the voice leading (C, D, E)\(\xrightarrow {(-4,2,2)}\)(G\(\sharp \), E, F\(\sharp \)), discussed in Sect. 3 above. Earlier we presented this as an case of divergence between the voice leading and Fourier worlds: a “balanced” voice leading that involves no change in pitch-class sum, but traverses a full circle in Ph\(_3\). The voice-leading based approximation directs us, not to the sum of the pitch classes, but to the nearest perfectly even chord (represented by a unison in the reduced octave of 4 semitones). In the picture we have just described, this “nearest” chord indeed traverses a (discontinuous) circle in the reduced octave, much like Fourier phase. Thus what began as a delicate convergence between two fundamentally different ways of thinking leads, in the end, to a much more robust and general connection.

It thus appears that many music-theoretical uses of the Fourier transform can be reconceived in terms of voice leading. An interesting future project is specifying those musically relevant aspects of Fourier analysis that resist such reconceptualization – presumably, distinctively harmonic features that complement the broadly contrapuntal perspective we have been considering.