Introduction

Musical expression

Music is alluring from an early age, which suggests that the temporal patterns in music might reflect basic temporal patterns in the brain (Trehub 2003; Trevarthen 1999). Observations on brain activity related to the awareness and the performance of music are yielding insights about brain function (Keysers et al. 2003; Peretz and Zatorre 2005; Schlaug 2001). When listening to musicians playing expressively we are privy to temporal patterns in their brains that emerge in the temporal patterns of their sound-producing movements and the sounds themselves (Janata and Grafton 2003; Meister et al. 2004; Popescu et al. 2004; Ramani and Miall 2004; Zatorre 2003). These brain patterns find expression in rhythm and melody and in the acoustic transitions between tones. Our brains resonate to the sounds with related temporal patterns of neural activity. Drawn to dance to the music, we express related temporal patterns in our own movements (Hanna 1979; Mitchell and Gallaher 2001). And if the musicians are interacting with us their brains will resonate to the temporal patterns of our dancing movements, and this will influence the temporal pattern of their playing. An interactive cycle is thus formed.

Music conveys thought and feeling (Clynes 1973; Scholes 1960). In performance, this thought and feeling is conveyed by temporally modulating patterns in the musical sound, as exhibited, e.g., in melody, rhythm, stress, attack, intonation, portamento, vibrato, timbre, articulation, accelerandi, ritardandi and phrase-final lengthening (Clarke 1988; Juslin and Sloboda 2001; Panksepp and Bernatzky 2002; Repp 1990, 1995; Timmers et al. 2000; Todd 1994). Singers and instrumentalists achieve this modulation of musical sound by how they move—how they regulate their vocal apparatus, how they draw the bow across the strings, etc. Their finely controlled movements interact with the physics of the musical instrument to sculpt the musical sound. Remarkably, similar musical expressions can be achieved on different instruments (e.g., voice and strings), using quite different movements, and producing quite different sounds. This led us to conjecture that there may be fundamental temporal patterns, probably expressible as mathematical functions and reflecting basic informational variables in the nervous system, that underlie the form of expressive movements and sounds. In this study we sought such fundamental expressive temporal patterns in the musical sounds and sound-producing movements of performers.

When a tune is sung or played there is normally a continuous flow of sound both between and during the tones, except when pauses or staccato check the flow. We focused on two transient events in the continuous flow of musical sound. First, we analyzed the continuous transition, or portamento, between successive tones when singing. This often entails, as we found, an inter-tone f 0(t)-glide, a continuous change in the fundamental frequency, f 0(t), of the sound in passing from one tone to the next. Second, we analyzed the transient change in sound at the initiation of the tone, or attack, which significantly affects the perception of the tone (Galembo et al. 2001). This often entails, as we found in bass-playing, an attack intensity-glide, a continuous increase in sound intensity to a peak level. Inter-tone f 0(t)-glides and attack intensity-glides are evident in many forms of music, if the instruments permit them, perhaps most obviously in vocal and instrumental jazz.

General Tau Theory

For the analysis we used a general theory of movement guidance, General Tau Theory (Lee 1998, 2005), which was developed from the work of Gibson (1966) and Bernstein (1967) on how movement is perceptually and intrinsically guided. The theory is supported by experiments spanning a wide range of skills (Lee 2005). To see how the theory can be applied to music, consider a soprano singing a song involving inter-tone f 0(t)-glides. The activity entails both intrinsic and perceptual guidance. On the intrinsic side, the singer prescribes how the song, including the inter-tone f 0(t)-glides, should be sung. On the perceptual side, the singer needs information through, e.g., hearing her voice to ensure that she is following what her brain is prescribing. General Tau Theory provides an explanation of how an activity like singing is prescriptively-cum-perceptually guided. Two basic premises of the theory are that (1) all purposeful movement entails controlling the closure of action gaps between the current states a person or animal is in and the goal states to be achieved through movement, and (2) a single informational variable is used in regulating the closure of any action gap, X(t), namely τ X (t), the τ function of the gap. τ X (t) equals \( X(t)/\dot X(t) \), or the time-to-closure of the action gap at the current closure-rate (\( \dot X(t) \) is the time derivative of X(t)). Only τ information is required: information about, e.g., the size and speed of closure of an action gap is not needed to control its closure. The τ of an action gap is directly perceptible in any sensory modality, in contrast with other quantities such as size or speed, which are generally not directly perceptible (Lee 1998). Thus, there is reason to consider τ as a primary informational variable for controlling movement.

The principal way that τ is used in guiding the closure of action gaps is through the coordinating principle of τ-coupling, whereby the τs of two action gaps, X(t) and Y(t), are kept in a constant ratio, k X,Y , during the closure of the action gaps (i.e., \( \tau _X (t) = k_{X,Y} \tau _Y (t) \)). This ensures that the action gaps, X(t) and Y(t), reach closure simultaneously. For instance, when subjects moved a hand cursor to a goal position to arrive at the same time as a moving object, they τ-coupled the hand/goal and hand/object gaps (Lee et al. 2001). A musical example would be a drummer hitting two drums simultaneously. In both these cases X(t) and Y(t) are extrinsic gaps. When no extrinsic guide is available, as when singing unaccompanied, it is hypothesized, following the principle of parsimony, that guidance is again achieved through τ-coupling. But now the action gap (e.g., the f 0(t) gap when singing an inter-tone f 0(t)-glide) is τ-coupled onto a changing canonical energy gap, G(t), generated in the nervous system that closes in a simple manner, namely with constant acceleration from rest. The function τ G(t), the τ of the gap, G(t), is derived from Newton’s equations of motion as

$$ \tau _{\text{G}} (t) = \tfrac{1}{2}(t - T_{\text{G}}^2 /t) $$
(1)

where time, t, runs from zero to T G, the duration of closure of the gap, G(t). The hypothesis predicts that skilled, self-timed closure from rest of an action gap, X(t), will follow the τ G-guidance equation

$$ \tau _X (t) = k_{X,{\text{G}}} \tau _{\text{G}} (t) $$
(2)

where k X,G is a constant during the movement. Only when k X,G = 1 does the action gap, X(t), change with constant acceleration like G(t). Otherwise, the action gap accelerates then decelerates, and the value of k X,G, which is assumed to be set in the nervous system for each particular movement, controls the shapes of the velocity and acceleration–deceleration profiles of the action gap. The higher the value of k X,G (for \( 0 < k_{X,{\text{G}}} < 1 \)), the more delayed is the peak velocity (Fig. 1a) and the shorter and steeper is the final deceleration (Fig. 1b). Thus the action gap closes with more “oomph”.

Fig. 1
figure 1

Effect of varying the parameter k X,G (labelled k in the figure) on: a the changing speed of closure, \( \dot X(t) \), and b the changing acceleration of closure, \( \ddot X(t) \), of a movement gap, X(t), that is closing from rest and is governed by the τ G-guidance equation (Eq. 2)

There is evidence from measurements of skilled purposive movements that τ G-guidance spans a range of human actions, including newborn babies suckling (Craig and Lee 1999), and adults reaching (Lee et al. 1999), intercepting beats (Craig et al. 2005), and putting at golf (Craig et al. 2000a). There is also neurophysiological evidence for τ G-guidance in neural firing patterns in the brains of monkeys engaged in skilled goal-directed reaching (Lee et al. 2008). Unskilled movements, however, tend not to be τ G-guided—e.g., babies less than around 20 weeks of age reaching for seen objects (von Hofsten 1983) and developmentally neurologically delayed babies controlling suction when suckling (Craig et al. 2000b). Practice appears to be required to hone τ G-guidance of movement.

Aims of study

Our aims were to measure, in the sounds and in the movements producing the sounds, the degree to which skilled singers τ G-guide inter-tone f 0(t)-glides, and skilled string players τ G-guide attack intensity-glides when initiating tones. Further, we sought to determine which, (if any), τ G-guidance parameters of the glides the performers modulate to contribute to musical expression.

Methods

Singing study

Two semi-professional, classically trained female sopranos sang the high and low parts of Pergolesi’s “Vanne, Vale, Dico Addio” without accompaniment. Both singers received regular vocal tuition and had acquired good relative pitch. They had sung duets together frequently but the Pergolesi piece was new to them and required rehearsal time. They rehearsed the piece (with the original accompaniment omitted) together for half an hour a week for a month and also rehearsed each part alone. At the test session, each singer stood in a 2 m × 2 m vocal sound booth. Singers were recorded using Neuman microphones positioned about 30 cm in front of the singer’s mouth. In addition, a laryngograph (Fourcin 1981) was attached to the throat with an electrode 2 cm each side of the voice box, and was adjusted so that it did not constrict the singer’s movement. The laryngograph recorded, as a waveform, the opening and closing movements of the vocal folds by measuring the variation in conduction of a high frequency signal between the two electrodes on the throat. Direct recordings were made of the audio output of the microphone and the waveform output of the laryngograph; no filtering was employed at this stage of signal acquisition. The audio recordings were made using Protools digital audio suite (16 bit, 44 kHz, .wav format) via a Digi002 RAK interface. The singers recorded four performances, two as a duet and two solo, providing eight microphone (audio, .wav) and eight matched laryngograph (audio, .wav) recordings—one pair of recordings for each condition (duet and solo).

The changing fundamental frequencies, f 0(t), in the voice and laryngograph recordings were computer analyzed at 500 Hz using the computer program Praat 3.9 (Boersma and Weenink 2000). The time step was set at 0.002 s and the f 0(t) range at 20–2,000 Hz. All f 0(t) files were Gaussian filtered with a sigma of 8. The singing was predominantly legato and so, when moving between tones, the voice usually glided continuously through the range of f 0(t) between the tones, producing an inter-tone f 0(t)-glide. The inter-tone f 0(t)-glides were τ G-analyzed following the procedure described below under “ τ G-guidance analysis”.

Bass-playing study

This study examined the relation between a movement and the sound it produced. A professional bass player bowed on an electric bass a key phrase from Tchaikovsky’s “The dance of the sugar plum fairy” (from the popular Nutcracker Suite), in two moods, “happy” and “sad”. These two moods were chosen because they were readily understandable to the bass player and were expected to produce a difference in performance. The bassist was provided with the score and asked to familiarize himself with it for about 30 min. He was asked to practice bars 5–8 of the treble clef part. This sequence of 20 tones constitutes the most famous and easily recognisable part of the piece and is simple and easy to play. The bassist was asked to play the phrase repeatedly for 45 s in six recording takes. In three of the takes he played with a “sad” interpretation, in the other three he played with a “happy” interpretation. Since the sad interpretation was played at a slower tempo than the happy one, this resulted in five complete renditions of the phrase in the happy interpretation and three in the sad interpretation. Three renditions in each interpretation were analyzed. The instrument was a Steinberger CR4 electric double bass, which has an integrated stand. It has no resonating body and sound is produced using an electromagnetic pickup system. The audio output of the bass was recorded using Protools digital audio suite (16 bit, 44 kHz, .wav format) via a Digi002 RAK interface. No filtering was employed at this signal acquisition stage. The bow movement across the strings was recorded at 500 Hz on a Selspot™ motion capture system. This comprised one camera and three markers (infrared emitting diodes). The camera was placed directly in front of the bassist with its optical axis perpendicular to the approximately planar movement of the bow. The x and y axes of the camera’s image plane were horizontal and vertical, respectively. One marker was attached on the bridge of the bass, and one on each end of the bow. Thus the markers provided 2D vector information in the approximately planar movement of the bow. A bow-glide across a string was measured by the difference in the x-coordinates of the images of the markers on the bridge and the bow tip. A continuous measure of changing sound intensity was produced from the recorded sound using Praat software (Boersma and Weenink 2000) at a resolution of 500 Hz. The software calculates intensity by first squaring the values in the sound wave and then convolving them with a Gaussian analysis window (Kaiser-20; sidelobes below −190 dB). The attack intensity-glides when initiating tones, and the bow-glides that produced them, were then Gaussian filtered with a sigma of 8 and were τ G-analyzed following the procedure described in the next section.

τ G-guidance analysis

The procedure for analyzing the action gap data (all recorded at 500 Hz) to measure the degree to which the closure of a gap, X(t), followed the τ G-guidance equation, Eq. 2, was as follows. First, the raw data time series was Gaussian filtered with a sigma value of 8, which yielded a cut-off frequency of around 10 Hz. The resulting smoothed data time series, x(t), was then numerically differentiated to yield the rate-of-change time series, \( \dot x(t) \). Next, the start and end of the gap closure were determined as the times when \( \dot x(t) \) just exceeds 10% of its peak value during the gap closure. (The value of 10% for the cut-offs was chosen to eliminate the noisy computed values of \( \dot x(t) \) at low values.) Finally, the action gap time-series, X(t), was calculated by subtracting the value of x(t) at the end of the gap-closure from each of the values in the x(t) time series. In Fig. 2, data from a typical inter-tone f 0(t)-glide taken from the singing study are used to illustrate the analysis procedure. Figure 2a plots X(t) and \( \dot X(t) \) against time (\( \dot X(t) \)equals \( \dot x(t) \)). The vertical lines mark the start and end of the action gap. The peak \( \dot X(t) \)is negative because the inter-tone f 0(t)-glide was downward. Next, for each time point from the start to the end of the action gap, the τ X (t) time-series was computed using the formula, \( \tau _X (t) = X(t)/\dot X(t) \), and the τ G(t) time-series was computed using Eq. 1, where T G is the time interval between the start and end of the action gap. Figure 2b shows how τ X (t) and τ G(t) co-varied over time. The data points correspond to the data points in Fig. 2a, and the vertical lines again mark the start and end of the action gap. Figure 2c plots the values of τ X (t) against the corresponding values of τ G(t) in Fig. 2b. The line through the data points is the result of applying a recursive linear regression algorithm. The algorithm derives two measures of the degree to which closure of a gap, X(t), is τ G-guided—namely the % gap τ G-guided and the % variance explained. The % gap τ G-guided is the highest percentage of data points, up to the end of the movement, that fit the τ G-guidance equation, Eq. 2, with <5% of the variance unaccounted for (i.e., with r 2 of the linear regression >0.95). For the data illustrated in Fig. 2c, the % gap τ G-guided was 97.6% and so the leftmost point, which was 2.4% of the original data set, has been omitted from Fig. 2c. The % variance explained by the τ G-guidance model equals one hundred times the r 2 of the linear regression computed by the algorithm. For the data shown in Fig. 2c this is 99.0%. The slope of the linear regression computed by the algorithm (0.655 in Fig. 2c) is an estimate, \( \hat k_{X,{\text{G}}} \), of the coupling ratio, k X,G, in the τ G-guidance equation, Eq. 2.

Fig. 2
figure 2

Procedure for analyzing τ G-guidance. Data are illustrative. See “ τ G-guidance analysis” for details

It is assumed that τ G-guiding the closure of an action gap, X(t), entails constantly sensing, τ X (t), and regulating the muscular forces driving the movement to counteract perturbations caused, e.g., by external forces. The control cannot be perfect, because of perceptuo-motor delays, etc., and so the person or animal needs to aim to keep τ X (t) acceptably close to \( k_{X,{\text{G}}} \tau _{\text{G}} (t) \) at each instant during the movement, like controlling lateral sway when walking along a wall. We measured the pattern of regulation used in keeping τ X (t) close to \( k_{X,{\text{G}}} \tau _{\text{G}} (t) \) by computing the κ X,G(t) (kappaXG) profile of a gap-closing movement, where \( \kappa _{X,{\text{G}}} (t) = \tau _X (t)/\tau _{\text{G}} (t) \) at each moment in time, t, during the closure of the action gap (Fig. 2d). If control were perfect, κ X,G(t) would be constant and so the κ X,G(t) profile would be straight and horizontal. If regulation of the movement were non-perfect and unsystematic then the mean κ X,G(t) profile for a set of similar movements would be approximately horizontal but with random wiggles. However, if the regulation were systematic then this would be reflected in a systematic mean κ X,G(t) profile, which would indicate a style, or control strategy, being employed by the person (or animal) in attempting to keep κ X,G(t) acceptably close to the goal value, k X,G. Furthermore, if a person were to exhibit different styles of κ X,G(t) profile in different situations, the κ X,G(t) profile would provide a measure of the person’s different control strategies. Thus, a skilled singer τ G-guiding an inter-tone f 0(t)-glide, or a skilled bass player τ G-guiding an attack intensity-glide, might systematically vary their κ X,G(t) profile with a style that befits the musical expression they want to convey. We tested these conjectures in the present study.

Results

Inter-tone f 0(t)-glides when singing

Figure 3 shows, for each performance, the (changing) fundamental frequencies, f 0(t), of the singers’ voices that were derived from the audio recordings (“Methods”). For each performance, by simultaneous inspection of both the audio record and the graphic transcript of f 0(t) (Fig. 3), each transition in f 0(t) between adjacent tones that corresponded with an event in the musical score (Fig. 4) was selected for “ τ G-guidance analysis” (“Methods”), except when the transition involved the singer articulating a voiceless consonant, since these transitions did not involve a continuous change in f 0(t). This procedure yielded 225 acoustic inter-tone f 0(t)-glides, which were then τ G-analyzed. Figure 5 shows representative data plots of τ X (t) against τ G(t), where X stands for f 0. Across the 225 acoustic inter-tone f 0(t)-glides, the means (SE) of the τ G-guidance measures were: % gap τ G-guided = 99.03% (0.01%); % variance explained = 98.7% (0.1%); \( \hat k_{X,{\text{G}}} \) = 0.552 (0.008). These results strongly indicate that the inter-tone f 0(t)-glides were τ G-guided by the singers’ nervous systems. Because the value of \( \hat k_{X,{\text{G}}} \)was close to 0.5, the closure of the f 0(t) gaps was gentle, ending at low speed and low deceleration (Fig. 1).

Fig. 3
figure 3

Fundamental frequencies, f 0(t), of the singers’ voices when singing Pergolesi’s “Vanne, Vale, Dico Addio” without accompaniment, in duet (a, b) and solo (c, d)

Fig. 4
figure 4

Inter-tone f 0(t)-glides (fundamental frequency glides) when singing Pergolesi’s “Vanne, Vale, Dico Addio” without accompaniment. The sound wave recorded from a singer was computer analyzed to yield the f 0(t) profile shown. Inter-tone f 0(t)-glides are indicated by rapid shifts in f 0(t) that are aligned with the note changes in the musical score. The inter-tone f 0(t)-glide from B to E on Va-ne is shown ringed

Fig. 5
figure 5

Representative plots of τ X (t) against τ G(t) for the inter-tone f 0(t)-glides when singing Pergolesi’s “Vanne, Vale, Dico Addio” without accompaniment, in duet (ad) and solo (eh). X stands for f 0

Measuring the emotional power of inter-tone f 0 (t)-glides

To determine whether the shape of an inter-tone f 0(t)-glide was directly related to its emotional power, the singers examined the score using their personal judgment and identified 20 “emotionally neutral” and 20 “emotionally charged” transitions between adjacent notes. These were inter-tone f 0(t)-glides that the singers felt they had used to emotional effect. The data from the τ G-analysis of the inter-tone f 0(t)-glides of these 40 transitions were then compared across the two populations, emotionally charged vs. emotionally neutral. As shown in Fig. 6a, the mean \( \hat k_{X,{\text{G}}} \) value for the emotionally charged inter-tone f 0(t)-glides was 0.645 (SE 0.017), which was significantly higher (P < 0.05, t test) than the mean \( \hat k_{X,{\text{G}}} \) value of 0.480 (SE 0.030) for the emotionally neutral inter-tone f 0(t)-glides. The mean durations of the emotionally charged and emotionally neutral inter-tone f 0(t)-glides were 0.119 s (SE 0.008 s) and 0.107 s (SE 0.007 s), respectively, which were not significantly different. This indicates that k X,G was a parameter of expression of emotional power in the inter-tone f 0(t)-glides, but T G was not. The fact that the mean \( \hat k_{X,{\text{G}}} \) value was significantly higher in the emotionally charged compared with the emotionally neutral transitions between adjacent notes indicates that emotional power was added by giving the τ G-guided inter-tone f 0(t)-glides more “oomph” (cf. Fig. 1).

Fig. 6
figure 6

Emotionally charged versus emotionally neutral inter-tone f 0(t)-glides. a Means and standard error bars of \( \hat k_{X,{\text{G}}} \) of a singer’s τ G-guided inter-tone f 0(t)-glides. (\( \hat k_{X,{\text{G}}} \) is the linear regression slope, which estimates the coupling ratio, k X,G, in the τ G-guidance equation, Eq. 2, where X stands for f 0.) b Mean time-normalized κ X,G(t) (\( = \tau _X (t)/\tau _{\text{G}} (t) \)) profiles, with standard error bars, of the emotionally charged and emotionally neutral inter-tone f 0(t)-glides, showing the different styles of control of the f 0(t)-glides. Thicker sections of the curves indicate statistically significant differences between the profiles (P < 0.05, 2-tailed t test)

The singers also used different styles in stabilizing around their chosen k X,G value when singing emotionally charged versus emotionally neutral inter-tone f 0(t)-glides. The κ X,G(t) profiles of the emotionally charged inter-tone f 0(t)-glides were significantly (P < 0.05, 2-tailed t test) higher than the κ X,G(t) profiles of the emotionally neutral inter-tone f 0(t)-glides, during about the first half of the inter-tone f 0(t)-glide (heavily marked sections of the graphs in Fig. 6b). This again indicates that the emotionally charged inter-tone f 0(t)-glides were given more “oomph”, particularly during the first half of the glide.

Laryngeal inter-tone f 0(t)-glides

The laryngograph recorded, as a waveform, the opening and closing movements of the vocal folds. When this waveform was played acoustically, it sounded rather like Donald Duck singing the song. It clearly contained different acoustic information than the microphone recording of the voice. To determine whether, despite this difference, there was invariant information about inter-tone f 0(t)-glides in the acoustic (microphone) and laryngograph records, 42 pairs of acoustic and laryngograph inter-tone f 0(t)-glides were randomly selected from one singer’s records and the degree of τ G-guidance was measured (“Methods”). The acoustic inter-tone f 0(t)-glides were found to be essentially identical to their laryngeal counterparts: (1) the mean (SE) of \( \hat k \) was 0.563 (0.016) for both acoustic and laryngograph inter-tone f 0(t)-glides; (2) the mean (SE) % gap τ G-guided was, respectively, 99.23 (0.198) and 99.09% (0.206) for the acoustic and laryngeal inter-tone f 0(t)-glides; (3) the κ X,G(t) profiles of the acoustic and laryngeal inter-tone f 0(t)-glides were not significantly different (P < 0.05, 2-tailed t test) (Fig. 7). This indicates that the τ G-guided inter-tone f 0(t)-glides in the singer’s voice were the result of neural τ G-guidance of the opening and closing movements of her vocal folds.

Fig. 7
figure 7

Mean time-normalized κ X,G(t) (\( = \tau _X (t)/\tau _{\text{G}} (t) \)) profiles of the acoustic and laryngograph records of a singer’s inter-tone f 0(t)-glides when singing the Pergolesi piece. Vertical bars represent standard errors

Attack intensity-glides when bowing

How the bow glides across a string modulates the attack intensity-glide on the tone being played. An attack intensity-glide is indicated by a rapid rise in sound intensity aligned with the onset of the tone. Corresponding to it is a bow-glide—a displacement of the bow across the strings. Figure 8 shows attack intensity-glides and contemporaneous bow-glides during a rendition of the musical phrase. An attack intensity-glide coincided either with the reversal of the direction of movement of the bow or with the resumption of bow movement in the same direction after a brief pause (e.g., around the time 7.8 s in Fig. 8). Figure 9 shows how the sound intensity, derived from the audio recordings (“Methods”), varied across the musical phrase for each rendition. We measured the temporal forms of the attack intensity-glides and corresponding bow-glides and determined whether these temporal forms were modulated according to the mood, “happy” or “sad”, of the rendition of the musical phrase. One hundred and forty-six matched pairs of bow-glides and attack intensity-glides were τ G-analyzed (“Methods”). Figure 10 shows representative data plots of the τs of the attack intensity-glides and bow-glides against τ G for the happy and sad renditions of the musical phrase. Overall, the mean percentage of the data points that fitted the τ G-guidance equation (Eq. 2), with more than 95% of the variance in the data explained by the equation, was 89.2% for the bow-glides and 86.2% for the attack intensity-glides.

Fig. 8
figure 8

Attack intensity-glides and contemporaneous bow-glides at the beginnings of tones in the bass study. Attack intensity-glides are indicated by large rapid rises in intensity aligned with the onset of notes in the musical score. The contemporaneous bow-glides are indicated by large displacements of the bow across the strings. The attack intensity-glide and contemporaneous bow-glide on the F sharp in bar 1 are singled out

Fig. 9
figure 9

Sound intensity against time for a the “happy” and b the “sad” renditions of the Tchaikovsky phrase, bowed on an electric bass

Fig. 10
figure 10

Representative data plots of τ X (t) against τ G(t), when X is a an attack intensity-glide in the “happy” rendition, b a bow-glide in the “happy” rendition, c an attack intensity-glide in the “sad” rendition, d a bow-glide in the “sad” rendition of the Tchaikovsky phrase, bowed on an electric bass

Figure 11 shows the mean values, with standard error bars, of \( \hat k_{X,{\text{G}}} \) and T G for the τ G-guided attack intensity-glides and accompanying bow-glides. The values of both parameters were significantly (P < 0.001, 2-tailed t test) higher in the “sad” rendition than in the “happy” rendition. That is, in changing the mood of the piece from happy to sad, the player conjointly increased the duration, T G, and the “oomph” factor, k X,G, of both the bow-glides and the attack intensity-glides. At the same time, the player decreased the tempo in the sad rendition (“Methods”). While it may be argued that this decrease in tempo caused the increase in duration of the glides, we can see no reason why it should cause the “oomph” of the glides to increase. Rather, we suggest that the decrease in tempo is a result of, or possibly independent of, increasing the duration and “oomph” of the glides.

Fig. 11
figure 11

Change in values of the τ G-guidance parameters k X,G and T G between the “happy” and “sad” renditions of the Tchaikovsky phrase, bowed on an electric bass. a Mean values of \( \hat k_{X,G} \), an estimate of the parameter k X,G. b Mean values of T G. Vertical bars represent standard errors. Data for the attack intensity-glides and contemporaneous bow-glides are shown

The mean values of \( \hat k_{X,{\text{G}}} \) and T G were also significantly higher in the bow-glides compared with the attack intensity-glides (P < 0.001, 2-tailed t test, except for the \( \hat k_{X,{\text{G}}} \) means in the sad rendition, where P < 0.005). The higher mean duration, T G, of the bow-glides (Fig. 11b) is probably due to the bow being lifted off the strings before stopping, but we cannot be sure because we did not measure this. The lower mean value of the “oomph” factor, k X,G, of the attack intensity-glides (Fig. 11a) is possibly due to the slippage of the bow on the strings reducing the energy transmitted to the strings.

The mood of the rendition also affected the style with which the player moved the bow when stabilizing around his chosen k X,G value. The κ X,G(t) profiles of the bow-glides and the intensity-glides were significantly different for the sad and happy renditions (P < 0.05, 2-tailed t test) during, respectively, 72 and 18% of the glide (heavily marked sections in Fig. 12a, b).

Fig. 12
figure 12

Mean time-normalized κ X,G(t) (\( = \tau _X (t)/\tau _G (t) \)) profiles of a the τ G-guided bow-glides and b the resultant attack intensity-glides, for the “sad” and “happy” renditions of the Tchaikovsky phrase, bowed on an electric bass. Vertical bars represent standard errors. Thicker sections of the curves indicate statistically significant differences between the profiles

Summary and discussion

The experiments examined how performers regulate the temporal patterns of transient musical sounds, demonstrating a possible mechanism for modulating expressive parameters of sound. The transient sounds studied were inter-tone f 0(t)-glides (the continuous change in fundamental frequency, f 0(t), when gliding from one tone to the next), and attack intensity-glides (the continuous rise in sound intensity when “attacking”, or initiating, a tone). We found that the temporal patterns of these sounds and the movements producing them—the movements of the vocal folds when singing inter-tone f 0(t)-glides and of the bow across the strings when producing attack intensity-glides on a bass—were τ G-guided. (That is, the inter-tone f 0(t)-glides and attack intensity-glides, and the movements producing them, conformed to the τ G-guidance equation, \(\tau _X (t) = k_{X, {\text{G}}}\tau _{\text{G}} (t) \) (Eq. 2), where \(\tau _{\text{G}} (t) ={\frac{1}{2}} ({t-T^2_{\text{G}}}/t)\) (Eq. 1); X stands for a changing inter-tone f 0(t)-gap, or intensity-gap, or sound-generating action gap; k X,G is the coupling ratio; T G is the duration of the τ G-guidance; time, t, runs from zero to T G.) Our finding that the τ G-guided inter-tone f 0(t)-glides in the singer’s voice matched the τ G-guided movements of the vocal folds suggests that the inter-tone f 0(t)-glides were τ G-guided by the singer’s nervous system τ G-guiding the tension in her laryngeal muscles (assuming only that the changing fundamental frequency, f 0(t), generated in the larynx is a power function—possibly linear—of the tension in the vocal folds).

When the performers modulated musical expression, they modulated the values of one or both of the parameters, k X,G and T G, of the τ G-guided movements and sounds. They also modulated the κ X,G(t) profiles of the movements and sounds, which measure the manner in which performers followed the τ G-guidance equation. \( (\kappa _{X,{\text{G}}} (t) = \tau _X (t)/\tau _{\text{G}} (t) \) = the instantaneous value of k X,G at each moment during a movement.) In particular, when the singers increased the “emotional charge” of transitions between adjacent tones, they increased the value of k X,G for the inter-tone f 0-glides and generally raised the κ X,G(t) profiles of the inter-tone f 0(t)-glides, which increased the “oomph” of the glides. When the bassist played in a “sad” compared to a “happy” mood, he increased the values of both k X,G and T G, and generally raised the κ X,G(t) profiles of the attack intensity-glides and of the bow-glides that produced them. Thus, in playing sadly he produced longer bow-glides and attack intensity-glides with more “oomph”.

Our study of transient musical movements and sounds raises two further important questions. The first is whether the principle of τ G-guidance of expressive movement and sound extends to longer expressive musical events, such as the temporal patterns of intensity change in crescendi and diminuendi, and of tempo change in accelerandi and ritardandi. Similar experimental and analytic methods as used in the present study could be applied to answering these questions. The second question is: how is τ G-guidance of expressive musical movement and sound enacted in a performer’s nervous system? An answer to this question would help in understanding how nervous systems intrinsically guide not only musical movements and sounds but also purposive movements in general, since τ G-guidance of purposive movement appears a common phenomenon. Answering the question will require recording (using, e.g., magnetoencephalography, MEG) the temporal patterns of flow of electrical energy in the brains of musicians when they are performing music, while simultaneously recording their movements and the sounds they are producing. We believe that using General Tau Theory, coupled with the general experimental approach that we have described here, could lead to answers to these two questions and thence to a better understanding of how movement sculpts the beauty and expressiveness of musical sound.