Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 1.1 Fundamental Physical Principles

1.1.1 1.1.1 Sound Pressure

When we hear music, the perceived tonal impression is caused by sound carried to our ears by the air. Relevant in this context are the minute pressure variations which are superimposed on the stationary pressure of the air surrounding us. The pressure variations propagate as waves in space. These more or less periodic deviations from the stationary mean value, comprise the so called sound pressure variations, for which in practice the shorter term “sound pressure” is used.

Since our ear is capable of responding to a wide range of sound pressures, from a barely perceptible sound, to the intensity for which the hearing sensation becomes painful, generally a logarithmic scale is used to represent the range of sound pressure values of interest to the acoustician. This makes the scale accessible and inclusive. The relation of a certain sound pressure to a reference value is given in “decibels” (dB), and one speaks of the sound pressure level, where the concept “level” always refers to a logarithmic scale. To the unaccustomed reader this procedure may initially appear somewhat complicated, however, it has proven very advantageous in practice, particularly once a number of dB values are associated with their corresponding sounds as heard. Furthermore, the logarithmic dB-scale reflects hearing perception more closely than a linear representation.

So called “absolute” dB values for sound pressure levels are obtained when a reference value of 2 × 10−5 Pa is used. This value was chosen by international agreement. It corresponds approximately to the threshold of hearing in the frequency region where the ear is most sensitive. (Consideration of the reference value as well as the logarithmic calculations are carried out by the measuring instrument.) As an example, in a Bruckner Symphony, depending on concert hall size, and location in the hall, as well as the size of the orchestra, one can expect values between 90 and 100 dB for a fortissimo, on the other hand a pianissimo could result in 40–45 dB.

It is, however, possible to use some other, arbitrary sound pressure value as a reference. In that case one obtains a “relative” dB value, particularly suitable for characterizing the difference between two sound levels. A value of 0 dB, would indicate that the two processes being compared have the same sound pressure, not, however, that they are at the lower limit of hearing. If in the example given above a fortissimo is measured at 100 dB (absolute) and a pianissimo at 45 dB (absolute) the resulting dynamic difference would be 55 dB (relative). The addition “relative” is dropped in general usage, while in situations which are not clear from the context, the absolute measure is emphasized by an indication of the reference sound pressure.

1.1.2 1.1.2 Particle Velocity

The actual cause for generating and propagating pressure variations lies in the fact that individual air particles vibrate about their rest position, and thus collide with neighboring particles in the direction of their motion. The velocity with which the particles move, relative to their rest position, is called the particle velocity. As is the case with sound pressure, the particle velocity likewise is subject to fluctuations. However, as the particles vibrate back and forth, not only is the magnitude changed, but also the direction of motion.

Sound pressure and particle velocity together determine the so called sound field, which characterizes the all inclusive temporal and spatial properties of a sound process. Here one is not only concerned with the magnitudes of these two quantities, but also with their relative phase. This means that the maximum pressure will not necessarily always coincide in time with the highest particle velocity. A relative shift in time between variations in pressure and velocity is quite possible.

In the case of a propagating plane wave, however, as in the case for larger distances from the sound source, i.e., in the far field, pressure and velocity are in phase. Furthermore, there is a direct proportionality between those two quantities: when the sound pressure rises, the particle velocity increases in the same measure. The relationship between pressure and velocity is thus determined by the “resistance” presented by the air to the vibrations. This “characteristic field impedance” (earlier denoted as sound wave resistance) can be considered to be constant for practical purposes.

With that in mind, one can describe the sound field by sound pressure alone, when considering the far field, as is almost always the case for the listener in a musical performance. Furthermore, it should be noted that the ear responds exclusively to sound pressure. On the other hand, for recordings with microphones, it is entirely possible to come so close to the sound source that sound pressure and particle velocity are no longer proportional to each other and are no longer in phase, and thus both must be considered. The so called “near field effect” is well known. For certain microphone types it leads to an unnatural amplification of the low registers.

1.1.3 1.1.3 Sound Power

Describing a sound field by specifying sound pressure levels for a number of points in a room represents a view point oriented toward the listeners, or recording devices at those points. Naturally the measured sound level depends on the strength of the sound source. It is therefore also of interest to determine a characterization of the sound source, which describes its strength independently of spatial considerations and the distance from the listener. This relates exclusively to the sound source itself. Such a quantity represents the sound energy radiated by a source in all directions during a unit of time. This quantity is designated as the sound power of the source.

The physical unit for power is the Watt; however, since acoustic power values occurring in practice cover a large dynamic range, as is the case for sound pressure, acoustic power can be represented on the more accessible dB scale. At the same time this simplifies the connection between the power of a sound source and the resulting sound pressure, which is of particular interest in room acoustics.

A power of 10−12 Watt serves as reference value for the dB scale of sound power. This value is a result of the reference value for sound pressure and the characteristic field impedance for air. Numerically, the sound power level given in dB corresponds to the sound pressure level at the surface of a sphere with surface area of 1 m2, which surrounds the center of the sound source, i.e., equal to the sound pressure level at a distance of approximately 28 cm from the center of the source.

Inasmuch as the dB scale is built on a logarithmic basis, the dB values for the power of individual sound sources can not simply be added to determine the combined power during simultaneous excitation. Rather, depending on the factor by which the total power exceeds that of the individual sources, a value must be added to the dB value of the individual sound sources.

Table 1.1 shows some value pairs needed for this calculation. For example, if the sound power is doubled – possibly by doubling the number of performers – the total radiated power is raised by 3 dB; however, on the other hand, if the number of players is raised from 4 to 5, this means an increase of only 1 dB in the radiated sound power.

Table 1.1

1.1.4 1.1.4 Frequency

The number of pressure fluctuations or vibrations occurring in a certain time period is designated as the frequency. In this context a single vibration is counted from an arbitrary starting point to the adjacent equivalent point where conditions prevail identical to the starting point. For example, in the case of a pendulum, the period of a vibration includes the time from the moment of highest excursion on one side until the subsequent instant of highest excursion on the same side.

The number of vibrations per second is given in Hz (“Hertz”). For very high frequencies 1,000 Hz can be replaced by 1 kHz (“Kilohertz”), to avoid large numbers. Vibrations perceived by our ear lie in the range of approximately 16–20 kHz. Frequencies above that are designated as ultrasound, frequencies below that as infrasound.

The musical reference note A4, for example, has a frequency of 440 Hz; a summary of some additional frequency values associated with musical notes is given in Table 1.2, frequency values are rounded for the sake of clarity. The highest notes in this table occur only extremely rarely, nevertheless these frequencies, and the additional frequency range up to the threshold of hearing, is of interest in connection with tone color effects of overtones.

Table 2 Table 1.2

1.1.5 1.1.5 The Speed of Sound

While the particle velocity indicates the speed with which air particles move relative to their rest position, one designates the speed of propagation with which pressure fluctuations spread in the air space (or some other medium) as the speed of sound. It is thus this speed of sound which is relevant when considering the delay with which a sound process reaches the ear after being released from some distance away.

The speed of sound in air is independent of frequency, however, it depends in some small measure on the stationary air pressure, as well as on the carbon dioxide content, nevertheless, the latter effects are of little consequence for practical musical purposes.

On the other hand, the observation that the speed of sound increases with rising temperature, is more important. This influences the tuning of brass instruments, for example. This effect also plays a role in sound propagation through differentially heated layers of air, which can come about by less than optimal heating installations, solar influence, or over a cool water surface in open air performances. For this reason, values of the speed of sound in air at several temperatures are given in Table 1.3.

Table 3 Table 1.3

Accordingly, in the range of average room temperatures, a mean value for the speed of sound in air can be calculated as 340 m s−1. This means that a listener at a distance of 34 m from the concert podium hears the sound with a delay of 1/10 s. This corresponds to a 1/16 note for a tempo of MM = 152. Such distances are quite common in large concert halls, however, the eye is no longer in a position to follow the motions of the performer exactly, so that the lack of temporal coincidence only becomes uncomfortably noticeable when using opera glasses.

1.1.6 1.1.6 Wavelength

Since sound waves spread with a certain propagation speed, naturally a spatial separation results between two successive pressure maxima (wave peaks). The faster these maxima follow each other (i.e., the higher the frequency), the shorter is the spacing. This spatial distance between two neighboring pressure maxima (or two neighboring pressure minima) is designated as the wavelength. It can be calculated from the formula

$$ {\rm{Wavelength}} = ({\rm{Speed of sound}})/{\rm{frequency}}{\rm{.}} $$

For a speed of sound of 340 m s−1 a number of examples of wavelengths in air are given in Table 1.4.

Table 4 Table 1.4

Wavelengths ranging from several meters to a few centimeters occur in the region of audible frequencies. This means that at high frequencies the wavelengths are small compared to musical instrument dimensions, and the sizes of rooms, and surfaces on which the sound impinges. For mid frequencies, in contrast, the wavelengths are of the same order of magnitude as the dimensions mentioned. In the low frequency range the instrument and room dimensions must be considered as small in comparison to the wavelengths. These considerations are of great significance for sound radiation from sources, and for processes dealing with room acoustics.

1.2 1.2 Characteristics of the Auditory System

1.2.1 1.2.1 The Sensation of Loudness

A sensation of hearing arises when pressure fluctuations, which reach our ear, occur in a certain frequency region, and do not fall below a minimum sound level. The lowest frequency for which a vibration process is still perceived as a tone is approximately 16 Hz. This corresponds to a C0 which is included in the 32′ register of some large organs. For yet lower frequencies the ear can already follow the temporal process of the vibrations, so that a unique tonal impression can no longer be formed. For younger persons the upper limit of hearing lies in the range of 20,000 Hz. This is, however, subject to individual variations, and decreases with increasing age.

The lower sound level limit for the ability to detect tones is also not the same for all people. However, from a large number of measurements, based on statistics, a mean value can be calculated, which characterizes typical behavior. In this context, the interesting result is noted that this so-called threshold of hearing depends in large measure on frequency. This characteristic is represented in the lowest curve of Fig. 1.1. In this diagram frequency increases from left to right, and the absolute sound level increases in the upward direction. One notes that the ear responds with most sensitivity to tones in the frequency range between 2,000 and 5,000 Hz. In this range the minimum required sound level is the lowest. For higher frequencies, but even more so for lower frequencies, the sensitivity of the ear is reduced, so that in these regions significantly higher sound pressure levels are required for a tone to become audible.

Fig. 1.1
figure 1

Equal loudness curves for sound incident from the front, with threshold of hearing and threshold of pain indicated, as well as threshold of discomfort (after Winkel, 1969). The heavy lines with end points show that depending on frequency, a different sound pressure level difference is required for the loudness level difference between 50 and 80 phons

The same tendency is evident when at higher intensity, tones of different frequencies are compared in relation to their impression of loudness. The sound pressure level as an objective measure of existing physical excitation is by no means equivalent to the loudness as a subjective measure of sensation. In order to establish a connection between the two, the unit of a phon was introduced for loudness level. It is defined such, that the dB scale for sound pressure level and the phon scale for loudness level coincide at 1,000 Hz. When tones of different frequency are compared to a tone of 1,000 Hz, the so called equal loudness curves are obtained which represent the relationship between the objective sound pressure level and the loudness level as determined by the sensitivity of the ear. Several of these curves are recorded in Fig. 1.1.

For example, following the curve for 80 phons shows that this loudness level at 1,000 Hz (by definition) was caused by a sound pressure level of 80 dB. On the other hand, at 500 Hz, approximately 75 dB are sufficient for the same loudness level impression, while at 100 Hz almost 90 dB are required. A level of 80dB would accordingly be perceived as somewhat louder at 500 Hz than at 1,000 Hz. At 1,000 Hz it would be perceived as a loudness level of 84 phons, while the same sound pressure level would cause a loudness level of only 72 phons at 100 Hz.

In the region of lower loudness levels the curves rise more steeply with decreasing frequency. Here the sound pressure level differences are even larger between tones of equal loudness sensation: For a loudness level of 30 phons a sound pressure level of 30 dB at 1,000 Hz contrasts with a value of almost 50 dB at 100 Hz.

This differentially steep slope of the curves at low frequencies leads to the fact that the lines of equal loudness level come closer together, and thus a sound pressure level change here is responsible for a larger loudness level change than at higher frequencies. Thus a rise of the sound pressure level from 60 to 90 dB at 50 Hz corresponds to an increase in loudness level from about 25 to 70 phons, whereas in contrast, at 1,000 Hz, this would correspond to an increase from 60 to 90 phons; the perceived dynamic difference accordingly would be 45 phons at 50 Hz, at 1,000 Hz, however, only 30 phons.

The relationship of phon values to the corresponding sound pressure level in the equal loudness curves is strictly valid only for tones of long duration. When the duration of a tone impulse is less than approximately 250 ms (1 ms = 1/1,000 s) then the perceived loudness level is somewhat lower than expected from the corresponding sound pressure level. The difference in comparison to a long lasting tone is approximately 1 dB at 200 ms, and increases to a value of about 2.5 dB at 100 ms and to 7 dB at 20 ms (Zwicker, 1982). This indicates that the loudness impression for short tones or noises is not determined by the power (energy per time unit), but rather by the total acoustic energy (power times duration) (Roederer, 1977).

In contrast, a brief rise of the sound pressure level at the beginning of a tone of longer duration can increase the perceived loudness level above the level corresponding to the later sound pressure level of that tone. An increase of the sound pressure level by 3 dB during the first 50 ms raises the entire tone by approximately 1 dB, while a corresponding sound pressure level increase at a later point in time is practically not perceived (Kuwanao et al., 1991). From this electronic – and thus rather abstract – sound experiment one can conclude that instrumental tones played with firm attack have a loudness edge over tones played equally loudly, but with soft or uncertain attack.

In addition to the threshold of hearing and the equal loudness curves, the diagrams of Fig. 1.1 contain two additional curves which in a sense provide an upper boundary for hearing processes. The so-called threshold of pain indicates those levels above which the hearing sensation becomes painful. At this point a safety measure goes into effect in the middle ear, which protects the inner ear from a damaging overload, so that it no longer transmits the full vibration amplitude (Reichardt, 1968).

However, even below this threshold of pain there are loudness levels which in a musical context are no longer considered to be beautiful, but rather annoying. Such an esthetically determined boundary clearly is less exactly delineated than a transition to a sense of pain. This annoyance boundary, nevertheless, gives an indication for sound pressure levels which should not be exceeded. The decline of this curve for high frequencies is noteworthy, this feature takes on particular significance when tone color is under consideration.

As mentioned earlier, the sensitivity of the ear for high frequencies diminishes with age. This refers not only to the upper frequency boundary, but also to the shape of the threshold of hearing from 2,000 Hz upward (Zwicker, 1982). In comparison to the ear of a 20-year old (i.e., normal hearing), the sensitivity of the healthy ear of a 40-year old is approximately 8 dB less sensitive at 5,000 Hz, at age 60 the difference increases to 15 dB. For tones of 10,000 Hz the loss of sensitivity for a 60-year old is on the average about 25 dB, so that the threshold of hearing begins to come relatively close to the annoyance limit. This explains why older individuals often feel annoyed with strong high frequency tone components in loudspeaker reproductions, since these, if they can be heard at all, already overlap into the annoyance region.

When two sinusoidal tones are sounded simultaneously, one senses a difference in total loudness level depending on their frequency separation. This effect is represented in Fig. 1.2 for two sinusoidal tones, each of 60 dB sound pressure level, located symmetrically about 1,000 Hz, as a function of frequency separation. Three regions of different hearing reactions are noted (Zwicker, 1982). For small frequency separations below approximately 10 Hz an increase in loudness level of about 6 phons is noted when compared with a single tone; this corresponds to a rise in loudness level caused by doubling the sound pressure level of the single tone. When one considers that two tones with small frequency separation lead to beats, i.e., rhythmic variations of the sound pressure level, one can conclude that the hearing process is oriented toward the maximum values of sound pressure level rather than toward the time average value.

Fig. 1.2
figure 2

Loudness level of two simultaneous sinusoidal tones in dependence on their frequency separation (after Zwicker, 1982)

As the frequency difference between the two tones increases, the ear no longer senses the beats as direct fluctuations in time, it thus is oriented toward the sound energy of both tones. The energy doubling when compared to a single tone leads to an increase in loudness level corresponding to an increase in sound pressure level of 3 dB. The ear performs this energy addition as long as both tones fall within a so called “critical band” of the ear. The width of such critical bands is related to the structure of the inner ear (more precisely to the frequency distribution on the basilar membrane). Below 500 Hz the width of a critical band is about 100 Hz, above 500 Hz critical bands have a width corresponding to a major third, i.e., the lower and upper frequency limits of a critical band have a ratio of 4–5.

While the energy of all tones in a critical band is combined by the ear, a separate loudness level impression is formed for tones differing in frequency by more than a critical band, (one also speaks of partial loudness), and these individual partial loudness levels are combined to a total loudness level. For large frequency separation this leads to a doubling of loudness when two equally loud tones are sounded. The sensation that something is twice as loud corresponds to a loudness level increase of 10 phons, which in the region of 1,000 Hz is a sound pressure level rise of 10 dB.

Aside from the age dependent decrease in hearing sensitivity, illness or exposure to high level sounds can be responsible for influencing hearing ability to a greater or lesser degree. In this context it should be mentioned that the high loudness levels which occur in an orchestra can on occasion lead to a temporary or even permanent reduction of the threshold of hearing curve; high frequencies in the range of 4,000 Hz are of particular concern. Depending on instrument positioning, often only one ear is affected: for example, the left ear for violinists and trombone players, the right ear for piccolo players (Frei, 1979).

From a purely statistical standpoint, however, there appears to be no particularly increased risk based on the high sound levels in an orchestra. Apparently symphonic music has a different effect on the ear than noise, where the emotional attitude relative to the loudness level associated with music plays a role (Karlsson et al., 1983). Alone the fact, that musicians with a demonstrably diminished threshold of hearing are still valued members of an ensemble, shows that measurements of hearing thresholds do not form a unique qualification criterion for musicians. On the one hand musicians are able to compensate for age related limitations by experience, on the other hand it seems reasonable that hearing ability for medium and higher loudness levels remains normal, even if the hearing threshold has been lowered (Woolford & Carterette, 1989).

1.2.2 1.2.2 Masking

The threshold of hearing and the equal loudness curves of Fig. 1.1 are only valid for single sinusoidal tones which reach the head of the listener from the front under otherwise perfectly quiet conditions. If in contrast, two tones are sounded simultaneously, it cab happen, that by reason of the loudness level of one of the tones, the other is no longer audible, in spite of the fact that it has a sound pressure level which exceeds the threshold of hearing for a single tone. In this situation the softer tone is masked by the louder one.

As an example for the masking effect of a disturbing tone of 1,000 Hz, Fig. 1.3 shows the so-called masking thresholds. These are the thresholds which must be exceeded by the softer test-tone to become audible. In the diagram, the frequency of the test-tone is plotted from left to right, and the test-tone sound pressure level increases to the top. Above the schematically simplified threshold of hearing curve, four lines are plotted. They indicate the required sound pressure level of the test-tone for various levels of the masking tone. The fact that the curves are interrupted at 1,000 Hz, as well as partially at 2,000 and 3,000 Hz, results from the fact that beats, or difference tones between test-tone and masking tone, render an exact measurement impossible.

Fig. 1.3
figure 3

Masking threshold of a test-tone which is masked by a masking-tone of 1,000 Hz with a level of Ls (after Zwicker, 1982)

As noted, the masking effect is most strongly pronounced in the neighborhood of the frequency of the masking tone. Thus, for example, for a masking tone level of 90 dB, a test-tone of 1,200 Hz must exhibit a sound pressure level of at least 60 dB to become audible. It is particularly characteristic that the curves below the frequency of the masking tone drop off with a rather steep slope, while the slope above that frequency is significantly less, and for larger masking intensity a second maximum is noted at the octave of the masking frequency. This means that, by reason of the masking effect, predominantly high frequency contributions are diminished or even rendered inaudible by low tones in the context of hearing impressions. As is furthermore seen from the curves, the masking effect increases with increasing loudness, with strong dependence on the relation to the frequency region under consideration. This is one of the reasons why polyphonic music sounds more transparent when played softly, rather than at larger loudness levels (Lottermoser and Meyer, 1958).

The masking effect is not limited in time to the duration of the masking tone or sound. Inasmuch as the ear requires a certain recovery phase to regain its “undisturbed” sensitivity, a so-called post-masking effect is observed: after the cessation of the masking tone, the masking threshold remains initially unchanged for several milliseconds, and then goes over into an almost linear decline, reaching the original threshold after about 200 ms. These time indications are practically independent of the strength of the masking sound (Zwicker, 1982). For dynamic and temporally highly structured music, this effect can occasionally be significant. In most cases, however, it insignificant because of the decay of the instrumental sound and the reverberations in the hall.

Since the ear processes softer sounds somewhat more slowly than louder sounds, a pre-masking is also possible, prior to the onset of the masking sound. This is, however, limited to a time frame of less than 20 ms. In a musical context such pre-masking can occasionally become significant when short attack noises, which precede the tone, are made inaudible or are at least weakened. It is of value to identify the point of attack, perceived by the ear, as associated with this pre-masking effect. This is the point which is relevant for determining a rhythmic structure in a tone sequence, in other words, the instant in which the ear senses the sound level to be already very close to the final value. This is illustrated by the fact that the perceived point of tone entrance lies about 10 dB below the final sound level, provided this lies by 40 dB above the threshold of hearing, or the masking threshold (in the presence of pre-existing noise), and this is relatively independent of the speed of the staccato attack. For very soft tones the point of attack can move as close as 7 dB to the final sound pressure level, i.e., it is sensed even later. For very loud tones, the tone entrance is already perceived at a sound pressure level of 15 dB below the final value (Vos and Rasch, 1981).

Naturally the masking effect influences not only shifts in hearing thresholds, but it is also noticeable when several audible tones influence each other in their relative loudness levels. In this context the apparent weakening of the higher tones by the lower ones is most significant, whereas masking in the other direction is relatively minor. This partial masking of loudness is particularly pronounced between sound components close in frequency. For example the loudness impression of a 1,000Hz tone with sound pressure level of 60 dB in the presence of 30 dB rush noise, corresponds to a loudness level of a 50 dB 1,000 Hz tone without the rush noise. A rush noise level of 40 dB would suffice to render the 1,000 Hz 60 dB tone inaudible (Zwicker, 1982).

Nevertheless, partial masking is also experienced for widely separated frequencies. This phenomenon is represented in Fig. 1.4 for the interaction of two sinusoidal tones of 250 Hz (at a level of L1) and 500 Hz (at level L2). In the left portion of the diagram L2 is constant (83 dB) and the level L1 of the lower tone rises from 43 to 83 dB. In the right portion L1 remains constant and L2 drops to 63 dB as indicated by the lines. The shaded strips indicate the levels required for a tone sounded alone in order to be perceived of the same loudness as the corresponding tone when sounded concurrently with the other one. Inasmuch as the subjective impressions vary somewhat for different persons, the perceived levels L1E and L2E are represented as bands. The distance of these bands for L1E and L2E from the associated curves L1 and L2, which represent the objectively present levels, show by how much the two tones are perceived as softer when sounded simultaneously. The graph makes clear that the higher tone appears to be weakened by 5 dB when objectively it is 15 dB stronger than the lower tone, where it is still heard with almost its original loudness. When both tones are sounded at the same level, however, they mask each other equally: On the average they are sensed approximately 6 dB softer, than when sounded alone. Finally if the level of the higher tone drops by more than 5 dB below the lower one, the upper tone is masked in preference.

Fig. 1.4
figure 4

Mutual masking of two sinusoidal tones of 250 Hz (level L1) and 500 Hz (level L2) sounded simultaneously. The shaded region indicates the perceived level

1.2.3 1.2.3 Directional Characteristics

While the eye subtends only a limited angular region in its field of view, the ear senses sound events from all directions. There is, however, a certain dependence of the perceived loudness on the direction of sound incidence. This is mostly due to the fact that the particular ear, which is turned away from the sound source, is shaded by the head. In addition there is the contribution of the shape of the external ear and the ear canal which influence the sound pressure level formation directly in front of the ear drum. The sound pressure level at this location ultimately determines the impression of loudness.

The differing sensitivity of the ear for various directions of sound incidence is designated as the directional characteristic. For frequencies below 300 Hz there is no directional dependence of loudness impressions for the individual ear, however, for higher frequencies there is a clear preference for those directions from which the sound impinges on the ear without shadowing (Schirmer, 1963). In the normal case of binaural hearing a directional characteristic for loudness perception is present, which corresponds to the addition of the sensitivities of both ears (Jahn, 1963). Though a certain compensation due to the cooperation of the two ears is present, nevertheless, at higher frequencies a typical directional dependence of the loudness impression is evident.

Several examples for the directional characteristic in a horizontal plane for binaural hearing are given in Fig. 1.5. It is noted from the diagrams that the 1 kHz sound is perceived as loudest when it arrives from the viewing direction. If it impinges on the listener from behind, it appears – in comparison to the same level in a free field – about 5 dB weaker. In contrast, the direction of greatest sensitivity at 4.5 kHz is from the side, the sound is perceived as approximately 3 dB stronger than if it comes from the direction of view or from behind. For frequencies around 4.5 kHz the angular region of greatest sensitivity is shifted somewhat more forward, while sound arriving from behind is noticed only very weakly. By 8 kHz the preferential hearing directions have again become oriented predominantly sideways.

Fig. 1.5
figure 5

Directional characteristics of the ear (binaural hearing) for various frequencies (after Jahn, 1963)

The directional characteristics compare the sensitivity of the ears for sound incidence from a certain direction with that in the direction of view. For practical applications, however, another case is of interest: In enclosed rooms, generally a diffuse sound field is formed, i.e., sound impinges on the listener from all directions. In such a diffuse sound field the sensitivity of the ear corresponds to an integral over all spatial directions, and departs therefore from the value for frontal incidence. This difference is shown in Fig. 1.6, where positive values indicate that a sound incident from the front is perceived as louder than the same level in a diffuse sound field. This is particularly the case between 2,000 and 4,000 Hz, while sound contributions between 300 and 1,500 Hz as well as above 5,000 Hz have the effect of appearing louder for uniform sound incidence from all sides.

Fig. 1.6
figure 6

Difference between sound pressure levels in a diffuse field, and for frontally incident plane waves for the case of equal loudness perception (after Zwicker, 1982)

1.2.4 1.2.4 Directional Hearing

When sound impinges on the head of the listener from a somewhat sideways direction, rather than directly from the front, a slight time differential ensues between the times of arrival at the two ears, since the path to the ear turned away from the source is slightly longer. This time difference is evaluated by the nervous system to determine the direction of incidence of the sound. In this context, the extremely short time span of 0.03 ms is sufficient to evoke a sensation of directional change; this corresponds to a sound path difference between the ears of about 1 cm. Thus, the ears are able to detect a departure of only 3° from the frontal sound incidence direction (Reichardt, 1968).

For incidence from the side, orientational resolution with this procedure would drop to 7.5°, furthermore, confusion with sound incidence from the rear would be possible if additional information based on directional characteristics of the ear were not available for these angular regions. For sound incidence from the side, both ears do not receive the same sound pressure level, but rather a difference results which is typical for a particular angle of incidence. This phenomenon plays a role especially at high frequencies, and thereby affects a certain relationship between tone color changes and directional changes. This cooperative relationship between sound running time difference, intensity variations and tone color changes, leads to a resolution on the part of the ears for sound incidence directions in the angular region of ±45° of the frontal direction which permits differentiation in steps of 3° and furthermore permits an orientation in steps of 4.5° in the region of 45–90°. It should also be mentioned that the ear reacts rather rapidly to a change between two sound sources coming from different directions. A jump from left to right (or right to left) is noticed in about 150 ms, a change from front to rear in less than 250 ms (Blauert, 1970). These time intervals correspond to the duration of very short notes.

When several equal sound signals arrive at the listener simultaneously, as can be the case with the use of loudspeakers for example, then only one sound source located between the original sources is perceived. For two speakers of equal intensity a median impression results. By changing the intensities, an apparent source between the speakers can be shifted. A comparable shift in location can be simulated by changing the running time difference, which however, must not exceed 3 ms (Hoeg and Steinke, 1972).

For running time differences of more than 3 ms, the sound source is located solely by the direction of the wave front arriving first, even if the following sound signal is stronger than the primary arriving signal. For running time differences between 5 and 30 ms, this level difference can amount to up to 10 dB without influencing the localization of the sound source in relation to the direction of incidence from the first source (Haas, 1951). This phenomenon is labeled as the precedence effect.

Finally, the spectral composition of the sound signal received from the source can influence the directional impression. This effect, among other things, determines the possibility to distinguish between incidence directions “front” and “back” as well as “up” in the median plane, i.e., the symmetry plane of the head. Frequency contributions below 600 Hz as well as those from about 3,000 to 6,000 Hz support the direction “front”, components between 800 and 1,800 Hz as well as above 10,000 Hz, the direction “back”, and components around 8,000 Hz the direction “up” (Blauert, 1974).

1.2.5 1.2.5 The Cocktail Party Effect

The high directional selectivity of the hearing mechanism rests on processing two distinct sound signals which are transmitted by the ears to the brain. This selectivity enables not only recognition of the direction of incidence of a single sound source, but also facilitates differentiation between multiple sound sources, located in different directions. In this, binaural masking plays an important role. When test sound and masking sound reach the listener from different directions, the masking is not as strong as when they come from the same direction. In the past, most investigations of masking were related to monaural masking, i.e., identical sound signals at both ears, or exposure of only one ear to test- and masking sound; the results quoted in Sect. 1.2.2 fall in this category.

The effect of binaural hearing on masking is represented in Fig. 1.7. The upper curve (a) represents the usual monaural masking threshold for a (sinusoidal) test-tone which is masked by “pink noise”, i.e., a noise with strong low frequency components. In this case, the sinusoidal tone and the noise arrive from the same frontal direction. Curve b applies to the case of the noise arriving at the listener as a plane wave from the front and the test-tone from the side with an angle of 60° relative to the direction of view. The directional difference of the two sound sources effects a lowering of the masking threshold by up to 10 dB for the mid frequencies, above 1,000 Hz the drop is still 6 dB, it thus raises the sensitivity of the ears for the test-tone. Curve c relates to the same position of the tone source, however, the noise reaches the listener as a diffuse sound field, it thus reaches the listener from all sides; in this case, which for example is of interest for locating a sound source in an expansive hall, the sensitivity is raised especially for high frequencies (Prante et al., 1990)

Fig. 1.7
figure 7

Hearing thresholds of a test-tone masked by pink noise (after Prante et al., 1990)

(a) sinusoidal tone and noise from the front

(b) sinusoidal tone 60° from the side, noise from the front

(c) sinusoidal tone 60° from the side, noise as diffuse sound field

When multiple sound sources are distributed around a listener, the hearing mechanism (inclusive of further information processing in the brain) has the capability to concentrate selectively on one of these sources and emphasize it in comparison to the others. This phenomenon is referred to as the “Cocktail Party Effect”, since (Theile, 1980) such a situation is particularly typical for a large number of distributed speaking voices. It is, however, required that the sound pressure level of the sound of interest lies about 10–15 dB above the masking level determined by the masking sound. Otherwise directional location is no longer possible. Through the Cocktail Party Effect, the intelligibility threshold for speech can be enhanced by up to 9 dB for several directionally distributed masking sources in comparison to having all sources come from the same direction (Blauert, 1974).

This concentration on one of many sound sources is particularly important for musicians playing in an ensemble, and the ability for such concentration is a matter of practice. In this context it is of value if the musician can visualize the sound without hearing it. It is this visualization which stimulates the relevant brain section, so that during further information processing in the brain, the already existing stimulation pattern needs only to be compared with the pattern arising from the arriving sound (Kern, 1972).

1.2.6 1.2.6 Masking for the Musician

Masking effects plays a particularly important role for those musician, for whom the instrument as a sound source is relatively close to the ear, who nevertheless need to hear the sound of other instruments and also the sound reflected by the room. Singers naturally have the same problem, however, they have the additional option of controlling the voice by sensing chest vibrations (Sundberg, 1979). The sound level, generated by the musician’s own instrument near the ear can have substantial values. For a string or woodwind forte, this lies in the region of 85–95 dB, for brasses it can be an additional 10 dB. These values are in reference to the sound running directly from the instrument to the ear, without the amplification due to the surrounding room, they thus approximate a free field situation. This limitation is an advantage for subsequent room acoustics considerations. The level may not always be the same for both ears, since many instruments are held at the player = s side. Figure 1.8 shows the level difference at the ears of several instrumentalists. As can be seen, the difference increases significantly with increasing frequency, where the differential path length of sound to both sides around the head introduces additional waviness to the curves.

Fig. 1.8
figure 8

Difference of sound pressure levels at the left and right ear of the player. Positive values mean that the level is higher at the left ear, negative that the value is higher at the right ear

The consequence of the directional characteristics of the ear, as previously described, and these additional level differences is, that the degree of mutual masking also depends on the direction from which extraneous sounds reach the player. The directionally dependent masking threshold varies by up to 9 dB around 250 or 500 Hz, and for frequencies above 1,000 Hz by values up to 20 dB. For the general description of this effect it is, however, necessary to average the relevant values over the entire tone scale. In all this, the question, for which direction of incidence the ear of the player is especially sensitive, and for which especially insensitive, is of particular interest.

For the case of symmetrically held wind instruments, such as the oboe, the clarinet or the trumpet, this effect is graphically represented in Fig. 1.9. Individual directions of incidence are indicated by marks which give the sensitivity in 3 dB steps in relation to the direction of greatest sensitivity. Measurement results are given for three frequencies of the extraneous sound and are valid for the entire tone range. Additionally it should be noted that for 500 Hz, all directions fall within the 3 dB range, there is here thus no practical directional dependence. Also at 250 Hz the influence of direction is small. In contrast, at 1,000 Hz, the median plane, except for vertically upwards, shows itself disadvantaged. At 2,000 Hz this direction and the direction of view are most advantageous, disadvantaged are all rising directions of less than 45° as well as the angled horizontal directions to the rear.

Fig. 1.9
figure 9

Directional Dependence of the Masking threshold for players with symmetrically held wind instruments (Oboe, Clarinet, Trumpet) in reference to the direction of lowest masking threshold (= 0 dB)

The cello already shows a slight asymmetry. Thus, for the cellist at 250 Hz, the direction left upwards is the only one outside the 3 dB region. For 1,000 Hz the direction vertical from above and angled behind are less sensitive (3–6 dB). On the other hand, at 2,000 Hz the direction left upwards is less sensitive than right upwards, the directions angled front and behind are likewise less sensitive than for the wind instruments of Fig. 1.9. The asymmetry is more strongly pronounced in the trombone, for which the 1,000 Hz sound contributions coming from the right, and in contrast the 2,000 Hz contributions from the left, are disadvantaged.

For the violin and the viola the uneven exposure of the players = ears to sound is naturally pronounced, yet even here, at 500 Hz, all directions remain within the 3 dB limits. For three other frequencies the relationships are represented in Fig. 1.10. Here, at 1,000 Hz, as at 2,000 Hz, several directions stand out, for which the masking threshold lies by more than 10 dB higher than for the optimal direction (Meyer and Biassoni de Serra, 1980). As a whole, a certain similarity with the pictures of Fig. 1.9 is recognizable, at least in the directions going up. It is noteworthy that the directional dependence of the masking threshold is practically unchanged when the player has an elevated threshold of hearing by about 10 dB, i.e., when the player has a slight deterioration of hearing ability, as is not uncommon for violinists.

Fig. 1.10
figure 10

Directional Dependence of the Masking threshold for a violin player, in reference to the direction of lowest masking threshold (= 0 dB)

For horn players as well, a distinct asymmetry exists when it comes to sensitivity to extraneous sound. True, at 250 Hz, the masking threshold is practically direction independent, however, at 500 Hz, the left side, namely the side away from the instrument, is less sensitive. For 1,000 Hz the directions left and right are the most favorable, while front and back are particularly insensitive. At 2,000 Hz there is a relatively low sensitivity for the side turned toward the instrument, particularly in the direction upwards and to the right, for which the masking threshold is raised by more than 10 dB in comparison to the direction of view as the most favorable direction (Meyer and Biassoni de Serra, 1980).

1.2.7 1.2.7 Sensitivity to Changes in Frequency and Sound Pressure Level

Periodic changes in frequency or amplitude of a tone are perceived differently by the ear, depending on how fast they occur. When the number of fluctuations per second remains below 5, the change in pitch or loudness, respectively, can be followed in its time sequence (Winckel, 1960). If a vibrato is carried out that slowly, it can therefore easily become a whine. From 6 Hz upwards, in contrast, one senses a uniform pitch or loudness, which is associated with an internal motion. The perceived pitch corresponds quite accurately to the central pitch about which vibrato fluctuates (Meyer, 1979); the loudness impression, on the other hand, is oriented toward the maximum level of the fluctuating sound (see Fig. 1.2). If the fluctuations occur more rapidly, the impression is given (approximately between 10 and 15 Hz) of a tonal roughness. The strength of this effect depends on the frequency position of the modulated tones, the fluctuation frequency, and also on the relative strength of the amplitude fluctuation; however, it is practically independent of the absolute sound pressure level (Terhardt, 1973, 1974).

The upper limit for fluctuation frequencies which produce roughness for low tones (below 400 Hz, i.e., below G4) lies at about 100 Hz and rises to a value of 250 Hz for very high frequencies. The impression of roughness for tones around 100 Hz (G2) is largest if the temporal fluctuations occur with a frequency of about 20 Hz. With rising tone frequency, the maximal roughness producing frequency increases, and for 3,000 Hz it reaches a frequency of 100 Hz. Considering that these temporal variations are caused not only by the superpositioning of several tones but also by the overtones of the very same tones, it follows that low tones can become rough if they are too rich in overtones. The phase position of the partials is relevant in this context: if the tone contains sharp impulses, as for example in brass instruments, the roughness is strong. In contrast, the roughness of a tone is weak, when individual partials stand out in relation to neighboring partials, particularly if in each critical band (see Sect. 1.2.1) one partial clearly dominates over all other partials, as for example in the plenum sound of the organ.

Naturally, all frequency- or amplitude-fluctuations must have a certain amplitude to become audible at all. For the case of frequency fluctuations of pure sinusoidal tones, three curves are plotted in Fig. 1.11, which show the smallest perceptible frequency rise in dependence on vibrato frequency. Two things are noted from these diagrams: The ear is most sensitive to pitch fluctuations between 2 and 5 Hz, all three curves exhibit a minimum at that point; if the frequency fluctuation is more rapid or slower, they are not as strongly noticeable. Furthermore, the sensitivity of the ear increases with increasing frequency of the vibrating tone; in the region between 1,000 and 2,000 Hz variations of ±5 cents are sufficient, whereas at 200 Hz variations of about three times that strength are required (100 cents correspond to one half step in a tempered scale). These sensitivity limits are of interest in connection with the strength of a vibrato, for example.

Fig. 1.11
figure 11

Smallest noticeable frequency rise for tones of different frequencies in dependence on vibrato frequency (recalculated after Zwicker, 1982)

For periodic changes in sound pressure level, the largest sensitivity lies in the region around 4 Hz as well. The required measure of level fluctuation drops with increasing loudness. While in the neighborhood of the threshold of hearing, fluctuations by 4 dB first become audible, at sound pressure levels of 80 dB, fluctuations of approximately 0.3 dB are already sufficient (Reichardt, 1968). This sensitivity to amplitude fluctuations, however, depends somewhat on frequency. In the region of 1,000 Hz it is even larger than 0.3 dB yet, on the other hand, for low notes below 200 Hz it becomes lower. For the dynamic range available in musical practice, one can deduce from this sensitivity for sound pressure variations, that the ear can differentiate approximately 130–140 loudness steps (Winckel, 1960).