Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In the first half of the 20th century, studies were mostly concentrated on temporal aspects of the sound field. In 1900 Sabine [10.2] initiated the science of architectural acoustics, developing a formula to quantify reverberation time. In 1949, Haas [10.3] investigated the echo disturbance effect from adjustment of the delay time of the early reflection by moving head positions on a magnetic tape recorder. He showed the disturbance of speech echo to be a function of the delay time, with amplitude as a parameter.

After investigations with a number of existing concert halls throughout the world, Beranek in 1962 [10.4] proposed a rating scale with eight factors applied to sound fields from data obtained by questionnaires on existing halls given to experienced listeners. Too much attention, however, has been given to monaural temporal factors of the sound field since Sabineʼs formulation of reverberation theory. For example, the Philharmonic Hall of Lincoln Center in New York, opened in 1962, was not satisfactory to listeners even after many improvements.

On the spatial aspect of the sound field, Damaske in 1967 [10.5] investigated subjective diffuseness by arranging a number of loudspeakers around the listener. In 1968, Keet [10.6] reported the variation of apparent source width (ASW) in relation to the short-term cross-correlation coefficient between two signals fed into the stereo loudspeakers as well as the sound pressure level. Marshall in 1968 [10.7] stressed the importance of early lateral reflections of just 90°, and Barron in 1971 [10.8] reported spatial impressions or envelopment of sound fields in relation to the short-term interaural cross-correlation coefficient.

As a typical spatial factor of the sound field, Damaske and Ando in 1972 [10.9] defined the IACC as the maximum absolute value of the interaural cross-correlation function (IACF) within the possible maximum interaural delay range for the human head, such that

IACC = | ϕ lr ( τ ) | max , for | τ | < 1 ms
(10.1)

and proposed a method of calculating the IACF for a sound field. In 1974, Schroeder et al. in [10.10] reported results of paired-comparison tests asking listeners which of two music sound fields was preferred. In an anechoic chamber, sound fields in existing concert halls were reproduced at the ears of listeners through dummy head recordings and two loudspeaker systems, with filters reproducing spatial information. They found that two significant factors, the reverberation time and the IACC, had a strong influence on subjective preference.

In 1977, Ando discussed subjective preference in relation to the temporal and spatial factors of the sound field simulated with a single reflection [10.11]. In 1983, he described a theory of subjective preference in relation to four orthogonal factors consisting of the temporal and spatial factors for the sound field, which enable us to calculate the scale value of the subjective preference at each seat [10.12,13]. Cocchi et al. reconfirmed this theory in an existing hall in 1990 [10.14]. In 1997, Sato et al. [10.15] reconfirmed this clearly by use of the paired-comparison judgment in an existing hall, switching identical loudspeakers located at different positions on the stage, instead of changing the seats of each subject. This method may avoid effects of other physical factors than the acoustic factors. In addition to the orthogonal factors, they found the interaural delay in the IACF, τ IACC, as a measure of image shift of the sound source that is to be kept at zero realizes good balance in the sound field. Using this method, the dissimilarity distance has also been described by the temporal and spatial factors of the sound field in 2002 [10.16].

Thus far, the theory has been based on the global subjective attributes for a number of subjects. The theory may be applied for enhancing each individualʼs satisfaction by adjusting the weighting coefficient of each orthogonal factor [10.1], even though a certain amount of individual differences exist [10.17]. The seat selection system [10.18] was introduced in 1994 after construction of the Kirishima International Concert Hall.

For the purpose of identifying the model of the auditory–brain system, experiments point toward the possibility of developing the correlation between brain activities, measurable with electroencephalography (EEG) [10.19,20]. Correspondences between subjective preference and brain activity have been found from EEG and magnetoencephalography (MEG) studies in human subjects [10.21,22], but details of these results are not included in this chapter due to limited space. Results show that orthogonal factors comprise two temporal factors: the initial time delay gap between the direct sound and the first reflection (Δt 1) and the subsequent reverberation time (T sub) are associated with the left hemisphere, and two spatial factors: the listening level (LL) and the magnitude of the IACF (IACC) are associated with the right hemisphere. The information corresponding to subjective preference of the sound field can be found in the effective duration of the autocorrelation function (ACF) of the alpha (α) waves of both EEG and MEG. A repetitive feature in the α-wave, as measured in its ACF at the preferred condition, has been found. The evidence ensures that the basic theory of subjective preference may also be applied to each individual preference [10.21].

Since individual differences of subjective preference in relation to the IACC in the spatial factor are small enough, at the first stage of acoustic design we can determine the architectural space form of the room. The temporal factors are closely related to the dimensions of a specific concert hall, which can be altered to exhibit specific types of music, such as organ music, chamber music or choral works.

On the neural mechanism in the auditory pathway, a possible mechanism for the interaural time difference and correlation processors in the time domain was proposed by Jeffress in 1948 [10.23], and by Licklider in 1951 [10.24]. By recording left and right auditory brainstem responses (ABR), Ando et al. in 1991 [10.25] revealed that the maximum neural activity (wave V at the inferior colliculus) corresponds well to the magnitude of the interaural cross-correlation function. Also, the left and right waves IVl,r are close to the sound energies at the right- and left-ear entrances. In fact, the time-domain analysis of the firing rate of the auditory nerve of a cat reveals a pattern of ACF rather than the frequency-domain analysis as reported by Secker-Walker and Searle in 1990 [10.26]. Cariani and Delgutte in 1996 showed that pooled inter-spike interval distributions resemble the short-time or the running autocorrelation function (ACF) for the low-frequency component. In addition, pooled interval distributions for sound stimuli consisting of the high-frequency component resemble the envelope for running ACF [10.27,28].

Remarkably, primary sensations such as the pitch of the missing fundamental [10.29], loudness [10.30], and duration sensation [10.31] can be well described by the temporal factors extracted from the ACF [10.32,33]. Timbre investigated by the dissimilarity judgment [10.16] of the sound field has been described by both the temporal and spatial factors. The typical spatial attributes of the sound field, such as subjective diffuseness [10.34] and apparent source width (ASW), as well as subjective preference, are well described by the spatial factor [10.32,33,34,35,36].

Besides the design of concert halls, other acoustical applications such as speech identification [10.36], environmental noise measurement [10.37], and sound localization in the median plane [10.38] should benefit from guidelines derived from this model.

Theory of Subjective Preference for the Sound Field

Subjective preference judgment is the most primitive response in any subjective attribute and entails judgments that steer an organism in the direction of maintaining and/or animating life. Subjective preference, therefore, may relate to an aesthetic issue. It is known that judgment in an absolute manner presents a problem in reliability; rather, data is judged in a relative manner such as by paired-comparison tests. This is the simplest method, in that any person may participate, and the resulting scale value may be utilized in the wide range of applications. From the results of subjective preference studies in relation to the temporal factor and the spatial factor of the sound field, the theory of subjective preference is derived. Examples of calculating subjective preference at each listenerʼs position are demonstrated in Sect. 10.2 for the global listener and Sect. 10.3 for the individual listener. The relationship between the resulting scale value of subjective preference in an existing hall and the physical factors obtained by calculation using architectural plan drawings, has been examined by factor analysis in Sect. 10.4.

Sound Fields with a Single Reflection

Preferred Delay Time of a Single Reflection

First of all, the simplest sound field, which consists of the direct sound with the horizontal angle to a listener: ξ 0 =0° (the elevation angle η 0 =0°), and a single reflection from a fixed direction ξ 1 =36° (η 1 =9°), was investigated. These angles were selected since they are typical in a concert hall. The delay time Δt 1 of the reflection was adjusted in the range of 6–256 ms. The paired-comparison test was performed for all pairs in an anechoic chamber using normal hearing subjects with two different music motifs A and B (Table 10.1). The effective duration of the ACF of the sound source defined by τ e may be obtained by the delay at which the envelope of the normalized autocorrelation function (ACF) becomes 0.1 (10-percentile delay) as shown in Fig. 10.1. The value of (τ e)min indicated in Table 10.1 is obtained from the minimum value of the running ACF, 2T =2  s, with an interval of 100 ms. The recommended 2T is given by (10.14). As far as these source signals are concerned, values of (τ e) extracted from the long-term ACF were similar to the minimum values from the running ACF, (τ e)min.

Table 10.1 Music and speech source signals and their minimum effective duration of the running ACFa, (τ e)min
Fig. 10.1
figure 1

Determination of the effective duration of the running ACF, τ e. The value of τ e may be obtained by the delay for which the envelope of the normalized ACF becomes 0.1 or −10 dB (10-percentile delay)

For simplicity, the score was simply obtained, in this section, by accumulating a score giving + 1 and − 1 corresponding to positive and negative judgments, respectively, and the total score is divided by S(F − 1) to get the normalized score, where S is the number of subjects and F is the number of sound fields tested. The normalized scores for two motifs and the percentage of preference for speech signal as a function of the delay time are shown in Fig. 10.2.

Fig. 10.2
figure 2

Preference scores of the sound fields as a function of the delay time A 1 = 1.0 (six sound fields and 13 subjects (after [10.11]). Preference scores are directly obtained for each pair of sound fields by assigning the values + 1 and − 1, corresponding to positive and negative judgments, respectively. The normalized score is obtained by accumulating the scores for all sound fields (F) tested and all subjects (S), and then dividing by the factor S(F − 1) for 13 subjects. A: music motif A, Royal Pavane by Gibbons, (τ e)min =125  ms; B: music motif B, Sinfonietta, opus 48, III movement by Malcolm Arnold, (τ e)min =40  ms

Obviously, the most preferred delay time with the maximum score differs greatly between the two motifs. When the amplitude of reflection A 1 = 1, the most preferred delays are around 130 ms for music motif A, 35 ms for music motif B (Fig. 10.2a), and 16 ms for speech [10.36]. Later, it was found that this corresponds well to the minimum values of the effective durations of the running ACF [10.39] of the source signals, so that the most preferred delay time were 125 ms (motif A), 40 ms (motif B) and 10 ms (speech). After inspection, the preferred delay is found roughly at certain durations of the ACF, defined by τ p, such that the envelope of the ACF becomes 0.1A 1. Thus, [Δt 1]p ≈ τ e only when A 1 = 1. As shown in Fig. 10.3, changing the amplitude A 1, collected data of [Δt 1]p are expressed approximately by

[ Δ t 1 ] p = τ p ( 1 - log 10 A 1 ) ( τ e ) min .
(10.2)

Note that the amplitude of reflection relative to that of the direct sound should be measured by the most accurate method (ex. the square-root value of the ACF at the origin of the delay time).

Fig. 10.3
figure 3

The relationship between the preferred delay time of the single reflection and the duration of the ACF such that its envelope becomes 0.1 A 1 (for k = 0.1 and c = 1.0). The ranges of the preferred delays are graphically obtained at 0.1 below the maximum score. A, B and S refer to motif A, motif B, and speech, respectively. Different symbols indicate the center values obtained at the reflection amplitudes of +6 dB (), 0 dB (), and −6 dB (), respectively (13–19 subjects) (after [10.11])

Two reasons can be given for why the preference decreases for the short delay range of reflection, 0 <Δ t 1 < [Δt 1]p:

  1. 1.

    Tone coloration effects occur because of the interference phenomenon in the coherent time region; and

  2. 2.

    The IACC increases when Δt 1 is near 0. The definition of the IACC, which may be extracted from the IACF, as given by [10.1], is shown in Fig. 10.4.

Fig. 10.4
figure 4

Definition of the three spatial factors IACC, τ IACC and W IACC, extracted from the interaural cross-correlation function (IACF)

On the other hand, echo disturbance effects can be observed when Δt 1 is greater than [Δt 1]p.

Preferred Horizontal Direction of a Single Reflection to a Listener

The direction was specified by loudspeakers located at ξ 1 =0° (η 1 =27°), and ξ 1 =18°,36° ,… ,90° (η 1 =9°), where the delay time of the reflection was fixed at 32 ms [10.11]. Results of the preference tests for the two motifs are shown in Fig. 10.5. No fundamental differences are observed between the curves of the sound source in spite of the large differences in the value of (τ e)min. The preferred score increases roughly with decreasing IACC, the typical spatial factor. The correlation coefficient between the score and the IACC is −0.8 (p < 0.01). The score with motif A at ξ 1 =90° drops to a negative value, indicating that the lateral reflections coming only from around ξ 1 =90°, thus, are not always preferred. The figure shows that there is a preference for angles less than ξ 1 =90°, and on average there may be an optimum range centered on about ξ 1 =55°. Similar results can be seen in the data from speech signals [10.40]. These results are due to the spatial factor independent of the temporal factor, which consists of the source signal.

Fig. 10.5
figure 5

Preference scores and the IACC of the sound fields with extreme music motifs A and B, as a function of the horizontal angle of a single reflection, A 1 =0  dB (six sound fields and 13 subjects)

Optimal Conditions Maximizing Subjective Preference

According to a systematic investigation of simulating the sound field with multiple reflections and reverberation by the aid of a computer and the listening test, the optimum design objectives and the linear scale value of subjective preference may be derived. The optimum design objectives can be described in terms of the subjectively preferred sound qualities, which are related to the four orthogonal factors describing the sound signals arriving at the two ears. They clearly lead to comprehensive criteria for achieving the optimal design of concert halls as summarized below [10.11,12,13].

Listening Level (LL)

The listening level is, of course, the primary criterion for listening to sound in a concert hall. The preferred listening level depends upon the music and the particular passage being performed. The preferred levels obtained with 16 subjects were similar for two extreme music motifs: 77–79 dBA in peak ranges for music motif A (Royal Pavane by Gibbons) with a slow tempo, and 79–80 dBA for music motif B (Sinfonietta by Arnold) with a fast tempo Fig. 10.7a.

Fig. 10.7
figure 6figure 6figure 6figure 6

Scale values of subjective preference obtained by the paired-comparison test for simulated sound fields in an anechoic chamber. Different symbols indicate scale values obtained from different source signals (after [10.12]). Even if different signals are used, a consistency of scale values as a function of each factor is observed, fitting a single curve. (a) As a function of the listening level, LL. The most preferred listening level, [LL]p =0 dB. (b) As a function of the normalized initial delay time of the first reflection by the most preferred delay time calculated by (10.4), Δt 1/[Δt 1]p; (c)3 As a function of the normalized reverberation time for the most preferred calculated by (10.6), T sub/[T sub]p; (d)4 As a function of the IACC (after [10.12])

Early Reflection after the Direct Sound (Δt 1)

An approximate relationship for the most preferred delay time has been discovered in terms of the ACF envelope of source signals and the total amplitude of reflections A. Generally, it is expressed by [Δt 1]p = τ p

| ϕ p ( τ ) | envelope kA c , at τ = τ p ,
(10.3)

where k and c are constants that depend on the subjective attributes [10.1, Fig. 41]. If the envelope of ACF is exponential, then

[ Δ t 1 ] p = τ p [ log 10 ( 1 / k ) - c log 10 A ] ( τ e ) min ,
(10.4)

where the total pressure amplitude of reflection is given by

A = ( A 1 2 + A 2 2 + A 3 2 + ) 1 / 2 .
(10.5)

The relationship given by (10.2) for a single reflection may be obtained by putting A = A 1, k = 0.1 and c = 1.

The value of (τ e)min is observed at the most active part of a piece of music containing artistic information such as a vibrato, a quick passage in the music flow, and/or a very sharp sound signal. Echo disturbance, therefore, may be perceived at (τ e)min. Even for a long musical composition, the minimum part of (τ e)min of the running ACF in the whole music, which determines the preferred temporal condition, may be taken into consideration for the choice of music program to be performed in a given concert hall. A method of controlling the minimum value (τ e)min in performance, which determines the preferred temporal condition for vocal music has been discussed for blending the sound source and a given concert hall [10.41,42]. If vibrato is introduced during singing, for example, it decreases (τ e)min, blending the sound field with a short reverberation time.

Reverberation Time After the Early Reflection (T sub)

It has been observed that the most preferred frequency response to the reverberation time is a flat curve [10.39]. The preferred reverberation time, which is equivalent to that defined by Sabine [10.2], is expressed approximately by

[ T sub ] p 23 ( τ e ) min .
(10.6)

The total amplitudes of reflections A tested were 1.1 and 4.1, which cover the usual conditions of sound fields in a room. Recommended reverberation times for several sound sources are shown in Fig. 10.6. A lecture and conference room must be designed for speech, and an opera house mainly for vocal music but also for orchestra music. For orchestral music, there may be two or three types of concert hall designs according to the effective duration of the ACF. For example, symphony no. 41 by Mozart, Le Sacre du Printemps by Stravinsky, and Arnoldʼs Sinfonietta have short ACFs and fit orchestra music of type (A). On the other hand, symphony no. 4 by Brahms and symphony no. 7 by Bruckner are typical of orchestra music (B). Much longer ACFs are typical for pipe-organ music, for example, by Bach.

Fig. 10.6
figure 7

Recommended reverberation time for several sound programs

The most preferred reverberation times for each sound source given by (10.6) might play important roles for the selection of music motifs to be performed. Of interest is that the most preferred reverberation time expressed by (10.6) implies about four times the reverberation time containing the source signal itself.

Magnitude of the Interaural Cross-Correlation Function (IACC)

All individual data indicated a negative correlation between the magnitude of the IACC and subjective preference, i.e., dissimilarity of signals arriving at the two ears is preferred. This holds only under the condition that the maximum value of the IACF is maintained at the origin of the time delay, keeping a balance of the sound field at the two ears. If not, then an image shift of the source may occur (Sect. 10.4.2). To obtain a small magnitude of the IACC in the most effective manner, the directions from which the early reflections arrive at the listener should be kept within a certain range of angles from the median plane centered on ±55°. It is obvious that the sound arriving from the median plane ±0° makes the IACC greater. Sound arriving from ±90° in the horizontal plane is not always advantageous, because the similar detour paths around the head to both ears cannot decrease the IACC effectively, particularly for frequency ranges higher than 500 Hz. For example, the most effective angles for the frequency ranges of 1 kHz and 2 kHz are centered on ±55° and ±36°, respectively. To realize this condition simultaneously, a geometrical uneven surface has been proposed [10.43].

Theory of Subjective Preference for the Sound Field

Theory

Since the number of orthogonal acoustic factors of the sound field, which are included in the sound signals at both ears, is limited [10.12]; the scale value of any one-dimensional subjective response may be expressed by

S = g ( x 1 , x 2 , x i ) .
(10.7)

It has been verified by a series of experiments that four objective factors act independently on the scale value when changing two of the four factors simultaneously [10.13]. Results indicate that the units of the scale value of subjective preference derived by a series of experiments with different sound sources and different subjects have appeared to be constant [10.12], so that we may add scale values to obtain the total scale value such as

S = g ( x 1 ) + g ( x 2 ) + g ( x 3 ) + g ( x 4 ) = S 1 + S 2 + S 3 + S 4 ,
(10.8)

where S i (i = 1, 2, 3, 4) is the scale value obtained relative to each objective parameter. Equation (10.8) indicates a four-dimensional continuity.

A Common Formula for the Four Normalized Orthogonal Factors

The dependence of the scale value on each objective parameter is shown graphically in Fig. 10.7. From the nature of the scale value, it is convenient to put a zero value at the most preferred conditions, as shown in this figure. These results of the scale value of subjective preference obtained from the different test series, using different music programs, yield the following common formula

S i - α i | x i | 3 / 2 , i = 1 , 2 , 3 , 4
(10.9)

where values of α i are weighting coefficients as listed in Table 10.2, which were obtained with a number of subjects. These coefficients depend on the individual. If α i is close to zero, then a lesser contribution of the factor x i on subjective preference is signified.

Table 10.2 Four orthogonal factors of the sound field and their weighting coefficients α i obtained by a paired-comparison test of subjective preference with a number of subjects in conditions without any image shift of the source sound (τ IACC = 0)

The factor x 1 is given by the sound pressure level (SPL) difference, measured by the A-weighted network, so that

x 1 = 20 log P - 20 log [ P ] p ,
(10.10)

P and [P]p being, respectively, the sound pressure at a specific seat and the most preferred sound pressure that may be assumed at a particular seat position in the room under investigation

x 2 = log ( Δ t 1 / [ Δ t 1 ] p ) ,
(10.11)
x 3 = log ( T sub / [ T sub ] p ) ,
(10.12)
x 4 = IACC .
(10.13)

Scale values of preference have been formulated approximately in terms of the 3/2 powers of the normalized objective parameters, expressed in the logarithm for the parameters, x 1, x 2 and x 3. Thus, scale values are not greatly changed in the neighborhood of the most preferred conditions, but decrease rapidly outside of this range. The remarkable fact is that the spatial binaural parameter x 4 is expressed in terms of the 3/2 powers of its real values, indicating a greater contribution than those of the temporal parameters.

Limitation of Theory

Since experiments were conducted to find the optimal conditions, this theory holds in the range of preferred conditions obtained by the test. In order to demonstrate the independence of the four orthogonal factors, under the conditions of fixed Δt 1 and T sub around the preferred conditions, scale values of subjective preference calculated by (10.8) for the LL with (10.10) and the IACC with (10.13) with constants listed in Table 10.2 are shown in Fig. 10.8. Agreement between the calculated and observed values are satisfactory [10.13]. Even though both LL and the IACC are spatial factors, which are associated with the right cerebral hemisphere (Table 10.3), these are quite independent of each other. The same is true for the temporal factors of Δt 1 and T sub associated with the left hemisphere. Of course, the spatial factor and the temporal factor are highly independent.

Fig. 10.8
figure 8

Scale values of subjective preference for the sound field with music motif A as a function of the listening level and as a parameter of the IACC (after [10.13]). Solid line: calculated values based on (10.8) together with (10.9) taking the two factors (10.10) and (10.13) into consideration; dashed line: measured values

Table 10.3 Hemispheric specializations determined by analyses of AEP (SVR), EEG and MEGa

Example of Calculating the Sound Quality at Each Seat

As a typical example, we shall discuss the quality of the sound field at each seating position in a concert hall with a shape similar to that of Symphony Hall in Boston. Suppose that a single source is located at the center, 1.2 m above the stage floor. Receiving points at a height of 1.1 m above the floor level correspond to the ear positions. Reflections with their amplitudes, delay times, and directions of arrival at the listeners are taken into account using the image method. Contour lines of the total scale value of preference calculated for music motif B are shown in Fig. 10.9. Results shown in Fig. 10.9b demonstrate the effects of the reflection from the sidewalls adjusted to the stage, which produce decreasing values of the IACC for the audience area. Thus the preference value at each seat is increased compared with that in Fig. 10.9a. In this calculation, the reverberation time is assumed to be 1.8 s throughout the hall and the most preferred listening level, [LL]p = 20 log [P]p in (10.10), is set for a point on the center line 20 m from the source position.

Fig. 10.9
figure 9

An example of calculating the scale value with the four orthogonal factors using (10.8) through (10.13) with weighting coefficients (Table 10.2). (a) Contour lines of the total scale value for the Boston Symphony Hall, with original side reflectors on the stage. (b) Contour lines of the total scale values for the side reflectors optimized on the stage

Auditory Temporal Window for ACF and IACF Processing

Auditory Temporal Window for ACF Processing

In analyzing the running ACF, the so-called auditory–temporal window 2T must be carefully determined. The initial part of the ACF within the effective duration of the ACF contains important information about the source signal. In order to determine the auditory–temporal window, successive loudness judgments in pursuit of the running SPL have been conducted. Results shown in Fig. 10.10 indicate that the recommended signal duration (2T)r to be analyzed is approximately expressed by [10.46]

( 2 T ) r 30 ( τ e ) min ,
(10.14)

where (τ e)min is the minimum effective duration, extracted from the running ACF [10.47]. This signifies an adaptive temporal window depending on the temporal characteristics of the sound signal in the auditory system. For example, the temporal window may differ according to music pieces ((2T)r =0.5–5  s) and to the vowels ((2T)r =50–100  ms) and consonants ((2T)r =5–10  ms) in continuous speech signals. It is worth noticing that the time constant represented by fast or slow of the sound level meter should be replaced by the temporal window, which depends on the effective duration of the ACF of the source signal. The running step (R s), which signifies a degree of overlap of signal to be analyzed, is not critical. It may be selected as K 2(2T)r; K 2 is chosen, say, in the range 0.25–0.5.

Fig. 10.10
figure 10

Recommended temporal window (2T)r for the ACF proceeding as a function of the minimum value of effective duration of the ACF (τ e)min. Different symbols represent experimental results using different sound sources

Auditory Temporal Window for IACF Processing

For the sound source fixed on the stage in a concert hall as usual, the value of 2T can be selected longer than 1.0 s for the measurement of the spatial factor at a fixed audience seat. But, when a sound signal is moving in the horizontal direction on the stage, we must know a suitable temporal window for 2T in analyzing the running IACF, which describes the moving image of sound localization. For a sound source moving sinusoidally in the horizontal plane with less than 0.2 Hz, 2T may be selected in a wide range from 30 to 1000 ms. If a sound source is moving below and/or at 4.0 Hz, 2T =30–100  ms is acceptable. In order to obtain a reliable result, it is recommended that such a temporal window for the IACF covering a wide range of movement velocity in the horizontal localization be fixed at about 30 ms.

Specialization of Cerebral Hemispheres for Temporal and Spatial Factors of the Sound Field

The independent influence of the aforementioned temporal and spatial factors on subjective preference judgments has been achieved by the specialization of the human cerebral hemispheres [10.44]. Recording over the left and right hemispheres of the slow vertex response (SVR) with latency of less than 500 ms, electroencephalograms (EEG) and magnetoencephalograms (MEG) have revealed various pieces of evidence (Table 10.3), with the most significant results being:

  1. 1.

    The left and right amplitudes of the evoked SVR, A(P 1 − N 1) indicate that the left and right hemispheric dominance are due to temporal factors (Δt 1) and spatial factors (LL and IACC), respectively [10.35,48].

  2. 2.

    Both left and right latencies of N 2 of SVR correspond well to the IACC [10.35].

  3. 3.

    Results of EEGs for the cerebral hemispheric specialization of the temporal factors, i.e., Δt 1 and T sub indicated left-hemisphere dominance [10.49,50], while the IACC indicated right-hemisphere dominance [10.45]. Thus, a high degree of independence between temporal and spatial factors was indicated.

  4. 4.

    The scale value of subjective preference is well described in relation to the value of τ e extracted from the ACF of α-wave signals over the left hemisphere and the right hemisphere according to changes in temporal and spatial factors of the sound field, respectively [10.45,49,50].

  5. 5.

    Amplitudes of MEGs recorded when Δt 1 was changed reconfirm the left-hemisphere specialization [10.51].

  6. 6.

    The scale values of individual subjective preference relate directly to the value of τ e extracted from the ACF of the α-wave of the MEG [10.51]. It is worth noting that the amplitudes of the α-wave in both the EEG and the MEG do not correspond well to the scale value of subjective preference.

In addition to the aforementioned activities in the time domain, in both the left and right hemispheres, spatial activity waves were analyzed by the cross-correlation function of alpha waves from the EEGs and MEGs. The results showed that a larger area of the brain is activated when the preferred sound field is presented [10.52] than when a less preferred one. This implies that the brain repeats a similar temporal rhythm in the α-wave range over a wider area of the scalp under the preferred conditions.

It has been reported that the left hemisphere is mainly associated with speech and time-sequential identifications, and the right is concerned with nonverbal and spatial identification [10.53,54]. However, when the IACC was changed using speech and music signals, right-hemisphere dominance was observed, as indicated in Table 10.3. Therefore, hemispheric dominance is a relative response depending on which factor is changed in the comparison pair, and no absolute behavior could be observed.

To date, it has been discovered that the LL and the IACC are dominantly associated with the right cerebral hemisphere, and the temporal factors, Δt 1 and T sub are associated with the left. This implies that such specialization of the human cerebral hemisphere may relate to the highly independent influence of spatial and temporal criteria on any subjective attribute. It is remarkable, for example, that cocktail party effects might well be explained by such specialization of the human brain, because speech is processed in the left hemisphere, and spatial information is processed in the right hemisphere independently.

Design Studies

Using the scale values in the four orthogonal factors of the sound field obtained by a number of listeners, the principle of superposition expressed by (10.8) together with (10.9) through (10.13) can be applied to calculate the scale value of preference for each seat. Comparison of the total preference values for different configurations of concert halls allows a designer to choose the best scheme. Temporal factors relating to its dimensions and the absorbing material on its walls are carefully determined according to the purpose of the hall in terms of a range of specific music programs (Fig. 10.6). In this section, we discuss mainly the spatial form of the hall, the magnitude of the interaural correlation function and the binaural listening level.

Study of a Space-Form Design by Genetic Algorithms (GA)

A large number of concert halls have been built since the time of the ancient Greeks, but only the halls with good sound quality are well liked. In order to increase the measure of success, a genetic algorithm (GA) system [10.55], a form of evolutionary computing, can be applied to the acoustic design of concert halls [10.56,57]. The GA system is applied here to generate the alternative scheme on the left-hand side of Fig. 10.11 for listeners.

Fig. 10.11
figure 11

The procedure for designing the sound field in a concert hall maximizing the scale values of subjective preference for a number of listeners (including a conductor) and performers. Data for global values of subjective preference may be utilized when designing a public hall

Procedure

In this calculation, linear scale values of subjective preference S 1 and S 4 given by (10.9) are employed as fitting functions due to the LL and IACC, because the geometrical shape of a hall is directly affected by these spatial factors. The spatial factor for a source on the stage was calculated at a number of seating positions. For the sake of convenience, the single omnidirectional source was assumed to be at the center of the stage, 1.5 m above the stage floor. The receiving points that correspond to the ear positions were 1.1 m above the floor of the hall. The image method was used to determine the amplitudes, delay times, and directions of arrival of reflections at these receiving points. Reflections were calculated up to the second order. In fact, there was no change in the relative relationship among the factors obtained from calculations performed up to the second, third, and fourth order of reflection. The averaged values of the IACC for five music motifs (motifs A through E [10.13]) were used for the calculation.

Those hall shapes that produced greater scale values are selected as parent chromosomes. An example of the encoding of the chromosome is given in Fig. 10.12. The first bit indicated the direction of motion for the vertex. The other (n − 1) bits indicated the range over which the vertex moved. To create a new generation, the room shapes are modified and the corresponding movement of the vertices of the walls is encoded in chromosomes, i.e., binary strings. After GA operations that include crossover and mutation, new offspring are created. The fitness of the offspring is then evaluated in terms of the scale value of subjective preference. This process is repeated until the end condition of about 2000 generations is satisfied.

Fig. 10.12
figure 12

An example of the binary strings used in encoding of the chromosome to represent modifications to the room shape

Shoe-Box Optimized

First of all, the proportions of the shoe-box hall were optimized (model 1). The initial geometry is shown in Fig. 10.13. In this form, the hall was 20 m wide, the stage was 12 m deep, the room was 30 m long, and the ceiling was 15 m above the floor. The point source was located at the center of the stage and 4.0 m from the front of the stage; 72 listening positions were selected. The range of motion for each sidewall and the ceiling was ±5  m from the respective initial positions, and the distance through which each was moved was coded on the chromosome of the GA. Scale values at the listening positions other than those within 1 m of the sidewalls were included in the averages ( S 1 ¯ and S 4 ¯ ). These values were employed as the measure of fitness. In this calculation, the most preferred listening level [LL]p was chosen at the frontal seat near the stage. Results of optimization of the hall for S 1 ¯ and S 4 ¯ are shown in Fig. 10.14a,b, respectively. The width and length were almost the same in the two results, but the indicated heights were quite different. The height of the ceiling that maximizes S 1 ¯ was as low as possible within the allowed range of motion to obtain a constant LL (Fig. 10.14a). The height that maximizes S 4 ¯ , on the other hand, was at the upper limit of the allowed range of motion to obtain small values of the IACC (Fig. 10.14b). Table 10.4 shows the comparison of the proportions obtained here and those of the Grosser Musikvereinssaal, which is a typical example of an excellent concert hall. The length/width ratios are almost the same. The height/width ratio of the Grosser Musikvereinsaal is intermediate between our results for the two factors. For the ceiling of the hall, the height that maximized S 1 ¯ was the lowest within the allowed range of motion (Fig. 10.14a). This is due to the fact that more energy can be provided from the ceiling to the listening position throughout the seats. To maximize S 4 ¯ , on the other hand, the ceiling took on the maximum height in the possible range of motion (Fig. 10.14b). Reflections from the flat ceiling did not decrease the IACC, but those from the sidewalls did.

Fig. 10.13
figure 13

The initial scheme of a concert hall (model 1). The range of sidewall and ceiling variation was ±5 m from the initial scheme

Fig. 10.14
figure 14

Results for the model 1. (a) Geometry optimized for S 1 ¯ . (b) Geometry optimized for S 4 ¯ (after [10.56])

Table 10.4 Comparison of proportions for the optimized spatial form of shoe-box type and the Grosser Musikvereinssaal

Modification from the Shoe-Box

Next, to obtain even better sound fields, a little more complicated form (model 2), as shown in Fig. 10.15, was examined. The floor plan optimized according to the above results was applied as a starting point. The hall in its initial form was 14 m wide, the stage was 9 m deep, the room was 27 m long, and the ceiling was 15 m above the stage floor. The sound source was again 4.0 m from the front of the stage, but was 0.5 m to one side of the centerline and 1.5 m above the stage floor. The front and rear walls were vertically bisected to obtain two faces, and each stretch of wall along the side of the seating area was divided into four faces. The walls were kept vertical (i.e., tilting was not allowed) to examine only the plan of the hall in terms of maximizing S 1 ¯ and S 4 ¯ . Each wall was moved independently of the other walls. The openings between the walls, in the acoustical simulation using the image method, were assumed not to reflect the sound. Forty-nine listening positions distributing throughout the seating area on a 2 m ×4  m grid were selected. In the GA operation, the sidewalls were moved, so that none of these 49 listening positions were excluded. The moving range of each vertex was ±2  m in the direction of the line normal to the surface. The coordinates of the two bottom vertices of each surface were encoded on the chromosomes for the GA. In this calculation, the most preferred listening level was set for a point on the hallʼs long axis (central line), 10 m from the source position.

Fig. 10.15
figure 15

Initial scheme of the concert hall (model 2). The rear wall of the stage and the rear wall of the audience area were divided into two. Sidewalls were divided into four

Leaf-Shape Concert Hall

The result of optimizing the hall for S 1 ¯ is shown in Fig. 10.16 and the contour lines of equal S 1 ¯ values are shown in Fig. 10.17. To maximize S 1 ¯ , the rear wall of the stage and the rear wall of the audience area took on concave shapes. The result of optimizing for S 4 ¯ is shown in Fig. 10.18 and contour lines of equal S 4 ¯ values are shown in Fig. 10.19. To maximize S 4 ¯ , on the other hand, the rear wall of the stage and the rear wall of the audience area took on convex shapes. With regard to the sidewalls, both S 1 ¯ and S 4 ¯ were maximized by the leaf-shaped plan, which is discussed in the following section. As for the conflicting requirements for S 1 and S 4, the maximization of S 4 may take priority over that of S 1, because the preference increases with decreasing IACC without exception [10.1], while there is a large individual difference in the preferred LL [10.58]. It is worth noting that listeners themselves can usually choose the best seat with respect to the preferred LL in a real concert hall.

Fig. 10.16
figure 16

A result for the model 2 optimized for S 1 ¯

Fig. 10.17
figure 17

Contour lines of equal S 1 ¯ values calculated for the geometry shown in Fig. 10.16

Fig. 10.18
figure 18

A result for model 2 optimized for S 4 ¯

Fig. 10.19
figure 19

Contour lines of equal S 4 ¯ values calculated for the geometry shown in Fig. 10.18

A conductor and/or music director must be aware of the sound field characteristics of a concert hall. One is then able to select a program of music so that the sound is best in that hall in terms of the temporal factors (Fig. 10.6).

Actual Design Studies

After testing more than 200 listeners, a small value of the IACC, which corresponds to different sound signals arriving at two ears, was demonstrated to be the preferred condition for individuals without exception. A practical application of this design theory was done in the Kirishima International Concert Hall (Miyama Conceru), which was characterized by a leaf shape (Fig. 10.20).

Fig. 10.20
figure 20

A leaf shape for the plan proposed for the Kirishima International Concert Hall. (a) Original leaf shape. (b) Proposed shape for the plan. As usual, the sound field in circled seating area close to the stage must be carefully designed to obtain reflections from the walls on the stage and tilted sidewalls

Temporal Factors of the Sound Field for Listeners

When the space is designed for pipe-organ performance, the range of (τ e)min, which may be selected to be centered on 200 ms, determines the typical temporal factor of the hall: [T sub]p ≈4.6  s (10.6). When designed for the performance of chamber music, the range is selected to be near the value of 65 ms ([T sub]p ≈1.5  s). The conductor and/or the sound coordinator select suitable musical motifs with a satisfactory range of effective duration of the ACF to achieve a music performance that blends the music and the sound field in a hall (Fig. 10.6). To adjust the preferred condition of Δt 1, on the other hand, since the value of (τ e)min for violins is usually shorter than that of contrabasses in the low-frequency range, the position of the violins is shifted closer to the left wall on the stage, and the position of the contrabasses is shifted closer to the center, as viewed from the audience.

Spatial Factors of the Sound Field for Listeners

The IACC should be kept as small as possible, maintaining τ IACC = 0. This is realized by suppressing the strong reflection from the ceiling, and by appropriate reflections from the sidewall at certain angles. When the source signal mainly contains frequency components around 1 kHz, the reflection from the side walls is adjusted to be centered roughly 55° to each listener, measured from the median plane. Under actual hearing conditions, the perceived IACC depends on whether or not the amplitudes of reflection exceed the hearing threshold level. This may be one of the reasons why a more diffuse sound field can be perceived with increasing power of the sound source. When the source is weak enough, that only the direct sound is heard, the actual IACC being processed in the auditory–brain system approaches unity, resulting in no diffuse sound impression. Thus, the IACC should be small enough with only strong early reflections.

Sound Field for Musicians

For music performers, the temporal factor is considered to be much more critical than the spatial factor (Sect. 10.3.2). Since musicians perform over a sequence of time, reflections with a suitable delay in terms of the value of (τ e)min of the source signals are of particular importance. Without any spatial subjective diffuseness, the preferred directions of reflections are in the median plane of music performers, resulting in IACC ≈ 1.0 [10.61,62]. In order to satisfy these acoustic conditions, some design iterations are required, maximizing scale values for both musicians and listeners and leading to the final scheme of the concert hall as shown in Fig. 10.11.

Sound Field for the Conductor

It is recommended that the sound field for the conductor on the stage should be designed as that of a listener with appropriate reflections from the sidewalls on the stage [10.63].

Acoustic Design with Architects

From the historical viewpoint, architects have been more concerned with spatial criteria from the visual standpoint, and less so from the point of view of temporal criteria for blending human experience and the environment with design. On the other hand, acousticians have mainly been concerned with temporal criteria, represented primarily by the reverberation time, from the time of Sabine [10.2] onward. No comprehensive theory of design including the spatial criterion as represented by the IACC existed before 1977, so that discussions between acousticians and architects were rarely on the same subject. As a matter of fact, both temporal and spatial factors are deeply interrelated with both acoustic design and architectural design [10.64,65].

As an initial design sketch of the Kirishima International Concert Hall, a plan shape like a leaf (Fig. 10.20a) was presented at the first meeting for discussion with the architect Fumihiko Maki and associates with the explanation of the temporal and spatial factors of sound field. After some weeks, Maki and Ikeda indicated a scheme of the concert hall as shown in Fig. 10.21 [10.59,60]. Without any change of plan and cross sections, the calculated results indicated the excellent sound field as shown in Fig. 10.22 [10.66,67].

Fig. 10.21
figure 21figure 21figure 21

Scheme of the Kirishima International Concert Hall, Kagoshima, Japan designed by the architect Maki and associates (1997) (after [10.59,60]). (a) Longitudinal section. (b) Plan of balcony level; (c) Plan of audience level . (a) Cross section; (b) Tilt sidewalls and triangular ceilings after construction of the hall

Fig. 10.22
figure 22

Calculated orthogonal factors at each seat with a performing position S for an initial design of the hall. In the final design of the hall, the width was enlarged by about 1 m to increase the number of seats. The designed reverberation time was about 1.7 s for the 500 Hz band. (a) Relative listening level; (b) initial time delay gap between the direct sound and the first reflection Δt 1 [ms]; (c) A-value, the total amplitude of reflections; (d) IACC for white noise

The final architectural schemes, together with the special listening room for testing individual preference of sound field and selecting the appropriate seats for maximizing individual preference of the sound field, are shown in Fig. 10.21b. In these figures, the concert courtyard, the small concert hall, several rehearsal rooms and dressing rooms are also shown. The concert hall under construction is shown in Fig. 10.21e, in which the leaf shape may be seen; it was opened in 1994 (Fig. 10.21f).

Details of Acoustic Design

For Listeners on the Main Floor

In order to obtain a small value of the IACC for most of the listeners, ceilings were designed using a number of triangular plates with adjusted angles, and the side walls were given a 10% tilt with respect to the main audience floor, as are shown in Fig. 10.21d and Fig. 10.23. In addition, diffusing elements were designed on the sidewalls to avoid the image shift of sound sources on the stage caused by the strong reflection in the high-frequency range above 2 kHz. These diffusers on the sidewalls were designed as a modification of the Schroeder diffuser [10.68] without the wells, as shown by the detail of Fig. 10.24.

Fig. 10.23
figure 23

Details of the cross section, including a sectional detail of the rear wall on the stage at the lower left of the figure

Fig. 10.24
figure 24

Detail of the diffusing sidewalls effective for the higher-frequency range above 1.5 kHz, avoiding image shift of the sound source on the stage. The surface is deformed from the Schroeder diffuser by removal of the well partitions (after [10.68])

For Music Performers on Stage

In order to provide reflections from places near the median plane of each of the performers on the stage, the back wall on the stage is carefully designed as shown in the lower left in Fig. 10.23. The tilted back wall consists of six sub-walls with angles adjusted to provide appropriate reflections within the median plane of the performer. It is worth noting that the tilted sidewalls on the stage provide good reflections to the audience sitting close to the stage, at the same time resulting in a decrease of the IACC. Also, the sidewall on the stage of the left-hand side looking from the audience may provide useful reflections arriving from the back for a piano soloist.

Stage Floor Structure

For the purpose of suppressing the vibration [10.69] of the stage floor and anomalous sound radiation from the stage floor during a performance, the joists form triangles without any neighboring parallel structure, as shown in Fig. 10.25. The thickness of the floor is designed to be relatively thin (27 mm) in order to radiate sound effectively from the vibration of instruments such as the cello and contrabass. During rehearsal, music performers may control the radiation power somewhat by adjusting their position or by the use of a rubber pad between the floor and the instrument.

Fig. 10.25
figure 25

Detail of the triangular joist arrangement for the stage floor, avoiding anomalous radiation due to the normal modes of vibration from certain music instruments that touch the floor

Individual Preferences of a Listener and a Performer

The minimum unit of society to be satisfied by the environment is one individual, which leads to a unique personal existence. Here, we demonstrate that the individual subjective preferences of each listener and each cellist may be described by the theory in Sect. 10.1, which resulted from observing a number of subjects.

Individual Subjective Preference of Each Listener

In order to enhance individual satisfaction for each listener, a special facility for seat selection, testing each listenerʼs own subjective preference [10.58,70], was first introduced at the Kirishima International Concert Hall in 1994. The sound simulation system employed multiple loudspeakers. It used arrows for testing the subjective preference of four listeners at the same time. Since the four orthogonal factors of the sound field influence the preference judgment almost independently [10.1], each single factor is varied, while the other three are fixed at the preferred condition for the average listener. Results of testing acousticians who participated in the International Symposium on Music and Concert Hall Acoustics (MCHA95), which was held in Kirishima in May 1995, are presented here [10.1].

Individual Preference and Seat Selection

The music source was orchestral, the Water Music by Händel; the effective duration of the ACF was 62 ms [10.11]. The total number of listeners participating was 106. Typical examples of the test results for listener BL as a function of each factor are shown in Fig. 10.26. Scale values of this listener were rather close to the averages for subjects previously collected: the most preferred [LL]p was 83 dBA, the value [Δt 1]p ≈ [(1 −  log 10 A)(τ e)min] was 26.8 ms (the global preferred value calculated by (10.4) with the total sound pressure as was simulated A = 4.0 is 24.8 ms. And the most preferred reverberation time is 2.05 s (the global preferred value calculated by (10.6) is 1.43 s). Thus, as was designed, the center area of seats was preferred for listener BL (Fig. 10.27). With regard to the IACC, the result for all listeners was that the scale value of preference increased with decreasing the IACC value. Since listener KH preferred a very short delay time of Δt 1, his preferred seats were located close to the boundary wall, as shown in Fig. 10.28. Listener KK indicated a preferred listening level exceeding 90 dBA. For this listener, the front seating area close to the stage was preferable, as shown in Fig. 10.29. For listener DP, whose preferred listening level was rather weak (76.0 dBA) and the preferred initial delay time was short (15.0 ms), the preferred seat was in the rear part of hall, as shown in Fig. 10.32. The preferred initial time delay gap for listener CA exceeds 100.0 ms, but was not critical. Thus, any initial delay times were acceptable, but the IACC was critical. Therefore, the preferred areas of seats were located as shown in Fig. 10.33.

Fig. 10.26
figure 26

Scale values of preference obtained by paired-comparison tests for the four orthogonal factors, subject BL. (a) The most preferred listening level was 83 dBA, the individual weighting coefficient in (10.9): α 1 = 0.06; (b) the preferred initial time delay gap between the direct sound and first reflection was 26.8 ms, the individual weighting coefficient in (10.9): α 2 = 1.86, where [Δt 1]p calculated by (10.4) with (τ e)min =62  ms for the music used (A = 4) is 24.8 ms; (c) the preferred subsequent reverberation time is 2.05 s, the individual weighting coefficient in (10.9): α 3 = 1.46, where [T sub]p calculated by (10.6) with (τ e)min =62  ms for the music used, is 1.43 s; (d) individual weighting coefficient in (10.9) for IACC: α 4 = 1.96.

Fig. 10.27
figure 27

Preferred seating area calculated for subject BL. The seats are classified in three parts according to the scale value of subjective preference calculated by the summation of S 1S 4 (10.8) together with (10.9). Black indicates preferred seating areas, about one third of all seats in this concert hall, for subject BL

Fig. 10.28
figure 28

Preferred seating area calculated for subject KH

Fig. 10.29
figure 29

Preferred seating area calculated for subject KK

Fig. 10.32
figure 30

Preferred seating area calculated for subject DP

Fig. 10.33
figure 31

Preferred seating area calculated for subject CA

Cumulative Frequency of Preferred Values

Cumulative frequencies of the preferred values with 106 listeners are shown in Fig. 10.30 through Fig. 10.34 for three factors. As indicated in Fig. 10.30, about 60% of listeners preferred the range 80–84.9 dBA when listening to music, but some listeners indicated that the most preferred LL was above 90 dBA, and the total range of the preferred LL was scattered, exceeding a 20 dB range. As shown in Fig. 10.31, about 45% of listeners preferred initial delay times of 20–39 ms, which were around the calculated preference of 24.8 ms (10.4) with k = 0.1, c = 1 and A = 4.0; however, some listeners indicated 0–9 ms and others more than 80 ms. With regard to the reverberation time, as shown in Fig. 10.34, about 45% of listeners preferred 1.0–1.9 s, which is centered on the preferred value of 1.43 s calculated by (10.6), but some listeners indicated preferences lower than 0.9 s or more than 4.0 s.

Fig. 10.30
figure 32

Cumulative frequency of preferred listening level [LL]p (106 subjects). About 60% of subjects preferred the range of 80–84.9 dBA

Fig. 10.34
figure 33

Cumulative frequency of the preferred subsequent reverberation time [T sub]p (106 subjects). About 45% of subjects preferred the range 1.0–1.9 s. The value of [T sub]p calculated using (10.6) is 1.43 s

Fig. 10.31
figure 34

Cumulative frequency of the preferred initial time delay gap between the direct sound and the first reflection [Δt 1]p (106 subjects). About 45% of subjects preferred the range of 20–39 ms. The value of [Δt 1]p calculated using (10.4) is 24.8 ms with c = 1 and k = 0.1

Independence of the Preferred Conditions

It was thought that both the initial delay time and the subsequent reverberation time appear to be mutually related, due to a kind of liveness of the sound field. Also, it was assumed that there is a strong interdependence between these factors for each individual. However, as shown in Fig. 10.35, there was little correlation between preference values of [Δt 1]p and [T sub]p (the correlation is 0.06). The same is true for the correlation between values of [T sub]p and [LL]p and for that between values of [LL]p and [Δt 1]p, a correlation of less than 0.11. Figure 10.36 shows the three-dimensional plots of the preferred values of [LL]p, [Δt 1]p and [T sub]p for the 106 listeners. Looking at a continuous distribution in preferred values, no specific groupings of individuals can be classified from the data.

Fig. 10.35
figure 35

The relationship between the preferred values of [Δt 1]p and [T sub]p for each subject. No significant correlation between values was achieved

Fig. 10.36
figure 36

Three-dimensional illustration of the three preferred orthogonal factors for the sound field for each individual subject (squares). All 106 listeners tested indicated that a smaller value of the IACC was preferred, and these data are not included in this figure. Preferred conditions are distributed in a certain range of each factor, so that subjects could not be classified into any specific groups

Another important fact is that there was no correlation between the weighting coefficients α i and α j , i ≠ j (i, j = 1, 2, 3, 4) in (10.9) for each individual listener [10.1]. A listener indicating a relatively small value of one factor will not always indicate a relatively small value for another factor. Thus, a listener can be critical about a preferred condition as a function of a certain factor, while insensitive to other factors, resulting in characteristic differences from other listeners.

Individual Subjective Preference of Each Cellist

To realize an excellent concert, we need to know the optimal conditions not only in the stage enclosure design for performers, but also in the style of the performance. The primary issue is that the stage enclosure is designed to provide a sound field in which performers can play easily. Marshall et al. [10.71] investigated the effects of stage size on the playing of an ensemble. The parameters related to stage size in their study were the delay time and the amplitude of reflections. Gade [10.72] performed a laboratory experiment to investigate the preferred conditions for the total amplitude of the reflections of performers. Nakayama [10.61] showed a relationship between the preferred delay time and the effective duration of the long-time ACF of the source signal for alto-recorder soloists (data was rearranged [10.1]). When we listen to a wide range of music signals containing a large fluctuation, it is more accurately expressed by the minimum value of the effective duration (τ e)min of the running ACF of the source signals [10.39]. For individual singers, Noson et al. [10.73,74] reported that the most preferred condition of the single reflection for an individual singer may be described by (τ e)min and a modified amplitude of reflection according to the overestimate and bone conduction effects (for control of (τ e)min see [10.41,42]).

Preferred Delay Time of the Single Reflection for Cellists

As a good example, the preferred delay time of the single reflection for individual cello soloist is described by the minimum value of the effective duration of the running ACF of the music motifs played by that cellist [10.75]. The same music motifs (motifs I and II) used in the experiment by Nakayama were applied here [10.61]. The tempo of motif I was faster than that of motif II, as shown in Fig. 10.37. A microphone in front of the cellist picked up the music signals performed three times by each of five cellists.

Fig. 10.37
figure 37

Music motifs I and II composed by Tsuneko Okamoto for the investigation (after [10.61])

Figure 10.38 shows an example of the regression curve for the scale value of preference fitted by (10.9). The peak of this curve denotes the most preferred delay time [Δt 1]p. The values for individual cellists are listed in Table 10.5. Global and individual results (except for that of subject E) for music motif II were longer than those for music motif I.

Fig. 10.38
figure 38

An example of the regression curve for the preferred delay time (subject D, music motif I, −15 dB), log 10t 1]p ≈ 1.35, thus [Δt 1]p ≈22.6  ms

Table 10.5 Judged and calculated preferred delay times of a single reflection for each cello soloist. Calculated values of [Δt 1]p are obtained by (10.15) using the amplitude of the reflection A1 and (τ e)min for music signals performed by each cellist

The most preferred delay time of a single reflection is approximately expressed by the duration τp of the ACF as similar to that of listeners (10.4), so that

[ Δ t 1 ] p = τ p [ log 10 ( 1 / k ) - c log 10 A ] ( τ e ) min ,
(10.15)

where the values k′ and c′ are constants that depend on the individual performer and musical instrument used. A substantial difference from (10.4) of listeners is that the amplitude of the reflection A′ is defined by A′ = 1 relative to −10 dB of the direct sound as measured at the earʼs entrance. This is due to the phenomenon of missing reflection (i.e., a performer overestimating the reflection) [10.1].

Fig. 10.39
figure 39

The relationship between the most preferred delay time [Δt 1]p measured and the duration [Δt 1]p = τp calculated using (10.15). Correlation coefficient, r = 0.91 (p < 0.01). : music motif I, −15 dB; : music motif I, −21 dB; : music motif II, −15 dB; : music motif II, −21 dB

Individual Subjective Preference

Using the quasi-Newton method, the resulting constants on average were k′ ≈ 1/2 and c′ ≈ 1 for the five cellists. (It is worth noting that the coefficients k′ and c′ for alto-recorder soloists were 2/3 and 1/4, respectively [10.1].) After setting k′ = 1/2, the coefficient c′ for each individual was figured out as listed in Table 10.6. The average value of the coefficient c′ for the five cellists obtained was 1.03. The relationship between the most preferred delay time [Δt 1]p obtained by the judgment and the duration τp (= [Δt 1]p) of the ACF calculated by (10.15) is shown in Fig. 10.40. Different symbols indicate values obtained in different test series with two music motifs. The correlation coefficient between calculated values of [Δt 1]p and measured values is 0.91 (p < 0.01). The scale values of preference for each of the five cellists as a function of the delay time of the single reflection normalized by the calculated [Δt 1]p are shown in Fig. 10.40.

Table 10.6 The coefficient c′ for each cellist in (10.15), calculating the preferred delay time of reflection for individual results and the global result, with k′ = 1/2 (fixed)
Fig. 10.40
figure 40

Scale values of preference for each of five cellists as a function of the delay time of the single reflection normalized by its most preferred delay time calculated by (10.15). : music motif I, −15 dB; : music motif I, −21 dB; : music motif II, −15 dB; : music motif II, −21 dB. The regression curve is expressed by (10.9), i = 2

Subjective Responses as a Function of [Δt 1]p/(τ e)min

Figure 10.41 shows the relative amplitude of the single reflection to that of the direct sound for the preference of cello soloists as a function of the delay time of the single reflection normalized by the minimum value of the effective duration (τ e)min of the running ACF of the source signal. Several other subjective responses in terms of the amplitude are shown together as a function of the delay time of the single reflection normalized by the value of the effective duration τ e of the long-time ACF. All these values can generally be expressed by (10.4) in relation to the effective duration of the ACF with the constants k and c, which depend on different subjective responses. An alto-recorder soloistʼs preference is also plotted in this figure [10.1]. The values for performers are below or close to the threshold of perception of listeners [10.76]. These reconfirm the phenomenon of missing reflection for performers.

Fig. 10.41
figure 41

Relative amplitude of the single reflection for the subjective preference of cello soloists as a function of the delay time of a single reflection normalized by the value of (τ e)min. Also, the amplitudes of several subjective responses as a function of the delay time of the single reflection normalized by the value of τ e of the source signal. Note that threshold has been rearranged using the typical ACF of the speech signal (after [10.1])

In order to blend the source music under performance and the sound field in a given concert hall, a performer, to some extent, can control the value of (τ e)min by introducing vibrato. Such an introduction of vibrato may decease the value of (τ e)min to obtain a more preferred condition for listeners as well, even though there is a short reverberation time in a given concert hall [10.41,42].

Acoustical Measurements of the Sound Fields in Rooms

Acoustical measurements were made in an existing hall for the purpose of testing acoustic factors that were calculated using the architectural scheme at the design stage. Also, subjective preference judgments for different source locations on the stage were performed by the paired-comparison test at each set of seats. The relationship between the resulting scale values of subjective preference and the physical factors obtained by simulation using architectural plan drawings was examined by factor analysis. The accumulation and understanding of the field data, in turn, may improve details of future methods for calculating acoustic factors.

Acoustic Test Techniques

Binaural Impulse Response

A diagnostic system for measuring the impulse response at the two ear entrances determining the four orthogonal factors, and for further evaluation of the subjective attributes of sound field at each seat in a hall is shown in Fig. 10.42. A test signal is radiated from the loudspeaker to measure impulse responses using two small microphones placed at the ear entrances of a real head (1.1 m above the floor). Then spatial factors associated with the right-hemisphere specialization (LL and IACC) and the temporal factors of left-hemisphere specialization (Δt 1 and T sub) are analyzed. When the effective duration of the ACF of the source signal (τ e) is calculated, the total scale value may be obtained by adding the scale values of the orthogonal factors referred to the most preferred conditions. The value of (τ e)min is used to determine the most preferred temporal values for [Δt 1]p and [T sub]p (Sect. 10.1.2). If the source signal is fed into the ACF processor, then outputs (numbered 1–4) may be used to control the sound field simultaneously with an electro-acoustic system, without any manual adjustment, preserving the preferred conditions of the four factors.

Fig. 10.42
figure 42

A system for measuring the four orthogonal factors of sound fields and evaluating subjective qualities at each seat in a room. TS: test signal (maximum-length sequence signal); IPR: impulse response analyzer; RH: right-hemispheric factors (listening level and the IACC); LH: left-hemispheric factors (Δt 1, T sub, and the A value); CP: comparators with the most preferred condition based on the effective duration of ACF, (τ e)min; ACF: autocorrelation function; SIG: source signals; g r(x): scale values from the right-hemispheric factors; g l(x): scale values from the left-hemispheric factors; Σ: total scale value of subjective preference

Examples of measuring binaural impulse responses at a seat close to the stage (seat a, left ear) and at a rear seat b (right ear) in the Kirishima International Concert Hall are demonstrated in Fig. 10.43. In this measurement, an omnidirectional dodecahedron loudspeaker with 12 full-range drivers was placed on the stage 1.5 m above the floor for a sound source. The total amplitude of reflections A at a seat close to the stage is usually smaller than that at seat far from the stage.

Fig. 10.43
figure 43

Examples of the impulse response measured at seats: (a) (near to the stage) and (b) (far from the stage) in the Kirishima International Concert Hall. The amplitude of the impulse response measured at seat a is attenuated because of a strong direct sound

Another powerful signal to be radiated from the loudspeaker to measure the impulse binaural responses is the pulse signal generated by inverse Fourier transformation [10.77].

Reverberation Time

After the impulse response is obtained, the reverberation time is measured by Schroederʼs method [10.78,79]. The integrated decay curve as a function of time may be obtained by squaring and integrating the impulse response of the sound field in a room, such that

s 2 ( t ) = K t + T t h 2 ( x ) d x
(10.16)

where the time T should be chosen sufficiently longer than the reverberation time.

Fig. 10.44
figure 44

Integrated decay curves obtained from the impulse responses at the two ear entrances, at seat a in the Kirishima International Concert Hall

For the 500 Hz octave band, examples of the measured decay curve and the decay rate of both left and right ears at seating position a are shown in Fig. 10.47. The reverberation times measured are both 2.07 s. The measured reverberation times with octave band filters in the Kirishima International Concert Hall (without audience) are plotted as filled circles in Fig. 10.45. The empty circles are estimated values of the reverberation time for a full audience. It is worth noticing that Jordan [10.80] showed that the values of the early decay time (EDT) measured over the first 10 dB of decay are close to values of the reverberation time averaged with the interval of −5 to −35 dB.

Fig. 10.47
figure 45

Measured IACC as a function of the integration interval 2T of the impulse responses for each octave band range. The value of IACC converged for 2T >200  ms (after [10.81])

Fig. 10.45
figure 46

Reverberation time measured in the Kirishima International Concert Hall. : Measured values without audience; : estimated with full audience

The total amplitude of reflection A, defined by (10.5), is obtained as its square

A 2 = ε h 2 ( x ) d x ε 0 h 2 ( x ) d x ,
(10.17)

where ε signifies a small delay time just large enough to cover the duration of the direct sound.

Measurement of Acoustic Factors at Each Seat in a Concert Hall

In Sect. 10.3.1 we discussed the seat selection system, designed for the purpose of enhancing individual satisfaction. To begin with, four orthogonal factors are measured at each seat in a concert hall [10.66,67]. Measured values of the listening level (LL), the total amplitude of reflection (A), the initial time delay gap (Δt 1) between the direct sound and the first reflection excluding the reflection from the floor, and the IACC at each seat in the Kirishima International Concert Hall are shown in Fig. 10.46. The reverberation times at all the seats had almost the same value, about 2.05 s for the 500 Hz band.

Fig. 10.46
figure 47

Orthogonal factors measured at each seat in the Kirishima International Concert Hall, other than the reverberation time, which is almost constant throughout the hall: (a) listening level; (b) Δt 1; (c) A value, the total amplitude of reflections; (d) IACC

Even though the final scheme of the concert hall was changed in terms of the width of the hall (one meter wider) from the scheme at the design stage, values of each physical factor measured as shown in Fig. 10.46 are not very different from the values calculated (Fig. 10.22).

Recommended Method for IACC Measurement

There are two purposes for measuring IACC, as needed for subjective evaluations and acoustic comparison of existing halls:

  1. 1.

    In order to evaluate the subjective quality of the sound field in an existing hall, the IACF (with values of IACC, τ IACC and W IACC defined in Fig. 10.4 as well as LL) together with the other three factors Δt 1, T sub and A are measured. Without any octave band filtering, measurements must be performed after passage of the music or the speech signal through an A-weighting network, under identical conditions with subjective judgment.

  2. 2.

    In order to compare values of the IACC as well as T sub for the sound field in existing halls, measurements with octave band filtering are performed. With a fixed sound source on the stage, the IACC is defined by a long integration interval, which includes the effects of the direct sound and all reflections, including reverberation, without any temporal subdivisions.

A typical example of measuring the IACC as a function of the integration interval, which was performed in Symphony Hall, Boston, is shown in Fig. 10.47 [10.81]. It is remarkable that the measured values of IACC almost converged for 2T ≈200  ms, and the values are not so different for longer intervals.

If a room is used for performing dance, ice-skating or a party, then the listeners face in various directions. In this case, the values of IACC and τ IACC are measured as a function of the direction of the head. The measured results with the 500 Hz octave band noise in an oblong atrium of a hotel at the distance 10 m from the source position are demonstrated in Fig. 10.48 [10.82]. When the listener is facing the sound source, then IACC = 0.41, and τ IACC = 0, and thus no image shift occurs. These values are nearly unchanged for head directional angles less than 30° when the listener is facing the lateral side at 90°, then the IACC is greater than 0.50, and τ IACC is about 600 μs, due to the interaural delay time.

Fig. 10.48
figure 48

Measured IACC and τ IACC as a function of head direction relative to the sound source in a narrow hotel atrium (after [10.82])

Subjective Preference Test in an Existing Hall

The subjective preference judgments for different source locations on the stage at Uhara Hall, Kobe, were performed by a paired-comparison test at each set of seats. The relationship between the resulting scale value of subjective preference and the physical factors obtained by calculation, using architectural plan drawings, was examined by factor analysis [10.15]. Calculated scale values of subjective preference were reconfirmed for the Uhara Hall (Fig. 10.49). The physical factors at each set of seats for four source locations on the stage were calculated. In the simulation, the directional characteristics of the four loudspeakers used in preference tests were not taken into consideration for the sake of convenience. The simulation calculation was performed up to the third order of reflection. Due to a floor structure with a fair amount of acoustic transparency, the floor reflection was not taken into account for the calculation, and part of the diffuser ceiling was regarded as a nonreflective plane for the sake of convenience. In the calculation of the IACC, the listeners faced toward the center of the stage, so that the IACC was not always a maximum at the interaural time delay τ = 0.

Fig. 10.49
figure 49

(a) Plan; (b) and (c) cross sections of the Uhara Hall, Kobe. Four source positions, 1, 2, 3 and 4 on the stage, which were switched in the paired-comparison test of subjective preference without moving subjects from seat to seat. There were 21 listening positions including neighboring seats

The hall contains 650 seats with a volume of 4870 m3. Four identical loudspeakers with the same characteristics were placed 0.8 m above the stage floor, and sixty-four listeners, divided into 21 groups, were seated in the specified set of seats. Without moving from seat to seat and excluding the effects of other physical factors such as visual and tactual senses on judgments, subjective preference tests by the paired-comparison method were conducted, switching only the loudspeakers on the stage. As a source signal, music motif B was selected in the tests. Scale values of preference were obtained by applying the law of comparative judgment and were reconfirmed by the goodness of the fit [10.83,84]. The session was repeated five times, exchanging seats, and thus data for 14–16 subjects in total were obtained for each set of seats.

Results of Multiple-Dimensional Analyses

In order to examine the relationship between scale values of subjective preference and physical factors obtained by simulation of an architectural scheme, the data were analyzed by factor analysis [10.78,79,85].

Of the four orthogonal factors, the reverberation time was almost constant for the source location and the seat location throughout the hall, and thus was not involved in the analysis. As previously discussed, as a condition for calculating the scale value of preference, the maximum value of the interaural cross-correlation function must be maintained at τ IACC = 0 to ensure frontal localization of the sound source. However, the IACC was not always maintained at τ = 0 due to the loudspeaker locations, because the subjects were facing the center of stage. In this analysis, therefore, the effect of the interaural time delay of the IACC was added as an additional factor. Thus, the outside variable to be predicted with factor analysis was the scale value obtained by subjective judgments, and the explanatory factors were: (1) the listening level, (2) the initial time delay gap, (3) the IACC, and (4) the interaural time delay (τ IACC).

Scores for Each Factor

The scores for each category of the factors obtained from the factor analysis are shown in Fig. 10.50. As shown in Fig. 10.50a, the scores for the listening level indicate a peak at the subcategory of 83–85.9 dB, with decreasing scores moving away from the preferred listening level. For the IACC, the preference score increases with a decrease in the IACC (Fig. 10.50c). It is worth noting that the scores for the aforementioned two factors are in good agreement with preference scale values obtained from preference judgments for a simulated sound field (Fig. 10.7a and 10.7d). The scores of the initial time delay gap normalized to the optimum value (Δt 1/[Δt 1]p) peaked at smaller values (Fig. 10.50b) than the most preferred value of the initial time delay gap obtained from the simulated sound fields (Fig. 10.7b). It is considered that, due to the limited range of the Δt 1 in the existing concert hall and the limited data for the short range of the Δt 1, the effects of the Δt 1 of the sound fields was rather minor in this investigation. Concerning τ IACC, as shown in Fig. 10.50d, the score decreases monotonically as the delay is increased. This may be caused by an image shift without balancing of the sound field.

Fig. 10.50
figure 50

Scores for each category of four physical factors obtained by the factor analyses. The number indicated at the upper left part of each figure signifies the partial correlation coefficient between the score and each factor. (a) Listening level; (b) normalized initial time delay gap between the direct sound and the first reflection; (c) IACC; (d) interaural time delay of the IACC, τ IACC, found as the most significant factor in this investigation with loudspeaker reproductions on the stage. Tendencies obtained here are similar to those of the scale value shown in Fig. 10.7, which were obtained from the simulated sound field

Measured and Calculated Subjective Preference Values

The relationship between the scale value obtained by subjective judgments and the total score at each center of three or four seats is shown in Fig. 10.51. The scale values of preference are well predicted with the total score for four loudspeaker locations (r = 0.70, p < 0.01). In some cases there is a certain degree of apparent coherence between physical factors, for example, the calculated listening level and the IACC for sound fields in existing concert halls. However, these factors are theoretically orthogonal, and therefore the preference scores obtained were in good agreement with the calculated preference scale values obtained by the simulation.

Fig. 10.51
figure 51

Relationship between the scale value of subjective preference obtained by the paired-comparison test in the existing hall and the total score calculated by using the scores shown in Fig. 10.50. The correlation coefficient was r = 0.70 (p < 0.01)

So far the subjective preference of source locations on the stage has been examined at each set of seats. The rear source position (#4) on the stage is more preferred than that of the other source locations. The side source position (#3) indicates a low preference, due to the interaural time delay. The initial time delay gap resulted in a small influence on the total score because of its limited range.

Conclusions

Results of the analysis demonstrate that the theory of calculating subjective preference by the use of orthogonal parameters obtained in the laboratory is supported by experiments in a real hall. This may hold only when the maximum value of the interaural cross-correlation is maintained at τ = 0. However, this condition is usually realized by introducing certain diffusing elements on the sidewalls in a real concert hall when the listeners face a performer.

Cerebral hemisphere specialization in the human brain may play an important role for the independent effects of temporal and spatial factors on preference as discussed in Sect. 10.1.5. The scale values of subjective preference of both the sound field and the visual field have been described by both temporal and spatial factors [10.86]. Accordingly, a generalized theory of environmental planning incorporating the temporal and spatial values for the left and right cerebral hemispheres, respectively, has been proposed by blending the built environment and nature [10.87]. In particular, examples of temporal design associated with the left hemisphere are opened for discussed introducing the third stage of human life [10.88].