Introduction

The perception of speech rhythm is a key issue in psycholinguistics. Speech rhythm is a major component of prosody, from which intonation and phonetic-phonological cues are not included. To know the rhythmic typology of a language is important, because it seems that infants start and process language learning from the rhythm of their language. Some hypotheses argue that discrimination between languages, acquisition of syllable structure, rudimentary segmentation of fluent speech, and structuring of Head-Complement in phrases are developed through rhythm, and rhythmic typology of a language is related to such knowledge (Cutler et al. 1986; Otake et al. 1993; Jusczyk et al. 1993; Kelmer Nelson et al. 1989; Christophe et al. 2003a, b).

Infants, first, need to tune to the native language and separate it from different ones to process their native language learning, otherwise the language acquisition process does not occur. Therefore, they must distinguish the native language from other ones. Concerning the infant’s ability to discriminate between languages, several hypotheses have been proposed and tested: (a) infants might discriminate utterances of their native language from those of any other language because every language adopts its own distinctive phonetic, phonological and prosodic system (Moon et al. 1993; Bahrick and Pickens 1988); (b) infants might also distinguish between utterances of two different foreign languages for the same reasons (Mehler et al. 1988; Mehler and Christophe 1995); and (c) infants might distinguish languages according to a small number of rhythmic categories, at a stage without any a priori linguistic knowledge (Mehler et al. 1996; Nazzi et al. 1998; Ramus et al. 1999; Christophe and Morton 1998). According to several recent findings, “rhythm based language discrimination hypothesis” is surprisingly convincing among these hypotheses.

The idea of classifying languages into “stress-timed”, “syllable-timed” and “mora-timed” categories according to different impressions of timing have been proposed by linguists (Lloyd James 1940; Pike 1945; Abercrombie 1967; Ladefoged 1975). “Timing” in speech refers to the rhythmic qualities. Basically, there are three ways to assign time units in any given language: to each syllable, or to each mora, or to each stressed syllable. In syllable-timed languages, every syllable has roughly the same duration of time, while in mora-timed languages, every mora does. In stress-timed languages, every syllable or mora does not constitute the same temporal unit, but there is roughly the same duration of time between two consecutive stressed syllables. Linguists have previously hypothesized “isochronic units” as a provocation that causes distinctive impressions of these linguistic rhythms (Pike 1945; Abercrombie 1967; Catford 1977; Kiparsky 1979; Selkirk 1980; Lehiste 1972; Shockey et al. 1972; Kozhevnikov and Chritovich 1965), but measurements have failed to find the physical isochronic units (Classe 1939; Bolinger 1965; De Manrique and Signorini 1983; Wenk and Wioland 1982) and these classifications have been abandonned.

However, even if researchers have failed to find real and objective isochronic units in speech, the nature of the inexact perception of the human cognitive system in relation to real isochrony, like “perceptual center effect” (Allen 1975; Lehiste 1977; Morton et al. 1976; Fowler 1979) and the phonological phenomena to keep temporal regularity within words, like “compression” (Dasher and Bolinger 1982; Dauer 1983), are newly considered to construct such rhythms. However, decisive confirmation of the existence of such rhythmic categories has also been made recently. The recent findings of infants’ behavior concerning the discrimination between languages have been based on their categorical perception (Nazzi et al. 1998; Ramus et al. 1999). It is evident that rhythm categories exist.

However, there have been no firm agreements among linguists as to the rhythm topology of Korean. Some researchers considered it syllable-timed, others stress-timed (Han 1964; Ko 1988; Ji 1993; Park 1990). However, that it is mora-timed has not been proposed. The suggestions have not been based on perceptual research, but have relied solely on theoretical hypothesis.

We can find the rhythm category of Korean speech by observing adults’ discrimination behaviors in relation to the pure rhythms between Korean–Italian (syllable-timed), Korean–English (stress-timed), and Korean–Japanese (mora-timed) language pairs because, according to recent psychological findings, the human cognitive system, including both infants and adults, distinguishes language between different rhythmic categories only when people are tested by hearing only the pure rhythms.

Here, we present perceptual experiments not only to confirm the existence of rhythm classes, but also to find the rhythm typology of Korean speech.

Experiment

Construction of material

The method developed by Ramus et al. (1999) was used in the present research. Sentences were constructed, recorded, and digitalized at 16 kHz for the present study in four languages (English, Italian, Japanese, Korean). Four speakers (two men, two women) per language participated for the reading, where every speaker read five sentences: 80 sample utterances in total. The sentences were short declarative statements, and languages were cross-matched by the number of syllables (about 20) and the average duration (about 3 s).

The 80 sentences were segmented as precisely as possible into consonants and vowels using the sound editing software PRAAT, using both auditory and visual cues. Glides (/w/) were treated as consonants in the pre-vocalic position in a syllable, whereas treated as vowels in the post-vocalic position.

Then we measured the duration of vocalic and consonantal intervals, according to the assumption that infants distinguish only vowels (‘energy’) from consonants (‘obstacle’). For example, the phrase ‘il mio amico’ is segmented as the following: /i/ /lm/ /ioa/ /m/ /i/ /c/ /o/.

The resulting durational information was fed into the software MBROLA for synthesis by concatenation of diphones using a French diphone database. The French sounds were chosen in order to keep our discrimination tasks neutral. To remove phonetic-phonological cues, we have transformed all sentences into forms of “sasasa”, consisted of replacing all consonants with /s/, and all vowels with /a/. Then we have synthesized all these “sasasa” sentences into the form of “flat sasasa” with a constant fundamental frequency at 230 Hz, for the purpose of eliminating intonations. In this way, we have created pure syllabic rhythms of the original sentences (see Appendix).

Procedure

The AAX experimental paradigm developed by Ramus et al. (2003) was used. For each language pair (Korean vs. English, Korean vs. Italian, Korean vs. Japanese), the subjects participated in 20 oddball tasks (pilot experiments composed of the AAX paradigm, where A is one language, X is another language). We have adopted the AAX paradigm, where the first two sentences (of the same language) are formed as a context, and the third sentence is presented in either the same or a different language. After listening to each group of AAX sentences, the subjects indicated “yes” or “no” to the question “Is the third sentence (X) expressed in the same language as the previous two (AA)?”

For each language pair, the AA sentences (context) were taken from the first two speakers (speaker 1 and speaker 2 mentioned in the Appendix) of the four languages in random order and the X sentence (test) was taken from one of the last two speakers (speaker 3 and speaker 4) in random order. There were 20 trials for each language pair, and each trial (sequence of AAX) contains pauses of 500 ms between A, A and X.

Participants

Forty adult subjects (20 Koreans who did not know Italian and 20 Italians who did not know Korean) participated in each experiment.

Results

The “Signal Detection Theory” method was applied to verify the discrimination between languages. The correct percent scores were thus converted into hit rates (the proportion of “correct same” trials) and false alarm rates (the proportion of “incorrect different” trials). Hit and false alarm rates were then converted to A′ (discrimination scores), that varies between zero and one, with chance level of 0.5.

A′ scores were calculated according to the following formula:

$$ \begin{aligned} H \geq F \to A' & = \frac{1} {2} + \frac{{(H - F)(1 + H - F)}} {{4H(1 - F)}} \\ H < F \to A' & = \frac{1} {2} + \frac{{(F - H)(1 + F - H)}} {{4H(1 - H)}} \\ \end{aligned} $$

Here, H is the hit rate and F is the false alarm rate.

Results of language discrimination experiments are as follows:

Flat sasasa

A

SD

P*

Korean–Italian

0.70

0.155

<0.001

Korean–English

0.79

0.234

<0.001

Korean–Japanese

0.47

0.065

0.398

  1. *P values were obtained from two-tailed one-sample t tests with test value of 0.5

A′ scores in the first two cases confirm that rhythmic differences between Korean and Italian, a typical syllable-timed language, and between Korean and English, a typical stress-timed language, are significantly perceivable. In contrast, the A′ score in the third case confirms that results between Korean and Japanese, a typical mora-timed language, are near to the chance level, indicating that Korean belongs to the same rhythmic category as Japanese.

Standard deviation indicates homogeneity of the response values in the respective pairs of languages, in which the homogeneity in the case of Korean–Japanese is higher than the cases of Korean–Italian and Korean–English.

The probability that the sentences of the two languages presented belong to the same language is rejected in the first two cases (P<0.001) whereas, in the third case, the probability is accepted (P=0.4).

As we can see, the subjects considered the differences within the language pairs “Korean and English” and “Korean and Italian” as “above chance level”, but failed to discriminate the language pair “Korean–Japanese”. These results indicate that Korean is neither a stress-timed language like English, nor a syllable-timed language like Italian, but is a mora-timed language like Japanese.

Discussion

In this study, we have investigated Korean, in which we have found the rhythmic typology of the language to be mora-timed like Japanese, confirming the existence of such categories that have been classified traditionally. We have also found that adults can categorize languages in the same way as infants by listening to the stimuli of pure syllabic rhythms. With numerous tests, the validity of the existence of the rhythmic category of Korean was recognized, in comparison with English (stress-timed), Italian (syllable-timed), and Japanese (mora-timed). Differences in rhythm were found between English and Korean as well as between Italian and Korean. The result of not finding any difference between Japanese and Korean confirmed their unity in rhythm typology (mora-timed). We think that the present research might be a useful method for categorizing other languages not yet analyzed for their rhythm typology. Such knowledge is rather important, for previous studies have suggested that the human cognitive system can categorize languages into a limited number of categories on the basis of rhythm perception. Finally, because the even control of speech rates and accurate measurements of the phonemes were not fully made in our experiment, we think that such cases should be improved for future researches.