Keywords

1 Introduction

It is generally believed that vocabulary does not exist in isolation in the user’s mind but is interconnected according to meaning so as to form a network with certain structural characteristics. The position of each word in the network is different, and the number of connected words and the ways of connection are also different. According to the principles of semantics, the semantic relations among lexical items are usually divided into the following types: synonymous, antonymous, hypernyms and hyponyms, whole-part relationship, individual-collection, and component-total. However, Feng [1] pointed out that such classification excludes the collocation relationship which is closely related to the semantic relationships. He believed that the collocation characteristics are effective components of the lexical meaning, and that the collocation relationship is an indispensable and important relationship in the lexical semantic network. Previous studies basically took the same view as such, when adopting the test method of vocabulary association to study the representation of mental lexicon, regardless of whether it is the phonetic-horizontal vertical-aggregation triad, or the phonetic-semantic-syntactic triad, or the form-based response meaning-based response position-based response triad. The studies emphasized equal importance between collocations and semantics, or they have placed horizontal combination and vertical aggregation in semantic responses [2,3,4]. Based on the theory of connectionism, Xing [5] concluded that Chinese as Second Language (CSL) learners’ vocabulary knowledge includes pronunciation, word form and meaning, and their mutual connections, among which meaning knowledge is the core. The collocation knowledge is the core content of the word relationship knowledge in meaning knowledge. Based on the above viewpoints, the collocation knowledge of words is an important part of the lexical semantic network. Therefore, we explore the development of the CSL production semantic network at the lexical collocation level.

Regarding adjective-noun collocations, research on language ontology are quite abundant. There are, for examples, the studies on the classification and grammatical function of adjectives in adjective-noun collocations [6, 7]; studies on the concealed and restrictive factors of “的” in adjective-noun collocations [8,9,10]; the studies on the convertibility of the adjective-noun collocations in attributive headword structures and subject-predicate structures [11,12,13,14]; the studies on the rules of mutual selection of adjectives and nouns in adjective-noun collocations [15,16,17]; and the studies on adjective-noun collocations from the perspective of computational linguistics [18,19,20] and so on. The use of adjectives and nouns as well as the acquisition of adjective-noun collocation knowledge are the difficult but key aspect in CSL learning. However, in the field of CSL teaching and acquisition, the research on adjective-noun collocations started relatively late. There are, for instances, studies adopting a corpus-based approach to discuss the acquisition of “的” in adjective-noun collocations of international students [10, 21, 22]; an experimental research discussing the acquisition of international students’ awareness of the adjective “向” [23]; and also comparative studies that simply studied the acquisition situation of a certain type of adjectives and nouns by international students [24, 25].

Both the methods and the results of the above research have given us great inspiration. At the same time, we find that there are still some problems worth further discussions. For example, will collocations of adjectives and nouns develop with the improvement of Chinese proficiency? If so, what are the levels of the development involved?

Under the task of writing an essay titled “我的家乡” (My Hometown), we compared and analyzed the written output of international students of different levels with the written output of native speakers. The collected composition data was segmented, tagged, and all adjective-noun collocations were counted. The high-frequency adjective is “大” (big), the high-frequency noun is “城市” (city), and “大城市” (big city) is a high-frequency collocation shared by CSL learners with “大” and “城市” as core words. Based on this observation, we investigated whether the selection and collocation of adjectives and nouns of CSL learners of different levels will develop with their improvement in Chinese language proficiency.

2 Research Design

2.1 Research Questions

In the composition “我的家乡” (my hometown), do CSL learners share collocations of the high-frequency adjective “大” and the high-frequency noun “城市”? If not, what will happen as the level goes up? What is the difference between CSL learners and native speakers in terms of adjective-noun collocations ?

2.2 Research Methods

We took the common high-frequency adjective “大” and the common high-frequency noun “城市” in the composition “我的家乡” as the core vocabulary and listed all adjective-noun collocations related to “大” and “城市” in an exhaustive manner. In terms of the adjective “大”, we looked at the set of nouns with which it matched, and in terms of the noun “城市”, we looked at the set of adjectives with which it matched. At the same time, the similarities and differences between the two sets of CSL and native speakers were compared. Through the semantic selection and application of nouns and adjectives as well as the characteristics of adjective-noun collocations in the output, the development of the semantic collocation knowledge of adjective-noun collocations of international students was observed.

2.3 Research Objects

In the experimental group, students at intermediate and advancedFootnote 1 levels were all from universities in Beijing. Among them, there were 22 students at intermediate level, of which 13 students whose mother tongue was from Asia (namely Korea, Japan, Thailand, Vietnam and Mongolia), 9 students were from other countries (namely France, Italy, Chile, Russia and Pakistan). There were 26 students at advanced level, of which 14 students were from Asia (namely Korea, Japan, Indonesia, Thailand and Mongolia) and 12 students were from other countries (namely Germany, France, Poland, Belgium, Romania, Tunisia and Algeria). Moreover, there were 20 Chinese students in the reference group, all of whom were from universities in Beijing with a bachelor’s degree or a postgraduate degree.

2.4 Implementation Process

Due to the differences in Chinese proficiency and writing ability between the two groups of subjects, the writing test was designed to be a narrative writing expressing a common emotion and common experiences – “我的家乡” (My hometown). The form was a propositional composition and the test time was 50 min. Participants were required to write about 400 words, and no auxiliary materials such as dictionaries were allowed. The test requirements of the reference group were the same as those of the experimental group.

2.5 Data Collection

A total of 68 answers were collected. Among them, 22 intermediate student produced a total of 7,066 characters, 26 advanced students a total of 11,206 characters and 20 native speakers a total of 9,058 characters. The original corpus collected in this study was handwritten material written on square paper. In order to facilitate the analysis, the author has transferred these materials into electronic documents, and then they were annotated. The number of characters and words as well as the frequency of characters and words etc. were then examined. At first, the word segmentation and part-of-speech tagging tool “Corpus Word Parser 3.0”Footnote 2 performed a preliminary segmentation and statistics, while adjective-noun collocations were manually tagged and counted. On this basis, a manual inspection, re-classification and statistics calculation were carried out. The tagging was done manually by two authors and cross-checked.

2.6 Measurement Indicators and Statistical Tools

  1. (1)

    Type: refers to the number of different words in the corpus.

  2. (2)

    Token: refers to the number of times a certain type of word is used in the corpus.

  3. (3)

    Lexical diversity (U): refers to the diversity of a certain type of word in the corpus, which can be calculated by the following formula:

    $$U = \frac{{\left( {{\text{log~}}Tokens} \right)^{2} }}{{\log Tokens~ - ~\log Types}}$$
    (1)
  4. (4)

    Semantic similarity (S): refers to the semantic similarity between two words. This paper uses a programming method to calculate semantic similarity based on the “同义词词林” (‘The dictionary of synonyms’) (Second Edition) [26]. The specific principle is as follows:

    The “同义词词林” (‘The dictionary of synonyms’) contains more than 640,000 entries. Each entry points to a semantic code consisting of five levels and one mark-bit. For example, “城市” points to “Cb25A01 =”, where “C” is the first level, “b” is the second level, “25” is the third level, “A” is the fourth level, “01” is the fifth level, and “=” is the mark-bit. Mark-bits are mainly used to distinguish regular synonyms, related words and independent words with neither synonyms nor related words, which are represented by the characters “=”, “#” and “@”. We assign a certain weight to each level of the “同义词词林” (‘The dictionary of synonyms’), namely 1.2, 1.2, 1.0, 1.0, 0.8 and 0.4, whose sum is 5.6. Then, when calculating the semantic similarity, we first obtain the corresponding codes of the two words, and then we compare the codes one by one to see whether each level is equal. The weight of all equal levels is added up and divided by the sum to get the semantic similarity.

  5. (5)

    Collocation compactness (MI): refers to the degree of closeness of two words. In this paper, the mutual information value is used to represent the collocation compactness.

Mutual information is a common method of model analysis in computational linguistics, which measures the mutuality between two objects. Mutual information is a concept in information theory, which is used to represent the relationship between information. It is a measure of the statistical correlation between two random variables. In this paper we use it to measure the collocation strength of two words, the formula is as follows:

$$MI\left( {X,Y} \right) = \log _{2} \frac{{P\left( {X,Y} \right)}}{{P\left( X \right)P\left( Y \right)}}$$
(2)

P(X,Y) is the frequency of the co-occurrence words X and Y, P(X) is the frequency of word X, and P(Y) is the frequency of word Y.

3 Statistical Results

3.1 Output Diversity of Adjectives and Nouns

First, we investigated the overall distribution of adjectives and nouns in the corpus. The statistical results showed that with the improvement of Chinese proficiency, the diversity of adjectives produced by CSL learners has increased. Although the diversity of the adjectives produced by CSL learners was still rather different from the diversity of the adjectives produced by native speakers, the former was gradually approaching the target language level. As for the diversity of nouns, with the improvement of Chinese proficiency, the output of CSL learners has not improved significantly, and there was still a big gap between them and the native speakers. See Table 1 and Fig. 1.

Table 1. Summary table of types, tokens and diversity of adjectives and nouns.
Fig. 1.
figure 1

Diversity line chart of productive adjectives and nouns.

3.2 The Lexical Semantic Adjective-Noun Set Composed of “大” and “城市”

Starting from the two high-frequency words “大” and “城市”, we found the adjective-noun collocations containing these two core words. For example, “(很/特/最/不) 大 (的) + N” found the set of N, and “(很/特/最) A (的) + 城市” found the set of A. According to the semantic classification of “同义词词林” (‘The dictionary of synonyms’) (Second Edition), the words in the set were labeled with semantic categories. Then we used the MI formula to calculate the corresponding collocation compactness. See Table 2 and Table 3.

Table 2. The set of nouns N modified by “大” (For the semantic categories in the noun set, please refer to the word classification of “同义词词林” (‘The dictionary of synonyms’) (Second Edition) edited by Mei Jiaju et al. We made some adjustments for the words not included in the book).
Table 3. The set of adjectives A that qualifies “城市” (The adjectives in this paper are counted according to their broad meanings, including distinguishing words that express the attributes or distinguishing features of things).

Taking “大” as the core word and looking at its collocation nouns, it is not difficult to see from Table 2 that the produced similarities between CSL and native speakers are that the collocation compactness of [space] and [architecture] nouns and “大” were relatively low, such as “城市” (city) and “公园” (park). A difference was that the collocation compactness of produced [human general term] nouns by native speakers with “大” was also low, such as “人” (person). Taking “城市” as the core word and looking at the adjectives with which it was matched, it is not difficult to see from Table 3 that intermediate CSL learners produced [nature] adjectives with “城市” that had a low degree of collocation compactness, such as “热” (hot). Advanced CSL learners produced [representation], [shape], and [nature] adjectives with “城市” that had a low degree of collocation compactness, such as “美丽” (beautiful), “大” (big) and “热” (hot). Native speakers produced [human general term] nouns with “大” that also had a low degree of collocation compactness, such as “人” (person).

3.3 Adjective-Noun Collocations Formed by “大” and “城市”

From the above, it is not difficult to see that in the two sets of A and N, and there were many common semantic categories in the output by intermediate and advanced CSL learners and native speakers. We used the semantic approximation to roughly estimate the semantic differences among the three. It was found that in the production of nouns, the average semantic similarity between intermediate level CSL learners and native speakers was higher. In the output of adjectives, the average semantic similarity between advanced level CSL learners and native speakers was higher, as shown in Table 4:

Table 4. Summary table of types, tokens and diversity of adjectives and nouns.

Due to the limited space, we only drew the semantic collocation diagram of adjectives and nouns of “城市” based collocations listed in Table 3 and hyponyms related to “城市”, as shown in Figs. 2, 3 and 4:

Fig. 2.
figure 2

Adjective-noun collocation network diagram of “城市” produced by intermediate level CSL learners.

Fig. 3.
figure 3

Adjective-noun collocation diagram of “城市” produced by advanced level CSL learners.

Fig. 4.
figure 4

Adjective-noun collocation diagram of “城市” produced by native speakers.

It is not difficult to see that with the improvement of Chinese proficiency, CSL learners produced more and more abundant types of adjective-noun collocations involving “城市”. Not only the semantic categories of adjectives have been more diverse, but also the total number of words within the same semantic category have also increased, such as in the category [shape] and the way the advanced level CSL learners producing “大”, “小” (small), “小小的” (little). The use of hyponyms of “城市” was increasing, but collocations of hyponyms and adjectives were still relatively few. This was still a difference from how native speakers produced adjective-noun collocations involving “城市”.

4 General Discussion

4.1 Developmental Characteristics of CSL Learners Regarding Productive Adjectives and Nouns

Generally speaking, productive adjectives of CSL learners were more diversified and had more semantic types. With the improvement of Chinese proficiency, the richness of adjectives produced by CSL learners has increased. It included not only the increase of semantic categories, but also the number of semantic sub-categories. For example, there were specific adjectives such as those in the categories [shape] and [representation], as well as abstract adjectives, e.g., those in the categories [circumstance], [nature], [state of affairs], [performance] and [psychological activity] that modify and define “城市”. At the intermediate Chinese level, CSL learners mastered a lot of modifying semantic categories of adjectives that define a type of “城市”. As they continued to learn, the number of vocabularies under each semantic category increased. For example, under the semantic category of [representation], an adjective produced by intermediate level CSL learners was “美丽”. With the improvement of Chinese proficiency, advanced level CSL learners produced “美丽”, “干净” (clean), “秀丽” (pretty), “古老” (ancient) and other words. On the whole, CSL learners tended to use adjectives with semantic categories of [shape] and [representation]. The degree of collocation compactness of these semantic adjectives and nouns was not high.

The nouns produced by CSL learners were not as rich as those produced by native speakers. With the improvement of Chinese language proficiency, there was no obvious change in the diversity of noun output. This was mainly because, when expressing the concept of nouns, native speakers made heavy use of hypernyms and hyponyms of a certain semantic concept, that is, they changed different ways to express the same concept. For example, to express the concept of “城市”, the combination of “[feature] + [hypernym]” was used to form hyponyms or synonyms and related words to express the same concept; for example, “小城” (small town), “县城” (county town), “教育名城” (famous city for education), “肥皂之乡” (hometown of soap) and so on. Compared with native speakers, CSL learners tended to use a large amount of the superordinate concept of the noun when expressing a concept of the noun. For example, CSL learners directly used “城市”, and then selected a modifier from the adjective set to match with it. This can also explain why in the narrative composition “我的家乡” (My hometown), although the word “城市” was a high-frequency noun common to CSL learners, it was not a high-frequency noun for native speakersFootnote 3. With the improvement of Chinese proficiency, the hyponyms of nouns that can be used by CSL learners have increased by a certain amount, mainly in the form of “[description] + [hypernym]”. For example, in addition to the category [space] representing the noun “城市”, CSL learners of the intermediate level also used its hyponyms “古城” (ancient city), “普吉市” (Phuket City), and “铃鹿市” (Suzuka City). CSL learners at advanced and intermediate level wrote in similar manner in the fact that in addition to using “城市”, they also used the hyponyms “古城”, “广岛市” (Hiroshima City), “光阳市” (Gwangyang City), “姬路城” (Himeji Castle) or “镰仓市” (Kamakura City). In addition, they also used hyponyms such as “都市” (metropolis), “乡镇” (township), and “小镇” (small town). But in short, with the improvement of Chinese proficiency, CSL learners were limited to producing more noun hyponyms. The collocations of hyponyms and adjectives were still limited.

4.2 The Development of the Semantic Network of CSL Learners When Using Adjective-Noun Collocations – from “Many to One”, “One to Many” to “Many to Many”

The output of adjective-noun collocations can be said to be a two-way selection process. It is necessary to select the accurate noun form from the noun concept set and also to select the accurate adjective form from the adjective set that represent such characteristics according to the semantic characteristics of the noun. For example, when choosing noun concepts, CSL learners tended to use the hypernym N1 and to choose the suitable adjective A1 in the adjective set. When expressing other similar noun concepts, CSL learners continued to choose the corresponding form from the noun set. Due to the limited number of vocabularies they learned, they tended to continue to select the hypernym N1, and then selected the adjective A2 from the adjective set, and so on. Noun N1 may have many adjectives A1, A2, A3 … An which are compatible with it, forming “many to one” “A (的) + N” adjective-noun semantic collocations. Just like “城市” in this paper, there were “many-to-one” adjective-noun semantic collocations such as “大城市” (big city), “美丽的城市” (beautiful city), “热闹的城市” (busy city) and “干净的城市” (clean city). There was also a situation, where CSL learners tended to use adjectives of [shape] or [representation]. This led to the selection of the adjective A1 with [shape] or [representation] meaning in the adjective set after N1 was selected. When they wanted to express the second noun concept N2, the selected adjective was still A1, and so on. Maybe N1, N2, N3 … Nn selected the adjective A1, forming a “one-to-many” “A (的) + N” semantic collocation of adjective-nouns. For example, in this paper, “one-to-many” adjective-noun collocations such as “大城市”, “大公园” (big park), “大医院” (big hospital), “大超市” (big supermarket) and “大珍珠” (big pearl) were formed. In the output of native speakers, the choice of nouns included not only the hypernym N1 but also the hyponyms or related words N2, N3 … Nn. The adjectives corresponding to the semantic features included A1, and A2, A3 … An representing other semantic features. Therefore, “many to many” adjective-noun collocations such as “喧嚣的城市” (noisy city), “繁华的城市” (prosperous city), “安逸的小城” (easy town), “独特的县城” (unique county town) and so on were formed. We believe that the semantic network of adjective-noun collocations in CSL production is a gradual development process from “many to one” and “one to many” to “many to many”, as shown in Fig. 5:

Fig. 5.
figure 5

The development process of the semantic network of CSL learners’ output of adjective-noun collocations.

5 Conclusion

Word collocation knowledge is an important part of vocabulary knowledge, and it is also important for CSL learners when they are obtaining the vocabulary knowledge of their target language. Moreover, it is the difficulty that CSL learners encounter when learning target language vocabulary [5]. Based on the written output regarding the narrative of “我的家乡” (My hometown), this paper compared the use of the high-frequency adjective “大” and the high-frequency noun “城市” and the use of related collocations by international students of different Chines proficiency levels. We found that with the improvement of Chinese language level, the richness of adjectives produced by CSL learners has increased, including not only the increase of semantic types, but also the number of vocabularies of each semantic type has increased. However, there was no obvious change trend in the richness of nouns, which may be related to the fact that CSL learners used more noun hypernyms. The collocations of adjectives and nouns of CSL learners mainly focused on collocations of noun hypernyms and adjectives, while the collocation compactness of the adjectives of [shape], [representation], [nature] was relatively weak. With the improvement of the Chinese proficiency, the number of noun hyponyms used by CSL learners has increased, but the collocations of hyponyms and adjectives were still limited. In short, the development of the semantic network of adjective-nouns produced by the CSL learners was a slow development process from “many to one” and “one to many” to “many to many”.

The research in this paper also has certain insights for CSL teaching. If teachers use collocation semantic network teaching, it will be an effective way to expand the vocabulary of students; for example, while teaching the word “城市” (city), its hyponyms “小城” (small city), “古城” (ancient city) and “县城” (county city) could be taught together to students. From the perspective of collocations, teachers can choose adjectives that have a high degree of semantic collocation compactness with the noun, such as “古老” (ancient), “熟悉” (familiar) and so on, and teach adjective-noun collocations according to the student’s proficiency level. This is undoubtedly an effective way for CSL learners to use the target language accurately and authentically.