Improving Sentence Classification by Multilingual Data Augmentation and Consensus Learning

Wang, Yanfei; Chen, Yangdong; Zhang, Yuejie

doi:10.1007/978-3-030-63031-7_3

Yanfei Wang¹⁴,
Yangdong Chen¹⁴ &
Yuejie Zhang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12522))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

802 Accesses

Abstract

Neural network based models have achieved impressive results on the sentence classification task. However, most of previous work focuses on designing more sophisticated network or effective learning paradigms on monolingual data, which often suffers from insufficient discriminative knowledge for classification. In this paper, we investigate to improve sentence classification by multilingual data augmentation and consensus learning. Comparing to previous methods, our model can make use of multilingual data generated by machine translation and mine their language-share and language-specific knowledge for better representation and classification. We evaluate our model using English (i.e., source language) and Chinese (i.e., target language) data on several sentence classification tasks. Very positive classification performance can be achieved by our proposed model.

Y. Wang and Y. Chen—Equal contribution.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Towards Combining Multitask and Multilingual Learning

Robust Sentence Classification by Solving Out-of-Vocabulary Problem with Auxiliary Word Predictor

Cross-lingual Machine Translation: An Analysis Model for Low Resource Languages

Keywords

1 Introduction

Sentence classification is a task of assigning sentences to predefined categories, which has been widely explored in past decades. It requires modeling, representing and mining a degree of semantic comprehension, which are mainly based on the structure or sentiment of sentences. This task is important for many practical applications, such as product recommendation [5], public opinion detection [24], and human-machine interaction [3], etc.

Recently, deep learning has achieved state-of-the-art results across a range of Computer Vision (CV) [15], Speech Recognition [7], and Natural Language Processing tasks (NLP) [11]. Especially, Convolutional Neural Network (CNN) has gained great success in sentence modelling. However, training deep models requires a great diversity of data so that more discriminative patterns can be mined for better prediction. Most existing work on sentence classification focuses on learning better representation for a sentence given limited training data (i.e., source language), which resorts to design a sophisticated network architecture or learning paradigm, such as attention model [31], multi-task learning [20], adversarial training [19], etc. Inspired by recent advances in Machine Translation (MT) [30], we can perform an input data augmentation by making use of multilingual data (i.e., target language) generated by machine translation for sentence classification tasks. Such generated new language data can be used as the auxiliary information, and provide the additional knowledge for learning a robust sentence representation. In order to effectively exploit such multilingual data, we further propose a novel deep consensus learning framework to mine their language-share and language-specific knowledge for sentence classification. Since the machine translation model can be pre-trained off-the-shelf with great generalization ability, it is worth noting that we do not directly introduce other language data comparing to existing methods in the training and testing phase.

Our main contributions are of two-folds: 1) We first propose utilizing multilingual data augmentation to assist sentence classification, which can provide more beneficial auxiliary knowledge for sentence modeling; 2) A novel deep consensus learning framework is constructed to fuse multilingual data and learn their language-share and language-specific knowledge for sentence classification. In this work, we use English as our source language and Chinese/Dutch as the target language from an English-Chinese/Dutch translator. The related experimental results s how that our model can achieve very promising performance on several sentence classification tasks.

2 Related Work

2.1 Sentence Classification

Sentence classification is a well-studied research area in NLP. Various approaches have been proposed in last a few decades [6, 29]. Among them, Deep Neural Network (DNN) based models have shown very good results for several tasks in NLP, and such methods become increasing popular for sentence classification. Various neural networks are proposed to learn better sentence representation for classification. An influential one is the work of [13], where a simple Convolutional Neural Network (CNN) with a single layer of convolution was used for feature extraction. Following this work, Zhang et al. [36] used CNNs for text classification with character-level features provided by a fully connected DNN. Liu et al. [20] used a multi-tasking learning framework to learn multiple related tasks together for sentence classification task. Based on Recurrent Neural Network (RNN), they utilized three different mechanisms of sharing information to model text. In practice, they used Long Short-Term Memory Network (LSTM) to address the issue of learning long-term dependencies. Lai et al. [16] proposed a Recurrent Convolutional Neural Network (RCNN) model for text classification, which applied a recurrent structure to capture contextual information and employed a max-pooling layer to capture the key components in texts. Jiang et al. [10] proposed a text classification model based on deep belief network and softmax regression. In their model, a deep belief network was introduced to solve the sparse high-dimensional matrix computation problem of text data. They then used softmax regression to classify the text. Yang et al. [31] used Hierarchical Attention Network (HAN) for document classification in their model, where a hierarchical structure was introduced to mirror the hierarchical structure of documents, and two levels of attention mechanisms were applied both at the word and sentence level.

Another direction of solutions for sentence classification is to use more effective learning paradigms. Yogatama et al. [33] combined Generative Adversarial Networks (GAN) with RNN for text classification. Billal et al. [1] solved the problem of multi-label text classification in semi-supervised learning manner. Liu et al. [19] proposed a multi-task adversarial representation learning method for text classification. Zhang et al. [35] attempted to learn structured representation of text via deep reinforcement learning. They tried to learn sentence representation by discovering optimized structures automatically and demonstrated two attempts of Information Distilled LSTM (ID-LSTM) and Hierarchically Structured LSTM (HS-LSTM) to build structured representation.

However, these tasks do not take into account the auxiliary language information corresponding to the source language. This auxiliary language can provide the additional knowledge to learn more accurate sentence representation.

2.2 Deep Consensus Learning

Existing sentence classification works [1, 10, 13, 16, 33, 35, 36] mainly focus on feature representation or learning a structured representation [35]. Deep learning based sentence classification models have obtained impressive performance. Those approaches are largely due to the powerful automatic learning and representation capacities of deep models, which benefit from big labelled training data and the establishment of large-scale sentence/document datasets [1, 33, 35]. However, all of the existing methods usually consider only one type of language information by a standard single language process. Such methods not only ignore the potentially useful information of other different languages, but also lose the opportunity of mining the correlated complementary advantages across different languages. A similar model is [20], which used synthetic source sentences to improve the performance of Neural Machine Translation (NMT). While sharing the high-level multilingual feature learning spirit, the proposed consensus learning model significantly has the following three outstanding characteristics. (1) Beyond the language concatenation based on fusion, our model uniquely considers a synergistic cross-language interaction learning and regularization by consensus propagation. This aims to overcome the challenge of learning discrepancy in multilingual feature optimization. (2) Instead of the traditional single loss design, a multi-loss concurrent supervision mechanism is deployed by our model. This enforces and improves the model’s individuality learning power of language-specific feature. (3) Through NMT, we can eliminate some of the ambiguous words and highlight some key words.

3 Methodology

We aim to learn a deep feature representation model for sentence classification based on language-specific input, without any specific feature transformation. Figure 1 depicts our proposed framework, which consists of two stages. The first stage performs multilingual data augmentation from an off-the-shelf machine translator; and the second one feeds the source language data and generated target language data to our deep consensus learning model for sentence classification.

3.1 Multilingual Data Augmentation

Data augmentation is a very important technique in machine learning that allows building better models. It has been successfully used for many tasks in areas of CV and NLP, such as image recognition [15] and MT [35]. In MT, Back-translation is a common data argumentation method [25, 39], which allows us to combine monolingual training data. Especially when the existing data is insufficient to learn a discriminative representation for a specific task, the data augmentation methods can be used.

In sentence classification, given an input sentence in one language, we perform data augmentation by translating the sentence to another language using existing machine translation methods. We name the input language as source language and the translated language as target language. This motivation comes from the recent great advance in NMT [30]. Given an input sentence in source language, we simply call the Google Translation API^{Footnote 1} to get the translated data in target language. Comparing to other state-of-art NMT models, the Google translator has the advantage of both effectiveness and efficiency in real application scenarios. Since target language is used for multilingual data augmentation and the type of it is not important to the proposed model, we random choose Chinese and Dutch respectively as the target language for multilingual data augmentation, and the source language depends on the language of input sentence.

3.2 Deep Consensus Learning Model

Learning a consensus classification model with the combination of several beneficial information into one final prediction can lead to a more accurate result [2]. Thus we use two languages of data, $\left\{ S_1, S_2, S_3, \cdots , S_{N-1}, S_N\right\} $ and $\left\{ T_1, T_2, T_3, \cdots , T_{N-1}, T_N\right\} $, to perform consensus learning for sentence classification. As shown in Fig. 1, our model has three parts: (1) Two branches of language-specific subnetworks for learning the most discriminative features for each language data; (2) One fusion branch responsible for learning the language-share representation with the optimal integration of two kinds of language-specific knowledge; and (3) Consensus propagation for the feature regularization and learning optimization. The design of architecture components will be described in detail as below.

Language-Specific Network. We utilize the TextCNN architecture [13] for each branch of language-specific network, which has been proved to be very effective for sentence classification. TextCNN can be divided into two stages, that is, one with convolution layers for feature learning, and another with full connected layers for classification. Given training labels of input sentence, the Softmax classification loss function is used to optimize the category discrimination. Formally, given a corpus of sentences of source language $\left\{ S_1, S_2, S_3, \cdots , S_{N-1}, S_N\right\} $, the training loss on a batch of n sentences can be computed as:

$$\begin{aligned} L_{S\_brch}= -\frac{1}{n}\sum _{i=1}^n\log {\left( \frac{\exp {\left( w_{y_i}^TS_i\right) }}{\sum _{k=1}^c\exp {\left( w_k^TS_i\right) }}\right) } \end{aligned}$$

(1)

where c is the number of categories of sentences; $y_i$ denotes the category label of the sentence $S_i$; and w is the prediction function parameter of the training category class k. The training loss for target language branch $L_(T\_brch)$ can be computed in the same manner. Meanwhile, since the source language and target language belong to different language spaces, such two branches of language-specific networks are trained with the uniform architecture but different parameters.

Language-share Network. We perform the language-share feature learning from two language-specific branches. For this purpose, we firstly perform the language-share learning by fusing across from these two branches. For design simplicity and cost efficiency, we achieve the feature fusion on the feature vectors from the concatenation layer before dropout in TextCNN by an operation of Concat$\rightarrow $FC$\rightarrow $Dropout$\rightarrow $FC$\rightarrow $Softmax. This produces a category prediction score for input pair (a sentence in source language and its translated one in target language). We similarly utilize the Softmax classification loss $L_ST$ for the language-share classification learning as that in the language-specific branches.

Consensus Propagation. Inspired by the teacher-student learning approach, we propose to regularize the language-specific learning by consensus feedback from the language-share network. More specifically, we utilize the consensus probability $P_{ST} = \left\lceil p_{ST}^1, p_{ST}^2, \cdots , p_{ST}^{c-1}, p_{ST}^c \right\rceil $ from the language-share network as the teacher signal (called “soft label” versus the ground-truth one-hot “hard label”) to guide the learning process of all language-specific branches (student) concurrently by an additional regularization, which can be formulated in a cross-entropy manner as:

$$\begin{aligned} \mathcal {H}_S=-\frac{1}{c}\sum _{i=1}^c\left( p_{ST}^i\ln {\left( p_s^i\right) }+\left( 1-p_{ST}^i\right) \ln {\left( 1-p_s^i\right) }\right) \end{aligned}$$

(2)

where $P_S=[p_S^1,p_S^2,p_S^3,\cdots ,p_S^{c-1},p_S^c]$ defines the probability prediction over all c sentence classes by the source language branch. Thus the final loss function for the language-specific network can be re-defined via enforcing an additional regularization in Eq. (1).

$$\begin{aligned} L_S= L_{S\_brch}+ \lambda \mathcal {H}_S \end{aligned}$$

(3)

where $\lambda $ controls the importance tradeoff between two terms. The regularization terms $\mathcal {H}_T$ and $L_T$ for target language branch can be computed in the same way.

The training of our proposed model proceeds in two stages. First, we rely on training the language-specific network separately, which is terminated by the early stopping strategy. Afterwards, the language-share network and consensus propagation loss are introduced. We use the whole loss defined in Eq. (3) and $L_{ST}$ to train the language-specific network and language-share network at the same time. In the testing time, given an input sentence and its translated sentence, the final prediction is obtained by averaging the three prediction scores from the language-specific networks and the language-share network.

4 Experiment and Analysis

In this section, we investigate the empirical performance of our proposed architecture on five benchmark datasets for sentence classification.

4.1 Datasets and Experimental Setup

The sentence classification datasets include:

(1)
MR: This dataset includes movie reviews with one sentence per review, in which the classification involves detecting positive/negative reviews [23].
(2)
CR: This dataset contains annotated customer reviews of 5 products, and the target is to predict positive/negative reviews [8].
(3)
Subj: This dataset is a subjectivity dataset, which includes subjective or objective sentiments [22].
(4)
TREC: This dataset focuses on the question classification task that involves 6 question types [18].
(5)
SST-1: This dataset is Stanford Sentiment Treebank, an extension of MR, which contains training/development/testing splits and fine-grained labels (very positive, positive, neutral, negative, very negative) [27].

Similar with [13], the initialized word vectors for source language are obtained from the publicly available word2vec vectors that were trained on 100 billion words from Google News. For target language of Chinese, we retrain the word2vec models on Chinese Wikipedia Corpus; and for target language of Dutch, we retrain the word2vec models on Dutch Wikipedia Corpus. In our experiments, we choose the CNN-multichannel model variant of TextCNN because of its better performance.

4.2 Ablation Study

We first compare our proposed model with several baseline models for sentence classification. Here, we use S+T to indicate that the model’s input contains the source language and the target language. T(*) indicates the type of target language, i.e., T(CH) indicates that the target language is Chinese, and T(DU) indicates that the target language is Dutch. Figure 2 and 3 show the comparison results of classification accuracy rate on five benchmark datasets. CNN(S) denotes the CNN-multichannel model variant of TextCNN, which only uses the source language data of English for training and testing. CNN(T) is a retrained TextCNN model on the translated target language data of Chinese(CH)/Dutch(DU), and the other settings keep the same as CNN(S). Ours(S+T(*)) denotes our model by combining multilingual data augmentation with deep consensus learning. We can find that Ours(S+T(*)) performs much better than those baselines, which proves the effectiveness of our framework. It is obvious that multilingual data augmentation can provide the beneficial additional discrimination for learning a robust sentence representation for classification. It is worth noting that CNN(T) is even better than CNN(S) on MR. This indicates that existing machine translation methods can not only keep the discriminative semantics of source language, but also create useful discrimination in target language space.

Similar to TextCNN, we also use several variants of the model to demonstrate the effectiveness of our model. As we know, when lacking a large supervised training set, we usually use word vectors obtained from unsupervised neural language models to initialize word vectors for performance improvement. Thus we use various word vector initialization methods to validate the model.

The different word vector initialization methods include:

(1)
Rand: All words are randomly initialized and can be trained during training.
(2)
Static: All words of input language are initialized by pre-trained vectors from the corresponding language word2vec. Simultaneously, all these words are kept static during training.
(3)
Non-static: This is an initialization method same to Static, but the pre-trained vectors can be finetuned during training.
(4)
Multichannel: This model contains two types of word vector, which are treated as different channels. One type of word vector can be finetuned during training, while the other keeps static. Two types of word vector are initialized with the same word embedding form word2vec.

In Table 1, we show the experimental results of different model variants based on English$\rightarrow $ Chinese MT. Compared to the source language S, the accuracy rates of the target language T(CH) classification are partly improved or decreased, which shows the strong dataset dependency. Considering that the proposed S+T(CH) model with Multichannel obtains the current optimal results, we choose the model with Multichannel as our final results. Similar to Table 1, we show the experimental results of different model variants based on English$\rightarrow $ Dutch MT in Table 2. Combining the experimental results in Tables 1 and 2, we have enough reasons to prove the validity of our consensus learning method.

Table 1. The experimental results of different model variants based on English$\rightarrow $ Chinese MT.

Full size table

Table 2. The experimental results of different model variants based on English$\rightarrow $ Dutch MT.

Full size table

Table 3. The comparison results between the state-of-the-art approaches and ours.

Full size table

4.3 Comparison with Existing Approaches

To further exhibit the effectiveness of our model, we compare our approach with several state-of-the-art approaches, including recent LSTM-based models and CNN-based models. As shown in Table 3, it can be concluded that our approach can gain very promising results comparing to these methods. The whole performance is measured by the accuracy rate for sentence classification. We roughly divide the existing approaches into four categories. The first category is the RNN-based model, in which Standard-RNN refers to Standard Recursive Neural Network [27], MV-RNN is Matrix-Vector Recursive Neural Network [26], RNTN denotes Recursive Neural Tensor Network [27], and DRNN represents Deep Recursive Neural Network [9]. The second category is the LSTM-based model, in which bi-LSTM stands for Bidirectional LSTM [28], SA-LSTM means Sequence Autoencoder LSTM [4], Tree-LSTM is Tree-Structured LSTM [28], and Standard-LSTM represents Standard LSTM Network [28]. The CNN-based model is the third category, in which DCNN denotes Dynamic Convolutional Neural Network [12], CNN-Multichannel is Convolutional Neural Network with Multichannel [13], MVCNN refers to Multichannel Variable-Size Convolution Neural Network [32], Dep-CNN denotes Dependency-based Convolutional Neural Network [21], MGNC-CNN stands for Multi-Group Norm Constraint CNN [38], and DSCNN represents Dependency Sensitive Convolutional Neural Network [34]. The fourth one is based on other methods, in which Combine-skip refers to skip-thought model with the concatenation of the vectors from uni-skip and bi-skip [14], CFSF indicates initializing Convolutional Filters with Semantic Features [17], and GWS denotes exploiting domain knowledge via Grouped Weight Sharing [37]. Especially on MR, our model of S+T(CH) can achieve the best performance by a margin of nearly $5\%$. This improvement demonstrates that our multilingual data augmentation and consensus learning can make great contributions to such sentence classification task. Through multilingual data augmentation, important words will be retained. The NMT systems can map those ambiguous words in source language to different word units in target language, which can achieve the result of word disambiguation. Essentially, our method can enable CNNs to obtain better discrimination and generalization abilities. To further demonstrate the superiority of our proposed model, we also use English as the source language and Dutch as the target language to evaluate the model of S+T(DU). On the four benchmark datasets of MR, CR, Subj, and TREC, our models of S+T(CH) and S+T(DU) have both achieved the best results at present.

5 Conclusion and Future Work

In this paper, multilingual data augmentation is introduced to further improve sentence classification. A novel deep consensus learning model is established to fuse multilingual data and learn the language-share and language-specific knowledge. The related experimental results demonstrate the effectiveness of our proposed framework. In addition, our method requires no external data comparing to existing methods, which makes it very practical with good generalization abilities in real application scenarios. In the future, we will try to explore the performance of the model on larger sentence/document datasets. The linguistic features of different languages will be also considered when selecting the target language.

Notes

1.
https://cloud.google.com/translate/.

References

Billal, B., Fonseca, A., Sadat, F., Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 1907–1916. IEEE (2017)
Google Scholar
Chen, Y., Zhu, X., Gong, S.: Person re-identification by deep learning multi-scale representations. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2590–2600 (2017)
Google Scholar
Clavel, C., Callejas, Z.: Sentiment analysis: from opinion mining to human-agent interaction. IEEE Trans. Affect. Comput. 7(1), 74–93 (2015)
Article Google Scholar
Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: Advances in Neural Information Processing Systems, pp. 3079–3087 (2015)
Google Scholar
Dong, R., O’Mahony, M.P., Schaal, M., McCarthy, K., Smyth, B.: Sentimental product recommendation. In: Proceedings of the 7th ACM Conference on Recommender Systems, pp. 411–414 (2013)
Google Scholar
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)
MathSciNet MATH Google Scholar
Graves, A., Mohamed, A.r., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
Google Scholar
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)
Google Scholar
Irsoy, O., Cardie, C.: Deep recursive neural networks for compositionality in language. In: Advances in Neural Information Processing Systems, pp. 2096–2104 (2014)
Google Scholar
Jiang, M., et al.: Text classification based on deep belief network and softmax regression. Neural Comput. Appl. 29(1), 61–70 (2016). https://doi.org/10.1007/s00521-016-2401-x
Article MathSciNet Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 655–665 (2014)
Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 655–665 (2014)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751 (2014)
Google Scholar
Kiros, R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Li, S., Zhao, Z., Liu, T., Hu, R., Du, X.: Initializing convolutional filters with semantic features for text classification. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1884–1889 (2017)
Google Scholar
Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)
Google Scholar
Liu, P., Qiu, X., Huang, X.J.: Adversarial multi-task learning for text classification. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1–10 (2017)
Google Scholar
Liu, P., Qiu, X., Huang, X.: Recurrent Neural Network for Text Classification With Multi-task Learning, pp. 2873–2879 (2016)
Google Scholar
Ma, M., Huang, L., Zhou, B., Xiang, B.: Dependency-based convolutional neural networks for sentence embedding. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 174–179 (2015)
Google Scholar
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics (2004)
Google Scholar
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)
Google Scholar
Pang, B., Lee, L., et al.: Opinion mining and sentiment analysis. Found. Trends® Inf. Retrieval 2(1–2), 1–135 (2008)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 86–96 (2016)
Google Scholar
Socher, R., Huval, B., Manning, C.D., Ng, A.Y.: Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201–1211. Association for Computational Linguistics (2012)
Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1556–1566 (2015)
Google Scholar
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2(Nov), 45–66 (2001)
MATH Google Scholar
Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Google Scholar
Yin, W., Schütze, H.: Multichannel variable-size convolution for sentence classification. In: Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pp. 204–214 (2015)
Google Scholar
Yogatama, D., Dyer, C., Ling, W., Blunsom, P.: Generative and discriminative text classification with recurrent neural networks. arXiv preprint arXiv:1703.01898 (2017)
Zhang, R., Lee, H., Radev, D.: Dependency sensitive convolutional neural networks for modeling sentences and documents. In: Proceedings of NAACL-HLT, pp. 1512–1521 (2016)
Google Scholar
Zhang, T., Huang, M., Zhao, L.: Learning structured representation for text classification via reinforcement learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Zhang, X., LeCun, Y.: Text understanding from scratch. arXiv preprint arXiv:1502.01710 (2015)
Zhang, Y., Lease, M., Wallace, B.C.: Exploiting domain knowledge via grouped weight sharing with application to text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 155–160 (2017)
Google Scholar
Zhang, Y., Roller, S., Wallace, B.C.: MGNC-CNN: a simple approach to exploiting multiple word embeddings for sentence classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1522–1527 (2016)
Google Scholar
Zhang, Z., Liu, S., Li, M., Zhou, M., Chen, E.: Joint training for neural machine translation models with monolingual data. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61976057, No. 61572140), and Science and Technology Development Plan of Shanghai Science and Technology Commission (No. 20511101203, No. 20511102702, No. 20511101403, No. 18511105300). Yanfei Wang and Yangdong Chen contributed equally to this work, and were co-first authors. Yuejie Zhang was the corresponding author.

Author information

Authors and Affiliations

School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai, 200433, China
Yanfei Wang, Yangdong Chen & Yuejie Zhang

Authors

Yanfei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yangdong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuejie Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuejie Zhang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Peking University, Beijing, China
Sujian Li
Westlake University, Hangzhou, China
Yue Zhang
Tsinghua University, Beijing, China
Yang Liu
Chinese Academy of Sciences, Beijing, China
Shizhu He
Beijing Language and Culture University, Beijing, China
Gaoqi Rao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Chen, Y., Zhang, Y. (2020). Improving Sentence Classification by Multilingual Data Augmentation and Consensus Learning. In: Sun, M., Li, S., Zhang, Y., Liu, Y., He, S., Rao, G. (eds) Chinese Computational Linguistics. CCL 2020. Lecture Notes in Computer Science(), vol 12522. Springer, Cham. https://doi.org/10.1007/978-3-030-63031-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-63031-7_3
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63030-0
Online ISBN: 978-3-030-63031-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Sentence Classification by Multilingual Data Augmentation and Consensus Learning

Abstract

Similar content being viewed by others

Towards Combining Multitask and Multilingual Learning

Robust Sentence Classification by Solving Out-of-Vocabulary Problem with Auxiliary Word Predictor

Cross-lingual Machine Translation: An Analysis Model for Low Resource Languages

Keywords

1 Introduction