ADA: An Acronym-Based Data Augmentation Method for Low-Resource Text Classification

Ou, Lizhen; Chen, Honghui; Luo, Xueshan; Li, Xinmeng; Chen, Siya

doi:10.1007/978-981-19-6052-9_35

Lizhen Ou³⁸,
Honghui Chen³⁸,
Xueshan Luo³⁸,
Xinmeng Li³⁸ &
…
Siya Chen³⁸

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 949))

Included in the following conference series:

China Conference on Command and Control

1080 Accesses

Abstract

Text classification is an essential part of natural language processing. With the development of deep learning technology, using deep learning for text classification has become mainstream. However, deep learning methods often require a considerable quantity of data, and collecting and processing data sets is cumbersome and expensive. This paper presents a data enhancement method simulating the generation of acronyms, quickly expanding the text classification datasets. We show the performance of our method on three classical data sets and compare it with two classical text data enhancement methods. The results show that this method can effectively improve the effect of text classification. Meanwhile, by comparing the similarity with the original sentence, it is found that our method has very little change in sentence semantics and is more robust than the baseline method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 389.00; Price excludes VAT (USA)

Softcover Book: USD 499.99; Price excludes VAT (USA)

Hardcover Book: USD 499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Study on Japanese Text Multi-classification with ALBERT-TextCNN

Case Studies of Several Popular Text Classification Methods

Large Scale Text Classification with Efficient Word Embedding

References

Su, J., Zhang, B., Xu, X.: Research progress of text classification technology based on machine learning. J. Softw. 17(9), 12 (2006)
Article Google Scholar
Yu, Q., Liu, R.: Identification of spam based on dependency syntax and convolutional neural network. In: 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). IEEE (2018)
Google Scholar
Xu, H., Yang, W., Wang, J.: Hierarchical emotion classification and emotion component analysis on Chinese micro-blog posts. Expert Syst. Appl. 42(22), 8745–8752 (2015)
Article Google Scholar
Shi, M.: Knowledge graph question and answer system for mechanical intelligent manufacturing based on deep learning. Math. Probl. Eng. 2021(2), 1–8 (2021)
Google Scholar
Ye, Q., Misra, K., Devarapalli, H., et al.: A sentiment based non-factoid question-answering framework. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). IEEE (2019)
Google Scholar
Halperin, J.L., Levine, G.N., Al-Khatib, S.M., et al.: Further evolution of the ACC/AHA clinical practice guideline recommendation classification system. J. Am. Coll. Cardiol. (2015). S0735109715060453
Google Scholar
Xie, Q., Dai, Z., Hovy, E., et al.: Unsupervised data augmentation for consistency training. arXiv:1904.12848 (2019)
Coulombe, C.: Text data augmentation made simple by leveraging NLP cloud APIs 2018. arXiv:1812.04718 (2018)
Wei, J., Zou, K.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. In: Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (2019). arXiv:1901.11196
Xie, Z., Wang, S.I., Li, J., et al.: Data noising as smoothing in neural network language models. International Conference on Learning Representations (ICLR) (2017). arXiv:1703.02573
Barnett, A., Doubleday, Z.: Meta-research: the growth of acronyms in the scientific literature. Elife 9, e60080 (2020)
Article Google Scholar
Maron, M.E.: Automatic indexing: an experimental inquiry. J. ACM 8(3), 404–417 (1961)
Article Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Article Google Scholar
Scott, S., Matwin, S.: Feature engineering for text classification. In: ICML, vol. 99, pp. 379–388 (1999)
Google Scholar
Zhang, W., Gao, F.: An improvement to Naive Bayes for text classification. Procedia Eng. 15, 2160–2164 (2011)
Article Google Scholar
Pu, W., Liu, N., Yan, S., et al.: Local word bag model for text categorization. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 625–630. IEEE (2007)
Google Scholar
Mansuy, T., Hilderman, R.J.: A characterization of WordNet features in Boolean models for text classification. In: Proceedings of the Fifth Australasian Conference on Data Mining and Analytics, vol. 61, pp. 103–109 (2006)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., et al.: Text classification algorithms: a survey. Information 10(4), 150 (2019)
Article Google Scholar
Bohan, L., Yutai, H., Wangxiang, C.: Data augmentation approaches in natural language processing: a survey. arXiv:2110.01852 (2021)
Min, J., Mccoy, R.T., Das, D., et al.: Syntactic data augmentation increases robustness to inference heuristics. arXiv:2004.11999 (2020)
Kang, D., Khot, T., Sabharwal, A., et al.: AdvEntuRe: adversarial training for textual entailment with knowledge-guided examples. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2018)
Google Scholar
Anaby-Tavor, A., Carmeli, B., Goldbraich, E., et al.: Not enough data? Deep learning to the rescue! arXiv:1911.03118 (2019)
Thakur, N., Reimers, N., Daxenberger, J., et al.: Augmented SBERT: data augmentation method for improving bi-encoders for pairwise sentence scoring tasks. arXiv:2010.08240 (2020)
Richard, S., John, B., et al.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (2013)
Google Scholar
Nishikawa, S., Ri, R., Tsuruoka, Y.: Data augmentation with unsupervised machine translation improves the structural similarity of cross-lingual word embeddings. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop (2021)
Google Scholar
Almeida, T. A., Hidalgo, J.M.G., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG 2011), Mountain View, CA, USA (2011)
Google Scholar
Voorhees, E.: The TREC-8 question answering track evaluation. In: Proceedings of Text Retrieval Conference (1999)
Google Scholar
Zhiqiang, H.E., Yang, J., Luo, C.: Combination characteristics based on BiLSTM for short text classification. Intell. Comput. Appl. 9, 21–27 (2019)
Google Scholar
Sha Rfu Ddin, A.A., Tihami, M.N., Islam, M.S.: A deep recurrent neural network with BiLSTM model for sentiment classification. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP). IEEE (2018)
Google Scholar
Leila, A., Franziska, H., Grégoire, M., et al.: “What is relevant in a text document?”: an interpretable machine learning approach. PLoS ONE 12(8), 0181142 (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)
Jeffrey, P., Richard, S., Christopher, M.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Science and Technology on Information System Engineering Laboratory, National University of Defence Technology, Changsha, China
Lizhen Ou, Honghui Chen, Xueshan Luo, Xinmeng Li & Siya Chen

Authors

Lizhen Ou
View author publications
You can also search for this author in PubMed Google Scholar
Honghui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xueshan Luo
View author publications
You can also search for this author in PubMed Google Scholar
Xinmeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Siya Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xueshan Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ou, L., Chen, H., Luo, X., Li, X., Chen, S. (2022). ADA: An Acronym-Based Data Augmentation Method for Low-Resource Text Classification. In: Proceedings of 2022 10th China Conference on Command and Control. C2 2022. Lecture Notes in Electrical Engineering, vol 949. Springer, Singapore. https://doi.org/10.1007/978-981-19-6052-9_35

Download citation

DOI: https://doi.org/10.1007/978-981-19-6052-9_35
Published: 30 August 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-6051-2
Online ISBN: 978-981-19-6052-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

ADA: An Acronym-Based Data Augmentation Method for Low-Resource Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Study on Japanese Text Multi-classification with ALBERT-TextCNN

Case Studies of Several Popular Text Classification Methods

Large Scale Text Classification with Efficient Word Embedding

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

ADA: An Acronym-Based Data Augmentation Method for Low-Resource Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Study on Japanese Text Multi-classification with ALBERT-TextCNN

Case Studies of Several Popular Text Classification Methods

Large Scale Text Classification with Efficient Word Embedding

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation