Skip to main content

ADA: An Acronym-Based Data Augmentation Method for Low-Resource Text Classification

  • Conference paper
  • First Online:
Proceedings of 2022 10th China Conference on Command and Control (C2 2022)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 949))

Included in the following conference series:

  • 1080 Accesses

Abstract

Text classification is an essential part of natural language processing. With the development of deep learning technology, using deep learning for text classification has become mainstream. However, deep learning methods often require a considerable quantity of data, and collecting and processing data sets is cumbersome and expensive. This paper presents a data enhancement method simulating the generation of acronyms, quickly expanding the text classification datasets. We show the performance of our method on three classical data sets and compare it with two classical text data enhancement methods. The results show that this method can effectively improve the effect of text classification. Meanwhile, by comparing the similarity with the original sentence, it is found that our method has very little change in sentence semantics and is more robust than the baseline method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 389.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 499.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Su, J., Zhang, B., Xu, X.: Research progress of text classification technology based on machine learning. J. Softw. 17(9), 12 (2006)

    Article  Google Scholar 

  2. Yu, Q., Liu, R.: Identification of spam based on dependency syntax and convolutional neural network. In: 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). IEEE (2018)

    Google Scholar 

  3. Xu, H., Yang, W., Wang, J.: Hierarchical emotion classification and emotion component analysis on Chinese micro-blog posts. Expert Syst. Appl. 42(22), 8745–8752 (2015)

    Article  Google Scholar 

  4. Shi, M.: Knowledge graph question and answer system for mechanical intelligent manufacturing based on deep learning. Math. Probl. Eng. 2021(2), 1–8 (2021)

    Google Scholar 

  5. Ye, Q., Misra, K., Devarapalli, H., et al.: A sentiment based non-factoid question-answering framework. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). IEEE (2019)

    Google Scholar 

  6. Halperin, J.L., Levine, G.N., Al-Khatib, S.M., et al.: Further evolution of the ACC/AHA clinical practice guideline recommendation classification system. J. Am. Coll. Cardiol. (2015). S0735109715060453

    Google Scholar 

  7. Xie, Q., Dai, Z., Hovy, E., et al.: Unsupervised data augmentation for consistency training. arXiv:1904.12848 (2019)

  8. Coulombe, C.: Text data augmentation made simple by leveraging NLP cloud APIs 2018. arXiv:1812.04718 (2018)

  9. Wei, J., Zou, K.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. In: Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (2019). arXiv:1901.11196

  10. Xie, Z., Wang, S.I., Li, J., et al.: Data noising as smoothing in neural network language models. International Conference on Learning Representations (ICLR) (2017). arXiv:1703.02573

  11. Barnett, A., Doubleday, Z.: Meta-research: the growth of acronyms in the scientific literature. Elife 9, e60080 (2020)

    Article  Google Scholar 

  12. Maron, M.E.: Automatic indexing: an experimental inquiry. J. ACM 8(3), 404–417 (1961)

    Article  Google Scholar 

  13. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  Google Scholar 

  14. Scott, S., Matwin, S.: Feature engineering for text classification. In: ICML, vol. 99, pp. 379–388 (1999)

    Google Scholar 

  15. Zhang, W., Gao, F.: An improvement to Naive Bayes for text classification. Procedia Eng. 15, 2160–2164 (2011)

    Article  Google Scholar 

  16. Pu, W., Liu, N., Yan, S., et al.: Local word bag model for text categorization. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 625–630. IEEE (2007)

    Google Scholar 

  17. Mansuy, T., Hilderman, R.J.: A characterization of WordNet features in Boolean models for text classification. In: Proceedings of the Fifth Australasian Conference on Data Mining and Analytics, vol. 61, pp. 103–109 (2006)

    Google Scholar 

  18. Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  19. Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  20. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., et al.: Text classification algorithms: a survey. Information 10(4), 150 (2019)

    Article  Google Scholar 

  21. Bohan, L., Yutai, H., Wangxiang, C.: Data augmentation approaches in natural language processing: a survey. arXiv:2110.01852 (2021)

  22. Min, J., Mccoy, R.T., Das, D., et al.: Syntactic data augmentation increases robustness to inference heuristics. arXiv:2004.11999 (2020)

  23. Kang, D., Khot, T., Sabharwal, A., et al.: AdvEntuRe: adversarial training for textual entailment with knowledge-guided examples. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2018)

    Google Scholar 

  24. Anaby-Tavor, A., Carmeli, B., Goldbraich, E., et al.: Not enough data? Deep learning to the rescue! arXiv:1911.03118 (2019)

  25. Thakur, N., Reimers, N., Daxenberger, J., et al.: Augmented SBERT: data augmentation method for improving bi-encoders for pairwise sentence scoring tasks. arXiv:2010.08240 (2020)

  26. Richard, S., John, B., et al.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (2013)

    Google Scholar 

  27. Nishikawa, S., Ri, R., Tsuruoka, Y.: Data augmentation with unsupervised machine translation improves the structural similarity of cross-lingual word embeddings. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop (2021)

    Google Scholar 

  28. Almeida, T. A., Hidalgo, J.M.G., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG 2011), Mountain View, CA, USA (2011)

    Google Scholar 

  29. Voorhees, E.: The TREC-8 question answering track evaluation. In: Proceedings of Text Retrieval Conference (1999)

    Google Scholar 

  30. Zhiqiang, H.E., Yang, J., Luo, C.: Combination characteristics based on BiLSTM for short text classification. Intell. Comput. Appl. 9, 21–27 (2019)

    Google Scholar 

  31. Sha Rfu Ddin, A.A., Tihami, M.N., Islam, M.S.: A deep recurrent neural network with BiLSTM model for sentiment classification. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP). IEEE (2018)

    Google Scholar 

  32. Leila, A., Franziska, H., Grégoire, M., et al.: “What is relevant in a text document?”: an interpretable machine learning approach. PLoS ONE 12(8), 0181142 (2017)

    Google Scholar 

  33. Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)

  34. Jeffrey, P., Richard, S., Christopher, M.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xueshan Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Chinese Institute of Command and Control

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ou, L., Chen, H., Luo, X., Li, X., Chen, S. (2022). ADA: An Acronym-Based Data Augmentation Method for Low-Resource Text Classification. In: Proceedings of 2022 10th China Conference on Command and Control. C2 2022. Lecture Notes in Electrical Engineering, vol 949. Springer, Singapore. https://doi.org/10.1007/978-981-19-6052-9_35

Download citation

Publish with us

Policies and ethics