Meta Embeddings for LinCE Dataset

Teja, T. Ravi; Shilpa, S.; Joseph, Neetha

doi:10.1007/978-981-19-7874-6_26

T. Ravi Teja¹²,
S. Shilpa¹² &
Neetha Joseph¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 587))

593 Accesses

Abstract

Language Identification in code-mixed social media text contest aimed at Multilingual Meta Embeddings (MME), a productive method to learn multilingual representations for Language Identification. Language mixing occurs at a sentence boundary, within a sentence, or a word in code-mixing. This paper proffers an MME-driven language identification mechanism for code-mixed text. This study zeroed in on the comparison of different classifiers on Hindi-English code-mixed text data obtained from LinCE Benchmark corpus. LinCE is a centralized benchmark for linguistic code-switching evaluation that integrates ten corpora from four different code-switched language pairings with four tasks. Each instance in the dataset was a code-mixed sentence, and each token in the sentence was associated with a language label. Then we experimented with using different classifiers such as convolutional neural network, Gated Recurrent Unit, Long Short-Term Memory, Bidirectional Long Short-Term Memory, and Bidirectional Gated Recurrent Unit and we observed BiLstm outperformed well. A multilingual meta embedding technique was empirically evaluated for language identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evaluating Input Representation for Language Identification in Hindi-English Code Mixed Text

A New Methodology for Language Identification in Social Media Code-Mixed Text

Deep Learning-Based Language Identification in Code-Mixed Text

References

Thara S, Poornachandran P (2018) Code-mixing: a brief survey. In: 2018 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 2382–2388
Google Scholar
Sravani L, Reddy AS, Thara S (2018) A comparison study of word embedding for detecting named entities of code-mixed data in Indian language. In: 2018 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 2375–2381
Google Scholar
Aguilar G, Kar S, Solorio T (2020) LinCE: a centralized benchmark for linguistic code-switching evaluation. arXiv:2005.04322
Chaitanya I et al (2018) Word level language identification in code-mixed data using word embedding methods for indian languages. In: 2018 international conference on advances in computing, communications and informatics (ICACCI). IEEE
Google Scholar
Veena PVM, Kumar A, Soman KP (2017) An effective way of word-level language identification for code-mixed facebook comments using word-embedding via character-embedding. In: 2017 International conference on advances in computing, communications and informatics (ICACCI). IEEE
Google Scholar
Winata GI, Lin Z, Fung P (2019) Learning multilingual meta-embeddings for code-switching named entity recognition. In: Proceedings of the 4th workshop on representation learning for NLP (RepL4NLP-2019)
Google Scholar
Bollegala D, Bao C (2018) Learning word meta-embeddings by autoencoding. In: Proceedings of the 27th international conference on computational linguistics
Google Scholar
Thara S, Poornachandran P (2021) Transformer based language identification for malayalam-english code-mixed text. IEEE Access 11(9):118837–118850
Google Scholar
Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018) Learning word vectors for 157 languages. In: Proceedings of the international conference on language resources and evaluation (LREC 2018)
Google Scholar
Sreelakshmi K, Premjith B, Amrita_CEN_NLP@ DravidianLangTech-EACL2021 KS (2021) deep learning-based offensive language identification in Malayalam, Tamil and Kannada. In: Proceedings of the first workshop on speech and language technologies for dravidian languages, pp 249–254
Google Scholar
Ruder S (2017) An overview of multi-task learning in deep neural networks
Google Scholar
Singh K, Sen I, Kumaraguru P (2018) Lan-guage identification and named entity recognition in hinglish code mixedtweets. In: Proceedings of ACL 2018, student research workshop, pp 52–58. https://doi.org/10.18653/v1/P18-3008
Sharma A, Gupta S, Motlani R, Bansal P, Shrivastava M, Mamidi R, Sharma DM (2016) Shallow parsing pipeline—Hindi-English code-mixed social media text. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. https://doi.org/10.18653/v1/N16-1159
Khapra MM, Ramanathan A, Kunchukuttan A, Visweswariah K, Bhattacharyya P, When transliteration met crowdsourcing : an empirical study of transliteration via crowdsourcing using efficient, non-redundant and fair quality control. In: Proceedings of the ninth international conference on language resources and evaluation (LREC’14)
Google Scholar
Gupta DK, Kumar S, Ekbal A (2014) Machine learning approach for language identification and transliteration. In: Proceedings of the forum for information retrieval evaluation
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India
T. Ravi Teja, S. Shilpa & Neetha Joseph

Authors

T. Ravi Teja
View author publications
You can also search for this author in PubMed Google Scholar
S. Shilpa
View author publications
You can also search for this author in PubMed Google Scholar
Neetha Joseph
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. Ravi Teja .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Pulchowk Campus, Institute of Engineering, Tribhuvan University, Lalitpur, Nepal
Subarna Shakya
Automation and Applied Informatics, Aurel Vlaicu University of Arad, Arad, Romania
Valentina Emilia Balas
Go Perception Laboratory, Cornell University, Ithaca, NY, USA
Wang Haoxiang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teja, T.R., Shilpa, S., Joseph, N. (2023). Meta Embeddings for LinCE Dataset. In: Shakya, S., Balas, V.E., Haoxiang, W. (eds) Proceedings of Third International Conference on Sustainable Expert Systems . Lecture Notes in Networks and Systems, vol 587. Springer, Singapore. https://doi.org/10.1007/978-981-19-7874-6_26

Download citation

DOI: https://doi.org/10.1007/978-981-19-7874-6_26
Published: 23 February 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7873-9
Online ISBN: 978-981-19-7874-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Meta Embeddings for LinCE Dataset

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluating Input Representation for Language Identification in Hindi-English Code Mixed Text

A New Methodology for Language Identification in Social Media Code-Mixed Text

Deep Learning-Based Language Identification in Code-Mixed Text

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Meta Embeddings for LinCE Dataset

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluating Input Representation for Language Identification in Hindi-English Code Mixed Text

A New Methodology for Language Identification in Social Media Code-Mixed Text

Deep Learning-Based Language Identification in Code-Mixed Text

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation