Intrinsic Evaluation of Lithuanian Word Embeddings Using WordNet

Kapočiūtė-Dzikienė, Jurgita; Damaševičius, Robertas

doi:10.1007/978-3-319-91189-2_39

Jurgita Kapočiūtė-Dzikienė¹⁵ &
Robertas Damaševičius¹⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 764))

Included in the following conference series:

Computer Science On-line Conference

1013 Accesses
3 Citations

Abstract

Neural network-based word embeddings –outperforming traditional approaches in the various Natural Language Processing tasks – have gained a lot of interest recently. Despite it, the Lithuanian word embeddings have never been obtained and evaluated before. Here we have used the Lithuanian corpus of $\sim $234 thousand running words and produced several word embedding models: based on the continuous bag-of-words and skip-gram architectures; softmax and negative sampling training algorithms; varied number of dimensions (100, 300, 500, and 1,000). Word embeddings were evaluated using the Lithuanian WordNet as the resource for the synonym search. We have determined the superiority of the continuous bag-of-words over the skip-gram architecture; while the training algorithm and dimensionality showed no significant impact on the results. Better results were achieved with the continuous bag-of-words, negative sampling and 1,000 dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cbow Training Time and Accuracy Optimization Using SkipGram

A survey on training and evaluation of word embeddings

Article 17 February 2021

A Systematic Literature Review on Word Embeddings

Notes

1.
Describe in detail in https://code.google.com/archive/p/word2vec/.
2.
The Google word embeddings model can be downloaded from: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit.
3.
Downloaded from https://dumps.wikimedia.org/ltwiktionary/.
4.
This corpus STENOGRAMOS_INDV can be downloaded from http://dangus.vdu.lt/~jkd/eng/?page_id=16.
5.
The whole corpus of the Contemporary Lithuanian Language is at http://clarin.vdu.lt:8080/xmlui/handle/20.500.11821/16.
6.
This corpus of the fiction texts GROŽINĖ_INDV can be downloaded from http://dangus.vdu.lt/~jkd/eng/?page_id=16.
7.
Literary works downloaded from http://ebiblioteka.mkp.emokykla.lt/.
8.
These embeddings were downloaded from https://fasttext.cc/docs/en/pretrained-vectors.html.
9.
Downloaded from http://korpus.juls.savba.sk/ltskwn_en.html.

References

Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57, 345–420 (2016)
MathSciNet MATH Google Scholar
Venekoski, V., Puuska, S., Vankka, J.: Vector space representations of documents in classifying finnish social media texts. In: ICIST: Communications in Computer and Information Science, vol. 639, pp. 525–535 (2016)
Google Scholar
Mandelbaum, A., Shalev, A.: Word embeddings and their use in sentence classification tasks. CoRR, abs/1610.08229 (2016)
Google Scholar
Bengio, S., Heigold, G.: Word embeddings for speech recognition. In: Proceedings of the 15th Conference of the International Speech Communication Association (Interspeech) (2014)
Google Scholar
Zou, W.Y., Socher, R., Cer, D.M., Manning, C.D.: Bilingual word embeddings for phrase-based machine translation. In: EMNLP, pp. 1393–1398 (2013)
Google Scholar
Denis, P., Dehouck, M.: Delexicalized word embeddings for cross-lingual dependency parsing. In: EACL, vol. 1, pp. 241–250 (2017)
Google Scholar
Tulkens, S., Emmery, C., Daelemans, W.: Evaluating unsupervised dutch word embeddings as a linguistic resource. CoRR abs/1607.00225 (2016)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.D: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Google Scholar
Cotterell, R., Schütze, H.: Morphological word-embeddings. In: HLT-NAACL, pp. 1287–1292 (2015)
Google Scholar
Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.: Joint learning of character and word embeddings. In: IJCAI, pp. 1236–1242 (2015)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. CoRR, abs/1607.04606 (2016)
Google Scholar
Ruder, S., Vulić, I., Søgaard, A.: A survey of cross-lingual embedding models. CoRR, abs/1706.04902 (2017)
Google Scholar
Deeplearning4j Development Team: Deeplearning4j: Open-source distributed deep learning for the JVM, Apache Software Foundation License 2.0. http://deeplearning4j.org (2017)
Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: The 10th International Workshop on Artificial Intelligence and Statistics (AISTATS 2005), pp. 246–252 (2005)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Goldberg, Y., Levy, O.: word2vec Explained: deriving Mikolov et al’.s negative-sampling word-embedding method. CoRR, abs/1402.3722 (2014)
Google Scholar
Schnabel, T., Labutov, I., Mimno, D.M., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: EMNLP, pp. 298–307 (2015)
Google Scholar
Gladkova, A., Drozd, A.: Intrinsic evaluations of word embeddings: what can we do better? In: Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pp. 36–42 (2016)
Google Scholar
Garabík, R., Pileckytė, I.: From multilingual dictionary to Lithuanian WordNet. In: Natural Language Processing, Corpus Linguistics, E-Learning, pp. 74–80 (2013)
Google Scholar
Faruqui, M., Tsvetkov, Y., Rastogi, P., Dyer, C.: Problems with evaluation of word embeddings using word similarity tasks. In: RepEval@ACL, pp. 30–35 (2016)
Google Scholar
McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 2(12), 153–157 (1947)
Article Google Scholar
Lai, S., Liu, K., Xu, L., Zhao, J.: How to generate a good word embedding? CoRR, abs/1507.05523 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Vytautas Magnus University, K. Donelaičio 58, 44248, Kaunas, Lithuania
Jurgita Kapočiūtė-Dzikienė
Kaunas University of Technology, K. Donelaičio 73, 44029, Kaunas, Lithuania
Robertas Damaševičius

Authors

Jurgita Kapočiūtė-Dzikienė
View author publications
You can also search for this author in PubMed Google Scholar
Robertas Damaševičius
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robertas Damaševičius .

Editor information

Editors and Affiliations

Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Radek Silhavy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kapočiūtė-Dzikienė, J., Damaševičius, R. (2019). Intrinsic Evaluation of Lithuanian Word Embeddings Using WordNet. In: Silhavy, R. (eds) Artificial Intelligence and Algorithms in Intelligent Systems. CSOC2018 2018. Advances in Intelligent Systems and Computing, vol 764. Springer, Cham. https://doi.org/10.1007/978-3-319-91189-2_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-91189-2_39
Published: 27 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91188-5
Online ISBN: 978-3-319-91189-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Intrinsic Evaluation of Lithuanian Word Embeddings Using WordNet

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cbow Training Time and Accuracy Optimization Using SkipGram

A survey on training and evaluation of word embeddings

A Systematic Literature Review on Word Embeddings

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Intrinsic Evaluation of Lithuanian Word Embeddings Using WordNet

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cbow Training Time and Accuracy Optimization Using SkipGram

A survey on training and evaluation of word embeddings

A Systematic Literature Review on Word Embeddings

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation