Multi-lingual Author Profiling: Predicting Gender and Age from Tweets!

Rahman, Md. Ataur; Akter, Yeasmin Ara

doi:10.1007/978-3-030-51859-2_46

Md. Ataur Rahman¹⁸ &
Yeasmin Ara Akter¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1200))

Included in the following conference series:

International Conference on Image Processing and Capsule Networks

856 Accesses
1 Citations

Abstract

This article describes how we build a multi-lingual classification system for author profiling. We have used Twitter corpus for English, Dutch, Italian and Spanish languages for building different models incorporating SVM classifier that predicts the gender and age of an author. We evaluated each model using 3-fold cross-validation on the training dataset for each of these languages. The overall maximum average accuracy for gender classification was 81.3% for Spanish while for classification of age we achieved a maximum accuracy score of 70.3% for English using the cross-validation scheme. For other languages, the results were between 64–76%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Measuring Gender: A Machine Learning Approach to Social Media Demographics and Author Profiling

Machine learning based approaches for age and gender prediction from tweets

Article 29 March 2022

Bots and Gender Detection on Twitter Using Stylistic Features

Notes

1.
https://pypi.python.org/pypi/stop-words.
2.
Source scikit-learn - http://scikit-learn.org/.

References

Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. In: CLEF Conference on Multilingual and Multimodal Information Access Evaluation, pp. 352–365. CELCT (2013)
Google Scholar
Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkmann, M., Stein, B., Verhoeven, B., Daelemans, W.: Overview of the 2nd author profiling task at PAN 2014. In: CLEF 2014 Evaluation Labs and Workshop Working Notes Papers, Sheffield, UK, 2014, pp. 1–30 (2014)
Google Scholar
Malmasi, S., Zampieri, M., Ljubešić, N., Nakov, P., Ali, A., Tiedemann, J.: Discriminating between similar languages and Arabic dialect identification: a report on the third DSL shared task. In: Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pp. 1–14 (2016)
Google Scholar
Rangel, F., Rosso, P., Montes-y Gómez, M., Potthast, M., Stein, B.: Overview of the 6th author profiling task at PAN 2018: multimodal gender identification in Twitter. Working Notes Papers of the CLEF (2018)
Google Scholar
Tellez, E.S., Miranda-Jiménez, S., Moctezuma, D., Graff, M., Salgado, V., Ortiz-Bejar, J.: Gender identification through multi-modal tweet analysis using microtc and bag of visual words. In: Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018) (2018)
Google Scholar
Daneshvar, S., Inkpen, D.: Gender identification in Twitter using n-grams and LSA. In: Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018) (2018)
Google Scholar
Nieuwenhuis, M., Wilkens, J.: Twitter text and image gender classification with a logistic regression n-gram model. In: Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018) (2018)
Google Scholar
Rangel, F., Rosso, P., Verhoeven, B., Daelemans, W., Potthast, M., Stein, B.: Overview of the 4th author profiling task at PAN 2016: cross-genre evaluations. In: Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings/Balog, Krisztian [edit.]; et al, pp. 750–784 (2016)
Google Scholar
Aragón, M.E., López-Monroy, A.P.: A straightforward multimodal approach for author profiling. In: Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018) (2018)
Google Scholar
López-Santillán, R., Gonzalez-Gurrola, L., Ramfrez-Alonso, G.: Custom document embeddings via the centroids method: gender classification in an author profiling task. In: Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018), vol. 2125 (2018)
Google Scholar
Ciccone, G., Sultan, A., Laporte, L., Egyed-Zsigmond, E., Alhamzeh, A.,Granitzer, M.: Stacked gender prediction from tweet texts and images note book for PAN at CLEF 2018 (2018)
Google Scholar
Patra, B.G., Das, K.G., Das, D.: Multimodal author profiling for Twitter. Notebook for PAN at CLEF (2018)
Google Scholar
Veenhoven, R., Snijders, S., van der Hall, D., van Noord, R.: Using translated data to improve deep learning author profiling models. In: Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018), vol. 2125 (2018)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019)
Google Scholar
Basile, A., Dwyer, G., Medvedeva, M., Rawee, J., Haagsma, H., Nissim, M.: Simply the best: minimalist system trumps complex models in author profiling. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 143–156. Springer, Cham (2018)
Google Scholar
Rangel Pardo, F.M., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: CLEF 2015 Evaluation Labs and Workshop Working Notes Papers, pp. 1–8 (2015)
Google Scholar
Rangel, F., Rosso, P., Potthast, M., Stein, B.: Overview of the 5th author profiling task at PAN 2017: gender and language variety identification in twitter. Working Notes Papers of the CLEF (2017)
Google Scholar
Loper, E., Bird, S.: NLTK: the natural language toolkit. arXiv preprint cs/0205028 (2002)
Google Scholar
Porter, M.F., et al.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Premier University, Chittagong, Bangladesh
Md. Ataur Rahman
East Delta University, Chittagong, Bangladesh
Yeasmin Ara Akter

Authors

Md. Ataur Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Yeasmin Ara Akter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yeasmin Ara Akter .

Editor information

Editors and Affiliations

Department of Electrical Engineering, Dayeh University, Changhua, Taiwan
Joy Iong-Zong Chen
Instituto de Ciência e Inovação em Engenharia Mecânica e Engenharia Industrial, Departamento de Engenharia Mecânica, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal
João Manuel R. S. Tavares
Department of Electronics and Computer Engineering, Tribhuvan University, Lalitpur, Nepal
Subarna Shakya
College of Engineering, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
Abdullah M. Iliyasu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rahman, M.A., Akter, Y.A. (2021). Multi-lingual Author Profiling: Predicting Gender and Age from Tweets!. In: Chen, J.IZ., Tavares, J.M.R.S., Shakya, S., Iliyasu, A.M. (eds) Image Processing and Capsule Networks. ICIPCN 2020. Advances in Intelligent Systems and Computing, vol 1200. Springer, Cham. https://doi.org/10.1007/978-3-030-51859-2_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-51859-2_46
Published: 24 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51858-5
Online ISBN: 978-3-030-51859-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Multi-lingual Author Profiling: Predicting Gender and Age from Tweets!

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Measuring Gender: A Machine Learning Approach to Social Media Demographics and Author Profiling

Machine learning based approaches for age and gender prediction from tweets

Bots and Gender Detection on Twitter Using Stylistic Features

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-lingual Author Profiling: Predicting Gender and Age from Tweets!

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Measuring Gender: A Machine Learning Approach to Social Media Demographics and Author Profiling

Machine learning based approaches for age and gender prediction from tweets

Bots and Gender Detection on Twitter Using Stylistic Features

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation