Abstract
Age is one of the most important attributes in one user’s profile. Age detection has many applications like personalized search, targeted advertisement and recommendation. Current research has uncovered the relationship between the use of western language and social identities to some extents. However, the age detection problem for Chinese users is so far unexplored. Due to the cultural and societal difference, some well known features in English may not be applicable to the Chinese users. For example, while the frequency of capitalized letter in English has proved to be a good feature, Chinese users do not have such patterns. Moreover, Chinese has its own characteristics such as rich emoticons, complex syntax and unique lexicon structures. Hence age detection for Chinese users is a new big challenge.
In this paper, we present our age detection study on a corpus of microblogs from 3200 users in Sina Weibo. We construct three types of Chinese language patterns, including stylistic, lexical, and syntactic features, and then investigate their effects on age prediction. We find a number of interesting language patterns: (1) there is a significant topic divergence among Chinese people in various age groups, (2) the young people are open and easy to accept new slangs from the internet or foreign languages, and (3) the young adult people exhibit distinguished syntactic structures from all other people. Our best result reaches an accuracy of 88% when classifying users into four age groups.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Bergsma, S., Durme, B.V.: Using conceptual class attributes to characterize social media users. In: Proc. of ACL, pp. 710–720 (2013)
Cheng, N., Chen, X., Chandramouli, R., Subbalakshmi, K.P.: Gender identification from e-mails. In: CIDM, pp. 154–158 (2009)
Garera, N., Yarowsky, D.: Modeling latent biographic attributes in conversational genres. In: Proc. of ACL and IJCNLP, pp. 710–718 (2009)
Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: Proc. of ICWSM, pp. 214–217 (2009)
Gressel, G., Hrudya, P., Surendran, K., Thara, S., Aravind, A., Poornachandran, P.: Ensemble learning approach for author profiling. In: PAN at CLEF (2014)
Kabbur, S., Han, E.H., Karypis, G.: Content-based methods for predicting web-site demographic attributes. In: Proc. of ICDM (2010)
Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. PNAS 110, 5802–5805 (2013)
Li, J., Ritter, A., Hovy, E.: Weakly supervised user profile extraction from twitter. In: Proc. of ACL, pp. 165–174 (2014)
Mislove, A., Viswanath, B., Gummadi, P.K., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: Proc. of WSDM, pp. 251–260 (2010)
Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proc. of EMNLP, pp. 207–217 (2010)
Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: “how old do you think i am?”: A study of language and age in twitter. In: Proc. of ICWSM, pp. 439–448 (2013)
Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proc. of the 5th ACL-HLT Workshop, pp. 115–123 (2011)
Nguyen, D., Trieschnigg, D., Dog̀ruöz, A.S., Grave, R., Theune, M., Meder, T., de Jong, F.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. In: Proc. of COLING, pp. 1950–1961 (2014)
Otterbacher, J.: Inferring gender of movie reviewers: exploiting writing style, content and metadata. In: Proc. of CIKM, pp. 369–378 (2010)
Peersman, C., Daelemans, W., Vaerenbergh, L.V.: Predicting age and gender in online social networks. In: Proc. of SMUC, pp. 37–44 (2011)
Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proc. of SMUC, pp. 37–44 (2010)
Rosenthal, S., McKeown, K.: Age prediction in blogs: a study of style, content, and online behavior in pre- and post-social media generations. In: Proc. of ACL, pp. 763–772 (2011)
Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: Proc. of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, pp. 199–205 (2005)
Tam, J., Martell., C.H.: Age detection in chat. In: Proc. of ICSC, pp. 33–39 (2009)
Xiao, C., Zhou, F., Wu, Y.: Predicting audience gender in online content-sharing social networks. JASIST 64, 1284–1297 (2013)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of ICML, pp. 412–420 (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, L., Qian, T., Wang, F., You, Z., Peng, Q., Zhong, M. (2015). Age Detection for Chinese Users in Weibo. In: Dong, X., Yu, X., Li, J., Sun, Y. (eds) Web-Age Information Management. WAIM 2015. Lecture Notes in Computer Science(), vol 9098. Springer, Cham. https://doi.org/10.1007/978-3-319-21042-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-21042-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21041-4
Online ISBN: 978-3-319-21042-1
eBook Packages: Computer ScienceComputer Science (R0)