Personality Facets Recognition from Text

dos Santos, Wesley Ramos; Paraboni, Ivandré

doi:10.1007/978-3-030-28577-7_15

Wesley Ramos dos Santos¹⁷ &
Ivandré Paraboni¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11696))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1184 Accesses
5 Citations

Abstract

Fundamental Big Five personality traits (e.g., Extraversion) and their facets (e.g., Activity) are known to correlate with a broad range of linguistic features and, accordingly, the recognition of personality traits from text is a well-known Natural Language Processing task. Labelling text data with facets information, however, may require the use of lengthy personality inventories, and perhaps for that reason existing computational models of this kind are usually limited to the recognition of the fundamental traits. Based on these observations, this paper investigates the issue of personality facets recognition from text labelled only with information available from a shorter personality inventory. In doing so, we provide a low-cost model for the recognition of certain personality facets, and present reference results for further studies in this field.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Text-Based Automatic Personality Recognition: Recent Developments

Big Five Personality Recognition from Multiple Text Genres

Lingual markers for automating personality profiling: background and road ahead

Article 22 September 2022

Keywords

1 Introduction

The Big Five personality model [4] comprises five fundamental categories of personality - Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness to experience - which are further divided into dozens of more specific facets. For instance, the Neuroticism category includes facets representing Anxiety, Depression etc. Big Five categories are strongly correlated to (and possibly defined by) language use and, as a result, the recognition of an individual’s personality traits from text is a well-established task in the Natural Language Processing (NLP) field [14].

Models for the recognition of personality traits from text are usually based on supervised machine learning methods that take as an input a text corpus labelled with personality scores. These scores, in turn, are computed from a range of personality inventories (or questionnaires) such as the BFI-44 inventory [7]. The BFI-44 consists of a relatively short, 44 multiple-choice inventory conveying short items such as ‘I see myself as someone who is depressed, blue’. Items are to be answered on a zero (disagree strongly) to five (agree strongly) scale.

Knowing the five fundamental categories of personality of an individual may be sufficient for a number of practical applications. For others, however, a more detailed assessment of personality facets may be called-for. Assessing personality facets usually involves the use of a more extensive personality inventory, such as the 260-item NEO-PI-R [8]. From a computational perspective, however, large or complex inventories of this kind may be impractical, which may explain why studies on personality recognition from text [9, 11, 14, 17] are usually limited to the five main personality categories obtainable from short inventories such as the BFI-44.

Despite these difficulties, a compromise between convenience (as in the BFI-44) and expressiveness (as in NEO-PI-R) may still be possible. In particular, we notice that the work in [18] proved evidence that, although most facets cannot be explicitly captured by the BFI-44, a small subset of 10 facets (two from each of the main Big Five factors) are inferable from this short scale. Thus, it may be possible to obtain at least some of the facet labels available from NEO-PI-R at a much lower cost.

Based on these observations, the actual NLP question to be investigated in this paper is whether the 10 additional facets proposed in [18] may be automatically recognised from text labelled with BFI-44 information only. To this end, we developed a series of binary classifiers for Big Five facet recognition from a labelled corpus of Brazilian Facebook status updates, and we present reference results for further studies in this field. To the best of our knowledge, our work is the first attempt to learn personality facets in this way, and it is most likely the first of its kind to be devoted to the Brazilian Portuguese language.

2 Related Work

We are not aware of any large-scale work on Big Five facet recognition from text, but there is a wide range of studies focused on the more general task of recognising its main five personality categories. Given that the applicable methods are presumably similar, in what follows we briefly review a number of instances of the latter.

The work in [9] presents a comprehensive view of the personality recognition task from multiple computational perspectives (i.e., as classification, regression and ranking tasks), by comparing the use of written essays and speech corpus as input data, and by comparing the use of self-reported Big Five scores and those produced by specialists, among other issues. The study makes extensive use of psycholinguistic features provided by the LIWC [12] and MRC [3] databases, and results suggest that using ranking algorithms, speech as input data, and personality reports produced by specialists work best.

Contrary to the use of psycholinguistics-motivated features in [9] and others, the work in [11] makes use of n-gram models to classify extremes of personality using both Naive-Bayes and SVM models. Evaluation based on a corpus of personal blogs achieves maximum accuracy of 65%.

In the context of the PAN-CLEF shared task series [14], a number of supervised models of personality recognition based on Twitter data labelled with personality scores obtained from a 10-item Big Five inventory have been developed. These include the overall winner of the competition [1], which combines second order attributes with a LSA text representation; the work in [5], which makes use of char and POS n-gram models, and the work in [19], which makes use of TF-IDF counts and stylistic features. For details, we refer to [14].

3 Personality Facet Recognition

The present study aims to compare a number of models of personality facet recognition from text. More specifically, we consider the set of 10 personality facets that, according to the method discussed in [18], may be inferred from the BFI-44 inventory [7]: Assertiveness and Activity facets (under the main Extraversion category), Altruism and Compliance (under Agreeableness), Order and Self-discipline (under Conscientiousness), Anxiety and Depression (under Neuroticism), and Aesthetics and Ideas (under Openness to experience).

The method proposed in [18] consists of a series of theoretically-motivated calculations (in addition to those already performed to obtain the basic Big Five personality scores) over the set of 44 responses provided by the BFI-44 inventory. Thus, provided that the full set of BFI-44 responses about an individual is known, computing these 10 additional facet scores is straightforward.

For instance, according to [18], the Activity facet of the Big Five Extraversion category is defined as the simple average of two of the BFI-44 scores from which the main Extraversion score is obtained in the first place. In the present work, these facet scores are therefore taken as given, and we do not discuss the underlying method to obtain them. For details, see [18].

Following existing work on Big Five personality recognition for the English language and others [9, 11], personality facet recognition is presently regarded as a set of independent binary classification tasks. To this end, a document is to be labelled as a positive instance of a given facet if the corresponding author shows an above-average score for that facet when considering the entire set of authors in the domain. Since personality facets are, by definition, independent from each other [4], each document is to be assigned ten individual labels corresponding to each facet, which are to be classified one at a time.

4 Experiment

4.1 Overview

We devised an experiment to compare three binary classifiers for personality facet recognition from text:

BoW: bag-of-words features from the 3000 most frequent words in corpus
skip: average word vectors obtained from a skip-gram-1000 model
cbow: average word vectors obtained from a cbow-1000 model

The Bow model is built using Naive Bayes classification. Both skip and cbow models are built using logistic regression and pre-trained word embeddings computed from a 150-million Brazilian Twitter corpus using word2vec [10] with window size = 5 and min_count = 10. In addition to these three classifiers, we also consider a simple Majority class baseline system for illustration purposes.

4.2 Data

We use the 2.2 million-words b5-post corpus of Brazilian Facebook [13], conveying 194k status updates written by 1019 users, which are accompanied by self-reported BFI-44 [7] inventories filled-in by every user. The b5-post corpus has been previously taken as the input to a number of author profiling tasks [6], including personality recognition [17].

The text portion of the corpus was subject to basic spell checking and term substitution (e.g., laugh expressions such as ‘haha’ were replaced by a common $LAUGH$ symbol etc.) From the corpus inventories, 10 additional personality facets were inferred according to the method in [18]. This information constitutes the set of ten class labels for each document as discussed in the previous section.

4.3 Procedure

All models were built using 10-fold cross validation over the entire b5-post dataset. However, since that we now intend to learn ten (facet) classes, and not only five (main categories), and since many facets may be considerably more sparse than others (e.g., the Depression facet of Neuroticism may be naturally less common than, say, Self-consciousness), data imbalance is a major concern to our work. As a means to alleviate this, we resort to SMOTE minority sampling [2] with $k=5$ neighbours.

5 Results

Table 1 shows reference results for the majority class baseline, and for the three models of interest. The first column represents mean F1 scores over the ten classification tasks, followed by the number of times (wins) in which each model was the overall winner, and the mean F1 measure for each individual class.

Table 1. 10-fold cross validation mean F1 scores.

Full size table

Although all models present a considerable improvement over our admittedly simple baseline, the distinction among them is narrow, particularly between BoW and skip. A slight advantage of the cbow model over the others is however noticeable in the number of classes (wins) for which cbow was the overall winner (7 out of 10 classification tasks).

As it is usually the case in personality classification, some personality traits tend to be more evident from text than others. In the present setting, we notice that Compliance and Depression recognition were the most challenging tasks. However, it remains unclear whether these facets are less explicit in language use in general, or simply less explicit in our Facebook domain.

Finally, we notice that the present results are generally similar to those observed in Big Five personality classification in English [9] and other languages, and also along the lines of previous studies on the recognition of the main Big Five categories from the b5-post corpus [15, 16].

6 Final Remarks

This paper presented a number of models of Big Five facet recognition from a Brazilian Portuguese Facebook corpus and corresponding BFI-44 information. Our study suggests that, not unlike basic Big Five categories, the ten facets proposed in [18] may be recognised from text with reasonable accuracy if compared to a simple baseline system. In other words, our experiments suggest that we may in principle develop supervised models of personality recognition at a level of abstraction more specific than those obtainable from existing work, and without resorting to larger or more complex inventories to provide the required text labels.

The current work provides only initial reference results for further studies in this field, and a number of possible improvements are left as future work. In particular, we envisage the use of larger word embedding models and alternative learning architectures for this task, and further evaluation work by directly comparing our results against text labelled with actual facet information.

References

Álvarez-Carmona, M., López-Monroy, A., Montes-y-Gómez, M., Villaseñor-Pineda, L., Escalante, H.: INAOE’s participation at PAN’15: author profiling task. In: CLEF 2015 (2015)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)
Article Google Scholar
Coltheart, M.: The MRC psycholinguistic database. Q. J. Exp. Psychol. Sect. A: Hum. Exp. Psychol. 33(4), 497–505 (1981)
Article Google Scholar
Goldberg, L.R.: An alternative description of personality: the Big-Five factor structure. J. Pers. Soc. Psychol. 59, 1216–1229 (1990)
Article Google Scholar
González-Gallardo, C., et al.: Tweets classification using corpus dependent tags, character and POS N-grams. In: CLEF 2015 (2015)
Google Scholar
Hsieh, F.C., Dias, R.F.S., Paraboni, I.: Author profiling from Facebook corpora. In: 11th International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki, Japan, pp. 2566–2570. ELRA (2018)
Google Scholar
John, O.P., Naumann, L.P., Soto, C.J.: Paradigm Shift to the Integrative Big-Five Trait Taxonomy: History, Measurement, and Conceptual Issues, pp. 114–158. Guilford Press, New York (2008)
Google Scholar
Costa Jr., P.T., McCrae, R.R.: Revised NEO Personality Inventory (Neo-PI-R) and NEO Five-Factor Inventory (NEO-FFI): Professional Manual. Psychological Assessment Resources, Odessa (1992)
Google Scholar
Mairesse, F., Walker, M., Mehl, M., Moore, R.: Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Intell. Res. (JAIR) 30, 457–500 (2007)
Article Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL-HLT-2013, Atlanta, USA, pp. 746–751. Association for Computational Linguistics (2013)
Google Scholar
Nowson, S., Oberlander, J.: Identifying more bloggers: towards large scale personality classification of personal weblogs. In: Proceedings of the International Conference on Weblogs and Social Media, Boulder, Colorado, USA (2007)
Google Scholar
Pennebaker, J.W., Francis, M.E., Booth, R.J.: Inquiry and Word Count: LIWC. Lawrence Erlbaum, Mahwah (2001)
Google Scholar
Ramos, R.M.S., Neto, G.B.S., da Silva, B.B.C., Monteiro, D.S., Paraboni, I., Dias, R.F.S.: Building a corpus for personality-dependent natural language understanding and generation. In: 11th International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki, Japan, pp. 1138–1145. ELRA (2018)
Google Scholar
Rangel, F., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: CLEF 2015 Evaluation Labs and Workshop, Toulouse, France (2015). CEUR-WS.org
dos Santos, V.G., Paraboni, I., da Silva, B.B.C.: Big five personality recognition from multiple text genres. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 29–37. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64206-2_4
Chapter Google Scholar
da Silva, B.B.C., Paraboni, I.: Learning personality traits from Facebook text. IEEE Latin Am. Trans. 16(4), 1256–1262 (2018). https://doi.org/10.1109/TLA.2018.8362165
Article Google Scholar
da Silva, B.B.C., Paraboni, I.: Personality recognition from Facebook text. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 107–114. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_11
Chapter Google Scholar
Soto, C.J., John, O.P.: Ten facet scales for the Big Five Inventory: convergence with NEO PI-R facets, self-peer agreement, and discriminant validity. J. Res. Pers. 43(1), 84–90 (2009). https://doi.org/10.1016/j.jrp.2008.10.002
Article Google Scholar
Ṣulea, O.M., Dichiu, D.: Automatic profiling of twitter users based on their tweets. In: CLEF 2015 (2015)
Google Scholar

Download references

Acknowledgements

This work received support by FAPESP grant 2017/06828-1 and 2016/14223-0.

Author information

Authors and Affiliations

School of Arts, Sciences and Humanities, University of São Paulo, Av. Arlindo Bettio, 1000, São Paulo, Brazil
Wesley Ramos dos Santos & Ivandré Paraboni

Authors

Wesley Ramos dos Santos
View author publications
You can also search for this author in PubMed Google Scholar
Ivandré Paraboni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivandré Paraboni .

Editor information

Editors and Affiliations

Universita della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
Zurich University of Applied Sciences, Winterthur, Switzerland
Martin Braschler
University of Neuchâtel, Neuchâtel, Switzerland
Jacques Savoy
Technische Universität Wien, Vienna, Austria
Andreas Rauber
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Santiago de Compostela, Santiago de Compostela, Spain
David E. Losada
Swiss Alliance for Data-Intensive Services, Thun, Switzerland
Gundula Heinatz Bürki
University of Padua, Padua, Italy
Linda Cappellato
University of Padua, Padua, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

dos Santos, W.R., Paraboni, I. (2019). Personality Facets Recognition from Text. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham. https://doi.org/10.1007/978-3-030-28577-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-28577-7_15
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28576-0
Online ISBN: 978-3-030-28577-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Personality Facets Recognition from Text

Abstract

Similar content being viewed by others

Text-Based Automatic Personality Recognition: Recent Developments

Big Five Personality Recognition from Multiple Text Genres

Lingual markers for automating personality profiling: background and road ahead

Keywords

1 Introduction

2 Related Work

3 Personality Facet Recognition

4 Experiment

4.1 Overview

4.2 Data

4.3 Procedure

5 Results

6 Final Remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Personality Facets Recognition from Text

Abstract

Similar content being viewed by others

Text-Based Automatic Personality Recognition: Recent Developments

Big Five Personality Recognition from Multiple Text Genres

Lingual markers for automating personality profiling: background and road ahead

Keywords

1 Introduction

2 Related Work

3 Personality Facet Recognition

4 Experiment

4.1 Overview

4.2 Data

4.3 Procedure

5 Results

6 Final Remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation