1 Introduction

Personality is a stable tendency and characteristic that determines the similarities and differences in psychological behaviors (thoughts, emotions, and actions) of individuals. In other words, personality not only shows an individual's behavioral pattern, thought, and interpersonal communications but also has a great impact on various life aspects, such as happiness, preference, physical, and mental health [1,2,3]. The evolution of personality theories and their progress in measurement methods and statistical analysis has led to the emergence of one of the most influential contemporary personality theories, known as the Five Factor Model (FFM) or Big Five, which contains five primary traits and has been considered by many psychologists in recent years as a popular and powerful approach for studying personality traits [4, 5]. According to the FFM, personality consists of five main dimensions including neuroticism (NEU), extraversion (EXT), openness to experience (OPN), agreeableness (AGR), and conscientiousness (CON). It is worth mentioning that the study of personality is not only essential for psychology and personality recognition but also can benefit various applications, such as cognitive science [6], social network analysis [7], sentiment analysis [8,9,10], recommender systems [11], deception detection [12], and so on. In this regard, personality computing is now known as field which investigates the combination of psychology and machine learning as a computational model for understanding personality. Nevertheless, it must be noted that building a general system for this aim is very challenging due to the uncertain delineation of human behaviors in the diverse situation. However, considering the importance of automatic personality recognition, numerous studies focusing on context-specific problems have been conducted in this field over the past few years [13, 14].

On the other hand, by the development of the online social network, several studies have focused on the text generated by people in social media for predicting personality. Despite other the existing methods that commonly utilized questionnaire investigations or expert reviews that were not only costly, time-consuming, and less practical but also were highly dependent on an expert and only utilized human-designed statistical features to perform recognition and did not consider the valuable information existing in texts, these studies only considered text as the most direct way of expressing thought and emotion while it contains rich self-disclosed personal information that is highly correlated with people's personality and interpreting them can provide us with valuable information about users' behavior and feelings [15, 16].

By taking the significance of textual data into account, a small number of studies have focused on using text generated by people to predict their personalities [17]. In this regard, machine learning based methods have been also utilized but their obtained results were not satisfactory because the majority of them were based on statistical or hand-craft linguistic features and were not able to consider the rich user-generated textual information and extract features from them automatically while these words and text are the most valuable features for determining the emotion and personality [18, 19].

By the development of deep neural networks, they demonstrated remarkable performance in various Natural Language Processing (NLP) tasks including opinion mining and sentiment analysis [8, 9]. It must be noted that personality recognition is very similar to NLP applications while they both focus on mining users' attributes from texts. Accordingly, employing powerful text modeling techniques that have been efficiently utilized in the NLP domain can be the most intuitive and straightforward idea for improving the performance of personality recognition [15, 20]. However, capturing potential and efficient features from the text that have a close relation to one's personality is still challenging.

Having the mentioned limitations besides the potential of deep learning in our mind, we proposed a deep learning based method for personality recognition that tries to make use of both Convolutional Neural Network (CNN) and AdaBoost algorithm [21]. CNN has been successfully utilized for various NLP tasks and extracting local features can be considered as its potential. Moreover, CNN generally utilizes filters of various lengths to generate feature maps. Due to the fact that different sizes of filters can lead to the generation of various kinds of N-grams that can help differently interpret and parse sentences, features obtained from various filters can also play an important role in predicting the personality. To this end, we decided to combine CNN with AdaBoost algorithm to investigate the possibility of leveraging the contribution of different filter lengths and gasp their potential for personality recognition by combining classifiers with respective filter sizes. The reason behind choosing AdaBoost is that it is a Meta algorithm that can be used in conjunction with other learning algorithms to improve classification accuracy. Based on this algorithm, the classification of each new stage is adjusted in favor of incorrectly classified samples in the previous stages. In fact, with the help of AdaBoost algorithm, the classification process is repeated until the classification error is minimized.

To prove the efficiency of our proposed method, it was examined on two heterogeneous datasets, namely Stream-of consciousness essays and YouTube personality dataset, and obtained better results compared to other methods which demonstrate that our proposed method can not only capture more advanced structural features but also is more efficient in detecting users' personality traits. In summary, the contributions of this paper are as follows:

  1. (1)

    We designed a new structure based on the integration of CNN and AdaBoost for predicting personality from texts where different lengths and various weigh matrices are used in the convolutional layer to extract features. To the best of our knowledge, it is the first study that tries to investigate the combination of CNN and AdaBoost for the task of personality recognition.

  2. (2)

    We considered different variations of the proposed method, derived from various vector representation models, in our implementation to prove the efficiency of vector representation in personality recognition.

  3. (3)

    Despite other existing studies, we conducted our experiments on two different datasets to clearly explain the versatility and generalization of our method, and based on the empirical result, the proposed method demonstrated higher efficiency compared to both machine learning and deep learning based methods.

The remainder of this paper is organized as follows: Related studies with a focus on deep learning-based methods are presented in Sect. 2. Section 3 includes the details of the proposed methods. Experimental details and obtained results are extensively reported in Sect. 4. Conclusion and possible future directions are also mentioned in Sect. 5.

2 Related Work

Along with the explosive popularity of social media, various studies have been also conducted for personality recognition. Personality recognition methods can be generally divided into two major categories of psychology-based and artificial intelligent based methods [15, 22]. From the perspective of psychology, personality theorists have developed unique methods for assessing individuals' personalities and by applying these methods; they obtained valuable information and then based their formulations on it. On the other hand, artificial intelligence-based methods combined with psychology and machine learning as a computational model for understanding personality. Given that the focus of this paper is on automatic personality recognition from texts, especially people's opinions about various topics, the present study falls into the group of artificial intelligent based methods. Artificial intelligent based methods for personality recognition are divided into two groups of machine learning and deep learning based methods. More details about these methods and related studies are reported in the following.

2.1 Machine Learning based Methods

With the rapid development of the Internet and social media, numerous studies have been conducted on personality recognition from text. In this regard, Golbeck et al. [23] used M5' rule and Gaussian process for predicting personality based on Big Five scores. They utilized 167 Facebook users' personal information, activity, preferences, and language usage to extract 77 features and perform classification. Following a similar line of research, Golbeck et al. [24] utilized 297 Twitter users' information and applied a similar approach to predict the personality. Moreover, the relation between personality and various kinds of users was also analyzed by Quercia et al. [25]. They used M5' algorithm to predict 335 Twitter users' personalities based on Big Five attributes according to their number of followers, following, and listed counts. Alam et al. [26] utilized a bag of word methods besides unigrams as features to perform personality recognition. They applied various techniques, such as Support Vector Machine (SVM), Multinomial Naïve Bayes (MNB), and Bayesian Logistic Regression (BLR) to predict Big Five attributes according to My personality corpus. Skowron et al. [27] collected text, image, and users' metadata from Instagram and Facebook and performed various machine learning techniques to predict personality. They concluded that joint analysis could enhance performance. Li et al.[28] also proposed a semi-supervised method that utilized over 547 Chinese active users of SinaWeibo to predict personality.

In the following, Bai et al. [29] utilized the information of 209 users of PenPen (Chinese social network) to predict their personality. They analyzed various attributes including usage statics, emotional state, and demographic information, and then applied C4.5 decision tree to perform classification. Peng et al. [30] utilized SVM to predict the personality of 222 Chinese Facebook users based on their generated texts about various topics. Argamon et al. [31] used word categories and relative frequency of function words as SVM input to make discrimination between students at the opposite extremes of neuroticism and extraversion. The efficiency of various textual features extracted from the psycholinguistic dictionary or psychologically oriented text analysis tools was also explored by Mairesse et al. [32] N-gram frequency was another feature that was commonly used as the input of SVM or Naïve Bayes for classifying low and high scoring blogger for Big Five personality attributes.

As it is clear, a large number of studies have focused on traditional machine learning methods to perform personality recognition and the majority of them were highly dependent on the handcraft features like online activities, profile information, or manually extracted features of texts. In other words, the machine learning based methods required an expert to extract features and were not able to make use of rich features existing in the text.

2.2 Deep Learning based Methods

By the rapid growth of deep learning, deep neural networks obtained remarkable efficiency in various NLP tasks. Due to the fact that personality recognition from text is very similar to other NLP tasks like text classification or sentiment analysis, deep neural networks have also found their way in personality recognition. In this regard, Jiango et al. [33] utilized deep learning methods to predict the personality of Facebook users. They applied a fully connected network, Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) in their experiments and proved the superiority of deep learning based methods for personality recognition compared to other existing methods. In similar research, Tandra et al. [34] utilized multilayer perceptron, Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), and 1-DCNN to predict Facebook users' personality according to Big Five personality attributes. Similarly, Majumder et al. [5] employed CNN for extracting deep semantic features and predicting the personality based on them. Xue et al. [4] also proposed AttCNN model to extract deep semantic features from users' post text and then concatenated them with statistical linguistic features. They fed the obtained features to a regression algorithm to predict the personality based on the Big Five personality attributes. It is worth mentioning that although deep neural networks have been rarely employed for the task of personality recognition, they have obtained considerable results and they are actually in the early steps of their development and growth.

3 Proposed Methodology

Due to the fact that language is the most reliable way for people to state their opinion and internal feeling in an understandable way, it can be considered as valuable knowledge for psychologists to interpret people's feelings and predict their personalities. In other words, while text can reflect various aspects of its author, efficiently modeling the text generated by authors can improve the performance of personality recognition [35]. Motivated by this intuition, we decided to use a reinforced convolutional neural network architecture with various filters to perform classification. Unlike previous convolutional neural network-based personality recognition methods that combined the properties of different filters into a unified vector that was then fed to a fully connected network to predict the personality[4, 5], various features obtained from various filters of the convolutional neural network are fed to a separate pooling and classifier in our proposed method. When the initial results are obtained by each of the classifiers, the AdaBoost algorithm [21] is used to produce the overall classification results. The reason for choosing AdaBoost is that it is a Meta algorithm that can be used in conjunction with other learning algorithms to improve performance. In this algorithm, the classification of each new sample is adjusted in favor of incorrectly classified samples of other classifiers. In other words, AdaBoost is able to combine weak classifiers with a strong classifier because it can learn the classification error of each weak classifier and accordingly adjust the weight of the classifier for the final classification.

The proposed method includes five steps and it is a combination of convolutional neural network and Adaboost algorithm [21]. In the proposed method, different filters with various sizes are used to scan the input sentence and extract precious low-level features from the input text. In other words, each convolutional neural network has its own unique convolutional, pooling, and classification layer, and classification is performed separately in each convolutional neural network. Finally, the AdaBoost aggregation algorithm is used to create a robust classification based on the different weights of the different classifiers and the personality type is then estimated based on it.

In general, the idea of this paper is based on this hypothesis that using various filters with convolutional neural networks leads to the generation of different features that each of them may have a different effect on the final classification. In the classical convolutional neural networks, these features are merged after applying the pooling operation and classification is performed on the merged features. This can eliminate the impact of some features that may contain valuable information. In this regard, we decided to feed the features obtained from each filter to a separate pooling and classification layer. Next, the classification results are then combined using the AdaBoost algorithm to obtain the best final. The schematic structure of the proposed method is illustrated in Fig. 1.

Fig. 1
figure 1

Integration of convolutional neural network with AdaBoost for personality recognition

3.1 Representation Layer (Word Matrix Formation)

Data representation refers to language modeling techniques in natural language processing that aims to map words from a very large space to a continuous vector space with much smaller dimensions. In other words, in order to be able to apply a deep learning method for the task of text classification, the words must be transformed into high dimensional vectors to capture the syntactic, semantic, and morphological information of the words. To this end, Skip-Gram [36] model which is a shallow two layers neural network and tries to learn vector representation of a word based on its context is used in the first layer of our proposed method. The diagram of the Skip-Gram model is depicted in Fig. 2 and its object is to find word representations that are efficient for the prediction of the surrounded words in a sentence. Let \({x}_{1},{x}_{2}\cdots {x}_{n}\) is a sequence of training words, Skip-Gram aims to maximize the average log probability (Eq. 1).

$$\frac{1}{n}\sum_{i=1}^{n}\sum_{-c\le j\le c,j\ne 0}\mathrm{log}p\left({x}_{i+j}|{x}_{i}\right)$$
(1)

where \(c\) refers to the training context size while a larger value of \(c\) yields to more training samples and higher accuracy. Noteworthy, the value of \(p\left({x}_{i+j}|{x}_{i}\right)\) is also obtained using SoftMax function. Finally, considering that \(n\) is the size of the vocabulary and \(d\) is the size of word embedding, each word is encoded by a column vector in \(A\in {\mathcal{R}}^{n\times d}\) as the sentence matrix.

Fig. 2
figure 2

Skip-Gram model structure

3.2 Convolutional Layer

The objective of the convolutional layer is to extract local features as well as retaining the sequential information of the input text. To this end, the obtained sentence matrix \(A\in {\mathcal{R}}^{n\times d}\) is fed to the convolutional layer to produce new features. According to the fact that the sequential structure of a sentence has an important effect in specifying its meaning, it is sensible to choose filter width equal to the dimensionality of word vectors \(\left(d\right).\) In this regard, only the height of filters \((h)\), known as region size, can be varied.

Considering \(A\in {\mathcal{R}}^{n\times d }\) as a sentence matrix, convolution filter \(H\in {\mathcal{R}}^{{\varvec{h}}\times {\varvec{d}}}\) is applied on \(A\) to produce its submatrix as a new feature\(A\left[i : j\right]\). As the convolution operation is applied repeatedly on the matrix of\(A\), \(O\in {\mathcal{R}}^{n-h+1\times d}\) as the output sequence is achieved (Eq. 2).

$${O}_{i}=w \cdot A[i:i+h-1]$$
(2)

here \(i=1,\dots ,n-h+1\) and ⋅ is the dot product between two matrices of the convolution filter and input submatrix. Bias term \(b\in \mathcal{R}\) and an activation function are also added to each \({O}_{i}\). Finally, feature maps \(C\in {\mathcal{R}}^{{\varvec{n}}-{\varvec{h}}+1}\) are generated (Eq. 3).

$${C}_{i}=f({O}_{i}+b)$$
(3)

3.3 Pooling Layer

While various feature maps according to different filter sizes are generated, a pooling function is required to induce fixed size vectors. Various strategies such as average pooling, minimum pooling, and maximum pooling can be used for this aim and the idea behind them is to capture the most important feature from each feature map and reduce dimensionality. Maximum pooling is used in our proposed method (Eq. 4).

$${c}_{max}=\mathrm{max}\left\{C\right\}=\mathrm{max}\left\{{c}_{1},\dots ,{c}_{n-h+1}\right\}$$
(4)

It is worth mentioning that the pooling layer makes the proposed method aware of the order of the sentences and distributes information related to the personality of the individuals throughout the sentence. On the other hand, the pooling layer allows us to work with sentences of variable length, given that the number of features in the proposed method is aligned with the number of filters. Moreover, the pooling layer reduces the size of the feature maps and future computations. Features obtained from the pooling layer are then processed using a nonlinear function before being classified.

3.4 Regularization Layer and SoftMax

In order to overcome overfitting, which is known as one of the most important weaknesses of neural networks, dropout is used as a regularization technique in our proposed method. Based on this technique, the values of some features are set to zero. It means that if \({C}_{max}=\{{c}_{max}^{1},{c}_{max}^{2},\dots {c}_{max}^{m}\}\) are features obtained from the previous layer (\(m\) is the number of filters in the convolutional layer), some of them are randomly set to zero before the SoftMax layer. Notably, the dropout value is a hyper-parameter that is specified along with the training. The classification result is the SoftMax output after the regularization layer. This layer employs the regularized features as the input of SoftMax layer to calculate the probability of distribution over all five different kinds of personality based on Big Five attributes (Eq. 5).

$$P\left(y=j|x\right)={softmax}_{j}({x}^{T}\mathrm{w}+\mathrm{b})\text{=}\frac{{e}^{{X}^{T}{w}_{j}+{b}_{j}}}{{\sum }_{k=1}^{K}{e}^{{X}^{T}{w}_{k}+{b}_{k}}}$$
(5)

where \({w}_{k}\) is the input weight, \({b}_{k}\) is the bias term, \({P}_{i}\) refers to the output class and \(K\) is the number of output classes.

3.5 AdaBoost Training Layer and Prediction Integration

AdaBoost is an algorithm that tries to integrate weak classifiers into a strong classifier. Accordingly, we employed this algorithm in our proposed method to find the appropriate weights for the classifiers adjusted to different N-grams. In this regard, there is a need to obtain the statistics of the weak classifier results based on the training samples and then adjust the weights of training samples and classifiers to achieve the final strong classification. Backpropagation is also used to train the network ahead of the AdaBoost integrating part. Training process of AdaBoost can be stated as follows:

  1. 1.

    Initialize equal distribution of \({\mathcal{D}}^{1}\) to all training samples while \({\mathcal{D}}^{i}\) specifies the ith training sample distribution (Eq. 6)

    $${\mathcal{D}}_{i}^{1}=\frac{1}{\#training\_samples}$$
    (6)
  2. 2.

    In each training epoch of \(t\):

    While backpropagation is applied to train three neural networks consecutively, the following process is performed on all classifiers in each epoch.

    1. a.

      Estimating weak classifiers statistics: After training the classifiers and predicting the output labels, classification statistics over the samples are saved, and weak classifier error \({G}_{m}(x)\) in then calculated (Eq. 7).

      $${e}_{m}^{t}=\sum_{i}{\mathcal{D}}_{i}^{t}1({G}_{m}\left(x\right)\ne y\left(x\right))$$
      (7)
    2. b.

      Adjusting weight: As a weak classifier is trained, the classification error is used to modify the distribution over the training set. Thereafter, the error index and weak classifiers' weights are calculated.

      • Calculating classifier weights (Eq. 8):

        $$a\left(m\right)=\frac{1}{2}ln\frac{1-{e}_{m}^{t}}{{e}_{m}^{t}}$$
        (8)
      • Adjusting distribution (Eq. 9):

        $${\mathcal{D}}_{i}^{t+1}=\frac{{\mathcal{D}}_{i}^{t}\mathrm{exp}(-a\left(m\right)y\left(x\right){G}_{m}\left(x\right))}{{\mathcal{D}}^{t}}$$
        (9)
  3. c.

    Improved validation: As the training process is finished, element-wise multiplication of weights and outputs is performed to obtain the final predicted class of the personality. The learned weight \(a\) is then used to perform the improved validation using the following equation (Eq. 10) where \(i\) is the classifier index,\(l\) refers to the output label of the classifier and \(a\) determines the ensembles of the classifier weights.

    $$L\left(s\right)=\sum_{i}a\left(i\right)*l(i)$$
    (10)

4 Experiments

Experiments that were carried out to prove the efficiency of the proposed method and obtained results are explained in this section in detail. It is worth mentioning that two datasets were leveraged in our experiments that are extensively introduced in the following.

4.1 Dataset

In order to prove the versatility and generalization of our proposed method, two heterogeneous personality recognition datasets were used in our experiments.

  1. (1)

    Stream-of consciousness essays.

Essay is a large dataset based on the stream of consciousness that was collected by James Pennebaker and Laura King according to the text generated by 2467 users between 1997 and 2004 that were labeled based on classes of personality traits including neuroticism (NEU), extraversion (EXT), openness to experience (OPN), agreeableness (AGR), and conscientiousness (CON)). Therefore, the dataset includes a label for each essay indicating the personality of its author and can be suitable for supervised learning. It is worth mentioning that the texts of this dataset were generated by students of the American Psychological Association.

  1. (2)

    YouTube personality dataset.

This dataset comes from about 400 YouTube bloggers' webcam videos containing speech transcriptions, gender, and behavioral features translated manually from the videos. In contrary to the first dataset, this dataset contains shorter text, and personality impressions (labels) are determined from the annotator rating impressions by watching each blog.

It is worth mentioning that we decided to examine our proposed method on two datasets to test if our method can handle various cases. The number of statistics of these two datasets are summarized in Table 1. The reason behind choosing these two datasets are as follows:

Table 1 Summary statistics of essay and YouTube datasets
  • As it is clear in Fig. 3, YouTube dataset documents are shorter than Essay dataset which can verify if our proposed method is effective for both long and short texts.

  • The Essay dataset labels come from the authors' questionnaire which can be considered as autognosis while the YouTube dataset labels are come from the volunteers watching the bloggers' videos which can be treated as outer perception. As a result, employing these two datasets can help us to prove that our proposed method is valid in both cases and there is no difference whether the labels are generated by authors or other people.

Fig. 3
figure 3

Sentence and word count cumulative distribution of Essay and YouTube datasets

4.2 Evaluation Metrics

Evaluation metrics explain the performance of a method while an important aspect of evaluation metrics is their capability to discriminate among method results. Evaluation of a deep learning based method is generally conducted by comparing the actual labels of training samples with those that are empirically labeled and choosing the best evaluation metric is highly dependent on the task. Noteworthy, the standard metric of Accuracy (Eq. 11) is used in our experiments to perform the evaluation. Where TP, TN, FP, and FN respectively refer to the true positive, true negative, false positive, and false negative.

$$\mathrm{Accuracy}= \frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{FP}+\mathrm{FN}+\mathrm{TN}}$$
(11)

We also employed MAE (Mean Absolute Error) in our experiment as the evaluation metric which was widely employed to measure the difference between the predicted score and the observed score by Big Five Inventory in APR research. It is calculated using the following equation (Eq. 12), where \(n\) shows the number of unseen instances, \({\left(\begin{array}{c}{y}_{i}\\ {S}_{{x}_{i}}\end{array}\right)}^{*}\) refers to the predicted value for the trait \({y}_{i}\), and \(\left(\begin{array}{c}{y}_{i}\\ {S}_{{x}_{i}}\end{array}\right)\) is the observed one. While MAE is the error measure, the lower is the better.

$$MAE=\frac{1}{n}\sum_{i=1}^{n}|{\left(\begin{array}{c}{y}_{i}\\ {S}_{{x}_{i}}\end{array}\right)}^{*}-\left(\begin{array}{c}{y}_{i}\\ {S}_{{x}_{i}}\end{array}\right)$$
(12)

4.3 Experiment Description

To provide a comprehensive understanding of the efficiency of the proposed method, various experiments with several variations of the proposed method were conducted that are introduced in the following:

  • CNN-AdaBoost-Rand = Random initialized word vectors are used as the input of the proposed method.

  • CNN-AdaBoost-Static = Pre-trained word vectors obtained from Skip-Gram model are used as the input and their weights are not updated along the training process.

  • CNN-AdaBoost-Non-Static = Pre-trained word vectors obtained from Skip-Gram model are used as the input and their weights are updated along the training process.

  • CNN-AdaBoost-2channel = Combination of random initialized word vectors and pre-trained word vectors obtained from Skip-Gram model is used as the input of the proposed method.

4.4 Model Configuration and Hyper-Parameters

Due to the fact that deep neural networks require a large number of training samples to be accurately trained, a typical processor cannot be expected to perform this operation. Therefore, it is necessary to provide a powerful processor with high speed to perform the training process. All implementations of this paper were conducted on the system with Intel Xeon 2 E5-2620 2.0 GHz processor and 8 GB of RAM using Python as the programming language in the Linux environment.

The implementation process started with preprocessing the input data. To this end, the text was split into the sequence of sentences, and the period and question mark characters. Thereafter, the sentences were split into words. All letters were then reduced to lowercase all characters other than ASCII letters, exclamation marks, digits, and quotation marks were removed. Due to the fact that some documents in Essays dataset did not include periods that yielded absurdly long sentences, sentences longer than 150 words were split into sentences with 20 words (expect the last piece that could be shorter).

In the following, in order to be able to use words as the input of the proposed method, they must be converted to vectors. In this regard, we trained Skip-Gram model using all existing documents while window size and word vector dimensions were respectively about 5 and 150. Notably, the learning rate of 0.025 was employed to update word vectors and minimize the loss function.

After updating the word vectors, they were fed to the convolutional layer where various filer size (3, 4, and 5) was selected and the number of filters was about 150. Rectified linear function (ReLU) was also utilized as an activation function to apply nonlinearity. It is worth mentioning that we conducted our experiments with various filter sizes and the best results were just obtained using the mentioned values.

The proposed neural network was trained using the training data and theoutput classes were determined for the multiplicity of classes by Softmax function. In order to evaluate the output of the experimental data, a cost function was adjusted and ADADELTA update rule was employed for stochastic gradient descent with a learning rate of 0.01 while mini-batch size and dropout rate were respectively about was 25 and 0.05. 60 epochs were also used for training. Hyper-parameters' values used for training of the proposed method are briefly reported in Table 2.

Table 2 Configuration of the proposed method hyper-parameters

It is worth mentioning that due to the fact that the goal of this paper is to predict the personality based on Big Five traits which include five different classes, we built five different neural networks with the same introduced structure for the five personality traits. Each network was assumed as a binary classifier to predict whether the corresponding trait is positive or negative. Moreover, all training and testing experiments were carried out based on fivefold cross-validation and the whole dataset was randomly divided into five chunks while three chunks were used as the training set and the other two chunks were used as validation and test sets. Notably, the average accuracy of each variation of the proposed method over the five fold cross-validation is reported in the result section.

4.5 Resulta and Analysis

As previously mentioned, we used two various datasets in our experiments and the average testing results over fivefold cross-validation obtained by different variations of the proposed method in comparison to other existing methods for the task of personality recognition on the Essay and YouTube datasets are mentioned in the following. It must be noted that while Essay dataset has been employed in other studies, we compared our obtained results with other existing methods (their results are taken from their original papers) and the results are provided in Table 3.

Table 3 Accuracy comparison of automatic classification of texts in Essay Dataset based on the big five dimensions of personality

From the perspective of Essay dataset, all variations of the proposed method have superior performance compared to both machine learning and deep learning methods which can be due to the employment of AdaBoost algorithm that aims to separate the features by parsing the documents from different filters and boosting the classifier performance on these representations and modulate them to obtain higher performance. Notably, although AdaBoost algorithm is sensitive to noisy and outdated data, it is superior to most of the learning algorithms in terms of overfitting problems.

Among all variations of the proposed method, it is obvious that CNN-AdaBoost-Rand, which used random initialized word vectors as input, has the lowest accuracy while other variations that employed pre-trained word vectors as input performed slightly better which can be owing to the utilization of Skip-Gram model for proving the rich vector representation. Moreover, besides CNN-AdaBoost-Rand, CNN-AdaBoost-Static has the lowest accuracy indicating that updating weight along the training process can enhance performance. Overall, CNN-AdaBoost-2channel proved to have the highest performance for the task of personality recognition while its obtained accuracy was respectively about 61.25, 61.93, 59.02, 60.16 and 64.63% for extraversion, neuroticism, agreeableness, openness to experience, and conscientiousness.

From the perspective of YouTube dataset, it was rarely used in experiments. For this aim, we chose five well-known models as baselines to prove the efficiency of our proposed method that are listed in Table 4. The first one is the most basic method of text classification named Bayes classification based on TF-IDF features. We also compared our method with 2CNN (2-dimensions convolutional neural network), 3-CNN (3-dimensions convolutional neural network), 1-LSTM (Single RNN), 2-LSTM (bi-directional long short term memory), and 2-CLSTM (a bidirectional LSTM concatenated with CNN).

Table 4 Accuracy comparison of automatic classification of texts in YouTube Dataset based on the big five dimensions of personality

As it is clear, all variations of the proposed method have higher accuracy compared to other methods on to have the highest performance for the task of personality recognition while its obtained accuracy was respectively about 62.11, 62.43, 60.23, 61.08 and 65.19% defines that not only our proposed method can work better on shorter sentences but also it can better detect the personality traits in the eyes of other people rather than the author himself.

Results of the experiments also demonstrated that vector representation has a great impact on the performance while updating word vectors during the training process without considering whether or not the vectors are trained in advance can also lead to achieving better results.

To better demonstrate the performance of variations of the proposed model, we measured the MAE value and the results are mentioned in Table 5. As can be seen, CNN-AdaBoost-2channel has the lowest MAE value which clearly confirms that not only does the proposed model have higher classification accuracy it also has the lowest mean absolute error. As can be seen, CNN-AdaBoost-2channel has the lowest MAE value which confirms its higher efficiency.

Table 5 Average mean absolute error (MAE) obtained from different variations of the proposed model

Notably, considering the fact that the training time of deep neural networks is highly related to the hardware that they are implemented on, namely modern GPUs can significantly reduce the training time, it cannot be considered as a fair measure for comparing the efficiency of deep learning based methods and it has been rarely explored as a metric for evaluation. However, it is necessary to mention that choosing an optimal model a deep learning based architecture is not possible while the definition of "optimal" is not well-defined and there is always a tradeoff between the model complexity (training and test speed) and its performance.

4.6 Ablation Study

One of the downsides of convolutional neural networks is their various hyper-parameters that must be preciously tuned to obtain an optimal model. While the values of the hyper-parameters have a great impact on the performance of the proposed model, we decided to perform an ablation study to investigate the influence of different hyper-parameters on the final classification accuracy. In this regard, we hold all setting constant and vary only one factor to examine the sensitivity of the proposed model. We report the effect of the filter size, number of filters, activation and function on one of the variations of the proposed model (CNN-AdaBoost-2channel). The results are reported based on experiments on Essay dataset.

  • To investigate the effect of the filter size, we used various filters while the other parameters were kept constant. Based on the obtained results (Table 6), different filter size has a remarkable effect on the efficiency of the model, and the best classification result is obtained when the number of filters were set to (3,4,5).

  • To explore the impact of the number of filters, we only varied the number of filters while other parameters were constant. As illustrated in Table 7, it can be concluded that the number of filters has also a great effect on the efficiency of the proposed model. Moreover, the highest classification accuracy was obtained when the number of filters was set to 150.

  • To study the influence of activation function, different activation functions, such as ReLU, Softmax, Tanh, SoftPlus, and linear were used in our experiments. Based on the empirical results (Table 8), the ReLU function outperformed the other activation function on all source datasets.

Table 6 The influence of the filter size on the performance of the proposed model (CNN-AdaBoost-2channel)
Table 7 The influence of the filter size on the performance of the proposed model (CNN-AdaBoost-2channel)
Table 8 The influence of the activation function on the performance of the proposed model (CNN-AdaBoost-2channel)

5 Conclusion

Personality recognition is known as one of the most interesting and practical topics in both psychology and artificial intelligence. Recently, due to the penetration of the Internet in society and the combination of human relations and artificial intelligence, personality recognition has also attracted considerable attention. Commonly, machine learning based methods have been successfully utilized for this aim. However, they are confronted with some limitations and are highly dependent on human experts for extracting appropriate features. On the other hand, by the development of deep learning, as a special type of machine learning method, they presented significant results in natural language processing, particularly personality recognition, due to their amazing capabilities in automatically extracting features and their unique structure.

In this regard, convolutional neural networks can extract low-level features from the text and have been efficiently used for text classification. It is worth mentioning that despite the considerable performance of convolutional neural networks, they are still facing some major problems. The prominent challenge of these networks is that is the features, generally n-grams, obtained from various filter sizes can play a different role in the final decision. This means that a 5-g may extract more relevant information than a 4-g about the meaning of a sentence. To fill this lacuna, we decided to give the features obtained from various filters of the convolutional neural networks to separate pooling and classification layers. Therefore, when the initial results were obtained by each of the classifiers, the AdaBoost algorithm was used to produce the overall classification results. The goal of AdaBoost algorithm is to increase the learning rate of classifiers. Our proposed method combined several weak classifiers to obtain a suitable boundary for separating data between classes, which could be used to modify classifiers that were incorrectly classified.

The performance of the proposed method was examined on two heterogeneous datasets, namely Essay and YouTube datasets. Based on the empirical results, the use of AdaBoost algorithm has efficiently increased the accuracy of classification. The proposed method was able to obtain the convergence point after 60 epochs with accuracies of 61.25%, 61.93%, 59.02%, 60.16% 64.63% for extraversion, neuroticism, agreeableness, openness to experience, and conscientiousness respectively on Essay datasets. The accuracies of the proposed method on YouTube dataset were also respectively about 62.11%, 62.43%, 60.23%, 61.08% 65.19% for extraversion, neuroticism, agreeableness, openness to experience, and conscientiousness.

Employing the multimodal learning and combination of photos, videos, and other shared content besides text for predicting individuals' personalities can be considered as possible future works. Moreover, the proposed method of this paper can be utilized in other applications of cognitive science including identifying the level of stress, anxiety, and depression.