An application of MOGW optimization for feature selection in text classification

Asgarnezhad, Razieh; Monadjemi, S. Amirhassan; Soltanaghaei, Mohammadreza

doi:10.1007/s11227-020-03490-w

An application of MOGW optimization for feature selection in text classification

Published: 12 November 2020

Volume 77, pages 5806–5839, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

The Journal of Supercomputing Aims and scope Submit manuscript

An application of MOGW optimization for feature selection in text classification

Download PDF

Razieh Asgarnezhad¹,
S. Amirhassan Monadjemi ORCID: orcid.org/0000-0002-8094-2449² &
Mohammadreza Soltanaghaei¹

597 Accesses
16 Citations
Explore all metrics

Abstract

Due to extensive web applications, sentiment classification (SC) has become a relevant issue of interest among text mining experts. The extensive online reviews prevent the application of effective models to be used in companies and in the decision making of individuals. Pre-processing greatly contributes in sentiment classification. The traditional bag-of-words approaches do not record multiple relationships among words. In this study, emphasis is on the pre-processing stage and data reduction techniques, which would make a big difference in sentiment classification efficiency. To classify opinions, a multi-objective-grey wolf-optimization algorithm is proposed where the two objectives aim for decreasing the error of Naïve Bayes and K-nearest neighbour classifiers and a neural network as the final classifier. In evaluating this proposed framework, three datasets are applied. By obtaining 95.76% precision, 95.75% accuracy, 95.99% recall, and 95.82% f-measure, it is evident that this framework outperforms its counterparts.

Integrated Feature Selection Methods Using Metaheuristic Algorithms for Sentiment Analysis

A Comparative Study of Feature Selection and Machine Learning Methods for Sentiment Classification on Movie Data Set

Text Classification Using Hybridization of Meta-Heuristic Algorithm with Neural Network

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the explosion of information on the Internet, it is hard to make decisions based on reviews, tweets, etc. People purchase products on the Internet and immediately express their opinions. These opinions have a significant effect on the financial statements of the involved companies. The main problem in this process is the nature of the natural language of the expressed opinions. There exists a big gap between opinions in natural language (i.e., unstructured data) and where structured data applications are applied [1].

The knowledge stored as text, documents, video, and voice media formats exceeds 80% of its volume. In the field of computer science, these documents have an unstructured nature. In knowledge extraction, realization is must before searching the implicit meanings and concepts. Idea mining in any text is attributed to the technical phase of what humans can search for. Keywords are the keys sought by the search engines in finding text data, based on the probable presented facts, not ideas. Expressing ideas through keywords is impossible [2].

Sentiment classification (SC) is an appealing field in text mining. The extracted opinions from the unstructured data on the Internet become classified as positive, negative, or neutral. In this context, the three levels of document, sentence, and feature are of concern. According to Pang and Lee, the classes of these three features are determined separately [3].

Pre-processing is greatly contributive to SC. Most of the available studies are focused on traditional text classification approaches where a document is considered as both a bag-of-words (BoW), and part-Of-speech (POS) tagging [4, 5]. It is revealed that POS tags could not provide enough information for natural language processing (NLP) analyses. The POS tags will add unnecessary complexity, while the words are proper indicators for sentiment polarity detection [6]. BoW does not record multiple relationships among words; therefore, the researchers of this paper have added a multi-objective algorithm to extract better features.

The SC methods in practise are machine learning and lexicon-based [7,8,9,10]. Regarding SC, the machine learning method is adopted in most studies [5]. The advantage of this newly proposed method, as opposed to its counterparts, is in its ability to identify the non-emotional terms that make sense in a sentence (e.g. rich in the sentence ‘This article is rich.’). The Corpus-based [11] and Dictionary-based [12,13,14,15] methods are being applied in lexicon-based classification where the routine is to, first, search for every single word of a sentence in the sentiment lexicon dataset and, next, to extract the sentiment label of the word, provided that it is present in the network. Customer assistance in choosing a product might be the most favourite aspect applied in opinion mining. Before purchasing a product from e-shops, it is rational for the new customer to read the comments on the given product to be aware of its properties and compare it with like products.

The main objective of this article is to propose a framework to decrease the dimensions of the features in the multi-class SC where both the multi-objective grey wolf (MOGW) algorithm and the neural network (NN) classifier are applied. Non-existence of proper ‘decision-maker’ as to control keyword extraction among the comments and to reduce the input data dimensions makes the proposal of a framework necessary to face this challenge. The algorithm must read the information from the database, consisting of two training and testing sections, and send it to the pre-processing unit where the decomposition of sentences into words allows the ‘stop words deletion’. For the most important selected keywords, term frequency (TF), inverse document frequency (IDF), and inverse class frequency (ICF) weighting mechanisms are extracted and applied by the feature extraction unit. It is sought to enhance the classification performance by discrete grey wolf optimization than the other metaheuristic algorithm [16, 17]. Making the right choice regarding the worth of each element in the grey wolf algorithm is by applying three weights and two objectives outlined in this article. Better results can be obtained by applying two monitoring levels, that is, less domination and more crowding distance. In this framework, to find the best worth the accuracy evaluation parameter is applied in the objective functions design. The input structure in the classification unit consists of these features to develop the final model. The classification unit is where the data are classified and the final model is developed.

1.1 The innovative contributions

The innovations in this article consist of:

Combined nature of the framework.
A two-objective framework is developed for multi-class SC on movie reviews and Twitter.
Error reduction of Naïve Bayes (NB) classifier and the K-Nearest Neighbour (KNN) algorithm objectives and selection of most prominent features involved in a discrete multi-objective grey wolf algorithm.
Selected features and input data dimensions are reduced eliminating some tested features by application of the evolutionary MOGW optimization algorithm.

This article is organized as follows: The literature is reviewed in Sect. 2; the framework is proposed in Sect. 3; experiments and results are presented in Sect. 4; discussion is run in Sect. 5, and the article is concluded in Sect. 6.

2 Literature review

Many studies exist having different outlooks intending to improve the classification performance on the known datasets. These studies vary based on the applied classifiers and Internet online forums. Some resources that provide insight into the state-of-the-art literature are in Table 1 of the current research.

Table 1 A briefing of the studies run on SC

Full size table

Manek et al. [18] presented a combined algorithm including the Gini index-based feature selection and SVM classifier to classify large movie review dataset(s) and the obtained results indicate 92.8% accuracy and efficient error reduction rate. Zhuang et al. [19] proposed a multi-knowledge-based approach for movie review mining where the focus is on a specific movie genre review. The applied knowledge is a combination of integration-based WordNet, statistical analysis, and movie knowledge. Severyn et al. [20] proposed a method including the following three stages: (1) classifying the polarity, (2) testing structure with the domain, and (3) focus on English and Italian languages. Moreover, the tree kernels method is applied for better feature extraction. The obtained results confirm the efficiency of this method when the review domain is more than 4%, applied even in case of very low resources. Poria et al. [21] applied a 7-layer convolutional neural network to review all aspects of a text and analyze the sentiments showing speech patterns are mixed with the neural network to yield better results obtaining the highest precision of 92.7%. Chen and Qi [22] analysed opinions of the users to adopt a supervisor method called conditional radom field (CRF); a conditional probability distribution on an undirected graph. The data necessary for their article were collected from Yahoo and Flickr. The two contributing factors consist of the product features and user comments. The experimental results indicate that three-fourths of the decisions adopted are influenced by other user opinions, and just one-fourth is influenced by product features. Accuracy results are implemented on two purchased products. Chaovalit and Zhou [23] proposed an opinion mining method based on semantic orientation where compound words are applied and then graded. In their article, a comparison is made between the machine learning approaches, indicating that the accuracy of machine learning is evident at approximately 85.7%; something noticeably better than semantic orientation methods. Dave et al. [24] analyzed the comments of Amazon and C/net users by running two different experiments on the available data; the count of positive comments is considered to be five times more than the negative comments in training, and the positives and negatives are assumed equal if there exists an equilibrium. Experimental results indicate that the more compound words in classification, the more accurate the results. In the articles discussing the supervised models, the accuracy percentage is always higher than that of the ones discussing opinion mining in semantic orientation. This difference is due to the incompatibility of training data thereby preventing their comparison. For proving better efficiency of the semantic-oriented method in comparison with the machine learning methods, an experiment is run on the same data by implementing the SVM algorithm. Kumar and Jaiswal [25] applied binary grey wolf and moth flame for feature optimization to enhance the accuracy of the SC performance. Five baseline classifiers; namely NB, SVM, KNN, multilayer perceptron, and decision tree are applied to extract these features. The study was run on tweets in SemEval 2016 and SemEval 2017 obtaining the highest accuracy at 76.5% for SVM with a binary grey wolf optimizer based on the SemEval 2016 benchmark dataset. The grey wolf optimizer is applied for text clustering. Rashaideh et al. [26] applied the average distance of documents to the cluster centroid as an objective target to optimize the distance between the clusters of documents in a continuous manner. Evaluation of this method was based on six text documents selected from available public datasets in a random manner. Kumar and Khorwal [16] proposed a method for feature selection in SC where the Firefly Algorithm and SVM are applied as the feature selection and classifier, respectively. Shang et al. [17] applied the particle swarm optimization (PSO) Algorithm with three classifiers as a feature extraction.

As observed in Table 1, there exist a few studies run on SC, which can provide high classification performance through the grey wolf algorithm on movie and Twitter datasets. The main difference between this study and others like it is the fact that this framework explores features through the discrete structure of a two-objective grey wolf algorithm based on the three weighting mechanisms. Kumar and Jaiswal applied many classifiers in the classification stage [25]. The novelty of our research is applying a two-objective evolutionary algorithm with the ability of feature selection in a discrete manner. Kumar and Khorwal, and Shang et al. applied only one objective [16, 17]. The reason for the obtained better results is due to applying two monitoring stages, that is, less domination and more crowding distance. This proposed framework provides better results in terms of accuracy, precision, recall, and f-measure on the three datasets.

3 Proposed framework

The scheme of this framework is shown in Fig. 1 where, first, user comments are put in the request queue and are then loaded into the dataset. The stored data are proportioned in 70% and 30% as the testing and training datasets, respectively. A portion of the dataset space is allocated to the processed data at each stage. The main stage in this proposed framework is the sentiment processing engine consisting of the pre-processing, selection of important features through MOGW optimization algorithm, and classification through the multi-layer NN stages.

In the pre-processing stage, the user comments are extracted first, and next, Tokenization takes place to split an opinion into a list of sentences, thereafter, a sentence is decomposed into words. Words are first stemmed, then the ‘stop words’ are eliminated. The remaining words are simplified and, finally, the weight of words is calculated. The weighted words are stored in the processed data section of the dataset to be applied in the feature selection stage. In this stage, the processed data of the dataset are read, and the most important words are selected through the MOGW algorithm based on the two mentioned objectives then stored in the processed data. In the final stage, the model of classification is obtained, that is, a multi-layer NN is trained based on the processed data and the training data inputs. The obtained model is evaluated through the test data. The multi-layer NN setup and the final model produced through this proposed framework would be applied in classification. The results are obtained based on the final model and input data to be applied in knowledge management. The variables and notations applied in this article are in Table 2.

Table 2 Variables with their description

Full size table

The pseudocode of the base structure for this proposed framework is expressed as:

3.1 Pre-processing stage

3.1.1 Opinions decomposed into sentences and words

Opinions consist of some sentences, so first, the set S is formed which contains the sentences, next, each sentence is analysed to form a T set which contains the words of each sentence [33, 34]. Sentence Tokenization is a process of decomposing an opinion into a list of sentences. Tokenizing a paragraph into sentences means each sentence can be a token, and in a similar sense, a word can be a token when tokenizing a sentence into words. The extracted sentences are tokenized into words, and then, their earlier editions are applied.

3.1.2 Deleting stop words

Performance speed is an important factor, so the ‘stop words’ must be ignored. Stop words (e.g., a, about, all, am, did, has, have, etc.) are commonly applied in English with no contribution in recognizing their importance. Elimination of these words accelerates method functionality. The stop word count in this proposed method is 119.

3.1.3 Stemming the word

Word stemming refers to accurately recognizing and carefully registering the frequency of the words. Stemming is contributive to converting words into their simplest variant, like omitting ed from past tenses, ing from a present participle, and s and es from nouns, etc. Precision in recognizing similar words would improve by running this process accurately.

3.1.4 Extraction of the weight through the three mechanisms

The extraction process is an essential factor in this framework; TF, IDF, and ICF mechanisms are implemented and explained [10, 33, 34].

Term frequency: A frequent presence of a term, in all the comments, count (e.g., the total TF (I, d) as the frequency of term I in Document, and d is determined through TF.

Inverse document frequency: The word count repeatedly and commonly applied in a text is determined through IDF; calculated through Eq. 1:

$${\text{IDF}}_{i} = \log \frac{{{\text{TD}}}}{{{\text{DF}}_{i} }}$$

(1)

where TD is the total count of the comments and ${\text{DF}}_{i}$ is the comment containing the word i count.

Inverse class frequency: The count of repeatedly and commonly applied words in a class is determined through ICF; calculated through Eq. 2:

$${\text{ICF}}_{i} = \log \frac{{{\text{TC}}}}{{{\text{CF}}_{i} }}$$

(2)

where TC is the class count, and ${\text{CF}}_{i}$ is the class count containing the word i count.

3.2 Feature selection using MOGW algorithm

Extraction of the important features through the MOGW Algorithm is based on the combination of decreasing the error rate of NB and the KNN objectives where NN is applied as the final classifier. The structure of the MOGW optimizer algorithm for selecting important features (words) based on the weight of words listed in the following six stages:

1.
Generating the initial position of the wolves based on the total count of words
2.
Determining the worth of each wolf in population based on fitness function
3.
Classifying the wolves into four groups of alpha, beta, delta, and omega
4.
Movement of wolves toward best wolf of the group and the alpha wolf
5.
Deleting the wolves of low worth
6.
Selecting best wolf concerning dominance and crowding distance

The need-assessment structure of the MOGW Algorithm for feature selection is expressed through the following pseudocode:

First, the initial population and count of generations are determined; next, a random number between 1 and the count of the words is selected as the count of the initial population. The worth of each member in the population will be determined based on words, their weight, and the error rate of the KNN and NB classifiers (lines 2–6). By considering the gained worth and secondary worth, by applying the crowding distance, the population is sorted, and alpha, beta, delta, and omega populations are formed (lines 7–9). The main circle of the proposed GWO algorithm is repeated with the count of generations (line 10). A new population is formed based on the movement of wolves toward the best wolf in its group and toward the alpha wolf, known as the new wolves (line 11), which would be a combination of the previous population and the population formed by the movement (line 12). The worth of each member in this population is determined also based on words, their weight, and with respect to the error rate of the KNN and NB classifiers (lines 13–15). As to the gained worth and secondary worth, by applying the crowding distance, the population is sorted and classified (lines 16–18). Members would be selected from the first rank of the newly classified population to be counted as a member of future generation based on the initial population count (line 19).

3.2.1 Initial population

In this structure, each element (wolf) is an array of bits. The initial population is formed on a random basis. Each wolf takes the count of 1 and 0. If N is the word count, this array is defined as follows:

$$X = [{\text{bit}}_{1} ,{\text{bit}}_{2} , \ldots ,{\text{bit}}_{{N{\text{var}} }} ]$$

If one of the values in an array is zero, the word is ignored, while if 1, the word will be selected as seen in Table 3.

Table 3 The structure of each element of the initial population

Full size table

A wolf is represented by several 1 s and 0 s. If the count of the selected words in all comments is 15, the formed wolf would have 15 bits as the initial population, which, if 1, the word related to this index would be selected and considered by the selection operator, and if 0, the sentence would be ignored. Indices are selected randomly. When indices 3, 6, 8, and 14 are selected, the details of the formed wolf are according to the content of Table 4.

Table 4 A sample of formed wolf structure for 15 words

Full size table

3.2.2 Fitness function using two objectives

The suitability (or profitability) level of the wolves is determined through this function. Here, two objectives, each with its own worth, are considered for each element of the initial population.

3.2.2.1 The first objective: calculating the error through the KNN

The KNN algorithm is applied to calculate the value of the indices based on the data available in the database. KNN is a supervised training algorithm, applicable in estimating the density function of distributing training data and classifying the test data according to the training models. KNN is one of the simplest and most common methods based on sample learning.

It is assumed that all samples constitute points in n-dimensional real space, where, the neighbours are determined based on the standard Euclidean distance, and it is assumed that K is the neighbours count. The Euclidean distance is one of the most important factors in finding the neighbours. The Euclidean distance is obtained as a feature vector through Eq. 3:

$$< a1(x),a2(x), \ldots an(x) >$$

(3)

The Euclidean distance between the $x_{i}$ and $x_{j}$ samples is obtained through Eq. 4 [35].

$$d(xi,xj) = \sqrt {\sum\limits_{r = 1}^{n} {\mathop {(ar(xi) - ar(xj))}\nolimits^{2} } }$$

(4)

This proposed algorithm can be obtained by assigning a weight to each K sample of the neighbourhood, based on the test sample distance to other samples, and usually, there exists an inverse relation. Based on this assignment, the classification of all samples becomes possible instead of applying K neighbour samples at a lower speed. As to discreteness, Eq. 5 and as to continuousness Eq. 6 is applied [36].

$$\hat{f}(xq) \leftarrow \mathop {\arg \max }\limits_{v \in V} \sum\limits_{i = 1}^{k} {wi} \delta (v,f(xi))\quad \quad {\text{where}}\;wi = \frac{1}{{d(xq, xi)^{2} }}$$

(5)

$$\hat{f}(xq) \leftarrow \frac{{\sum\limits_{i = 1}^{k} {wi \, f(xi)} }}{{\sum\limits_{i = 1}^{k} {wi} }}\quad \quad {\text{where}}\;wi = \frac{1}{{d(xq,xi)^{2} }}$$

(6)

Applying even the unrelated features is possible when all features are involved in calculating the distance. This contradicts the Decision Tree method where only the related features are involved. Assume that every sample is determined by twenty features where only two are enough for classification and can be distanced from each other. This indicates that the distance applied in KNN is misleading. Applying more weight to related features is a possible solution. The solution, which is similar to changing the scales of the axes, indicates that the axis of the related features is shorter and the same of the unrelated features are longer. The cross-validation method is applied to determine the weight feature when a set of data is selected as the training data. The coefficients of $z_{1} , \ldots ,z_{n}$ volumes are selected to be multiplied by the vector of each axis to decrease the classification error in other samples. One or some feature(s) effect is or are completely ignored at $z_{j} = 0$.

3.2.2.2 The Second Objective: Calculating the error through the NB

Since the nature of the features is numerical, the NB decider is applied here where Eq. 7 is applied [37] to calculate the probability of event A in column A, provided that class C holds true:

$$P(K = A|C) = \frac{1}{{\sqrt {2\pi \sigma^{2}_{K = C} } }}{\text{e}}^{{ - \frac{{A - \mu_{K = C} }}{{2\sigma^{2}_{K = C} }}}}$$

(7)

where $\mu k = c$ is the column K mean, while the row belongs to the class C, and $\sigma_{k = c}^{2}$ is the variance of the k^th therein, and no input classification is required.

The output of the stage is the value of the error rate. Now it can be determined which one of the values is higher. The dominant wolf with a lower error rate is selected as the new population in the next generation. There exists an indirect relation between gained volume error and the produced output function efficiency. Equation 8 of [37] computes this error in its regressed sense:

$$S = \sum\limits_{i = 1}^{n} {|y_{i} - f(x_{i} )|}$$

(8)

where $y_{i}$ is the real output of the main class and $f(x_{i} )$ is the output computed by classifiers NB and KNN. At this stage, there exist two numbers (two objectives) for each wolf, which are applied in the next stage to determine the best wolf.

3.2.3 Selecting the best wolves (first front)

One of the most important features of the non-dominated MOGW algorithm is its ability to select the next generation where the Pareto, or set of non-dominated solutions, is determined first. To understand the concept of domination, the following definitions are of concern [38, 39]:

Strict Domination: Wolf $P_{2}$ is strictly dominated by wolf $P_{1}$, and if in all fitness functions $P_{1} \prec P_{2}$, then the strict domination is obtained through Eq. 9.
$$F_{i} (P_{1} ) \prec F(P_{2} )\quad \quad \forall i = 1 \ldots m$$
(9)
Weak Domination: Wolf $P_{1}$ can have weak domination on wolf $P_{2}$ and $P_{1} \prec P_{2}$ in all fitness functions and $\succ p_{2}$ in at least one fitness function. The term $P_{1} \le P_{2}$ indicates weak domination obtained through Eq. 10.
$$\begin{gathered} F_{i} (P_{1} ) \prec F(P_{2} )\;{\text{for}}\;{\text{at}}\;{\text{least}}\;{\text{one}}\quad \forall i = 1 \ldots m \hfill \\ F_{i} (P_{1} ) \sim \succ F(P_{2} )\quad \quad \quad \quad \quad \quad \forall i = 1 \ldots m \hfill \\ \end{gathered}$$
(10)
Neutral: Wolf $P_{1}$ will be neutral to wolf $P_{2}$ if the value of $P_{1} \prec P_{2}$ in some fitness functions and $P_{2} \prec P_{1}$ in other functions. The $P_{1} \sim P_{2}$ is an indicator of neutrality.

In the MOGW algorithm structure, elements from the population should be selected for the next generation with either strict or weak domination on other elements. In this context, the strictly dominated elements are selected first, followed by the weak ones. This proposed method provides a matrix to save the output of fitness functions for each wolf; Table 5.

Table 5 The structure of fitness function output for wolves

Full size table

After tabulating this table, a new developed matrix called the ‘domination matrix of a square structure’ is evident for an equal count of columns and rows equal to the count of the wolves. If the wolf i dominates wolf j, in either a strict or weak sense, the intersection cell of row i or column j will be valued first as 1; otherwise, 0 and the following will be valued, and the sum of each column is calculated. The wolves are sorted according to these ascending counts. The words, with low sum value, are the best Pareto next generation nomination. The wolves, with an equal sum of domination, are placed in the same group. The decision for the next generation is drawn according to this domination count. Here, the structure performance is reviewed in an example where wolves $P_{1}$ to $P_{6}$ are considered as the current generation. The details of the value saving matrix for objective functions are in Table 6.

Table 6 The value saving matrix

Full size table

Table 6 provides the domination and the values of the wolves. As to the following calculations for wolf $P_{1}$, the following must hold true: $P_{1}$ is neutral in relation to $P_{2}$, while it strictly dominates $P_{3}$, $P_{4}$, $P_{5}$, and $P_{6}$, thus, in the first row, the domination of $P_{1}$ is 0, $P_{2}$ is 0, and $P_{3}$ to $P_{4}$ is 1.

$$\begin{aligned} & P_{1} \sim P_{2} \quad \to \quad \quad \quad 0.05 \succ 0.03\quad \quad \quad 0.01 \prec 0.02 \\ & P_{1} \prec P_{3} \quad \to \quad \quad \quad 0.05 \prec 0.08\quad \quad \quad 0.01 \prec 0.03 \\ & P_{1} \prec P_{4} \quad \to \quad \quad \quad 0.05 \succ 0.07\quad \quad \quad 0.01 \prec 0.06 \\ & P_{1} \prec P_{5} \quad \to \quad \quad \quad 0.05 \succ 0.21\quad \quad \quad 0.01 \prec 0.11 \\ & P_{1} \prec P_{6} \quad \to \quad \quad \quad 0.05 \succ 0.14\quad \quad \quad 0.01 \prec 0.12 \\ \end{aligned}$$

Consequently, the first row of the domination matrix at the first stage appears in Table 7.

Table 7 The sample' s domination matrix at the first stage

Full size table

As to the following calculation: $P_{2}$ is neutral in relation to $P_{1}$, while it strictly dominates $P_{3}$, $P_{4}$, $P_{5}$, and $P_{6}$, thus, in the second row, domination for $P_{1}$ is 0, $P_{2}$ is 0, and for $P_{3}$ to $P_{6}$ is 1.

$$\begin{aligned} & P_{2} \prec P_{3} \quad \to \quad \quad \quad 0.03 \prec 0.08\quad \quad \quad 0.02 \prec 0.03 \\ & P_{2} \prec P_{4} \quad \to \quad \quad \quad 0.03 \prec 0.07\quad \quad \quad 0.02 \prec 0.06 \\ & P_{2} \prec P_{5} \quad \to \quad \quad \quad 0.03 \prec 0.21\quad \quad \quad 0.02 \prec 0.11 \\ & P_{2} \prec P_{6} \quad \to \quad \quad \quad 0.03 \prec 0.14\quad \quad \quad 0.02 \prec 0.12 \\ \end{aligned}$$

The details of the modified domination matrix at the second stage are in Table 8.

Table 8 The sample's domination matrix at the second stage

Full size table

As to the following calculations: $P_{3}$ is dominated through $P_{1}$ and $P_{2}$, and is neutral in relation to P4, while it strictly dominates $P_{5}$ and $P_{6}$. As observed in this table, in the third row, $P_{1}$ domination is 0, $P_{2}$ is 0, $P_{3}$ is 0, $P_{4}$ is 0, and for $P_{5}$ and $P_{6}$ is 1.

$$\begin{aligned} & P_{3} \prec P_{4} \quad \to \quad \quad \quad 0.08 \succ 0.07\quad \quad \quad 0.03 \prec 0.06 \\ & P_{3} \prec P_{5} \quad \to \quad \quad \quad 0.08 \prec 0.21\quad \quad \quad 0.03 \prec 0.11 \\ & P_{3} \prec P_{6} \quad \to \quad \quad \quad 0.08 \prec 0.14\quad \quad \quad 0.03 \prec 0.12 \\ \end{aligned}$$

The details of the modified domination matrix at the third stage are in Table 9.

Table 9 The sample's domination matrix at the third stage

Full size table

As to the following calculations: $P_{4}$ is dominated through $P_{1}$, $P_{2}$, and $P_{3}$. It strictly dominates $P_{5}$ and $P_{6}$, thus, in the fourth row, domination for $P_{1}$ is 0, $P_{2}$ is 0, $P_{3}$ is 0, $P_{4}$ is 0, and for $P_{5}$ and $P_{6}$ is 1.

$$\begin{aligned} & P_{4} \prec P_{5} \quad \to \quad \quad 0.07 \prec 0.21\quad \quad 0.06 \prec 0.11 \\ & P_{4} \prec P_{6} \quad \to \quad \quad 0.07 \prec 0.14\quad \quad 0.06 \prec 0.12 \\ \end{aligned}$$

The details of the modified domination matrix at the fourth stage are in Table 10.

Table 10 The sample's domination matrix at the fourth stage

Full size table

Considering the above calculations for $P_{5}$ and $P_{6}$, the details of final domination are in Table 11.

Table 11 The final sample's domination matrix

Full size table

As observed in Table 11, three groups are generated that would be sorted based on the obtained counts in an ascending manner; called ‘non-dominated sorting’. Accordingly, the Pareto front is expressed as follows:

$$\begin{aligned} & F_{1} = \{ P_{1} ,P_{2} \} \\ & F_{2} = \{ P_{3} ,P_{4} \} \\ & F_{3} = \{ P_{5} ,P_{6} \} \\ \end{aligned}$$

3.2.4 Crowding distance

One of the main challenges in selecting the next generation population of the wolves is the probability of having wolves of the same groups with equal ranking. In the mentioned examples, if it is required to choose half of the population for the next generation, then, $P_{1}$ and $P_{2}$ will be selected definitely, while the next wolf must be selected among $P_{3}$ and $P_{4}$. This is a difficult task because they are of equal ranking. At this stage, any crowding distance is meaningless for two points where a wolf is selected randomly among $P_{3}$ and $P_{4}$. The manner of converting the population into non-dominated groups is shown in Fig. 2.

As observed in this figure, the wolves qualified for next generation nomination are separated by the dotted line. All wolves in group 3 have the same worth, therefore, decision making on the normal selection is impossible and necessitates the application of crowding distance. This concept is applied because of more correspondence in solution distribution in a region, thus, having more obtained optimal solution in the next generation. The points with higher crowding distance will appear in the next generation, that is, the crowding distance should be separately calculated for all groups. The crowding distance for wolf $p$ is calculated through Eq. 11 [40, 41]:

$${\text{CD}}(p) = \sum\limits_{K = 1}^{t} {\frac{{|f_{k} (p - 1) - f_{k} (p + 1)}}{{\max (f_{k} ) - \min (f_{k} )}}}$$

(11)

where t is the fitness function count, $\max (f_{k} )$ is the highest value of function $f_{k}$, and $\min (f_{k} )$ is the lowest value of function $f_{k}$.

These points have no neighbours to cover the crowding distance for each objective k function, so the infinite distance value is assigned to the points with the maximum and minimum values of the objective function. For the other points, the same equation $(i = 2,\,3,\,...,\,(n - 1))$ is applied and crowding distances are summed around each point. The crowding distance for each member of the groups is separately calculated. Only the distances between the members of each group are compared and each group is sorted according to the descending pattern. The wolves with the highest value of crowding distance are included in the next generation. The calculating pattern of the crowding distance is shown in Fig. 3.

3.2.5 Alpha, beta, delta, and omega group formations

One of the most outstanding features of the Grey Wolf algorithm is its memetic property. That is, each generation consists of four groups separately assessed. Elements of the groups tend to move toward the optimum based on the worth of the group and the alpha. For improving this property of this algorithm, a discrete structure is applied for the movement of the wolves. This stage requires that the population of the wolves follow a descending pattern based on their worth. There exist n wolves in each group, where wolf (element) is assigned an array structure with a length equal to the count of the extracted words. The count of the elements is divided into four (i.e., each group consisting of ${\raise0.7ex\hbox{$n$} \!\mathord{\left/ {\vphantom {n 4}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$4$}}$ wolf). Following this process, the best member of each group is called $X - B$.

3.2.6 The movement of the weak wolves toward the best wolf

At this stage in the evolutionary process, the location of a wolf is modified to approach the best wolf. If a wolf with less worth moves toward the location of the best wolf, where no new solution is obtained, this movement is inefficient in practise. Consequently, a discrete structure is applied for this movement. In each group, the movement is oriented toward the optimal wolf. Considering that $X - B$ (the best wolf) and $X - P$ (any member of the group) are of an array structure, this movement is subject to the following procedure:

The ith step element is obtained through Eq. 12. A random value within 0 and the count of words range (called UB) is assigned to $S - Max$, and if the ith index of $X - B$˃ ith index of $X - P$, a positive mutation occurs. The step element is obtained through Eq. 12:

$${\text{Step}}_{i} = \min ({\text{int}} [{\text{rand}}(1,X\_B_{i} )],S\_Max)$$

(12)

if the ith index of $X - B$ ˂ ith index of $X - P$, the opposite holds true through Eq. 13:

$${\text{Step}}_{i} = \max ( - 1 \times {\text{int}} [{\text{rand}}(1,X\_P_{i} - X\_B_{i} )], - S\_{\text{Max}})$$

(13)

To calculate any one of the new $X - P$ indices, Eq. 14 is applied.

$$X\_P_{i} = (X\_P_{i} + {\text{Step}}_{i} )$$

(14)

3.2.7 Wolves reassessment

The new wolves are generated according to their mutation and all new wolves should be reassessed to determine their worth.

3.2.8 Selecting the best wolf

Ending the run of the multi-objective wolves crowding algorithm depends on the count of generations, that is, it ends after many runs and the wolf with the higher worth is selected and sent to the classification unit to be considered as the best case of subsequent grouped appropriate words.

3.3 Classification through NN

This classifier is the final in this framework where the output and input, layers, and functions of the NN are the most important factors [36]. For example, if the extracted features in the previous stage are 3, and the classification structure is of two classes, then the structure of the NN will be illustrated as observed in Fig. 4.

Note that these stages and classes are subject to the best wolf features.

Features mapping as the input layer

The best element (wolf) in the feature selection stage is selected by application of a grey wolf algorithm and is considered the most important factor in the NN input. This indicates that the neurons count in this layer of the NN is equal to the extracted features of the best wolf in the feature selection stage.

The middle layer

The neurons count in this layer is equal the neurons count of the input layer, plus one.

Mapping of the class count as the output layer

Here, the classification structure is two-class or three-class. The neurons count of the output layer would be 2 or 3 neurons, provided that the proposed method data are of the two or three classes. Here, each neuron has 0 or 1 value. The output of the first neuron would be 1, and the second 0, if the NN recognizes the features are of class 1.

4 Experiments and results

MATLAB S/W is applied to evaluate the efficiency of this proposed framework. The K-fold technique with tenfold cross-validation is applied to improve the accuracy of the evaluation.

4.1 Test environment and evaluation parameters

Windows 10 is the test environment, and the specifications are tabulated in Table 12.

Table 12 Testing enviroment specifications

Full size table

The four important metrics in classification evaluation proposed in this article consist of accuracy, precision, recall, and f-measure. Accuracy is the sum of actual tuples relative to the total number of classified instances. The precision and recall are separately calculated for all classes and then averaged. The f-measure is the weighted harmonic mean of precision and recall. According to the evaluations run here, the Mean of Square Error (MSE), Sum of Squares for Error (SSE), and Determination Coefficient ($R^{2}$) error indices are evident. In $R^{2}$, y is the actual value, and $\overline{{y_{i} }}$ is the predicted value. All tabulated evaluation metrics are in Table 13 [37].

Table 13 Evaluation parameters

Full size table

4.2 The used datasets

The following three datasets are applied in evaluating this proposed framework performance; Table 14.

1.
Cornell movie review [27], where the polarity movie dataset (PMD) of 1000 positive and 1000 negative reviews, are extracted from IMDB, which are of concern and generated datasets in sentiment analysis.
2.
User comments on other user tweets on Twitter; the Sentiment140 is obtained from 127,000 comments from 73,100 users. Here, the comments are grouped in positive, negative, and neutral. This text corpus structure is known as TS3.
3.
User comments on other user tweets on Twitter; the Sentiment140-2 are obtained from 74,000 comments from 31,820 users. The comments, TS2, are grouped in positive and negative, and this text corpus structure is known as TS3. The data are provided from Stanford University data [42].

Table 14 A sample of the structure of two records in datasets

Full size table

4.3 The comparison

The closest methods to this framework are the FFSVM [16] and FS-BPSO [17] where the metaheuristic algorithms, including Firefly and PSO algorithms are applied. The reason for choosing these methods is their close correspondence with this framework. The feature selection stage of the two methods is separately explained. The main parameters of implemented algorithms are in Table 15. The total features' count of the best elements of the three algorithms is tabulated in Table 16.

Table 15 Main parameters of implemented algorithms

Full size table

Table 16 The total features' count of the best elements

Full size table

FFSVM: The mutual information criterion is applied to extract features. The feature selection is accomplished through the discrete Firefly algorithm. The initial population is generated as a binary array of bits with a length equal to the features count and define the population size. The initial positions are generated in a random manner as a number within 0–1. For each element in the population, the fitness function is calculated based on the accuracy of the binary classifier SVM. The Fireflies of less worth move toward Fireflies of more worth. For updating the position, the researchers changed the optimization form to the discrete state. The new positions are calculated based on this change. The best Firefly will be selected based on the fitness function. Finally, the best position is saved, and other positions of less worth are deleted. This process will terminate when the next obtained generation equals to the maximum generation, Kumar and Khorwal [16].

FS-BPSO: The mutual information criterion is applied to evaluate features in binary SC and the feature selection is accomplished through a binary PSO algorithm for binary SC. The initial population is generated as a feature vector for the velocity and the position of particles in a random manner. This array with a size equal to features count is filled with 0 or 1. The researchers proposed an updated equation to evaluate both position and velocity. They applied a fitness sum for particles, which is divided into two groups where the group with the higher sum is selected. They applied a mutation rate to assure the convergence. The solutions are evaluated by the new positions of the particles. This process is repeated until the maximum generation is met, Shang et al. [17].

In this study, the objective is a feature selection as to reduce the count of features, which in turn would increase the classification accuracy in multi-class SC. The initial population is generated as an array of 0 or 1 in a random manner with a length equal to important features. To decrease dimension, half of the population is selected in a random manner, and only bits with a value equal to 1 are of concern. For each element of the initial population in the MOGW algorithm, TF, IDF, and ICF weights are involved. The KNN and NB are applied to classify the elements based on these weights. The obtained error rate of classification is calculated through the regression function because the framework is supervised. Now, for each wolf (element) in the population, there exist two that have worth (two objectives). The best wolf is selected based on the non-dominated sorting and sum of the crowding distance. When the wolves cannot dominate, and for the wolves with the same level of domination, the crowding distance is applied. The equations for the movement of the wolf positions are updated based on the discrete structure, and after generating several generations, a wolf will be extracted as the best feature that dominates other wolves (one with less domination value, or the more crowding distance with equal domination worth).

4.4 Evaluation of the MOWGOKB framework

This framework is compared to the FFSVM [16] and FS-BPSO [17], which is implemented on the three datasets. The x-axis illustrates the compared models in tenfold cross-validation, whereas the y-axis shows the evaluation parameters (i.e., accuracy, precision, and recall values); Figs. 5, 6, 7, 8, 9, 10, 11, 12, 13.

4.4.1 The PMD dataset

The precision value of this framework is in comparison with two methods, Fig. 5, where its higher precision is evident. The comparison of its accuracy is shown in Fig. 6, where its higher accuracy is evident. Accuracy is the most outstanding index in every method. Similar to the precision and accuracy, the MOWGOKB recall with two methods is evident in Fig. 7 where it is observed that this framework provides higher recall, thus, the more the precise feature extraction, the more the precise classification. The highest obtained precision, accuracy, and recall values of this framework are 95.76%, 95.21%, and 95.99%, respectively. The results have improved by an approximate 4% more than that of the FS-BPSO and 2% of FFSVM. The outperformance of this framework is evident by comparing the three methods. The details of this comparison as to error indices on the PMD dataset are in Table 17.

Table 17 Error Indices comparison is made based on the PMD dataset

Full size table

4.4.2 The TS2 dataset

The precision value is compared with two methods, Fig. 8, where it is revealed that this is more precise than the other two are, and the same for accuracy is shown in Fig. 9, where its higher accuracy is evident. Accuracy is an outstanding index in any method. Similar to the precision and accuracy, the MOWGOKB recall with two methods is evident in Fig. 10, where it is observed that this framework provides higher recall, thus, the more the precise feature extraction, the more the precise classification is. The highest obtained precision, accuracy, and recall values of this framework are 95.72%, 95.75%, and 95.93%, respectively. The results are improved by an approximate 3% more than that of the FS-BPSO and 1% of FFSVM. Outperformance of this framework is evident by comparing the three methods. The details of the MOWGOKB framework error indices compared with the two methods above on the TS2 dataset are in Table 18.

Table 18 Error indices comparison is made based on the TS2 dataset

Full size table

4.4.3 The TS3 dataset

The precision value compared with the two methods is shown in Fig. 11, where the higher precision is evident. The comparison of its accuracy is shown in Fig. 12, where its higher accuracy is evident. The accuracy (recognition percentage) is an outstanding index in any method. Similar to the precision and accuracy, the MOWGOKB recall with two methods is evident in Fig. 13, where it is observed that this framework provides higher recall, thus, the more the precise feature extraction, the more the precise classification is. The highest obtained precision, accuracy, and recall values of this framework are 94.98%, 94.39%, and 94.77%, respectively. The results are improved by an approximate 3% more than that of the FS-BPSO and 1% of FFSVM, so comparing the three methods, the outperformance of this framework is evident. The details of the MOWGOKB framework error indices compared with the two methods above on the TS3 dataset are in Table 19.

Table 19 Error indices comparison is made based on the TS3 dataset

Full size table

According to Tables 17, 18, 19, as to the three datasets, it is revealed that the error indices of the MOWGOKB framework are better than its counterparts. The obtained precision, accuracy, and f-measure by the three works, in tenfold cross-validation on all datasets, are compared in Table 20 and as observed, the results are better than that of its counterparts.

Table 20 The details of the MOWGOKB framework compared to the FFSVM and FS-BPSO methods in tenfold cross-validation on three datasets

Full size table

5 Discussion

As observed in Figs. 14, 15, 16, the evaluation parameters of the mean performance for the compared models are represented on the x-axis, and the obtained values are represented on the y-axis. The results of different evaluation parameters (i.e., the average accuracy, precision, recall, and f-measure) of the MOWGOKB, in comparison with other models for all datasets, are shown in the same figures.

Overall performances of the PMD dataset are shown in Fig. 14, where the MOWGOKB framework presents the average accuracy and f-measure of 94.62% and 94.85%, respectively. The FFSVM model, by applying the Firefly algorithm, represents the average accuracy and f-measure of 92.77% and 92.91%, respectively. The FS-BPSO method obtained average accuracy and f-measure values of 91.24% and 91.15%, respectively. The FFSVM model has a better performance than that of the FS-BPSO, yet to exceed this framework. This framework provides accuracy by approximately 3% more than that of the FS-BPSO and 2% of FFSVM. It is revealed that this framework outperforms the metaheuristic algorithms on the PMD dataset.

Overall performances of the TS2 dataset are shown in Fig. 15 where the average accuracy and f-measure values of the MOWGOKB framework are 94.89% and 95.08%, respectively. The average accuracy and f-measure of the FFSVM method are 93.07% and 93.28%, respectively, while the FS-BPSO method where the binary PSO algorithm is applied, obtained the average on accuracy and f-measure values are 91.25% and 91.58%, respectively.

The FFSVM model provides better performance than the FS-BPSO in the TS2 dataset, yet to exceed this framework. The average f-measure obtained through the MOWGOKB framework is approximately 2% higher than that of the FFSVM method, indicating its outperformance against its counterparts on the TS2 dataset. The average accuracy is improved by an approximate 4% more than that of the FS-BPSO and 2% of FFSVM.

Overall performances of the TS3 dataset are shown in Fig. 16 where the obtained average accuracy and f-measure of the MOWGOKB framework are 93.88% and 92.16%, respectively. The average accuracy and f-measure of the FFSVM method are 92.26% and 92.35%, respectively. However, the FS-BPSO method obtained average accuracy and f-measure values of 90.75% and 90.57%, respectively.

The FFSVM model provides better performance in the TS3 dataset than the FS-BPSO, yet to exceed this framework. The average accuracy is improved by an approximate 4% more than that of the FS-BPSO and 2% of FFSVM.

The advantage of this framework on the three datasets is evident in the provided figures, which reveal that the MOWGOKB framework yields better accuracy, precision, recall, and f-measure average for multi-class SC by applying an apocopate feature selection and decreased dimension. It is observed that feature selection, through a discrete grey wolf algorithm, yields better results than the Firefly and binary PSO algorithms applied in the FFSVM and FS-BPSO, respectively. In contrast to these two methods, in this framework TF, IDF, and ICF weights are applied in pre-processing. The findings of this framework, against that of the FFSVM and FS-BPSO methods, indicate that a combination of the discrete MOGW algorithm with these two objectives alongside applied weighting mechanisms is better than the Firefly algorithm applied in the FFSVM and the binary PSO algorithm applied in the FS-BPSO methods. The findings also show by reviewing the results that more parameters are considered here than other methods for the feature selection stage; hence, it is more successful than its counterparts are. The calculated error rates through these two objectives determine the importance of worth in selecting the best wolf (element). The reason for the obtained better results is in applying the two monitoring levels of less domination and more crowding distance.

In brief, this framework provides a better feature selection than its two counterparts do and can improve the performance of multi-class SC. It is deduced that the MOWGOKB framework provides the highest performance for all datasets, while the FFSVM model provides the second best.

6 Conclusion

People purchase products on the Internet and express their opinions thereof every second. These opinions are useful in the financial statements in the related outlets. With the explosion of information on the Internet, it is difficult for ordinary people to make decisions about products. SC is a significant field in sentiment analysis that can assist e-relations. The rapid growth in electronic text documentation makes such analysis important when it comes to information recovery. Keywords are the useful tools applied in searching for a high volume of text documentation in a short period, and their extraction is the focus of many researchers.

A new framework called MOWGOKB is introduced for multi-class SC based on a discrete MOGW algorithm with two objectives of decreasing the error of the KNN and NB classifiers. The NN classifier is applied as the final classifier. Here, user comments are first tokenized into sentences. The given sentence is decomposed, the words are stemmed, and the ‘stop words’ are eliminated. The most important words (features) are selected based on the two NB and KNN error reduction objectives through the MOGW algorithm. The final classification is made through the NN. For evaluation, the three PMD, TS2, and TS3 datasets are applied. The MOWGOKB framework is compared with FFSVM and FS-BPSO methods. The obtained results indicate that this proposed framework outperforms its counterparts on the movie dataset, with 95.76% precision, 95.21% accuracy, 95.99% recall, and 95.15% f-measure rates, which compared to other methods, represents about 4% improvement. The obtained results indicate that this framework outperforms its counterparts on the Twitter dataset, with a 95.72% precision, 95.75% accuracy, 95.93% recall, and 95.82% f-measure rates, which compared to other methods, represents about 4% improvement. In this context as suggested future work, a combination can be considered of a multi-objective algorithm with sequence pattern mining.

References

Gao H, Zeng X, Yao C (2019) Application of improved distributed naive Bayesian algorithms in text classification. J Supercomput 75(9):5831–5847
Article Google Scholar
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retriev. https://doi.org/10.1561/1500000001
Article Google Scholar
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Barcelona, Spain, pp 271–278. https://doi.org/10.3115/1218955.1218990
Abbas A, Hussein QM (2020) Twitter Sentiment Analysis Using an Ensemble Majority Vote Classifier. J Southwest Jiaotong Univ. https://doi.org/10.35741/issn.0258-2724.55.1.9
Article Google Scholar
Ahmad S, Zakwan M, Syafira N, Moziyana N (2019) A review of feature selection and sentiment analysis technique in issues of Propaganda. Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2019.0101132
Article Google Scholar
Alsaeedi A, Khan MZ (2019) A study on sentiment analysis techniques of twitter data. Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2019.0100248
Article Google Scholar
Verma B, Thakur RS (2018) Sentiment analysis using lexicon and machine learning-based approaches: a survey. In: Proceedings of International Conference on Recent Advancement on Computer and Communication, Lecture Notes in Networks and Systems, Springer, Singapore. https://doi.org/10.1007/978-981-10-8198-9_46
Zhang H, Gan W, Jiang B (2014) Machine learning and lexicon based methods for sentiment classification: a survey. In: Proceeding of the 11th Web Information System and Application Conference, IEEE, Tianjin, China. https://doi.org/10.1109/WISA.2014.55
Abdulla NA, Ahmed NA, Shehab MA, Al-Ayyoub M, Al-Kabi MN, Al-rifai S (2014) Towards improving the lexicon-based approach for arabic sentiment analysis. Int J Inf Technol Web Eng 9(3):55–71
Article Google Scholar
Nawaz A, Asghar S, Naqvi SHA (2019) A segregational approach for determining aspect sentiments in social media analysis. J Supercomput 75(5):2584–2602
Article Google Scholar
Alnawas A, Arici N (2018) The corpus based approach to sentiment analysis in modern standard Arabic and Arabic dialects: a literature review. Politeknik Dergisi 21(2):461–470
Google Scholar
Cruz L, Ochoa J, Roche M, Poncelet P (2017) Dictionary-based sentiment analysis applied to a specific domain. In: Proceeding of the 3rd. Annual Internacional Symposium on Information Management and Big Data, Communications in Computer and Information Science, Springer, Cham. https://doi.org/10.1007/978-3-319-55209-5_5
Phu VN, Chau VTN, Tran VTN, Dat ND (2018) A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics. Artif Intell Rev 50:93–159. https://doi.org/10.1007/s10462-017-9538-6
Article Google Scholar
Kumar CSP, Babu LDD (2020) Evolving dictionary based sentiment scoring framework for patient authored text. Evol Intel. https://doi.org/10.1007/s12065-020-00366-z
Article Google Scholar
Park S, Kim Y (2016) Building thesaurus lexicon using dictionary-based approach for sentiment classification. In: Proceeding of the 14th International Conference on Software Engineering Research, Management and Applications, IEEE, Towson, MD, USA. https://doi.org/10.1109/SERA.2016.7516126
Kumar A, Khorwal R (2017) Firefly algorithm for feature selection in sentiment analysis. In: Computational Intelligence in Data Mining. Singapore, Springer. pp 693–703. https://doi.org/10.1007/978-981-10-3874-7_66
Shang L, Zhou Z, Liu X (2016) Particle swarm optimization-based feature selection in sentiment classification. Soft Comput 20(10):3821–3834
Article Google Scholar
Manek AS, Shenoy PD, Mohan MC et al (2017) Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World wide web 20(2):135–154. https://doi.org/10.1007/s11280-015-0381-x
Article Google Scholar
Zhuang L, Jing F, Zhu X-Y (2006) Movie review mining and summarization. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, pp 43–50. https://doi.org/10.1145/1183614.1183625
Severyn A, Moschitti A, Uryupina O et al (2016) Multi-lingual opinion mining on YouTube. Inf Process Manag 52(1):46–60
Article Google Scholar
Poria S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108:42–49
Article Google Scholar
Chen L, Qi L (2011) Social opinion mining for supporting buyers’ complex decision making: exploratory user study and algorithm comparison. Social Netw Anal Min 1(4):301–320. https://doi.org/10.1007/s13278-011-0023-y
Article Google Scholar
Chaovalit P, Zhou L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences. IEEE, Big Island, HI, USA, pp 1–9. https://doi.org/10.1109/HICSS.2005.445
Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web. ACM, 2003, pp 519–528. https://doi.org/10.1145/775152.775226
Kumar A, Jaiswal A (2019) Swarm intelligence based optimal feature selection for enhanced predictive sentiment accuracy on twitter. Multimed Tools Appl 78(20):29529–29553. https://doi.org/10.1007/s11042-019-7278-0
Article Google Scholar
Rashaideh H, Sawaie A, Al-Betar MA et al (2018) A grey wolf optimizer for text document clustering. J Intell Syst 29(1):814–830. https://doi.org/10.1515/jisys-2018-0194
Article Google Scholar
Movie review data set. https://www.cs.cornell.edu/people/pabo/movie-review-data/
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 168–177. https://doi.org/10.1145/1014052.1014073
Rosenthal S, Farra N, Nakov P (2017) SemEval-2017 task 4: sentiment analysis in Twitter. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, pp 502–518. https://doi.org/10.18653/v1/S17-2088
Nakov P, Ritter A, Rosenthal S et al (2019) SemEval-2016 task 4: Sentiment analysis in Twitter, In: 10th International Workshop on Semantic Evaluation (SemEval-2016), Association for Computational Linguistics, San Diego, California, pp 1–18. https://doi.org/10.18653/v1/S16-1001
Deshmukh JS, Tripathy AK (2018) Entropy based classifier for cross-domain opinion mining. Appl Comput Inform 14(1):55–64. https://doi.org/10.1016/j.aci.2017.03.001
Article Google Scholar
Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. ACM, pp 375–384. https://doi.org/10.1145/1645953.1646003
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book Google Scholar
Nguyen DQ, Nguyen Dat Q, Vu T et al (2014) Sentiment classification on polarity reviews: an empirical study using rating-based features. In: Proceeding if the 5th Workshop on Computational Approaches to Subjectivity. Sentiment and Social Media Analysis, Baltimore, Maryland, pp 128–135. https://doi.org/10.3115/v1/W14-2621
Cha S-H (2007) Comprehensive survey on distance/similarity measures between probability density functions. Int J Math Models Methods Appl Sci 4(1):300–307
Google Scholar
Alpaydin E (2014) Introduction to machine learning. MIT press, Cambridge
MATH Google Scholar
Han J, Micheline K, Jian P (2012) Data mining: concepts and techniques. Morgan Kaufmann Elsevier, Burlington. https://doi.org/10.1016/C2009-0-61819-5
Book MATH Google Scholar
Coello CAC, Lamont GB, Van Veldhuizen DA (2007) Evolutionary algorithms for solving multi-objective problems. Springer, Berlin. https://doi.org/10.1007/978-0-387-36797-2
Book MATH Google Scholar
Zitzler E (1999) Evolutionary algorithms for multiobjective optimization: methods and applications. Citeseer. https://doi.org/10.1.1.39.9023
Korosec P (2010) New achievements in evolutionary computation. BoD–books on demand. https://doi.org/10.5772/3083
Deb K, Pratap A, Agarwal S et al (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197. https://doi.org/10.1109/4235.996017
Article Google Scholar
Deeply moving: deep learning for sentiment analysis. https://nlp.stanford.edu/sentiment/

Download references

Acknowledgements

Appreciations are extended to Islamic Azad University Isfahan Branch, for supporting this study by Grant #23842006951003.

Author information

Authors and Affiliations

Department of Computer Engineering, Isfahan (Khorasgan) Branch, Islamic Azad University, Isfahan, Iran
Razieh Asgarnezhad & Mohammadreza Soltanaghaei
Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran
S. Amirhassan Monadjemi

Authors

Razieh Asgarnezhad
View author publications
You can also search for this author in PubMed Google Scholar
S. Amirhassan Monadjemi
View author publications
You can also search for this author in PubMed Google Scholar
Mohammadreza Soltanaghaei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Amirhassan Monadjemi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Asgarnezhad, R., Monadjemi, S.A. & Soltanaghaei, M. An application of MOGW optimization for feature selection in text classification. J Supercomput 77, 5806–5839 (2021). https://doi.org/10.1007/s11227-020-03490-w

Download citation

Accepted: 23 October 2020
Published: 12 November 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11227-020-03490-w

An application of MOGW optimization for feature selection in text classification

Abstract

Similar content being viewed by others

Integrated Feature Selection Methods Using Metaheuristic Algorithms for Sentiment Analysis

A Comparative Study of Feature Selection and Machine Learning Methods for Sentiment Classification on Movie Data Set

Text Classification Using Hybridization of Meta-Heuristic Algorithm with Neural Network

Explore related subjects

1 Introduction

1.1 The innovative contributions

2 Literature review

3 Proposed framework

3.1 Pre-processing stage

3.1.1 Opinions decomposed into sentences and words

3.1.2 Deleting stop words

3.1.3 Stemming the word

3.1.4 Extraction of the weight through the three mechanisms

3.2 Feature selection using MOGW algorithm

3.2.1 Initial population

3.2.2 Fitness function using two objectives

3.2.2.1 The first objective: calculating the error through the KNN

3.2.2.2 The Second Objective: Calculating the error through the NB

3.2.3 Selecting the best wolves (first front)

3.2.4 Crowding distance

3.2.5 Alpha, beta, delta, and omega group formations

3.2.6 The movement of the weak wolves toward the best wolf

3.2.7 Wolves reassessment

3.2.8 Selecting the best wolf

3.3 Classification through NN

4 Experiments and results

4.1 Test environment and evaluation parameters

4.2 The used datasets

4.3 The comparison

4.4 Evaluation of the MOWGOKB framework

4.4.1 The PMD dataset

4.4.2 The TS2 dataset

4.4.3 The TS3 dataset

5 Discussion

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation