Keywords

1 Introduction

Blogs, forums, and Internet community are allowing users to share their opinion on any issue. For example, express their dissatisfaction with a product they purchased, discuss current events, or express their political opinions [5]. This form of user data analysis is required for recommender systems and customization. “Everyone cares about what other people think.” A persuasion or a judgement about anything that has a substantial impact on a person's decision-making process is referred to as an opinion. The practise of determining people's feelings or views is known as sentiment analysis or opinion mining. Circumstances affect people's thoughts, feelings, and sentiments. For example, “The film was filled with fun elements” is a positive opinion about the movie, “he made fun with Ram’s appearance” is a negative opinion [2, 6].

Sentiment analysis is an understanding of natural language exertion that attempts to categorize the texts predicated on their expressed sentiments on a decisive content [1, 4]. It is a technique for determining a speaker's or writer's attitude towards a topic or the overall impression of a text. Because of its multiple and potential uses, such as automatic product review classification, it has three types of categorization: document, phrase, and feature. At the document classification stage, the authors discuss the entire document and decide whether it is favourable or unfavourable. Sentiment categorization evaluates each sentence separately to determine its polarity at the phrase level. At the feature level, we may categorize sentiment in terms of many different features of things. Aspect level emotion categorization demands a more comprehensive analysis because most features are provided implicitly.

1.1 Motivation

Prior sentiment analysis research resulted in the development of a classification model for a specific domain. So, a multidomain sentiment analysis is offered in order to align the classifier across many domains simultaneously [9, 10]. The main contribution of the proposed work are as follows:

  • We employed the TF-IDF feature selection with Chi square \(\left( {{\upchi }^2 } \right)\) method for best feature selection.

  • Sentiment classification based on selected feature are implemented with multidomain dataset.

  • To increase classification performance, a classifier with bagging is employed.

  • Sentiment classification using different regression analysis methods are also employed in this work.

  • Finally, based on evaluation measures, the comparisions are made.

Section 2 discussed about the existing work, Sect. 3 implementation of proposed method, Sect. 4 contains result and discussion, and Sect. 5 about conclusion and future ideas.

2 Related Work

The authors [20] attempt to address the issue by developing a sentiment aware dictionary utilizing data from several domains. By using dictionary, they have assorted the target domain's unlabeled reviews. The task was performed in Hindi with a 76% accuracy rate. As an approach for cross-domain sentiment classification, the authors [3] propose a sentiment sensitive distributional thesaurus where sentiment sensitivity in thesaurus is achieved by including sentiment labels in the context vector at the document level. Using a benchmark data set that comprises Amazon user reviews for a variety of product categories, the proposed approach outperforms alternative baselines and achieves results equivalent to previously reported cross-domain sentiment classification algorithms.

The researchers [21] combined sentiment data from four different sources. Sentiment lexicons are a good place to start because they include sentiment polarity for widely used sentiment terms. Sentiment classifiers from diverse source domains are the second source. As the third source is to construct domain-specific sentiment correlations between words.

The target domain's tagged data is the fourth source. They offer a unified architecture for gathering all four forms of sentiment data and training a domain-specific sentiment classifier in the target domain.

The authors [13] suggested a fuzzy technique to describe the polarity learnt from training sets or from a training set. This newly acquired knowledge is combined with additional conceptual knowledge collected from two widely known sentiment analysis resources, SenticNet and the General Inquirer vocabulary. The advised strategy yielded the best results.

Researchers [11] developed a multidomain sentiment classification technique that reduced domain reliance whilst increasing overall performance. The suggested approach employs a way of combining several classifiers. They employ a method that involves training domain classifiers individually with domain-specific data before merging the classifiers to get the desired output.

The authors [8] investigated the capability of four different machine learning classification algorithms utilizing frequently used feature selection methods with three fame datasets are used to evaluate the proposed approaches: IMDb movie reviews, electronics reviews, and kitchen items reviews. The first step is to choose feature subsets from one of three available feature selection methods. After that, set theory concept such as union and intersection are appilied to evoke the top ranking features. The combined approach is achieved the greatest accuracy of 92.31% on the SMO classifier.

The authors [12] investigated the performance of a few feature selection techniques for sentiment analysis. A feature extraction method called as Term Frequency Inverse Document Frequency is utilized to create feature vocabulary. To choose the best set of word vectors, a variety of Feature Selection techniques are employed. Machine learning classifiers are used to teach the required attributes. To improve sentiment analysis performance, classifiers use bagging and random subspace. They proved the effectiveness of feature selection strategies trained with ensemble classifiers outperform neural networks with far less training time and parameters, removing the requirement for hyper-parameter modification.

3 Proposed Approach

Sentiment Classification steps:

Step1: In this work, a multidomain product review database1(Books, DVD, Electronics, Kitchen and Housewares) is collected and used to tackle this problem.

Step2: Preprocessing methodology is required to eliminate noisy, inconsistent, and incomplete data by considering tokenization, stop words removal, and stemming approaches.

Step3: Feature Extraction and Selection: To begin, we utilize the TF-IDF and Bag of Words (BoW) methods to generate a feature vector in a document. it will receive a score of 1 if it is present, otherwise the score is 0. Following that the CHI feature selection approach is used to pick distinct feature subsets.

Step4: Classification: To train the given feature, Support Vector Machines (SVM-RBF), SVM-(Linear Kernel), MNB, Decision Tree (DT), and Linear Regression (LR) are employed.

Step5: Bagging: Finally, the ensemble process helps to escalate the classification accurarcy.

3.1 Methodology

The proposed architecture is depicted in Fig. 1, with further information on each preliminary function provided in the subsections that follow. The sentiment categorization job in this study is performed using multidomain product review data. In the dataset, there are 1000 positive and 1000 negative tagged reviews in each domain [14, 15] are chosen for implementation. Table 1 summarizes the statistics for this data set.

Table 1 Multidomain sentiment dataset

3.1.1 Preprocessing Task

Tokenization: The tokenizer may separate review into distinct tokens like as words, numbers, special characters, and so forth, making it ready for further processing. Stop word Removal: To improve the effectiveness of the feature selection strategy, this stage entails deleting commonly used stop words such as prepositions, unnecessary words, special characters, ASCII code, new lines, and excessive white spaces. Stemming: it entails transforming each token to its stem, or root, form.

3.1.2 Feature Extraction and Selection

It plays vital role to increase the sentiment categorization process accuracy. TF-IDF and BoW are used to obtain its traits. TF-IDF is known technique to deal the text into feature vocabulary. TF-IDFs are calculated using Eqs. 1, 2 and 3 [7, 16].

$${\text{TF}}\left( t \right) = \frac{{\left( {{\text{Number}}\,{\text{of}}\,{\text{times}}\,{\text{term}}\,t\,{\text{appears}}\,{\text{in}}\,{\text{a}}\,{\text{document}}} \right)}}{{\left( {{\text{Total}}\,{\text{number}}\,{\text{of}}\,{\text{terms}}\,{\text{in}}\,{\text{the}}\,{\text{document}}} \right)}}$$
(1)
$${\text{IDF}}\left( {\text{t}} \right) = \log_{\text{e}} \frac{{{\text{Total}}\,{\text{amount}}\,{\text{documents}}}}{{{\text{Number}}\,{\text{documents}}\,{\text{with}}\,{\text{term}}\,t\,{\text{in}}\,{\text{it}}}}$$
(2)

To find the TF-IDF score:

$${\text{TF-IDF}}\left( w \right) = {\text{TF}}\left( w \right) \times {\text{IDF}}\left( w \right)$$
(3)

By calculating the frequency of the entire document, BoW translates text input into numeric numbers. By ignoring word order and focusing on word frequency, it generates feature vocabulary across all pages. Selecting the appropriate feature from the feature lexicon is an important task [13]. A typical statistical test for detecting the relation between a term and the linked class is the Chi square \(\left( {{\upchi }^2 } \right)\) statistic. If there is no association between the feature set and class, then it is said to be a null hypothesis. The \({\upchi }^2\) value is calculated by using Eq. (4) [6, 17].

$${\upchi }^2 = \frac{{\sum \left( {{\text{observed}}\,{\text{value}} - {\text{Expected}}\,{\text{value}}} \right)^2 }}{{{\text{Expected}}\,{\text{value}}}}$$
(4)

3.2 Classification

3.2.1 Multinomial Naive Bayes (MNB)

It is a popular method for classifying documents based on statistical analysis of their contents. To classify the documents by assessing the probability that a document belongs to the same class as on the same topic. Vectors \({\uptheta }_y = \left( {\theta_{y1} { }, \ldots ,{ }\theta_{yn} } \right)\) are distribution parameters, and \(\theta_{yi}\) is the probability \(P\left( {x_i {|}y} \right)\) of feature i appearing on the same class y [14]. The parameter \(\theta_y\) is estimated by Eq. (5),

$${\hat{\theta }}_{{\text{yi}}} = \frac{{N_{yi} + \alpha }}{{\begin{array}{*{20}c} {N_y + \alpha n} \\ \\ \end{array} }}$$
(5)

where \(N_i = \sum_{\left\{ {x \in T} \right\}} x i\) is the number of times feature i appears in a class y in the training set T, and \(N_y = \sum_{i = 1}^n {N_{yi} }\) is total count of all features for class y and \(\alpha \ge 0\).

3.2.2 Support Vector Machine (SVM)

SVM converts the data points into a higher-dimensional space, allowing them to be separated linearly. By determining the optimum hyperplane for dividing the group of data. The main aim is to shorten the distance amongst all data group and the hyperplane. The hyperplanes that should be used are specified by kernel functions. The linear kernel is used if the data can be linearly separated. The Radial Basis Function (RBF) kernel is used for non-linear data. The training data is labelled with \(\left( {x_i ,y_i } \right),\) i = 1, 2, 3. where \(x_i \in R^n \,{\text{and}}\,y \in \left\{ {1, - 1} \right\}^l\). The SVM optimization problem can be solved in the following Eq. 6:

$$\begin{aligned} & \mathop {\min }\limits_{w,b,{\upzeta }} \frac{1}{2}W^T W + C\mathop \sum \limits_i^{i = 1} {\upzeta }_i \,{\text{subject to}}\,y\_i(w^T \phi \left( {x_i } \right) + {\text{b}} \ge 1 - {\upzeta }_i , \\ & {\upzeta }_i \ge 0,i = 1,2,3, \ldots ,n \\ \end{aligned}$$
(6)

3.2.3 Logistic Regression (LR)

A supervised classifier, the logistic regression model is a rule set deal with multiclass problems. The logistic function determines the relation between two class labels. If the likelihood is greater than 0.5, label “1” is assigned; otherwise, label “0” is assigned [16, 17]. It operates by reducing the loss function by determining the optimal set of weight parameters.

3.2.4 Decision Tree (DT)

The Decision Tree is a binary tree with conditions at the core and class labels at the decedent nodes. Attribute selection method uses information gain or Gini index method to determine the value of each attribute. The highest information gain of attribute is chosen since it gives the most information as a split node [18, 19]. This process is repeated until the last node.

3.3 Bagging Ensemble Techniques

It is a bootstrap ensemble that uses re-placement to construct subsets of data from the original data. On each data subset, base classifiers are trained, and the individual predictions are concatenated to get the final prediction [12]. By training many weak classifiers on subsets of the original data, bagging enhances the classifier performance.

4 Experiments

The various classification algorithms is examined in this section. The experiments employed the Multidomain Product Review Dataset [15]. In the Review Dataset, TF-IDF and BoW models are used for feature extraction, and Chi square feature selection models are used to choose top features. Logistic Regression (LR), Support Vector Machines with RBF, Linear Kernels and Grid Search, Decision Tree, Multinomial Naive Bayes (MNB) and Random Forest (RF) are used to train selected features. These classifiers are subsequently trained using bagging techniques.

As performance measures for the classifiers mentioned above, accuracy and F-score are used. Equation 7 depicts the accuracy metric, which is defined as the ratio of correctly predicted numbers to total predicted numbers. The model is perfect if the F-score number is 1. The F-score is calculated using the Eqs. 8, 9 and 10.

$${\text{Accuracy}} = \frac{{{\text{Number}}\,{\text{of}}\,{\text{Correctly}}\,{\text{Predicted}}}}{{{\text{Total}}\,{\text{Number}}\,{\text{of}}\,{\text{Predicted}}}}$$
(7)
$${\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}e$$
(8)

where TP is True Positive and FP is False Positive.

$${\text{recall}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
(9)

where TP is True Positive, and FN is False Negative.

$$F1 = 2{*}\frac{{\text{precision*recall}}}{{{\text{precision}} + {\text{recall}}}}$$
(10)

5 Result and Discussion

The experiment results depict how each classifier responds to feature selection. The proposed technique's efficacy is shown in Tables 2 and 3. Tables 4 and 5 represent the F1-score of the classifiers.

Table 2 Base classifiers accuracy
Table 3 Ensemble classifiers accuracy
Table 4 F1-score of base classifier
Table 5 F1-score of ensemble classifiers

From the above tables, it shows that LR technique provides better accuracy than other base classifiers. In Bagging, LR classifiers trained on BoW with Chi square feature selection greatly outperformed the other basic classifiers, as shown in Table3. On multidomain datasets, the classifiers SVM (RBF) and DT achieved the same results for TF-IDF feature extraction and Chi square feature selection regardless of any classification methods.

6 Conclusion

Sentiment analysis on a multidomain dataset is performed using basic classifiers and ensemble classifiers. According to the experiment results, the LR algorithm outperformed the TF-IDF and BoW models, along with the Chi square feature selection technique, the accuracy of weak classifiers is also improved by using ensemble classifiers. The performance of neural network methods on multidomain datasets will be studied in the future. Other ensemble methods will be investigated in the future, and its performance will also be compared.