1 Introduction

Presently, online shopping is growing effectively in the fields of e-commerce in which the customer experiences are achieved after the purchase over interest. The non-rational factors play an important role in the development and activity of social movements through online media that have the main impact on viral spreading [1, 2]. The multi-analysis approaches mainly concentrated on the concerns in online social websites that depend on the emotion passed with the messages [3, 4]. The cognitive effort lessens the heuristics and also the information is easily accessible; this plays a crucial role in online purchase decisions in order to have a good sale [5, 6]. In social websites, the sentiment analysis is very useful to shed light on the role of emotion in both offline and online [7,8,9,10]. Investigating the effect of online user reviews increases product awareness [11,12,13,14]. The social web text identifies the false sentiment patterns that are irrelevant to the topic, which is extensively applied in a variety of dissimilar social web contexts [15,16,17,18,19]. Nowadays, sharing the information becomes a trend among business partnerships that is a mutually beneficial way to increase productivity. The main purpose of this research study is to propose a proper keyword extraction methodology and classification approach for classifying the opinions of customers as neutral, negative, and positive forms utilizing amazon customer review dataset.

In this research, sentiment analysis was performed on a reputed dataset, i.e. amazon customer review dataset. After the collection of input data, pre-processing was carried out by applying lemmatization, review spam detection, and removal of stop words and URLs from the input data. The lemmatization converts the words of a sentence into a dictionary form in order to extract the proper lemma. In addition, review spam detection identifies the customers’ untruthful opinions like positive spam reviews and negative spam reviews. Then, the pre-processed data were utilized to extract the keywords by applying an effective topic modelling approach LDA. In addition, modified PFCM was used to cluster the extracted keywords on the basis of amazon products. In conventional PFCM, a quantum-inspired methodology was included to obtain the correct cluster number (three). The quantum-inspired methodology operates on the smallest information representation named as a quantum bit (qubits). The output of modified PFCM was given as an input for SMA-CNN classifier to classify the opinions of the customers for amazon products: neutral, positive, and negative. Here, a new optimization algorithm (Adam optimization algorithm) was applied to optimize the moments of feature values in CNN–long short-term memory networks (LSTM) classifier that helps to identify the condition of best accuracy, and also to minimize the estimated error. Finally, the proposed system performance was compared with other existing systems in light of recall, f-measure, precision, classification accuracy, and area under curve (AUC).

This research paper is arranged as follows. Several recent papers on sentiment analysis are reviewed in Sect. 2. Problem statement about the existing methods is shown in Sect. 3. Detailed explanation of the proposed system is given in Sect. 4. Section 5 illustrates the quantitative analysis and comparative analysis of the proposed system. The conclusion is given in Sect. 6.

2 Literature review

Researchers developed numerous methodologies on dissimilar stages of sentiment analysis. In this literature section, a brief review of some important contributions to the existing literatures is given in Table 1.

Table 1 Review of existing literatures

3 Problem definition and solution

This section describes a problem statement in sentiment analysis and also detailed about how the proposed system gives the solution to the problem statement.

  • Expert knowledge is required to select a suitable keyword modelling approach

After pre-processing the collected data, keyword extraction is carried out to find the optimal keywords from the huge data. In sentiment analysis, finding the keywords from the huge database is one of the emerging concerns for researchers. The high-dimensional data increase the system complexity because if the volume of the data space increases then the collected data becomes sparse which leads to “curse of dimensionality” issue. In sentiment analysis, LDA is a well-known topic modelling approach that is designed to find the keywords from huge dataset. Though, LDA is a well-defined generative model that can be easily extended in other complicated approaches. To further improve the performance of keyword extraction, a new clustering algorithm (modified PFCM) is combined with LDA for finding the optimal keywords. In this research study, after implementing the LDA algorithm, the extracted keywords are optimized by using modified PFCM. In PFCM, quantum-inspired method is included for identifying the similarity between the objects, which effectively reduces the computational complexity of the system.

  • Expert knowledge is required to select an appropriate classifier

After extracting the keywords from the pre-processed data, classification is carried out to classify the opinions of the customers for amazon products. In sentiment analysis, binary classifiers like support vector machine is a well-known classifier that is designed for the two-class problem. The success of binary classifiers depends on the decision boundary that delivers good generalization performance [28,29,30]. The two major problems in binary classifiers are ineffective in high-dimensional data and applicable only for two-class classification. To address these issues, multiclass classification approaches are developed. Solution: In this research work, a new classifier SMA-CNN is implemented for multiclass classification. The SMA-CNN classifier effectively diminishes the size of the resulting dual issue by developing a relaxed classification error bound. In addition, the undertaken classification approach speeds up the training process by maintaining a competitive classification accuracy.

4 Proposed system

In recent periods, sentiment analysis gained much attention among the researchers, and it plays a major role in several applications such as healthcare, marketing, retail industry, and education. This research study tackles the issue of sentiment polarity categorization that is one of the major issues of sentiment analysis. The input data utilized in this research are online product reviews collected from amazon.com. The proposed system majorly consists of four phases: data collection, pre-processing, keyword extraction, and classification. Figure 1 shows the workflow of the proposed system. The detailed explanation about the proposed system is described below.

Fig. 1
figure 1

Work flow of proposed system

4.1 Data collection

At first, the input data are collected from the dataset: amazon customer review dataset. It is comprised of customer reviews from the amazon website. The time span of the amazon customer review dataset is 18 years that include approximately 35 million reviews up to March 2013. The reviews comprise product ratings, user information, product information, and a plain text review. Table 2 describes the data characteristics of the amazon customer review dataset.

Table 2 Data statistics

4.2 Data pre-processing

After the collection of data from amazon customer review dataset, data pre-processing is carried out to enhance the quality of collected data. Generally, the raw data contains more noise in terms of stop words, and URLs, which are all removed effectively from the collected data. In addition, lemmatization technique and review spam detection are applied to further enhance the quality of data.

  • Review spam detection The main task of review spam detection is to identify the customers’ untruthful opinions like positive spam reviews and negative spam reviews.

  • Lemmatization It transforms the words of a sentence into dictionary form. In order to extract the proper lemma, it is essential to analyse each morphological word [31, 32]. An example of lemmatization is denoted in Table 3.

    Table 3 A sample example of lemmatization

4.3 Keyword extraction

After pre-processing the collected data, keyword extraction is performed using LDA [33, 34]. It is a probabilistic topic methodology, where each document is indicated as a random combination of latent topics. In LDA, each latent topic is considered as a fixed set of words that is utilized for identifying the primary latent topic structure on the basis of observed data. Generally, the words are generated in a two-phase mechanism for every document. In the first phase, the random distribution over the topic is selected for every document. A word is a distinct data from a vocabulary index \( \{ 1, \ldots V\} \), a series of N words are indicated as \( w\; = \;(w_{1} ,w_{2} , \ldots ,w_{n} ) \) and a collection of M documents are signified as \( D\; = \;w\; = \;(w_{1} ,w_{2} , \ldots ,w_{M} ) \). LDA is a three layer representation, where the parameters \( \mu \) and \( \pi \) are analysed during the corpus generation.

For each document, the document-level topic values are analysed. Besides, the word-level values are also analysed for every word in the document. The joint distribution of random values is denoted as the generative mechanism of LDA. The probability density function of k-dimensional Dirichlet random value is evaluated using Eq. (1). Besides, the probability of a corpus and the joint distribution of a topic mixture are estimated by utilizing Eqs. (2) and (3).

$$ p\left( {\aleph |\pi } \right) = \frac{{\varGamma \left( {\mathop \sum \nolimits_{i = 1}^{k} \pi_{i} } \right)}}{{\mathop \prod \nolimits_{i = 1}^{k} \varGamma \left( {\pi_{i} } \right)}}\aleph_{1}^{{\pi_{1} - 1}} \ldots \ldots \aleph_{k}^{{\pi_{k} - 1}} $$
(1)
$$ p\left( {\aleph ,x,y |\pi ,\mu } \right) = p\left( {\aleph |\pi } \right)\mathop \prod \limits_{n = 1}^{N} p\left( {x_{n} |\aleph } \right)p\left( {y_{n} |x_{n} ,\beta } \right) $$
(2)
$$ p\left( {D |\pi ,\mu } \right) = \;\mathop \prod \limits_{d = 1}^{M} \smallint p\left( {\aleph_{d} |\pi } \right) \times \left( {\mathop \prod \limits_{n = 1}^{{N_{d} }} \mathop \sum \limits_{{x_{dn} }} p\left( {x_{dn} |\aleph_{d} } \right)p\left( {y_{dn} |x_{dn} ,\mu } \right)} \right)d\aleph_{d} $$
(3)

where N is indicated as a number of words, \( \aleph \) is signified as document-level topic values,\( \mu \) is stated as topics, x is specified as a per word topic assignment, y is stated as observed word, M is denoted as a document, and \( \pi \) is specified as Dirichlet parameter.

In the LDA method, the estimation of the posterior distribution of the hidden value in a document is an essential task. The exact interpretation of the posterior distribution of the hidden value is a crucial issue. The grouping of LDA with approximation algorithms like Markov chain, variational approximation, Laplace, and Gibbs sampling is widely used for keyword extraction. The positive and negative keywords are extracted with individual weight values, and the extracted keywords are stored in the dictionary. The testing data are coordinated with the dictionary in the testing phase for obtaining the negative and positive weight values. After obtaining the negative and positive weight values, the clustering process is carried out by using the modified PFCM algorithm.

4.4 Modified possibilistic fuzzy c-means

Clustering is a task that identifies the hidden groups in a set of objects accurately. Also, it is an unsupervised approach, so clustering does not require previous knowledge of both outputs and inputs. In this research study, the PFCM approach is utilized for clustering the membership grade. Generally, the PFCM clustering considers every object as a member of each cluster with a variable degree of “membership function”. In modified PFCM clustering, quantum-inspired method is included to find the similarity between the objects that effectively lessens the computational complexity of the system. In this research study, quantum-inspired method plays an essential role in obtaining correct clusters; here, the optimal cluster size is three. The quantum-inspired method operates on the smallest information representation named as a quantum bit (qubits) [35, 36]. The classical bits are represented as “1” and “0” that store the information at a time, where a single qubit has the capability to store number of information at a time with the help of a probability feature. Qubit states the linear superposition of “1” and “0” bits probabilistically, which is stated in Eq. (4).

$$ Q\; = \;\alpha \left| {0 > + \beta } \right|1 > $$
(4)

where \( \alpha \) and \( \beta \) are represented as complex numbers, which appears in two states, state “0” and state “1”. Thus, \( \alpha^{2} \) and \( \beta^{2} \) are denoted as probabilities of a qubit in state “0”, and state “1”, which is described in Eq. (5).

$$ \alpha^{2} + \beta^{2} = 1;\quad 0 \le \alpha \le 1,\quad 0 \le \beta \le \;1 $$
(5)

As mentioned in Eq. (5), the qubit is denoted as linear superposition of two states: state “0”, and state “1”. For instance, one and two qubit systems perform the operation on two and four values. Thus, n-qubit performs the operation on 2n values. So, quantum bit individual contains a string of q quantum bits. Let us consider the example of two quantum bits that are represented in Eqs. (6) and (7).

$$ Q = \left\langle {\begin{array}{*{20}c} {1/\sqrt 2 |1/\sqrt 2 } \\ {1/\sqrt 2 |1/\sqrt 2 } \\ \end{array} } \right\rangle $$
(6)
$$ Q = \left( {\frac{1}{\sqrt 2 } \times \frac{1}{\sqrt 2 }\left\langle {00} \right\rangle + \frac{1}{\sqrt 2 } \times \frac{1}{\sqrt 2 }\left\langle {01} \right\rangle + \frac{1}{\sqrt 2 } \times \frac{1}{\sqrt 2 }\left\langle {10} \right\rangle + \frac{1}{\sqrt 2 } \times \frac{1}{\sqrt 2 }\left\langle {11} \right\rangle } \right) $$
(7)

After identifying the exact clusters, the concept of quantum bits’ representation is used for achieving the global optimization in PFCM. Here, PFCM depends on the reduction in the objective function that is mathematically represented in Eq. (8), (9), and (10).

$$ J_{\text{PFCM}} \left( {U,T,V} \right) = \mathop \sum \limits_{i = 1}^{C} \mathop \sum \limits_{j = 1}^{n} (u_{ij}^{m} + t^{n} )d^{2} (x_{j} ,v_{i} ) $$
(8)

where

$$ \sum\limits_{i = 1}^{c} {\mu_{ij} = 1,\quad \forall j \in \left\{ {1, \ldots ,n} \right\}} $$
(9)
$$ \sum\nolimits_{j = 1}^{n} {t_{ij} = 1,\quad \forall i \in \left\{ {1, \ldots ,c} \right\}} $$
(10)

where T is denoted as typicality matrix, JPFCM is indicated as objective function, U is denoted as partition matrix, and V is represented as vector of cluster centres. In this work, the objective function is evaluated by utilizing the number of cluster centres and the degree of membership, as mathematically denoted in Eqs. (11)–(13).

$$ \mu_{ij} = \left[ {\mathop \sum \limits_{k = 1}^{c} \left( {\frac{{dx_{j} , v_{i} }}{{dx_{j} , v_{k} }}} \right)^{{\frac{2}{m - 1}}} } \right]^{ - 1} , 1 \le i \le c,\quad 1 \le j \le n $$
(11)
$$ t_{ij} = \left[ {\mathop \sum \limits_{k = 1}^{n} \left( {\frac{{dx_{j} , v_{i} }}{{dx_{j} , v_{k} }}} \right)^{{\frac{2}{n - 1}}} } \right]^{ - 1} , 1 \le i \le c,\quad 1 \le j \le n $$
(12)
$$ v_{i} = \frac{{\mathop \sum \nolimits_{k = 1}^{n} \left( {u_{ik}^{m} + t_{ik}^{n} } \right)x_{k} }}{{\mathop \sum \nolimits_{k = 1}^{n} \left( {u_{ik}^{m} + t_{ik}^{n} } \right)}}, \quad 1 \le i \le c $$
(13)

where n is indicated as number of data points that are described by the coordinates (xj, vi), which are utilized for calculating the distance between cluster centres and datasets, and c is stated as number of cluster centres. Modified PFCM clustering algorithm creates memberships and possibilities with normal cluster centres and prototypes of each cluster [37]. Here, selecting the objective function is the crucial aspect for enhancing the performance of clustering method. In this paper, the clustering performance is evaluated based on the objective function. In order to develop an effective objective function, the following requirements are considered:

  • The distance between the clusters needs to be diminished.

  • The distance between the data points (assigned in the clusters) needs to be decreased.

In addition, the objective function of modified PFCM clustering algorithm is enhanced by using driven prototype learning of parameter \( \alpha \). The learning mechanism of \( \alpha \) is dependent between the clusters, which is updated at each iteration. The parameter \( \alpha \) is mathematically stated in Eq. (14).

$$ \alpha \; = \;\exp \left( { - \min_{i \ne k} \frac{{\parallel v_{i} - v_{k} \parallel^{2} }}{\beta }} \right) $$
(14)

where \( \beta \) is signified as the sample variance that is denoted in Eq. (15).

$$ \beta \; = \;\frac{{\sum\nolimits_{j = 1}^{n} {\parallel x_{j} \; - \;\bar{x}\parallel^{2} } }}{n} $$
(15)

where \( \bar{x}\; = \;\frac{{\sum\nolimits_{j = n}^{n} {x_{j} } }}{n} \)

Then, a weighting parameter is developed for calculating the value of \( \alpha \). Every point of the dataset comprises of a weight in relationship with every cluster. A better classification result is attained by using the weighting parameter, especially in the case of noisy data. Formula of weighting parameter is given in Eq. (16).

$$ w_{ji} \; = \;\exp \left( { - \;\frac{{\parallel x_{j} - v_{i} \parallel }}{{\left( {\sum\nolimits_{j = 1}^{n} {\parallel x_{j} - \bar{v}} \parallel^{2} } \right) \times c/n}}} \right) $$
(16)

where wji is stated as weight function of the point j with the class i. The working procedure of modified PFCM is shown in Fig. 2 and also effectively explained below.

Fig. 2
figure 2

Working procedure of modified PFCM

  • Initialization Initially, the number of clusters is assumed by the user on the basis of quantum-inspired evolutionary algorithm.

  • Estimation of similarity distance After assuming the number of clusters, evaluate the distance between data points and centroids for each and every segment.

  • Estimation of typicality matrix After calculating distance matrix, typicality matrices are evaluated, which are attained from the modified PFCM clustering algorithm.

  • Estimation of membership matrix Estimate membership matrix Mik by evaluating the membership value of data point that are collected from the modified PFCM.

  • Update centroid After generating the clusters, centroid modernization is updated. After extracting the keywords, data classification is carried out by using SMA-CNN classifier.

4.5 Classification of data using SMA-CNN classifier

Generally, SMA-CNN is a multi-layer feed-forward network, which is designed to recognize the features in the sentiment data. The proposed SMA-CNN classifier contains seven layers: three convolutional layers, two LSTM layers, and two dense layers. The neurons in SMA-CNN classifier consider a small portion of the data that are named as sub-data [38]. Then, the respective sub-data are used for feature extraction, for instance, a feature may be a vertical line, arch, or circle. The features are captured by the respective feature maps of the network. A combination of features is utilized to classify the data. In addition, multiple different feature maps are used to make the network more robust. In this research, an optimization algorithm Adam optimization algorithm is used to optimize the moment of features values from one layer to another layer. Due to this action, the unwanted convolutions happening in the convolutional layer are avoided. A few major advantages of Adam optimization algorithm are: works effectively even with a little tuning of hyper-parameters and relatively low memory requirements [43]. Figure 3 represents the general architecture of the SMA-CNN classifier.

Fig. 3
figure 3

General architecture of SMA-CNN classifier

The convolutional layer is a primary layer in CNN classifier, which extracts the local information of the data. Moreover, convolutional operation improves the input features and reduces noise interference. The mapping operation in the convolution process is mathematically expressed in Eq. (17).

$$ x_{j}^{l} \;f_{c} \left( {\sum\limits_{{i \in M_{j} }} {x_{i}^{l - 1} \times k_{i,j}^{l} \; + \;\theta_{j}^{l} } } \right) $$
(17)

where \( x_{j}^{l} \) is specified as the jth mapping set of convolutional layer l, \( x_{i}^{l - 1} \) is denoted as the ith feature set indicating in the (l-1) convolutional layer, and \( k_{i,j}^{l} \) is indicated as the convolutional kernel between the ith feature set and jth mapping set in the convolutional layer l. The variable \( \theta_{j}^{l} \) is represented as bias and fc is denoted as activation function. The next step is the pooling process, which reduces the possibility of over-fitting during training process. The pooling process is mathematically denoted in Eq. (18).

$$ x_{j}^{l} \; = \;f_{p} \left( {\beta_{j}^{l} {\text{down}}\left( {x_{i}^{l - 1} } \right)\; + \;\theta_{j}^{l} } \right) $$
(18)

where \( {\text{down}}( \cdot ) \) is represented as the downsampling approach from layer (l-1) to layer lth, \( \theta_{j}^{l} \) and \( \beta_{j}^{l} \) are indicated as the additive bias and multiplicative bias, and \( f_{p} ( \cdot ) \) is represented as the activation function. Generally, the pooling process is sub-divided into two types such as, average and maximum pooling. The final pooling layer (matrix features) are arranged to form a rasterization layer, which is further connected with the fully connected layer. The output of node j is mathematically stated in Eq. (19).

$$ h_{j} \; = \;f_{h} \left( {\sum\limits_{i = 0}^{n - 1} {w_{i,j} x_{i} \; - \;\theta_{j} } } \right) $$
(19)

where wi,j is denoted as the connection weight of input vector xi, \( \theta_{j} \) is stated as the node threshold, and \( f_{h} ( \cdot ) \) is represented as the activation function. The next layer is the LSTM layer that helps to capture the sequential data by considering the prior data. This layer considers the output vectors from the pooling layer as inputs. The LSTM layer has a number of cells or units and the input of every cell is the output from the pooling layer. The final output of this layer has a similar number of units in the network.

If the LSTM layer deals with the multiclass issue, the softmax classifier is utilized in the fully connected layer. The loss function of softmax classifier is denoted in Eq. (20).

$$ J(\theta )\; = \; - \frac{1}{m}\left[ {\sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{k} {l\left\{ {y^{(i)} \; = \;j} \right\}\log \frac{{e^{{\theta_{j}^{l} }} }}{{\sum\nolimits_{k} {e^{{\theta_{k}^{l} }} } }}} } } \right] $$
(20)

where \( e^{{\theta_{j}^{l} }} \) is represented as the input of jth neuron in the l layer, \( \sum\nolimits_{k} {e^{{\theta_{k}^{l} }} } \) is denoted as the input of all the neurons, \( \frac{{e^{{\theta_{j}^{l} }} }}{{\mathop \sum \nolimits_{k} e^{{\theta_{k}^{l} }} }} \) is indicated as the output of jth neuron, e is stated as the constant, and \( l(.) \) is represented as the indictor function. If the value in the brace is true, the result of the indicator function is one. If the value in the brace is false, the result of the indicator function is zero. Then, add the rule items in \( J\left( \theta \right) \) to prevent from falling into local optimum. The loss function of softmax classifier \( J\left( \theta \right) \) after adding the rule items is mathematically stated in Eq. (21).

$$ J\left( \theta \right) = - \frac{1}{m}\left[ {\mathop \sum \limits_{i = 1}^{m} \mathop \sum \limits_{j = 1}^{k} l\left\{ {y^{\left( i \right)} = j} \right\}log\frac{{e^{{\theta_{j}^{l} }} }}{{\mathop \sum \nolimits_{k} e^{{\theta_{k}^{l} }} }}} \right] + \frac{\rho }{2}\mathop \sum \limits_{i = 1}^{k} \mathop \sum \limits_{j = 0}^{n} \theta_{ij}^{2} $$
(21)

where \( \frac{\rho }{2}\mathop \sum \nolimits_{i = 1}^{k} \mathop \sum \nolimits_{j = 0}^{n} \theta_{ij}^{2} \) is represented as the weighted term that helps to stabilize the excessive parameters in the training set. In proposed classifier, each layer contains a reluctant linear unit (ReLU) activation function for activating the neurons in each layer. The ReLU activation function makes the proposed classifier (SMA-CNN) more redundant, because it effectively solves the exploding and vanishing gradient problems completely. ReLU activation function initially calculates the prediction error and then estimates the gradients utilized to update each weight in the network, so that less error is accomplished next time. At last, the obtained results in three forms are positive, negative, and neutral. Pseudo-code of proposed SMA-CNN is determined below.

4.5.1 Pseudo-code of SMA-CNN classifier

figure a

5 Experimental result and discussion

This section details the experimental result and discussion of the proposed system and also explains the performance metric, experimental setup, quantitative, and comparative analyses. The proposed system was implemented using Python with 4 GB RAM, 1 TB hard disc, 3.0 GHz Intel i5 processor. The performance of the proposed system was compared with other classification methods and existing research papers based on the amazon customer review dataset for assessing the efficiency of the proposed system. The performance of the proposed system was evaluated in light of recall, classification accuracy, precision, f-measure, and AUC.

5.1 Performance measure

Performance measure is the procedure of collecting, reporting, and analysing information about the performance of a group or individual. Mathematical equation of accuracy, f-measure, precision, and recall is denoted in Eqs. (22)–(25).

$$ {\text{Accuracy}} = \frac{{{\text{TN}} + {\text{TP}}}}{{{\text{TP}} + {\text{TN}} + {\text{FN}} + {\text{FP}}}} \times 100 $$
(22)
$$ F - {\text{measure}} = \frac{{ 2 {\text{TP}}}}{{\left( {2{\text{TP}} + {\text{FP}} + {\text{FN}}} \right)}} \times 100 $$
(23)
$$ {\text{Precision}} = \frac{\text{TP}}{{\left( {{\text{FP}} + {\text{TP}}} \right)}} \times 100 $$
(24)
$$ {\text{Recall}} = \frac{\text{TP}}{{\left( {{\text{FN}} + {\text{TP}}} \right)}} \times 100 $$
(25)

where TP is signified as true positive, TN is indicated as true negative, FP is specified as false positive, and FN is indicated as false negative.

5.2 Quantitative analysis

Amazon customer review dataset is used for evaluating the performance of the proposed system and other existing classification approaches like random forest, decision tree, and Naive Bayes. In this research study, the collected data are classified into three forms: positive, negative, and neutral classes. In Tables 4, 5, 6, and 7, the performance evaluation of the proposed system and existing classification approaches are evaluated in terms of accuracy, recall, precision, f-measure, and AUC. Here, the performance evaluation is validated with 80% training of data and 20% testing of data. Among 2,441,053 amazon products, eight products are considered for experimental investigation such as, amazon instant video, books, electronics, home and kitchen, movie review, media, kindle, and camera.

Table 4 Performance analysis of proposed approach with dissimilar classifiers in light of recall, precision, and f-measure
Table 5 Performance analysis of proposed approach with dissimilar classifiers in light of AUC and accuracy
Table 6 Performance analysis of proposed approach with dissimilar classifiers by means of recall, precision, and f-measure
Table 7 Performance analysis of proposed approach with dissimilar classifiers by means of AUC and accuracy

Here, Tables 4 and 5 show the performance investigation of proposed and existing classification methods for four amazon products: amazon instant video, books, electronics, and home and kitchen. The average classification accuracy of the proposed classifier (SMA-CNN with FCM) is 90.32%, and the existing classification approaches (random forest, decision, and Naive Bayes) achieved 72.30%, 71.61%, and 82.78% of classification accuracy. Similarly, the average classification accuracy of the proposed classifier (SMA-CNN with modified KFCM) is 92.85%, and the existing classification approaches (random forest, decision, and Naive Bayes) achieved 75.44%, 73.85, and 85.76% of classification accuracy. Correspondingly, the average recall, precision, f-measure, and area under curve of the proposed classifier are better than the existing classifiers with FCM, because the proposed system effectively calculates the linear and nonlinear properties of collected data and also significantly preserves the quantitative relationship between the high- and low-level features. The graphical comparison of the proposed approach with dissimilar classifiers for the amazon products, amazon instant video, books, electronics, and home and kitchen is represented in Figs. 4 and 5.

Fig. 4
figure 4

Graphical evaluation of proposed approach with dissimilar classifiers by means of recall, precision, and f-measure

Fig. 5
figure 5

Graphical evaluation of proposed approach with dissimilar classifiers by means of AUC and accuracy

In addition, the comparative study of proposed and existing classification methods with FCM and modified KFCM is carried out for another four amazon products like a movie review, media, kindle, and camera. Here, the performance evaluation is validated with 80% training and 20% testing of data. Inspecting Tables 6 and 7, the proposed classifier (SMA-CNN) outperformed with the average classification accuracy of 92.8% as compared to the traditional classification methods: random forest, decision tree, and Naïve Bayes and existing clustering algorithm (FCM). In addition, the existing classifiers achieved minimum recall, precision, f-measure, accuracy, and AUC as related to the proposed classifier (SMA-CNN). In this research study, the computational time differs based on the data and the number of features in each and every review created by the user and product. So, the user cannot able to justify the stranded time scale for any text analysis mechanism. The graphical comparison of proposed and existing classifiers with modified KFCM for amazon products like a movie review, media, kindle, and the camera is shown in Figs. 6 and 7.

Fig. 6
figure 6

Graphical evaluation of proposed approach with dissimilar classifiers by means of recall, precision, and f-measure

Fig. 7
figure 7

Graphical evaluation of proposed approach with dissimilar classifiers by means of AUC and accuracy

5.3 Comparative analysis

Comparative study on performance of existing works and the proposed work is given in Table 8. Han et al. [39] developed a new sentiment classification approach (SentiWordNet (SWN)), which was utilized as the experimental sentiment lexicon, and then reviewed the data of four amazon products, which were collected from amazon customer review dataset. The experimental results showed that the bias processing strategy reduced the polarity bias rate and improved the performance of lexicon-based sentiment analysis. The developed algorithm achieved 69.79% of accuracy for DVD product, 68.72% of accuracy for electronics product, 68.17% of accuracy for books, and 71.41% of accuracy for kitchen products. Additionally, Liu et al. [40] evaluated the scalability of Naive Bayes classifier in amazon customer review dataset. In this research paper, Naive Bayes classifier was used for achieving fine-grain control of the analysis process. This classifier achieved 82% of classification accuracy on movie reviews.

Table 8 Comparative analysis of proposed and existing papers

Rain [41] developed an effective algorithm for sentiment analysis. Initially, the features from the collected data were extracted by applying sentence length, bag of words, part of speech tags, spell checking, handling negation, and collocations. The extracted features were classified using Naive Bayes and decision list classifiers for tagging given reviews positive or negative. Their algorithm achieved 84% of the accuracy for book, 84% of accuracy of kindle product, and 79.93% of accuracy for media. In addition, A. Ghose, and Ipeirotis [42] examined the relative position of the three broad feature categories such as, “review readability”, “review subjectivity”, and “review-related” features, and also found the three feature sets results in a statistically equivalent performance by using all available features. Initially, the econometric, text mining, and predictive modelling methods were integrated towards a more complete analysis of the information collected by user-generated online reviews for valuing the economic impact. Finally, the random forest was used to classify the reviews of customers in three forms: neutral, negative, and positive. Their system achieved 78.79% of accuracy for DVD product, 87.57% of audio and video product, and 87.68% of accuracy for digital cameras. Compared to these existing papers, the proposed system achieved better performance that was almost 6–20% higher than the existing papers.

6 Conclusion

In this research study, a new supervised system developed to classify the opinions of the customers for amazon selling products. The main aim of this experiment is to develop a proper keyword extraction method and classification approach for classifying the opinions of customers as neutral, negative, and positive forms using the amazon customer review dataset. In this scenario, a keyword extraction method (LDA) along with modified PFCM is used for selecting the appropriate keywords. The obtained keywords are classified using the classifier SMA-CNN. The development of an automated system for analysing the customer’s opinion on amazon products has numerous advantages like able to handle multiple customers, effective in agent monitoring, track overall customer satisfaction, etc. Compared to the existing papers, the proposed system delivered an effective performance by means of quantitative analysis and comparative analysis. From the experimental analysis, the proposed system averagely achieved around 92.83% of classification accuracy, but the existing methodologies attained limited accuracy in the amazon customer review dataset. In future work, an effective unsupervised system will be developed in order to further improve the classification accuracy of sentiment analysis.