1 Introduction

Personal Evaluation of OM seems to be a difficult task. This is because the process involves a higher number of opinions that directly reflects every single person’s perception towards the issue [1]. For convenience, the OM methodshave been categorized as the Attribute-enabled and the Sentiment-enabled strategies. The primary goal is to extract suitable as well as better opinions by means of restricting the keywords or the attributes and by eliminating the wrong/misleading ideas [13]. High precision for recalling is the most significant disadvantage of this method. It arises due to the Out-of-Vocabulary attributes.

The various methods that are similar to theabove-discussed techniques used for the OM and the ML depends onthe Lexicon (a text formed by the collection of words without considering their particular relations among such words [7]). Opinion Lexicons are resources combined with sentiments and words. The major disadvantage of this method is that few wordsare sensed to be both positive and negative based on its usage. The MLclassification algorithm has been trained with an efficient corpus and annotations for recognizing the essence of the word as positive or negative, depending on the situation.Any ML-based classification mechanism requires both the trainingand the testing datasets [18]. The dataset preparation method used by the automatic classifier for distinguishing the properties of the documents and the test datasetshave been used for measuring the accuracy of the automated classifier. A lot of ML techniqueshave been used for differentiating the reviews. Some of the prominent mechanisms used for text classifications includetheSupport Vector Machines (SVM), and the tools used for Natural Language (NL) K-Nearest Neighbors, ID3, N-Gram Model, C5, Winnow Classifier, and Centroid Classifier.For most of the naturalimplementations,theNaïve Bayes classifier has been treated as the best choice as it exhibitsa better evaluation of the parameters with less training data. When it comes to the fastest application,theAdaBoost classifier should be considered. The significant advantage of this classifier is that, it can be used as a combination with the other learning algorithms. It does not require any prior information regarding the weak learners and the works on any data like numbers, text, and it can be further extended to the learning problems beyond the binary classifications.

SentiWordNet is a lexical tool based on the English WordNet which integrates the sentiment information for each of thesynset. In WordNet three numerical scores are calculated for each of the synset S: Pos(S), Neg(S) and Obj(s), describing the positivity, negativity, and objectivity of the synset. Scores from the SentiWordNet basically an English WordNet with synset-linked polarity scores offers weights to the features, i.e. words of a document leading to better classification of the sentiments in the documents. SentiWordNetsemi-automatically offers the term level informations on the opinion polarities by deriving this information from the WordNet database of English words and relations. Positive and negative scores ranging from 0 to 1 are seen in the SentiWordNet for each of the WordNet term, which reveals its polarity. Higher scores doesindicate terms with information about the heavy opinion bias, while lower scores indicate a less subjective term.

The FLR classifierhas been used for incremental learning and has been found to be capableof tuning the generalization beyond a hyper box, granular computing, dealing with the loss of information, and implementation beyond RNfor a common lattice data domain. Thus, these three Naïve Bayes, FLR, and an AdaBoost classification technique used for the OM has been investigated in this paper.

2 Literature review

2.1 Opinion mining

The system proposed by Jeyapriya&Selvi (2015) [14] is based on a phrase-level examination of the client reviews. Phrase-level OM is also known as an aspect-based OM that is used for extracting the most relevant aspects of an item and for predicting an aspect orientation from the item reviews.In customer product reviews and mining opinions on whether the review is positive / negative, the projected system has implemented an aspect extraction technique using the frequent item-set mining. In customer reviews it has been defined as the sentiment orientation of each of the dimensionswith the help of supervised learning algorithms.

Mukhtar et al., (2018) [22] proposed and defined a new type of tree called the tree of opinion. As a result, the flexible OM model was created based on the opinion tree, here the coarse-grained, medium-sized and fine-grained (three different granularities) OM has been realized as a unified flexible model. The flexible OM procedure is set for public opinions on the internet. Finally, an experiment on how to build the tree of opinions was completed and the overall opinions were constructed in the form of a tree as a hot topic on the internet.

Chinsha& Joseph (2015) [8] focused on the OM aspect levels, they have proposed a new approach with reference to the syntactic dependency, aggregate score of opinion words, SentiWordNet and OM aspect tables. The restaurant reviews were worked out in their work. The data collection for the restaurant reviews were obtained from the web and were labeled manually. On the annotated test range, the new approach achieved an accuracy of 78.04%. The method was contrasted with one that used Part-Of - Speech tagger for the extraction of features; results showed that the new method on the annotated test set provided 6% more precision than that of the existing ones.

2.2 Web based queries

Pappas et al., (2012) [23] proposed an agent-focused crawling system for the retrieval of the subject and the genre based web documents. A set of focused crawler agents tend to explore the specific web paths in the parallel topicswith the help of the dynamic seed Uniform Resource Identifiers (URIs), these belong to certain web genres and are collected from the web search engines, starting from a simple topic query. The agents here have used an internal framework for measuring the unvisited web pages with the subject and the genre-relevance ratings. The authors conducted an experimental study for testing the actions of the agents for various subject questions, they have further demonstrated the advantages and capabilities of the system.

The functions of a framework designed for the conduct analysis of the e-commerce customers defined by Dziczkowski et al., (2013) [10] allowed for user identifications and customer behavior extractionsfor communicating with the customers of the website. General Web Usage Mining approaches were presented with proposals for extending the database with information from e-commerce site for queries. The system analyzed and measured the opinions on the approach that was based on the natural language linguistics and the statistical treatments. Three different methods were used for identifying the opinions from the forum of clients, and the application of the linguistic information based on the two new methods depends on the emotions / opinions of the clients mentioned in the comments of the forum.

Abu-Salih et al. (2018) [2] had proposed a new OM-based approach. Customers evaluated the indexes of various Websites quantitatively with the help of reviews. The authors used the Mutual Reinforcement Approch (MRA) for improving the accuracy of the mining results. Unlike most of the OM approaches, this method focusses on the explicit factor-opinion pairs, MRA effectively mined the implicit pairs with respect to the pre-constructed associatives. The study and assessmentsthus performed have portrayed that the results of the proposed approach were consistent and reliable in comparison with the traditional approaches.

2.3 Survey on opinion mining of blogs

Kao & Lin (2010) [15] planned to develop a sentiment analysis system suitable for Chinese reviews that extracted user-interested features and detected semantine-oriented opinions depending on certain features / opinions in a particular category. The authors presented integral results to users. Experiments showed that dependency between features / opinions was effectively calculated by the system. Examination of analysis sentiments confirmed the applicability of the proposed process.

Alaoui et al., (2018) [3] discussed the topic of opinion question answering to answer opinion questions about goods using the opinions of reviewers. The authors approach, called the Aspect-based Opinion Question Addressing (AQA), supports addressing opinion-based questions by enhancing the shortcomings of the existing techniques.In the pre-processing phase, AQA adopted an OM technique for defining and estimating the efficiency of the target aspects. Goal aspects are characteristics or components which a review focusses on. The authors conducted experiments on a real life dataset, Epinions.comand, which demonstrated the AQA’s improved efficacy with respect to its accuracy of the retrieved answers.

Li et al., (2018) [17] demonstrated an OM system which extracted the opinions and views from consumers / customers and analyzed them to provide concrete market flow with validated statistical data. To provide these features, the program used an OM-based grouping, clustering, and lingual awareness.

2.4 Survey on feature selection methods in opinion mining

All feature reduction methods improved the classifier’s performance, as demonstrated by Alsaffar& Omar (2014) [4]. Support Vector Machine (SVM) approach ensured the highest accuracy in feature selection in comparison with the other classification approaches, such as the Principal Component Analysis (PCA) and the CHI square, in classifying the Malay feelings. In feature selection, SVM recorded an experimental accuracy of 87%.

OM on Thai restaurant reviews using K-Means clustering and Markov Random Field (MRF) feature selection proposed by Claypo&Jaiyen (2015) [9] started with pre-processing of the text for breaking the reviews into words and for removing the stop words, followed by the text transformation for creating keywords and for generating vectors for inputs. Selection of MRF features was done to select relevant features from many of the extracted features. K-Means was then employed to cluster it into positive / negative reviews. Results showed that selection of MRF features effectively reduced the features in a computational time-decreasing data. Baccianella et al., (2014) [6] introduced six novel feature selection methods explicitly designed for ordinal classifications. It was tested on two product review data sets against 3 literature approaches using two SV regression-tradition learning algorithms. Results showed that all the six metrics outperformed the three baseline techniques on both the data sets and the learning algorithms (were more stable than others by order of magnitude).

2.5 Survey on semantic based feature selection in opinion mining

Application and comparison of three classification techniques over a text corpus composed of reviews of commercial products to detect opinions about them was focused on by Mazzonello et al., (2013) [20]. This domain is about “perfumes”, and user opinions in the corpus were written in Italian. The new approach was data-driven: a selection procedure for terms of Term Frequency / Inverse Document Frequency (TFIDF) was used to produceefficient computations, improved classification outcomes and for managing issues related to specific classification procedures were adopted.

Weichselbraunet al.(2014) [27] proposed a new approach for contextualizing and enriching the broad semantine knowledge bases for the OM based on Internet intelligence systems and high-throughput big data applications. The approach refers to traditional lexicons of emotion and multidimensional affective tools such as SenticNet. Quantitative assessment showed major changes by using an enriched version of SenticNet for polarity classification. Gold standard data from crowds with a qualitative evaluation shed light on the strengths / weaknesses of the concepts and the enrichment process.

An innovative OM methodology using a new Semantic Web-guided solutions to enhance the results with traditional natural language processing techniques and sentiment analysis processes was proposed by Peñalver-Martinez et al., (2014) [24] aimed at (1) enhancing the feature-based OM using ontologies at the feature selection stage, and (2) providing a new vector analysis method for sentimental analysis. Compared to the other traditional methods, the technique implemented / tested in a real-world movie review-themed scenario yielded very positive results. Table 1 discussed the various Opinion mining tools used at different levels.

Table 1 Opinion mining at different levels

Various tools implemented for extracting the opinions from user-created information have been explained below [5].

  • A Review Seer Tool was used for site combinationwith various opinions automatically. Naïve Bayes Classifier picks both the positive and the negativeviews and avails a score for the extracted features.

  • A Web Fountain uses beginningspecific Base Noun Phase heuristic strategy for obtaining the required features that assists in the construction of a primary interface for the web.

  • A Red Opal tool supports the users in specifying the product’s opinion orientations according to their features. Results were displayed using a web interface.

  • An Opinion Observer is a mining system used for distinguishing the opinions gathered from the Internet for the user — aWordNet Exploring method usedfor assigning prior polarity.

3 Methodology

3.1 Proposed work

Feature selection’s purpose is reduce the features by deleting those which are irrelevant while maintaining/enhancing the classification accuracy. Many search algorithms used for feature selections are available in literature. A semantic based feature selection for OM is proposed in this research. The objective of this research work is to evaluate the efficiency of the various types of classifiers in classifying the movie/web based medical query posts.

3.2 Methodology

In this work, the accessible IMDb dataset is used for categorizinga review as either positive or negative. Noise is removed using stop words and stemming procedures. Featuresare eliminated with the help of the Inverse Document Frequency mechanism along with Naïve Bayes, AdaBoost, and Fuzzy Logic Reasoning classifiers, respectively.

3.2.1 IMDb dataset and medical service dataset

The movie reviews data set comprises of 2000 movie reviews [25]: 1000 eachof positive and negative reviews evaluated by the classification algorithms. The previous version of the data set had used 700 each ofpositive as well as negative reviews [11]. Reports gathered from an Internet Movie Database (IMDb) archives were segregated as either positive or negative classesautomatically derived from the ratings. The resultant dataset solely containsof reviews where the stars are used for representing the ranks of a movie. The entire searching process utilizes 200 positive or 200 negative opinions subsets.

IMDb.com, the internet movie database, collects movie data obtained from fans and studios. It claims itself asthe most significant web movie database maintained by Amazon. Additional data on IMDb.com is found online, which also includes the process of data collection. IMDb makes raw data available, but classified asa text file with each file’s format possessing minute differences. A data file created by collecting the entire volume of data is to write a ruby script for extracting and preserving the required information in a database. This informationexported to CSV would then be importedin the form of several programs. Though IMDb data is available for download and personal use, it is appropriately protected by copyright laws [19]. Though we have transformed the IMDb data to the RDF format, we have failed in acquiring the required permissionsin publishing the same, and hence, our implementation has no information from IMDb even though the external links to IMDb pageshave been included based on the requirements.

For the medical query dataset, data collected from sites like Mayo Clinic Query and Answer (Q&A), medical weblogs, and reviews for a medical service dataset have been considered. It has medical weblogs posts on topics of medicine/health care based on the author’s medical weblogs that are differentiated into blogs, preferably entered by the health care professionals and the patients respectively. Data has been collected from Mayo Clinic, a nonprofit medical organization. One thousand and six hundred data contentsconsisting ofan equal number of valid and same number of informative queries have been received from the website and the various links, including sections of Question and Answers, reviews of drugs, and from thestudies on diseases. An example of an educational class data used in this work has been collected from the Mayo clinic website and thus presented for perusal.

Endometrial Cancer develops in women’s uterus. It is a hollow and a pear-shaped pelvic organ, where the fetal develops. This cancer slowly expands from the starting of a cell’s internal layer that leads to the formation of a lining (endometrium) in the uterus. It is thus referredtoas the “Uterine Cancer. Various forms of cancer can also originate in the uterus but observed in a very smaller number of cases. Endometrial Cancer can be generally recognized at an initial phase as it causes abnormal vaginal bleeding. If endometrial cancer is identified at an early stage, surgery can be done for curing the same.

3.2.2 Pre-processing and feature extraction

Stemming

Word Stemming is treated as a pseudo-linguistic procedure that eliminates suffixes to minimize the words to a word stem. For example, ‘classifier’, ‘classified’, and ‘classifying’ are preferably reduced as a word stem ‘classify.’The dimensionality of feature space is controlled by mapping the morphologically identical words to the word stems [21]. A stemming algorithm is thus observed as the Porter developed suffix stripper. Arabic language comprises oftwo different morphological analysis techniques, namely, the stemming and the light-stemming techniques. While stemming minimizes a word to its stem, light-stemming deletes common affixes from a word.

Stop words

Stop words are considered as a set of terms/words without useful information. Stop words are problematic in critical concepts and word identification from textual sources when not removed by their presence as regards frequency/occurrence in textual sources [26]. Stop wordsare observed as words that are strainedfor processing NLdata (text) in computing. There is no exact specific stop word list used by the tools. Some avoid their removal to support phrase searches. A word group was chosen to stop words for a purpose. These are frequently used words - for search machines – “like is, at, which, and on.

Term frequency

IDF is anarithmatical statistic reflection that depends upon a word’s significance in the respective document within a collection/corpus. It is considered as the combination of image processing and mining basedon the weighting factors. IDF value improves correspondingly to the times a phraseperforms in a text and is offset by word frequency in a quantity controlling the other words.

Simple word frequency can be replaced by the weighted frequency before the calculation of the cosine and the various statistics. The weighted frequency statistic is the Term Frequency-Inverse Documents Frequency that calculates a weight for a term that suitably reflects its significance. TF-IDF is a standard text categorization task metric, where its use in sentiment analysis is limited and not used as a unigram feature weight [28]. TF-IDF comprises of expression frequencies and contrary document frequencies. The former located by counting the times a term occurs in a document, and IDF achieved the same by dividing the total texts with reference to thereports offered bythe word. When such values multiplied, it resulted in highest score wordsmaterializing in a few documents frequently and lowest score words for terms appearing frequently in records, thus permitting the location of terms that are important in a document.

TF*IDF comes from IDF with a heuristic perception that a query term is not a better discriminator and so be issued with minimum weights than the one occurring in limited documents [26]. The Eq. (1) is a classical term weighting TF*IDF formula:

$$ w{t}_{i,j}= tf{r}_{i,j}\log \left(\frac{N}{d{f}_i}\right) $$
(1)

Where, wti, j corresponds to the weight of the term i in the document j, N is the total number of documents in a collection, tfri, j is the term frequency of the term i in the document j, and dfi is the document frequency fori in a collection.Consider the positive movie review “The critics have less than kind to ‘Sample People’ - so I had expectations that the film somewhat of a dud when I saw it. Many of the criticisms of the film are correct; it’s a little derivative and quite a messy movie - but that’s part of its charm. It’s quite brave for an Australian film - it’s noisy, colorful and never dull. It contains strong performances from Nathan Page, Ben Mendelsohn, Kylie Minogue, and David Field; and a brilliant soundtrack of Australian artists covering classic Australian songs. The film’s production design is excellent - it looks like a Gregg Araki film, and the editing and cinematography are relentlessly brash. It’s imperative that people go and support films such as this - a low budget Aussie indie pic because lack of support from critics and lack of distribution and publicity means that it remains unseen by the young adult demographic it is intended. ‘Sample People’ (brilliant title) conveysright as any film I have seen this year.”

After stop words, the terms eliminated are:

The, to, so, i, of, a, it, and, for, an, is, go, as, by.

After stemming the output is given by

“Critic has been less than kind sample people had to expect that film will be somewhat dud when saw mani critic correct little derive quit Messi but part charm brave Australian noisy colour never bore contain strong perform from Nathan page ben mendelsohnliminogudavid field brilliant soundtrack artist cover classic song product design look like greggaraki edit cinematographirelentlessli brash imper support film such the low budget aussiindi pic because lack distribute public mean remain unseen young adult demographics intend title good ani seen year” the Table 2 shows the various term frequencies commonly used:

Table 2 Commonly used Term Frequencies

3.3 Classifiers

In this section Naïve Bayes, a popular probability-based classifier, AdaBoost, an ensemble-based classifier, and a fuzzy-based classifier,FLR described in detail.

3.3.1 Naïve Bayes

It is treated as a variant of Bayes Theorem,which depends on the probabilistic manner of classifier containsdependent assumptions as well as personal assumptions [12, 28]. Hence specified as an autonomousFeatureModel. The Naïve Bayes classifier feels that the features of a class cannot be related to any other features. The classifier’s probability model is conditional over a variable C1 with limited outcomes number or classes, conditional on many feature variables Fe1 through Fen.

$$ p\left(C1|F{e}_1,\dots ..,F{e}_n\right) $$
(2)

The problem is that when features are more in number or depend upon the values of specific functions, such a model depending on the probability tables would be treated as infeasible. Applying Bayes theorem as in Eq. (3):

$$ p\left(C1|F{e}_1,\dots ..,F{e}_n\right)=\frac{p(C1)p\left(F{e}_1,\dots, F{e}_n|C1\right)}{p\left(F{e}_1,\dots, F{e}_n\right)} $$
(3)

Psuedo Code of Naive Bayes Classifier.

figure a

3.3.2 AdaBoost

This classifier integratesthe sequence of the weighted classifiers that can preferably be imposed on learning various data aspects for generating an aggregated classifier that evaluates the better result of its in-built high probability [29]. The algorithm’s necessary steps are.

figure c

This algorithm is a repetitive process that integrates a greater number oflowest classifiers to estimated Bayes classifier Cl∗(x). AdaBoost, beginning with an unweight classifier. The weight increases when a training data point is misclassified. The second classifier having updated its weights would not bethe same. So, boosting the weights of the misclassified training datashould be done as aniterative process that generates 500/1000 classifiers. A classifier that is appropriately provided with a value.

3.3.3 Fuzzy lattice reasoning (FLR)classifier

This classifier considers the rules from the corresponding training data with a continuous incrementofa rule’s diagonal size until it reaches the maximum threshold Dcrit. FLR, is regarded as the leader-follower classifier [16] that rapidly learns with a distinct pass of training data. The input of data illustration order plays aprominent role. This classifier can formulate the learning initiatives without the help of a priori knowledge.But,in further stages, it is necessary to provide the initial rules set. The entire provisionslearnedin advance can remain asan unknownmaterial for identifying via on-line in the learning procedure. Simultaneously, this classifier trains with the additional training information without having any impact onthe previously learned statistics. Therefore, the process is performed by holding the FLRClassifier that is supplied with the latest dataset obtained as a result of improvising the existing rules or by generating new rules. It appears asasingle parameter for tuning, and it should be of the highest threshold size Dcrit, obtained bybalancingits corresponding granularity in learning.

Various stylesinFLR has been explained below:

  • Inassimilationcondition, rule generation may becomeactive, directly with the replacement of a hyperboxAJwithmore magnificenthyperboxAi∨AJ.

  • Al➔Cl, l = 1,.. . L specifies a fuzzy element k(x ≤ Al) as adomestic of hyperboles,then the hyperbox Al represents the central point.

  • FLRcontains semantics in two different types: Occam razor semantics, as discussedin the earlier phases, and the non-numeric data like graphs retained as a constituent lattice.

  • This classifier focused on the missing data in the respective constituent lattice Li by replacingthe missing datum with appropriate lattice interval [a,b].

FLR Training Algorithm

figure b

4 Results and discussion

Various applications were performed to classify the sentiment procedures by making use of the medical services data set and the IMDb data set. Both data sets have positive instances and negative instances. While implementing the classification procedure, accuracy is observed to be the primary performance metric for analyzing the positive and negative opinions (Tables 3 and 4).

Table 3 Summary of Results using IMDB dataset
Table 4 Summary of Results using Medical Service Dataset

Fig. 1 demonstrates the implementation of classification in IMDb data set for performing the classification with several metrics like accuracy, precision, recall, F measure respectively. All the performance metricscomprises ofboth the positive and the negative opinions. All the three algorithms are analyzed from the data set and the results proved that AdaBoost algorithm performed better than the other 2 algorithms. Figure 2 also proved the same result while performing the classification on a Medical services data set.

Fig. 1
figure 1

Implementation of classification in IMDb data set

Fig. 2
figure 2

Implementation of classification in Medicalservices data set

Electronic media get information and advice on medical matters. Internet health information ranges from personal experience in medical condition and patient discussion groups to peer reviewed journal articles and tools for supporting clinical decision making. A study on how American consumers are searching for health-related information shows that the Web is a highly utilized resource for information about health. But it is difficult to locate the best source of knowledge to meet a particular need for information, as relevant information is hidden in web pages or social media data such as blogs and Q&A portals.

This research work suggests a set of semantic related features for OM where the analysis focuses on the expressed sentiment. By extracting and classifying features from reviews, sentiment is classified as positive / negative. Opinion on film review is analyzed / classified as positive / negative. Extraction of features from reviews is via Inverse document frequency and reviews classified using Naive Bayes, AdaBoost, and FLRC classifier. Results showed that the Naïve Bayes has achieved the best rating. To strengthen classification, more supervised learning-based research needs to be undertaken.

5 Conclusion

Sentiment classification is a binary-classification procedure that makes use of the structured reviews for testing, training, or identifying the suitable features andfor scoring the methodologieswith the help of the information retrieval mechanisms for recognizing the review status as positive/negative. In this work, the accessible IMDb movie review dataset has been investigated together with the medical query reviews obtained from the various websites on the internet for devising the user opinions. Thefeature vectors that were generated from the studies were essentially trained with the help of three classifiers, namely, the Naïve Bayes, the FLR, and the AdaBoost. Classification accuracy of about 82% was achieved for the movie review dataset and about 85% was achieved for the medical query dataset. As far as the practical applications are concerned, an accuracy of 85% appears to be insufficient. An increased level of investigation and study is required for enhancing the accuracy and performance levels of the classification mechanisms.