1 Introduction

User generated media like discussion forums, blogs, online reviews and other web contents are of great interest in opinion mining nowadays. Researchers are trying to analyze the power of social media in data science. Data from social networking media is highly unstructured text. Data scientists these days are seeking to develop numerous data services and tools to structure and analyze hidden information from social data. This need raises the thirst of current academic research to focus on language processing and sentiment analysis [7]. Ambiguity in the content and highly context dependent data of social media presents many challenges in knowledge discovery process. One of the most active forums in India, ‘HP Inc. Forum’ has more than 50 thousand daily users submitting their posts online. There are more than 500 million of subscribers in Sina Microblog in China [19]. Various machine learning techniques are used to extract useful knowledge for web forum content mining and web opinion clustering [21].

2 Opinion mining and sentiment analysis approaches

Opinion mining and sentiment analysis approaches can be classified into three levels of extraction namely, aspect or feature level, sentence level and document level. Further two categories of techniques are used: (i) Machine learning based techniques (ii) Lexicon based techniques. Machine learning based techniques are basically applied at aspect and sentence levels of feature extraction. Features of these techniques include uni-grams, bi-grams, n-grams, POS tags and bag-of-words. SVM, Naïve Bayes and Maximum Entropy are three flavors of machine learning at aspect and sentence levels of feature extraction. Lexicon based or corpus based techniques use decision trees, SMO, k-NN, HMM, CRF and SDC based methodologies for sentiment classification [2, 46, 9]. Following are the approaches for opinion classification and sentiment extraction from opinionated web text.

2.1 Linear discriminant analysis (LDA)

Li et al. [11] found a way of dimensionality reduction of feature matrix like Principle Component Analysis (PCA). LDA searches data in all directions that have large variances and then gradually project it to diminish some. Fisher LDA considers minimization of covariance matrix J(w).

$$J\left( w \right) = \frac{{{\text{WT }}.{\text{SB}}\left({w} \right)}}{WT .SW\left( w \right)}$$
(1)

where SB is ‘between classes scatter matrix’ and SW is ‘within classes scatter matrix’. If x is overall mean of data cases and Uc is the mean of class c, then definition of scatter matrices are:

$${\text{SB}} = \sum \left( {{\text{U}}_{\text{c}} {-}{\text{x}}} \right)\;\left( {{\text{U}}_{\text{c}} - {\text{x}}} \right)^{\text{T}}$$
(2)
$${\text{SW}} = \sum \sum \;\left( {{\text{x}}_{\text{i}} {-}{\text{U}}_{\text{c}} } \right)\;\left( {{\text{x}}_{\text{i}} {-}{\text{U}}_{\text{c}} } \right)^{\text{T}}$$
(3)

Data from online reviews is being broken into tokens. Tokens may correspond to nouns, verbs, adverbs or adjectives. Their sentiment or opinion values are pre defined whether they are used in negative sense or positive sense. Out of all positively opinionated reviews, their xi opinion values are collected in a scatter matrix for LDA. Above two equations are used to define their matrices.

2.2 Support vector machines (SVM)

Bhadane et al. [3] presented a new approach of sentiment analysis and opinion mining in Science Direct in 2015 using SVM with 0.78 precision. SVM is a supervised learning technique based on decision planes with decision boundaries. According to the membership of different objects these decision boundaries separates classes. Greed of sentence or tweet taken from twitter is considered or classified into positive, negative or neutral depends upon SVM training algorithm. There are four SVM models. Two are of classification and other two for regression SVM. Training of these SVM techniques minimize error function:

$${\text{E}} = 1/2\;{\text{W}}^{\text{T}} {\text{E }} + {\text{C}}\;\sum\upbeta_{\text{i}}$$
(4)

where C is constant and β is parameter for non separable data input. W is a vector coefficient. SVM uses a kernel function to map input feature space into new space where classes are linearly separable. It uses polynomial kernel which largely depends upon cache size, exponent, tolerance and numFolds. Vectors from nouns, verbs, adverbs and adjectives are made with their coefficient taken from defined values of their sentiments. Values less than 0.5 reflects negative influence whereas values greater than 0.5 gives positive sentiment and coefficients with 0.5 value are meant for neutral class.

2.3 Back propagation neural network

Vindhoni and Chandrasekaran [18] from Dept.of Computer Science and Engg. Anamalai University performed sentiment classification of online reviews using BPN. BPN is an adaptive learning technique with a capability of classification of sentiments from social data. For each training pattern, the inputs are applied to the network. Neurons at nodes of first layers are firstly trained with weights of defined range. Then at hidden layer various combinations of bigrams and tri-grams appearing in sentences are considered. With the applications of these varying combinations, weights get adjusted while reaching to the output layer. Use of error function signals to compute weight adjustments. According to number of nouns, verbs and adjectives present in sentences, number of neurons in three layers varies.

2.4 Probalistic neural networks (PNN)

Savchenko [15] have used PNN in recognition of discrete patterns from sets. In PNN, Gaussion Kernel function is used in the hidden layer of neural network. Its third layer was used to perform average operation of outputs for each review class. In fourth layer, final class belongingness is found by selecting largest value of class label.

2.5 Homogeneous ensemble neural network (HENN)

Su et al. [17] introduced Ensemble of learning for sentiment classification in 2013. They have used Chinese Lexical Semantics to combine predictions of multiple base models. Mixture of base models with re-sampling of training data by calling base classifiers. Interactively drawing sub samples of training data and then by combining majority voted classes will give best possible prediction of classification.

2.6 Gaussian mixture model (GMM)

Abdel Fatteh [1] proposed “Multiple Classifiers for sentiment Analysis”. in Neurocomping Elsevier Journal in 2015. GMM are used for clustering data by allocating query data points to the multivariate normal components. Assigning data points to clusters is termed as hard clustering. Power of GMM clustering can be noted because it uses soft clustering techniques. They include assignment of score to data point for each cluster.

2.7 Naïve Bayes (Bayessian networks) maximum entropy

In journal of Theoretical and Applied Information Technology, J. Jotheeswaran and Dr. Y, S Kumarswamy presented a paper on opinion using NB classifier and data set taken from Manhattan Hierarchical Cluster in 2013. NB classifier model is a directed acyclic graph whose nodes carry variables and edges contains conditional dependencies. In text classification for sentiment analysis BN is found to be very expensive.

2.8 Hidden Markov model (HMM)

L. R. Rabiner presented a tutorial on HMM and applications in speech recognition in IEEE proceedings in (1989). HMM is a classification technique which is used for putting the right label on any sequence of nodes either from biological terms or from linguistics.HMM basically associate different lexemes into chain of nodes. While processing this model of different nodes taken from online reviews, it has to go through from one state to another and path between states is noted. Depending upon overall sentiment of sentence or group of statements taken from review, AMM provides different chains, these chains are known as Markov chains and further the classification can be done through Naïve Bayes classifier.

2.9 Decision trees

Jaskarn and Shveta in 2012 presented “Analysis and identification of Human Emotions through Data Mining”, published in IJCA. It is a hierarchical based classifier gives decomposition of training data space where value of some dividing attribute is used to divide data. Division of data items or phrases in case of text mining is done recursively so that last leaf nodes contain tokens for classification.

2.10 Sequential mining optimization (SMO)

Vivek et al. has given survey of various classification techniques in IJCA in Dec-2015. G. Geetika and Y. Divakar presented a paper on sentiment analysis of twitter data using SMO in IEEE International Conference 2014. SMO is used to optimize classification processes when training SVM’s. It interactively breaks larger sentences into smaller phrase to tokens and then classifies these tokens according to boundary value analysis applied by SVM.

2.11 K-nearest neighbour classifier (KNN)

Pak and Paroubek [13], presented a paper titles “Twitter as a corpus for sentiment analysis and opinion mining” in IJLRC in May 2010, have talked about KNN while comparing SVM with other classifiers. It uses three types of distance functions namely Euclidean, Manhattan and Minkowski for finding gap between two terms under classification process. In this process a case is classified by using most likeliness of its neighboring values, the case being allocated to the class with amongst its K nearest neighbors identified by one of above distances.

2.12 Jaccard similarity

Mrunmayee et al., have provided a sentiment analysis tool using jaccard and cosine similarity. In this classification technique firstly sentence is tokenized and then its word root is found after removing unwanted nouns and verbs. Keywords from sentences are extracted in case of test mining. Term frequencies of each keyword from a document are found in the next step. Similarity between two terms can be found by using Jaccard’s relation

$${\text{J}}\left( {{\text{A}},{\text{B}}} \right) = \left| {{\text{A}} \cap {\text{B}}} \right|/\left| {\text{AUB}} \right|$$
(5)

2.13 Lexicon based opinion classifier (LEX)

T. Christopher and KG. Nanda presented a survey in which combined classification approaches of lex and KNN with ME(maximum entropy) are discussed. In LEX based classifier polarity prediction and product features are identified with entity ranking of lexemes. Accuracy of lex alone was found to be 50.08% while mix of ME and KNN its accuracy gets improved to 80.21% observed by T. Christopher.

2.14 Conditional random fields (CRF)

Hu and Liu [8] proposed a method for aspect extraction from online reviews that can be treated in sentiment analysis. The first step of this task is to mark up words from text with” The Stanford Log-Linear Parts of Speech Tagger”. Then in second step various nouns are extracted from tagged corpus. In the third step “Porter Stemming Algorithm’’ is used to remove words with lesser influence in polarity i.e. positive or negative conditions are looked up manually from squeezed data.

2.15 Scalable distance clustering (SDC)

SDC is a distance based algorithm proposed by Christopher C and Tobun Dorbin in Nov, 2011 which stressed that required density of words must be accumulated in initial clusters [17]. Words from text are clustered with initial densities defined. Then modifications in their distances are making noise filtering process along with cluster iteration and thereby growing the cluster further.

3 Review of different sentiment analysis approaches

Most important feature requirement in opinion mining and sentiment analysis is correct identification of positive and negative words depicting the real greed of author of the text. The advantage of machine learning based approaches over lexicon based approaches is that, former can attain desired level of accuracy by training the network of bag of words [12]. While in latter sentiment extraction is complex and slow due to large growing size of corpus and diversity of linguistic terms. In the table below is the analysis on the basis of features, advantages and limitations of various Opinion Mining and Sentiment Analysis techniques (Tables 1 and 2).

Table 1 Review of sentiment analysis approaches

4 Recent papers exploring sentiment analysis

Most of the business organizations today believe that their success lies in the satisfaction of their customers. Also, there are the plethora of product and services reviews available on the web. So business organizations encourage young researchers and academicians for sentiment analysis and opinion extraction of their web text. Here are some recent research papers exploring new insights of web text for opinion mining and sentiment analysis.

Table 2 Review of recent papers on sentiment analysis

5 Conclusion

Several opinion mining techniques are adopted to evaluate the real greed of user generated data over social media. Dictionary based or corpus based techniques are more accurate in mining opinionated texts, while machine learning techniques are yet to improve their error rates. LDA, SVM, GMM, HMM, Jaccard’s similarity and K-NN are the approaches which are quite near to real picture. These techniques are continuously working for the analysis of online data, that how much level these are able to satisfy the thrust for data over the social media. Several support vector machines are assessing the positive and negative aspects of the online data which is being posted. This assessment is basically done using certain training algorithms. Sentiments depend upon the certain range of values of features such as bi-grams and tri-grams with their polarities and also on their combinations. Their effects are slow and iterative and nature. So proceeding further to work on the hidden layer of neural network kernel function is being applied which computes the belongingness of class label. The conditional dependencies between the various nodes and edges of an acyclic graph is done with the help of Bayessian networks, which is helpful in extraction of data at the context level. For the good sentiment analysis of sentence as well as paragraphs Hidden Markov model is applied. The optimization of sentences and words leads to faster learning which generates accuracy of data on social media. Tokenization of data at word root level, helps to generate positive and negative aspects of data. All the approaches are working hard to reduce the errors in opinion mining and sentiment analysis to achieve better level of accurate data for social media. All in all, this paper focuses on the various sentiments analysis techniques for extraction of structured data from unstructured web text.