1 Introduction

Sentiment analysis uses the Natural Language Processing (NLP) techniques to extract the opinion, sentiments of the user about the product, service they are using. This helps to understand the popularity, utilization of a product or a service and necessary changes needed to improve its quality [1, 2]. Table 1 shows the types of sentiment analysis used.

Table 1 Types of sentiment analysis

Based on the methodology used to perform the sentiment analysis, they are classified as:

  1. 1.

    Machine learning based sentiment analysis

  2. 2.

    Knowledge based sentiment analysis

  3. 3.

    Hybrid sentiment analysis

Machine learning approach, is a statistical approach, where surface features such as word frequency, bag of words, are used for sentiment prediction. Algorithm such as Support vector machine (SVM), Naïve Bayes, LSTM, CNN, Combine CNN-LSTM are used to extract and classify the text into polarity classes defined [12,13,14].

Knowledge/Dictionary based approach where the lexicon-based semantic information (ex. SentiWordNet, WordNet), grammatical dependency of words is used for the prediction a sentiment [4].

Hybrid technique is the combination of Machine Learning algorithms and Knowledge based tools used for the polarity identification.

LSTM method of deep learning, remembers the long-distance relationship between the words which is helpful in deciding the sentiment of a sentence. The paper proposes a study of sentiment analysis of Marathi e-news using LSTM method. The rest of the paper is organized as follow: Sect. 2 provides the current state of art in sentiment analysis. Section 3 explains the LSTM techniques for sentiment analysis. Results and conclusion of the research is in Sect. 4 and Sect. 5 respectively.

2 Literature survey

Classification of the review, based on the sentiment orientation of the text was the first work found in the literature [15]. Similarity between the pair of the words is measure using Point-wise mutual information (PMI)-Information Retrieval (IR) algorithm. The algorithm first extracts the adjective and adverb phrases present in the sentence. Semantic orientation for each phase is computed and further based on the average semantic value of the phrase, review is classified.

Next notable work in 2004 was feature based opinion summarization by [16]. It identifies the features of the product, count the number of positive and negative opinion for each feature. Using Apriori algorithm, frequently occurring phrases are identified.

[8] performs the sentiment classification of the Bengali and the Hindi tweets using Support Vector Machine (SVM). The research considers the unigram, bigram to compute the word overlapping. Use of distributional thesaurus which compare the words automatically for the similarity and group them. SentiWordNet is used to count the positive sentiments (mark with numeric value 1) for the words which found in SentiWordNet. If the word is negative in SentiWordNet then count it as negative sentiment.

[17] had used Arabic health care data for the sentiment analysis. Arabic corpora is rich in morphology but has limited tools available for the preprocessing. A combined CNN and LSTM model is used in the research. The Convolution layer uses the filters to detect the multiple features and represent the sequence of vectors in feature maps. Extracted featured vector is given to LSTM model for sentiment analysis. By considering the previous data, LSTM layers captures the new sequence data. The recorded accuracy of the model is 94.24%.

Another approach of sentiment classification as mentioned in Table 1 is Aspect based sentiment analysis (ABSA). In ABSA, Aspect Terms are the features, attributes or categories such as food, electronics, marketing, travel, movie etc. Here text is annotated as per the AT. There are two approaches used in ABSA; 1. Assign polarity to AT and 2. Polarity to the sentence is the count of polarities of ATs present in the sentence. But sentence having ATs with opposite polarity is a challenge [4, 7, 18].

Use of bilingual dictionary was the another approach found in literature for the low resource languages like Telugu, Hindi, Bengali (English–Indian Languages) [3]. Translation software is used to translate the low resource language text (Ex. Hindi) in to high resource language text (Ex. English) [5].

[19] had considered three languages Hindi, Bengali, Tamil for the opinion mining. Each tweet is human annotated. 2-class classification (positive, negative) and 3 class classification (positive, negative and neutral) had performed on the tweet data. For classification, features such as word n-gram, SentiWordNet features, surface features (such as number of hashtags, number of @ symbols, number of characters, number of words in tweet etc.,) are derived manually. Various different classifiers such as naïve bayes, logistic regression, decision tree, Random forest, SVM are used to classify the tweets in 2-class or 3-class.

Here the survey shows that sentiment analysis for low resource language is a challenge due to the unavailability of complete resources such as parser, tagger, WordNet etc. Knowledge bases such as WordNet, SentiWordNet, bilingual dictionary needs continuous improvement, updation and maintenance. The thought was to perform the sentiment analysis by training the deep learning model which is independent of the NLP tools and the knowledge bases.

The next session explains the reason for selecting LSTM model sentiment detection of Marathi news text.

3 Methodology

To understand the LSTM, we need to understand the Recurrent Neural Network (RNN). Traditional neural network does not have an ability to retain the previous event. Figure 1 shows the simple perceptron model which retain the information by repeatedly occurring. RNN is a collection of such perceptron which allows the network to keep the information re-occurring.

Fig. 1
figure 1

Single perceptron with feedback loop

Figure 2 shows the RNN model where every node passes the information to the next successive node. This helps in predicting the possibility of occurrence of the next event. Learning process is carried out using the back propagation in time and updating the weight values.

Fig. 2
figure 2

Simple RNN

$${W}_{\mathrm{new}}={W}_{\mathrm{old}}- \mathrm{learning rate} * \mathrm{Gradient}$$

Gradient value computed here is the derivative of the predicted error. RNN model uses sigmoid activation function which is stated as,

$$f\left(x\right)=\frac{1}{1+{e}^{-x}}$$
(1)

Gradient descent value of sigmoid function is between 0 and 0.25. Gradient is the chain rule of differentiation. As the number of the layers in RNN increase, at one particular layer the new weight and old weight become same (because of the long series of multiplication of derivatives) and that causes no significant learning. This is called vanishing gradient problem.

The problem of long-term memory is overcome in LSTM. Figure 3 shows the LSTM node which consist of four different neural layers respectively: (1). forget gate ft, (2) input gate it (3). Output gate Ot and (4) current cell state Ct. The cell state equation for LSTM node is given as:

Fig. 3
figure 3

The LSTM Node

$${c}_{t}={c}_{t-1} * {f}_{t} + {\tilde{c }}_{t} * {i}_{t}$$
(2)

The gradient of the LSTM node is in additive form which eliminates the problem of vanishing gradient [20]. Hence for the word sequence, language modeling, LSTM is used [21, 22].

LSTM node has four gates. As shown in Fig. 4, sentence (the Movie was awesome) is pre-processed, tokenized and each token in sentence is represented as a unique integer value. The word vector is of the uneven length and hence it is padded with zeros. This process is called as Word-embedding. Word Embedding is important as it reduces the dimension of the word representation. Such embedded vector, xi, is considered for the further analysis.

Fig. 4
figure 4

LSTM layer

Every node in LSTM receives the cell state Ct-1from previous LSTM node and the new word vector xt and the output vector ht-1. The forget gate is use to decide whether the information is to forget or keep in the cell state.

$${f}_{t}=\sigma ({W}_{f} ({x}_{t},{h}_{t-1})+{b}_{f})$$
(3)

σ represent the sigmoid function which produce the output in range of 0 to 1, where 1 represents to keep the information and 0 to forget. The next layer is input gate it which decided which the value to be consider next and \(\stackrel{\sim }{{c}_{t}}\) for the vector value to be updated to the current state.

$${i}_{t}=\sigma ({W}_{i} ({x}_{t},{h}_{t-1})+{b}_{i})$$
(4)
$$ \tilde{C}_{t} = tanh(W_{c} (x_{t} ,h_{{t - 1}} ) + b_{c} ) $$
(5)

Output of the LSTM node, ht, shown in Eqs. (6) and (7), is the multiplication of outcomes of sigmoid layer and the tanh layer respectively.

$${O}_{t}=\sigma ({W}_{o} ({x}_{t,}{h}_{t-1})+{b}_{o })$$
(6)
$${h}_{t}={O}_{t} * tanh({C}_{t})$$
(7)

Neural network’s sequential model is used, where sequence data will flow from one layer to another layer in same sequence. Every hidden layer neuron uses tanh activation function. Output layer of the model uses softmax activation function. SoftMax is a form of logistic regression function which produces the output in the range of [0,1] and hence it is suitable for the multiclass classification [23].

4 Result analysis

Bidirectional LSTM deep learning model is setup as discussed in section III. For this pilot experiment 132 Marathi e-news sentences were collected from the online news website such as Loksatta, Maharashtra Times, Lokmat, Sakal etc. Punctuation marks, special symbols, English characters are removed from the sentences as a part of data pre-processing. With the help of 2 native speakers of Marathi language, the sentences were annotated with the sentiments positive, negative and neutral. Inter rater agreement between the annotators is computed using kappa statistics and found satisfactorily. Sentence with neutral sentiment (11 sentences) were eliminated from the experiment. Remaining 121 sentences become an input set to the embedded layer. The LSTM model is trained with 80% of data.

Standard evaluation matrices, precision recall and F-score measures are used to measure the performance of the model. For the evaluation, remaining 20% of data is use as a test data. The precision, recall and F-score measure for the test sample was recorded as 0.55, 1.00, 0.72 respectively. To check the performance of the model we compare our results with the imdb database provided by keras, and another researcher’s work which is listed in Table 2.

Table 2 Result analysis

It has been observed that, LSTM model for the imdb movie review dataset in English text shows better results than the Marathi_news sentiment analysis. The reason may be the size of training data set used is small compared to the English imdb movie dataset. It is also observed that, compared to SVM algorithm for sentiment classification for the Indian tweets in Hindi and Bengali [8], which gives the accuracy of the 0.42 and 0.48, proposed LSTM model performs better.

The proposed system is comparable with the Telugu news sentiment analysis having accuracy 73% [3]. Telugu e-news system had used Telugu SentiWordNet. In Knowledge based approach, result of analysis is purely based on the quality of lexical resources used for the annotation of the sentiments. A system for Vietnamese corpus [22] performs far better than the proposed system. The use of CNN- LSTM captures both global and local dependencies of the sentiment which improves the performance of model.

Table also shows that Machine learning algorithm such as Naïve bayes (NB) and Linear Regression (LR) along with lexical resources (WordNet) produces better result.

5 Conclusion

To conclude, proposed systems performance is comparable with the other Indian Language sentiment analysis system shown in the Table 2. Performance could be better if the input set sentences are diverse and rich in vocabulary. Model should be trained with such sentences which are challengeable to naïve experts predict to predict their correct sentiments. Machine learning approach is more independent approach for the sentiment analysis as underline dependencies on lexical resource updation and maintenance is eliminated.

Future research should work on improving the accuracy of the model which could be used as a recommender tool for suggesting positive e-news.