Keywords

1 Introduction

Nowadays, social media become an effective path to people for collecting various kind of information [1] as well as a platform for all those people who wants to share their opinions through these sites without any hesitation [2]. One of them is Twitter which has millions of users who always stay active. It is said that at least 500,000 new tweets are posted in Twitter on every single day [3]. Now, Twitter become most used social site for public to share their views, thoughts and opinion on particular topic they are interested in and also Twitter has become a platform for marketing of brands [4].

These social media platforms especially Twitter work as a place to analyse the opportunity for a new product or a business, also as a place to promote or analyse existing products to enhance their business [5, 6]. As more and more people are getting connected with the Internet, any product marketed there can be reached to the great number of people. These platforms also generate tons of data from huge amount of users all around the globe on various domains. Data collected by these platforms is also said to be better sample compared to data collected by others means, as they have data on every domain, from all geographical locations, of all age groups. Sentiment-rich data in form of tweets, blogs, comments, reviews, posts, etc. can be used to analyse for the betterment of products and services, which as a result is used for the growth of the business [7].

Sentimental analysis actually analysed the emotions or sentiment behind any blogs, posts, reviews or comments. It usually helps users by gathering all information for a particular product and tell us whether we should buy it or not. This analysis is a huge process of using of pre-processing [8] and text analytics to analysis the sentiment of tweets [9]. Natural language processing plays a major role behind this sentimental analysis [10].

Feature level sentimental analysis is a part of analysis which provides a fine-grained sentiment analysis on certain opinion targets and has wider range of applications on E-business.

Sentimental analysis is also known as opinion mining [11]. It is a very effective tool that is used in any business field or any social media. In any online website, when a company wants to sell a product, it is very important to know about how a customer react about the product for further progress. Sentimental analysis is one among the foremost common ways which analyses an approaching message and tells whether the elemental estimation is positive, negative or unbiased [12]. Sentiment interpretation is one among the toughest challenges within the production of tongue, and sometimes individuals fail to properly analyse emotions (Fig. 1).

Fig. 1
The flow chart depicts the sentimental analysis architecture. In it, the toughest challenge is one with sentiment interpretation produced by the tongue, and sometimes individuals fail to properly analyze emotions.

Sentimental analysis architecture

  1. (a)

    Sentimental analysis has some real-life example like reputation management, brand sentimental analysis, etc. Essentially, the application of sentimental analysis gives analysis flexibility and insight into the awarding the organisation and its products.

  2. (b)

    Sentimental analysis helps to solve the problem of dealing with huge volume of data. Users can easily find out the emotional tone of any reviews or comments.

  3. (c)

    Sentimental analysis helps to mine any data and extract the emotions that underlie social media conversations.

  4. (d)

    Sentimental analysis helps to differentiate the positive or negative comments by analysing all data.

Section 2 describes the literature survey, Sect. 3 describes methodology, and result and discussions are depicted in Sect. 4, whereas conclusion is expressed in Sect. 5 with future direction.

2 Literature Survey

From the past few years, we recommended opinion of users, domain experts for making a decision in today’s life. According to business changing nowadays, we have to be more effective to our customer’s need, e.g. which brand is more popular and effective for the certain product, whether the market value of the product is high or efficient, whether the current series are good or not. According to Appel et al. [13] and Stephen et al. [14], opinion mining can be called as sentimental analysis which plays an important role in this process. It is the objective of emotions like sentiments, expressions that are basic properties in natural language processing by Liu [15], Pak and Paroubek [16], Vinodhini and Chandrasekaran [17], Maks and Vossen [18]. There are some techniques which applied to omit emotions from unstructured data. The analysis of sentiment deals with the identification and classification of opinions or feelings that are present within the source text. According to Zvarevashe and Olugbara [19] and Saberi and Saad et al. [20] within the sort of tweets, status updates, reviews and blog posts, etc., social media generates an enormous amount of sentiment-rich data. In understanding the opinion of the gang, sentimental analysis of this user-generated data is extremely helpful by Feldman [21] and Madhoushi et al. [22]. A path that makes us understand the essential sentiment of the users that they want to express by commenting on our blog by Neethu and Rajasree [23]. Basically, we will be making computer capable of understanding the contents of the text, including the sentiment within them. It will help us to extract all details and observations found in the text and to categorise and arrange the record themselves according to Chowdhury [24] and Indurkhya and Damerau [25]. As a Literature Survey on Sentiment Analysis of Twitter Data using Machine Learning Approaches, Twitter sentiment analysis is difficult compared to basic sentiment analysis, thanks to the presence of slang words and wrong spellings by Patel et al. [26], the utmost character limit allowed on Twitter is 140. For the analysis of feelings from the text, a machine learning approach is often used according to Kharde and Sonawane et al. [27].

3 Methodology

Here, we use sequential model with LSTM as a layer in order to instruct computers to try to learn whatever comes obviously and gain knowledge from encounter. Calculations using machine learning make use of computational strategies to simply “learn” data from data without counting on a preordained condition.

3.1 Data/Text Cleaning

In our dataset, there are several types of tweets like comments, review on merchandise, posts concerning one thing, live tweets on matches, etc.; thus, it is important to clean our dataset and each text so as that its square measure usually processed simply. So, cleaning each text was our first initiative. It is done in order to make it easier to process. Filtering out the unwanted symbols, tags, that we do not need so that they do not need to be looked at or processed. At first, we tokenized each sentence in order that it is easier to process. Tokenization is basically the method of splitting or tokenizing a string, text into a list of tokens. Then we clean the text of varied symbols and website marks like @, https, .com, etc. Next, we worked on “stopwords” from each text. In NLP, useless or words with very little meaning are mentioned as stopwords, some commonly used stopwords are “the”, “a”, “an”, “in”. These kinds of words have been programmed to ignore, both when indexing entries for searching and when retrieving them as the results of a query.

3.2 Split and Train

While operating with datasets, associate formula of machine learning typically works in two steps. We have a tendency to typically split information around 80–20% train-test ratio. Here, we have around 1.6 million tweets from which we have taken a sample of 5 lac tweets. We train the model using data which are typically referred as training set or training data. Training data is that data which already consists of actual value that must have the ground truth values and thus the algorithm creates changes to the value of the parameters to interpret for training the model. From the 5 lac tweets from the sample, 4 lac tweets are used for training the model and 1 lac for testing purpose.

We can apply the trained model to evaluate any text now on the basis of their score after evaluation by the model and check with the sentimental threshold and classify them as positive, negative or neutral. This model can also be connected in back-end of a blogging site to analyse the sentiments of the blogs automatically, without any interaction (Fig. 2).

Fig. 2
It depicts the workflow model. It illustrates how the datasheet transfer to processing then word2vec then sequential model then predict function which decides it negative or positive.

Workflow model

3.3 Embedding Layer

As an input to the embedding layer, tweets are given. Any tweet is split into tokens. Each token is transformed into a vector of fixed-sized words, often known as embedding by phrase.

Pre-trained word embedding Word2vec is used for generating the word vector for each token. Among the pre-trained word embeddings, different word embeddings are developed using tweets using the Word2vec algorithm.

The embedding layer output is fixed-sized word vectors that are supplied to the Sequential model as an input for training the model.

4 Result and Discussion

It is very common to remove the stopwords (such as “did”, “doing”, “an”, “the”, “and”) in the pre-processing step as these words tend to have no or very little meaning.

Nevertheless, sometime these stopwords may also have a high impact on the result of certain text. So, to test that, we have conducted an experiment to measure the impact of stopwords removal on the sentimental analysis. We have compared the results of sequential model with LSTM, both with and without removing the stopwords. It was observed that the model trained with stopwords has outperformed the model trained without stopwords. Hence, we can conclude that removing the stopwords should be avoided, especially for sentimental analysis. We have calculated the accuracy of our model with and without stopwords. After applying the model on our tweet dataset, we have a tendency to get accuracy of seventy nine percent to eighty one point five percent (79% to 81.5%). During this method of analysing the sentiment of tweets, we have used LSTM along of the layers in our ordered layer together with the embedding layer, dropout layer to prevent overfitting, to teach our model. One in every of the foremost uses of LSTM is that this sentimental analysis. The foremost distinction between feedforward neural networks and LSTM is that it is feedback affiliation and it will method entire sequence of data that makes it extremely economical. Sentimental analysis square measure usually machine controlled and should supply resolution to sizable quantity of issues and judgments square measure usually taken on the thought of a giant quantity of information rather than plain intuition that is not correct. It can also be associated integral part of the approach to plug analysis and client service. You may not solely see what individuals believe your own merchandise or services; you may additionally see what your rivals take into account them. With sentiment analysis, the final shopper expertise of your customer’s square measure usually discovered simply.

Accuracy (without stopwords)

Accuracy (with stopwords)

79%

81.5%

5 Conclusion

In this present work, we have taken a dataset consisting of tweet data which have various sorts of tweets associated with quite different topics and created a model to analyse each tweet’s sentiment. Using this model, we will analyse the other text and determine the sentiment behind it. So, we have got the highest accuracy of 81.5%. As a future direction of this work, we will work with larger dataset and will try to optimise the hyperparameters.