Keywords

1 Introduction

Nowadays huge information, reviews or opinions are getting stored in the websites of social media or e-services in the form of raw data. In recent years, customers prefer to get the product through online. Therefore the prospective buyers choose the right products and large number of data collected from customers in the form of feedback. These research works softly provide opinionated words which assist the e-commerce business to recognize areas that need to be improved. In order to implement the proper methods, the raw data is required, and various methods are either related to adverbs, nouns, verbs or adjectives. However, a recent study shown in SA has the combination of adjectives and adverbs which are stronger than adjectives alone, but none of the research has focused on all the possible combinations of adverbs, adjectives and verbs. This paper presents the theoretical analysis of some well-known methods or proposal of SA. The data is really helpful, as well as knowledge for businesses looking to understand suggestions on their products or services. In fact, it is beneficial for consumers to be helpful with companies, the ratings and opinion stripped from them. For instance, reviews of hotels in a city that helps a consumer search for a good hotel to stay a city. Similarly, product ratings help other people determine whether the phone is worth buying or not. Similarly, phone reviews facilitate different users to choose whether or not the mobile phone is worth for money or not [1, 2]. This methodology incorporates various algorithms for evaluating and making sense out of the corpus of data. SA uses the manipulation of natural language to remove the particular knowledge from the data [3, 4]. The key component takes consumer reviews as input in NLP system and then separated by tokenizer into token. A series of characters combined in a text is called a symbol, while a semantic unit for processing is identified. The tokenizer contains punctuation marks, icons, words, etc. Consumers who help with an additional level of quality segregation for the product may convert an expression into word-level tokens that have been executed to generate rules that provide word counts and even ranking. In this analysis, consumer reviews related to smart phones were obtained from amazon.com in order to estimate the product ranking based on user feedback using SA. The organizations of this paper is described as follows: Section 2 defines the associated review about method based on SA; Section 3 defines the proposed methodology based on customer review data collection, data preprocessing, SA and frequency of review rating; and Sect. 4 discusses the conclusions.

2 Literature Review

In this section, many researches have been done using SA. The area of text-based classification was not much of a research work done to classify the sentences or word related to feedback rating.

[5] describes various mobile phone reviews based on SA which can be obtained by learning various posts given from various numbers of users that can classify the smart phones. [6] discusses to collect feedback on Tokopedia’s quality of service on-line analysis over several months of observations. Because of its high-level precision, the Naïve Bayes classification technique is applied, which facilitates large data processing. The outcome showed that the element of reliability and personalization needed more focus because they have a strong negative feeling. [7] suggests a continuous sentiment analysis (CSA) system for repetitive study of customer emotions emphasizing the intent of one such effort to catch the tone of the message. This ‘sentiment analysis’ approach is relatively a recent technique which uses NLP to provide meaning to the plentiful data available at hand. [8] explores a novel approach by trusting the comments on social media to build on a specific topic. The proposed solution includes a list of the words used to construct training dependent on knowledge of positive words and negative words. Originally data is obtained from web networks, namely, Amazon, Flipkart and eBay, etc., along with collecting special attributes from the information gathered and then applying them to vector and value set. This research study is carried out step by step, explaining the feedback, based on interpretation of SA. [9] proposed a methodology that used reviews from many customers who visit different hotels, book rooms and order food. This can be achieved using SVM algorithms, logistic regression and Naive Bayes. [10] proposes a machine learning (ML) model for SA and compares some popular ML approaches in the context of sentiment classification. The classifier efficiency is calculated in terms of precision. [11] explores how text analysis methods can be used to investigate based on various tweet language patterns and message volumes on Twitter into some of the details in a series of posts. The experimental tests reveal that the current classifiers for machine learning are more effective and accomplish better in terms of precision. [12] proposed detailed process descriptions of sentiment polarity categorization. Experiments were done with positive findings on both sentence-level and review-level categorizations. [13] discusses the long short-term memory (LSTM) classifier provides the best results in classifying comments with POS-tagged lexicon features into positive and negative review. [14] illustrates that mathematical approaches are frequently mixed with conventional linguistic laws and definitions. [15] discusses the study of emotions relevant to the field of education and gamification of learning. Naïve Bayes (NB) is the better classifier in which the results based on accuracy showed better results compared to the disagreement group when performing 1000 students for testing, the agreement group in learning to use gamification, that may improve student’s evaluation in learning. [16] suggests to introduce a multimodal sentiment prediction framework from various modal sources, namely, images, text and audio, that can interpret the projected emotions and combine them to understand the student’s community emotions in a classroom. This system includes a digital microphone device that records the student’s live video and audio streams during a lecture. [17] had used lexicon-based approach from Twitter posts to implement the SA mechanism. In this paper SentiCircles and lexicon-based methods are proposed which have been described primarily on the logical semantics that expresses the word-oriented sentiment. There are three separate databases which are Stanford Sentiment Gold Standard (STS)-Gold, Obama-McCain Debate (OMD) and Health Care Reform (HCR) tested by the proposed process. [18] suggested a Twitter data SA method. This paper describes the few methods to do text-based SA using lexicon-based methods. They dealt with three separate databases, AlchemyAPI, OpenCalais and Zemanta. [19] describes the approach of hybrid model of SA which is based on learning and lexicon. These can define emotions and polarity of the opinion which can be obtained with better accuracy of 75%. [20] suggested a technique of sentiment analysis through the study of restaurant domain customer feedback. In addition, creating the rule base to classifier by predicting the polarity of the review used by priority based algorithm. For incremental instance counts, the analysis performs well by k-nearest neighbourhood (K-NN) create. [21] suggested the a priori algorithm is a basic association rule mining (ARM) algorithm that can be used to mine frequent item sets and their related rules. An improved a priori algorithm is used to prune the subset and classify the more regular item collection, resulting in a better smart phone range. [22] has addressed the research of AI has attained an excellent level with a sublevel of ML and deep learning application with a minimal method that is proceeding to concrete future business.

The above study has to identify polarity of words using the analysis of SA techniques and several NLP concepts to linguistic the tokenized sentence and words.

3 Research Methodology

Most of the business establishments have done a ‘market basket analysis’ to evaluate the user input feedback on their mobile phone and buyer motive. According to an instance, a person is buying a mobile with the best battery consumption feature in the basket. Later, he switches over to better front camera features instead of considering the battery life. Moreover, in the current techniques, there is no consideration or intimation why the user or customer had switched over to another feature like front camera features instead of battery consumption feature. The existing techniques may not predict exactly, whereas the implicit rule interference algorithm is used to identify the kind of featured mobile that has been purchased by the person based on the basket data alone. To evaluate the explicit and implicit model, the present research work considers smart phone feedback analysis-based rule mining with SA. This work aims to progress a recommendation algorithm that is built from an explicit and implicit analysis based on laws of association. This paper discusses the use of NLP for deep learning as seen in Fig. 1.

Fig. 1
figure 1

Block diagram of proposed methodology

3.1 Input Dataset/Feedback

The data collection employed contains consumer ratings of smart phones obtained from amazon.com. The buyer agrees a recommendation on a scale from 1 to 5 and gives its individual opinion according to the overall experience about the product. The mean value is calculated over all ratings in order to arrive at the final ranking. Other visitors may also mark yes or no to a comment that has added benefit to the review and reviewer depending on their helpfulness. In this report, we looked at over 4000 user interface reports on amazon.com for cell phones. The dataset collected from ‘http://www.kaggle.com’ are outlined in Tables 1 and 2 with the following attributes from amazon.com based on the category ‘Cell Phone’.

Table 1 Features involved in the data set
Table 2 Feedback and rating from online shopping customer based on product ID

3.2 Data Preprocessing: Word Tokenization and NLP

Once the analysis text is imported, it is treated as individual consumer input, which is tokenized and analysed by NLP to generate the appropriate relationship. However, NLP has helped in categorising this method as a regulated programme of natural language that may be apprehensive in relation to machine and human language from the computer science with deep learning. The vast amount of text was analysed and dealt with using NLP predictive analysis. This is a deep learning technique that includes stemming, chunking data and removing stop terms. The benefit of NLP in generating emotion words by segregating the words in terms of noun and even the paragraph and sentences are tokenized and chunked in deciding the sentences as positive or negative. As a result, NLP is often used as a translator when translating from one language to another. It can produce low noise, resulting in reliable results. NLP is used to feed consumer reviews as an input and is then separated into tokens by a tokenizer. A character’s series has been paired with organisation containing punctuation marks, icons, special characters, sentences and other elements, resulting in the modification of a sentence into different words using word tokenization. This study looked at how the Natural Language Toolkit (NLTK) is calculated and used with Python and how it is used to help and understand the shape of sentences as well as their context. According to this approach, the analysis must change the consumer input expressed in the attribute of review language, as well as the text of unstructured data to structured data. To begin, all NLP tasks use data from part of speech to find nouns, verbs, adjectives and roots for each word in the sentence from the analysis document. This suggested chunking NLP algorithm aids in distinguishing sentiment terms such as adverb, noun and adjective that are used as a function that can reflect high accuracy in the analysis document.

Chunking NLP Algorithm for Extracting the Required Terms

figure a

3.3 Sentiment Analysis

Customer feedbacks are evaluated in this proposed work using SA which has been received from the website. Before charging the money, the customer needs feedback about the company. At the moment to read all the suggestions has not possible which was provided by the customer in the website. However all kinds of product analysis or feature analysis present in the companies are available with new information. Therefore, all kinds of essential inputs that have been provided by the customers are possibly to be missed. Thus the organized review rating frequency has assisted to resolve previous challenges. Then word count has been calculated from the extraction of all tokenized words based on SA. These can be obtained by deep learning. The easiest way to interpret the reviews using an SA along with word count is to figure out the feedback rating. Hence, the rating can be based on the reviews given by the customer. After the SA output has been received, the consumer should make a quicker and minimized attempt to read the feedback as the decision. The analysis terms are equipped using document frequency (DF) or inverse document frequency (IDF) and have been used for the sake of deciding the phrase count displayed in Table 3.

Table 3 Calculate word count

Algorithm for Sentiment Analysis

figure b

These reviewed word count to form a selected emotional word has been vectorized and gets associated with particular customer. The positive and negative feedback based on selected words – [‘best’, ‘good’, ‘love’, ‘amazing’, ‘impressive’, ‘super’, ‘glad’, ‘fantastic’, ‘funny’, ‘wonderful’, ‘extraordinary’, ‘awesome’, ‘bad’, ‘boring’, ‘unhappy’, ‘never’, ‘upset’, ‘sad’, ‘terrible’, ‘disappointment’, ‘poor’, ‘confused’, ‘hard’ and ‘hate’] – are illustrated in Fig. 2.

Fig. 2
figure 2

Plot the frequency of sentimental words

In this work the frequency of the review rating is from 1 to 5. Very positive, positive, neutral, poor and really bad are the ranking ranges that are used. In a score, for example, very good = 5 stars and very negative = 1 star, all of which are translated onto a five-star grade. It indicates the overall rating scale with corresponding product_ID is illustrated in Fig. 3.

Fig. 3
figure 3

Average rating based on product ID

The buying ranking amount in this 3D chart depicts the measurement of both the overt and tacit relationships based on consumer input (Fig. 4). Based on the average value of buyer purchasing number of items being associated with mixture of goods and object infrequent, the X axis represents the product ID, the Y axis represents ranking and the Z axis represents the number of sales in Fig. 4.

Fig. 4
figure 4

3D plot of the number of sales, average rating and product ID

This kind of analysis may be helpful for e-commerce business to improve the sales and identify the implicit product based on providing offers for particular implicit products.

4 Conclusion

The most difficult task for the user is to choose the best cell phone while shopping online due to app features that are difficult to justify. Consumer reviews and ratings, on the other hand, can suggest phone quality to the customer, but there is a lag in determining the exact device feature quality based on ranking. Many businesses claim that their business success depends solely on customer satisfaction. Therefore, scientists are encouraged to find better solutions for SA. Consequently, the aim of this project was to use SA to meet the needs of customers with their review text reviews. This SA works for natural language processing (NLP), which helps tokenize text for word counts. As a result, the word count is applied to the emotion terms as well as the consumer rating based on product ID to decide which mobile phone is better. This method of research has to be boosting the sales by defining the indirect product and offering approach for the various implicit products. In future research work, proposed system is made to evaluate the train and test dataset of SA with various classification techniques for justifying the accuracy level of qualified model.