Abstract
In the field of natural language processing and text mining, sentiment analysis (SA) has received huge attention from various researchers’ across the globe. By the prevalence of Web 2.0, user’s became more vigilant to share, promote and express themselves along with any issues or challenges that are being encountered on daily activities through the Internet (social media, micro-blogs, e-commerce, etc.) Expression and opinion are a complex sequence of acts that convey a huge volume of data that pose a challenge for computational researchers to decode. Over the period of time, researchers from various segments of public and private sectors are involved in the exploration of SA with an aim to understand the behavioral perspective of various stakeholders in society. Though the efforts to positively construct SA are successful, challenges still prevail for efficiency. This article presents an organized survey of SA (also known as opinion mining) along with methodologies or algorithms. The survey classifies SA into categories based on levels, tasks, and sub-task along with various techniques used for performing them. The survey explicitly focuses on different directions in which the research was explored in the area of cross-domain opinion classification. The article is concluded with an objective to present an exclusive and exhaustive analysis in the area of opinion mining containing approaches, datasets, languages, and applications used. The observations made are expected to support researches to get a greater understanding on emerging trends and state-of-the-art methods to be applied for future exploration.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
In the present time, the Internet plays an important role in people’s life. With the help of the latest technologies, people can access, share, and generate content over the Internet. In the World Wide Web, Web 1.0 refers to the first generation, which was entirely made up of web pages connected by hyperlinks and people can explore a website, read the content of the pages, but cannot write or add anything on the web page. Internet users are moving from Web 1.0 to Web 2.0 since 2004. Web 2.0 (“Interactive Web” or “The Social Web”) explains a novel stage of web facilities, social websites, and applications with an increasing emphasis on user collaboration (User-generated content and the read–write web). People are consuming as well as contributing information through sites such as YouTube, Flickr, Digg, blogs, etc. Web 3.0 (“web of meaning” or “the Semantic Web”) refers to the third generation of the World Wide Web. Web 3.0 includes smart search and behavioral advertising along with Web 2.0 features. Web contents are unstructured, structured, semi-structured, wrongly spelled and noisy that required Natural Language Processing techniques to analyze the data.
Social networking sites play an important role in Internet activities. Internet users and Internet content are increasing day by day. People are excited to share and express their feelings on any issues and day-to-day activities on the Internet. The micro text or short text is the biggest challenge in text analysis and different approaches are utilized for micro text normalization (Satapathy et al. 2020; Cambria 2016). Due to the explosive progress of online activities on the Internet (conferencing, chatting, social media communication, ticket booking, surveillances, e-commerce, online transaction, micro-blogging, and blogging, etc.) leads us to load, transmute, extract, and analyse the very large extent of data that is referred to as Big Data. This large amount of data can be analysed in several real-life applications by using a combination of data mining, text mining, web mining, and information retrieval techniques. Several blogs, forums, e-commerce websites, additional web resources, news reports, and social networks work as platforms to express views, which can be used to observe or report the feelings of the customer and general public on public occasions, organisations plans, monitoring reputations, political activities, promotion campaigns and product preferences (Ravi and Ravi 2015). A huge amount of raw data is tough to analyse and needs extant methods to get a comprehensive review summary. The population of the world and Internet users is going to increase day by day. People are giving more attention to web 1.0 and web 2.0 and many activities are performed by Internet users as shown in Fig. 1 (Balqisnadiah 2016).
User-generated views are the main source of raw text. With the rapid progress of user-generated typescripts on the web or the Internet, mining of valuable data automatically from plentiful documents receives more research interest in numerous areas of Natural Language Processing (NLP) (Sun et al. 2017). The concept of artificial intelligence is used everywhere example Amazon’s Alexa from phone to devices. Nowadays machine learning methodologies or technologies are increasingly used in artificial intelligence fields. The concept of artificial intelligence together with a large amount of data is used by many different companies (Netflix, Google, data companies, etc.). NLP focuses on smartphones the human language to explain insight, help in human text and many more. The everyday human says several words to other people that interpret in countless meanings because every word is context-dependent. NLP is used for many prospectives such as word suggestion, a quick compilation of data, voice to text converter (Google assistant, Alexa application, Search engine optimization (SEO) application, handwriting recognition (online and offline), speech recognition system, opinion mining, etc. The Computational study of a person’s thoughts, moods, reviews, feelings, emotions, events, appraisal, issues, attitude, and topics are defined in sentiment analysis. The following text, explain the early reviews in the field of sentiment analysis.
1.1 Sentiment analysis: the earlier reviews
In the field of sentiment analysis (Pang and Lee 2008) reviewed and analyzed more than 300 research articles by covering the major tasks (opinion summarization, opinion classification, sentiment mining, polarity determination, opinion spam detection, different level of sentiment analysis, etc.), challenges and applications of opinion mining. Later, Tang et al. (2009) highlighted some issues in the field of sentiment analysis or opinion mining such as sentiment extraction, document opinion classification, word opinion classification, and subjectivity classification. Further, the authors specified some approaches for the subjectivity classification such as a cut-based classifier, multiple Naïve Bayes classifier, Naïve Bayes classifier, and similarity dependent.
O’Leary (2011) reviewed on blog mining and outlined different kind of blog search, forums, sentiments to be analysed and their applications. Montoyo et al. (2012) mentioned applications, attainments and some open issues in the field of sentiment mining and subjectivity classification. Tsytsarau and Palpanas (2012) focused on opinion spam detection, contradiction analysis, opinion aggregation, and opinion mining. The authors equated different opinion mining techniques and approaches that are applied in a common dataset.
Liu (2012) surveyed more than four hundred research articles in the area of opinion mining and sentiment classification. This survey covered NLP issues, sentiment analysis applications, sentiment lexicon and its issues, different levels of analysis, opinion summarization, cross-domain sentiment classification, cross-lingual sentiment classification, aspect-based sentiment analysis, sentence subjectivity classification, quality of reviews, sentiment lexicon generation, and some challenging issues in the field of opinion classification and sentiment analysis.
Feldman (2013) studied in the field of sentiment analysis and pointed-out some specific difficulties: sentiment lexicon acquisition, comparative sentiment analysis, sentence-level sentiment classification, document-level sentiment classification, aspect-based sentiment classification and some open challenging issues such as sarcasm detection, automatic entity recognition, discussion on multi-entity in the same review, and composition statement’s in sentiment analysis. Cambria et al. (2013) the survey focused on complexities involved in opinion mining, concerning demand and future direction.
After that Medhat et al. (2014) focused on the problem of sentiment analysis concerning the techniques not the applications’ point of view. The author categorized the article according to the techniques involved and classified the various techniques of sentiment analysis with brief details of algorithms. The author explained the available datasets and categorized the datasets according to the applications. Finally, discussed some sentiment analysis fields for enhancement such as transfer learning, building resources, and emotion detection. At last briefed fifty-four research articles listing out task accomplished, type of language, data source, polarity, data scope, algorithm utilized and domain-oriented.
Later, Ravi and Ravi (2015) reviewed about 251 research articles during (2002–2015) and classified the survey based on opinion mining approaches, applications, and tasks. This literature covered different tasks of sentiment analysis and pointed-out major issues in sentiment classification, review spam detection, degree of usefulness measurement, subjectivity classification and lexicon creation. Finally briefed thirty-two publicly available datasets and one hundred sixty-one research articles in the tabular form listing out concepts and techniques utilized, type of language, polarity, type of data and dictionary.
Further, Hussein (2016) identified challenges relevant to the techniques and methods in the area of opinion mining. Based on two-comparisons among forty-seven research articles, the authors discussed the effects and importance of opinion mining challenges in opinion evaluation. Finally summaries sentiment challenges and how to improve the accuracy based on previous work. After that, Al-Moslmi et al. (2017) studied about 91 research articles from 2010 to 2016 in the field of cross-domain sentiment classification. This study focused on the techniques, algorithms, and approaches used in cross-domain opinion classification. Further, highlighted some open issues in cross-domain opinion classification. Finally summarized the methodologies and findings of twenty-eight research articles in the area of cross-domain opinion classification. It is observed from the survey analysis that there is no perfect solution found in cross-domain opinion mining.
Recently, Sun et al. (2017) reviewed opining mining using the techniques of NLP. Firstly, the authors explained information fusion techniques for combining information from multiple sources to solve certain tasks. This study also presented some natural language processing techniques for text processing. Secondly introduced the different approaches, methods, and resources of sentiment analysis for different levels and situations. The aim of opinion mining is to extract the sentiment orientation (positive or negative) from different levels of sentiment analysis (sentence level, document level, fine-grained level, word level, etc.) using supervised, unsupervised and semi-supervised learning methods. Finally discussed some advanced topics (opinion spam detection, review usefulness measurement, opinion summarization, etc.), some open problems (annotated corpora and cumulative errors from pre-processing) and some challenges (deep learning for accuracy) in the field of opinion mining. Most recently Young et al. (2018) explained the latest trends of deep learning in NLP, compared various deep learning models and explained the past, present and future of deep learning in NLP.
This literature survey diverges from earlier review articles in several ways such as (1) categorized the standing studies based on different tasks in sentiment analysis and different level of sentiment analysis, (2) this study emphasized the cross-domain sentiment classification that is one of the most challenging tasks in sentiment analysis (3) summarized different tasks of sentiment classification in some aspects (approaches, techniques or methodologies, datasets, lexicon or corpus, and type of languages are utilized in sentiment classification), (4) this study provides a detailed list of publically available toolkits and supported language by toolkits for sentiment analysis’ tasks, (5) summarized a detailed list of available datasets, data sources, annotated corpora, and sentiment lexicons along with type of languages that is utilized in the field of sentiment analysis, (6) classified the baseline methods and research articles of cross-domain opinion classification in four aspects (approaches and methods utilized, datasets and languages used, name of the corpora or dictionary utilized and details description of research article) (7) study discussed some challenging issues, open problems and future directions in the area of sentiment analysis (8) summarized one hundred plus research articles of sentiment analysis in the aspect of techniques, methodologies, datasets, data source, and type of language.
The main aim of this survey paper is to understand the different techniques, approaches, and datasets, used in the field of sentiment analysis to achieve accuracy. Rest of the article is organized as follows: Sect. 2 explains the sentiment analysis and different levels of sentiment analysis. Section 3 presents different tasks and sub-task of sentiment analysis, state-of-the-art discussion on opinion mining along with publicly available datasets/lexicon and toolkits. Section 4 explains one of the most challenging tasks of sentiment classification named cross-domain opinion classification. Sections 5 and 6 covers the outcomes from the survey, the pros and cons of different baseline methods, challenges, open issues and future direction in the field of sentiment analysis. Section 7 concludes the survey.
2 Sentiment analysis
Sentiment analysis is a computational study of people’s views or aspects towards an entity. Here entity can be an individual thing like topics, blogs or events. The term sentiment analysis was firstly introduced in early of this century and has become an active field for research. According to the definition of sentiment (Liu 2012), it is represented as a quintuple.
Definition of sentiment analysis (ei, aij, sijkl, hk, tl), where ei represents the ith entity, aij represents the jth aspect of the ith entity, hk represents the kth sentiment holder, tl represents the time when the sentiment conveyed and sijkl represents the sentiments on aspect aij of entity ei at tl time for hk opinion holder. The sentiment or opinion sijkl is neutral, positive or negative.
For Example, “The Power backup of a power bank is excellent!” Here “Power bank” represents an entity, “Power backup” represents as aspect, and the sentiment expressed as positive.
Sentiment analysis examines the people’s moods, aspects, views, opinions, attitudes, feelings, and emotions towards entities (services, products, researches, political issues, organizations, random issues, and any topics). Sentiment analysis aims to find the polarity. Polarity can be positive, negative or neutral towards the entity.
Machine learning, lexicon-based, and hybrid approaches play an important role in sentiment analysis for obtaining the polarity. Sentiment analysis can be used in different applications such as movie sale prediction, market prediction, recommender system, customer satisfaction measurement and many more for achieving one goal i.e., opinion analysis of people’s reviews. Due to rapid growth and interest in e-commerce, this is one of the prominent sources of analyzing and expressing their opinions. Opinions are important for both sides: customers as well as the manager’s point of view. Many customers take their decision based on reviews that are available on the Internet. Sentiment analysis is a multifaceted problem, not a single problem. Texts for sentiment analysis are coming from various sources in diverse formats. Various pre-processing steps are needed to perform the task of sentiment analysis. Sentiment analysis helps in achieving various tasks such as sentiment classification, spam detection, usefulness measurement, subjectivity classification, and many more. Data pre-processing and acquisition are the most common subtask required for text classification and sentiment analysis, which are explained in the Fig. 2. The next subpart of this section explains the different levels of sentiment analysis i.e., document, sentence and aspect level.
2.1 Levels of sentiment analysis
In general, sentiment analysis has been classified at three different levels such as document level, sentence level, and entity/aspect level. Some studies explain the concept of user-level and concept-level sentiment analysis also as shown in Fig. 3. Concept-level sentiment analysis emphasized on semantic analysis of the text by using semantic networks (Cambria 2013). User-level sentiment analysis analyze the opinion expressed in individual texts (what people think) (Tan et al. 2011). A brief explanation of these levels of sentiment analysis is presented below.
2.1.1 Document-level sentiment analysis
The main task of document-level sentiment analysis is to find out the overall opinion polarity of the document such as blogs, tweets, movie reviews, institute reviews, product reviews, and any issues. The objective of the document-level sentiment analysis is to determine the third tuple from the quintuple as per the definition of sentiment analysis. The generalized framework of document-level sentiment analysis is explained in Fig. 4. Research articles of document-level sentiment analysis are summarized below in Table 1.
In document-level sentiment analysis (Moraes et al. 2013) utilized some machine learning approaches such as Support Vector Machine, Neural Network, and Naïve Bayes for comparison on product reviews in the English language. Du et al. (2014) utilized neural network and SVM on microblogs (Box office) in the Chinese language. Geva and Zahavi (2014) used neural network, decision tree utilized a stepwise logistic regression, genetic algorithm, and SVM on the stock market and news count in the English language. Xia et al. (2011) utilized Maximum entropy, Naïve Bayes, SVM on product reviews in the English language. Lin and He (2009) proposed a new unsupervised learning approach that is based on Latent Dirichlet Allocation (LDA) named as joint sentiment topic. Li et al. (2007) proposed a framework named Dependency-Sentiment-LDA. With the help of the Markov model, the author assumes the sentiment of words that depends on the previous one. In document-level sentiment analysis, the word sentiment information is consistent with the labeled document (Li et al. 2017). For each document, determined the topics and sentiments simultaneously. Irrespective of the research conducted in the area of document-level sentiment analysis, areas such as opinion mining, usefulness, and opinion spam detection remain a challenge.
2.1.2 Sentence-level sentiment analysis
Sentence-level sentiment analysis is similar to the document-level sentiment analysis, subsequently, a sentence can be observed as a short document. The objective of sentence-level sentiment analysis is to categorize the opinion expressed in each sentence. Before analysing the polarity of a sentence, find out whether the sentence is objective or subjective. If the sentence is subjective then find out the orientation or polarity (positive or negative) of the sentence. The process of sentence-level sentiment analysis is shown in Fig. 5. Research studies in sentence-level sentiment analysis are summarized in Table 2.
The applications of sentence-level SA in opinion mining are multi and cross-lingual, review spam detection, polarity detection, etc.
2.1.3 Aspect-level sentiment analysis
Classifying text at the sentence-level or the document-level provides valuable information in several applications but some time that information is not sufficient in many advanced applications. To acquire this information from the opinionated text, we need to go at aspect level which receives a great interest in research. In aspect-level, several variations are included like word (also known as an entity, attitude, or feature) and concept-level sentiment analysis. According to the definition of sentiment analysis, the first three components in quintuple (entity, aspect, sentiment) discover aspect level sentiment analysis and categorized the opinion with respect to the particular aspects of entities. Aspect-level sentiment analysis aims to determine the particular targets (entities or aspects) and the corresponding polarities.
For example, “iPhone is made by Apple company. The iPhone’s picture quality is very clear”.
First, the comment is splitted into sentences such as [‘iPhone is made by Apple company.’, ‘The iPhone’s picture quality is very clear. ‘].
Secondly, the sentence is splitted into words such as [‘iPhone’, ‘is’, ‘made’, ‘by’, ‘Apple’, ‘company’, ‘.’] and next sentence as [‘The’, ‘iPhones’, ‘picture’, ‘quality’, ‘is’, ‘very’, ‘clear’, ‘.’].
Now each word is tagged along with POS taggers such as [(‘iPhone’, ‘NN’), (‘is’, ‘VBZ’), (‘made’, ‘VBN’), (‘by’, ‘IN’), (‘Apple’, ‘NNP’), (‘company’, ‘NN’), (‘.’, ‘.’)] and [(‘The’, ‘DT’), (‘iPhone’s, ‘JJ’), (‘picture’, ‘NN’), (‘quality’, ‘NN’), (‘is’, ‘VBZ’), (‘very’, ‘RB’), (‘clear’, ‘JJ’), (‘.’, ‘.’)].
The entity, aspect, and sentiment are identified from both sentences, for instance from the first sentence, entity: “iPhone”, aspect: “made by apple company” and sentiment: “neutral” is extracted. The sentiment is neutral as the sentence is objective and it reflects the universal truth.
From the second sentence, entity: “iPhone”, aspect: “picture quality very clear” and sentiment: “express some sentiment as this sentence is subjective in nature and reflects the sentiment toward the quality of the iPhone.
Now, assign sentiment of aspect based on the polarity of aspect using a dictionary, statistical, lexicon and some other approach.
Finally, the sentence is classified into a positive or negative class based on assigned sentiment at the aspect-level.
The subsequent part summarizes the process of aspect-level SA in Fig. 6 along with current research work on aspect-level sentiment analysis is shown in Table 3.
The lack of annotated corpora at feature-level and complicated appearance of sentiments are the problems in the aspect-level sentiment analysis (Ravi and Ravi 2015). The applications of aspect-level sentiment analysis in opinion mining are polarity determination, entity recognition, feature extraction, etc. The next section explains the different tasks, approaches, and methods of sentiment analysis and utilizes the different level of sentiment analysis to achieve the objective of tasks.
3 Different tasks of sentiment analysis
In general, sentiment analysis is used to perform multiple tasks. This article consists of some important tasks of sentiment analysis as presented in Fig. 7 like subjectivity analysis, spam review detection, degree of usefulness measurement, opinion summarization, aspect selection, sentiment lexicon creation, and opinion classification. Some task is further categorized into subtasks such as opinion classification is divided into polarity extraction, cross-lingual and multi-lingual opinion analysis, and cross-domain opinion classification. In this section, discuss all the tasks, sub-tasks, and approaches, methods, or techniques applied in the respective task as explained in Fig. 7a, b. Applied methods are generally categorized into four approaches such as lexicon-based, machine learning, deep learning, and hybrid approaches that are further classified into some specific approaches. This study presents the literature on sentiment analysis’s tasks and applied techniques so that the new researchers can get the state-of-art in the field of sentiment analysis.
3.1 Subjectivity analysis
Subjective analysis deals with the recognition of “personal states”—a term that encompasses speculations, evaluations, opinions, emotions, sentiments, and beliefs. The sentence is divided into two dimensions objective and subjective. In an objective sentence, some factual information is available such as “This is a university” whereas, in the subjective sentence, some personal views, attitudes, feelings, or beliefs are available like “This is a good university”. The expressions of subjective sentences can be considered in many forms like speculations, allegations, suspicions, desires, opinions, and beliefs. The process of analysing whether the given sentence is subjective or objective is known as Subjectivity Analysis. Subjectivity analysis is an interesting task to work upon and bridge the gap between many applications and fields. In the past decade, considerable research has been reported and improvement is still coming out from the sentence subjectivity. To obtain the subjectivity of the sentence is more complex as compared to determining the orientation of the sentence (Chaturvedi et al. 2018). The improvement in the subjectivity analysis is directly proportional to the improvement in polarity determination. Figure 8 showing the process of subjectivity analysis of the sentence. Table 4 shows the research work in the field of subjectivity analysis.
The applications of subjectivity analysis in opinion mining are feature extraction, polarity determination, and sentence sentiment mining.
3.2 Spam review detection
With the increasing popularity of online reviews or e-commerce, the concerned person used to involve some experts in writing fake analyses of anything with the intention to increase productivity. In web 2.0, everyone is free to give their response or express their feelings from anywhere in the world without disclosing the identity. The analyses are extremely valuable for the customer. Here the concerned person may be manufacturer, dealer, service provider, political leader, market predictor, etc. Fake analyses referred to as a false review, a fraudulent review, opinion spam, fake review, etc. A spammer is a person who writes a fake review. To promote a low-quality product, a spammer used to write a false opinion for the customer. To find a fake review or opinion spam is a very tough task in the field of opinion mining. Sub-sequent part of this survey presents the process of spam review detection in Fig. 9 and research works in Table 5.
Opinion spam detection is one of the challenging tasks in the field of sentiment analysis and work is required for improvement of the accuracy and identifying the spam reviews. The applications of spam review detection in opinion mining are genuine polarity determination of product or any activity in e-commerce.
3.3 Opinion summarization
Opinion summarization can be observed as multi-document summarization (Sun et al. 2017). Opinion summarization focused on the opinion part of documents and corresponding sentiments towards the entity. Subjective information of the sentence contains opinions, feelings, and beliefs. One opinion is not sufficient for the decision. A large number of views for a particular thing is good to analyses the opinion. In traditional text summarization, emphasized on eliminating the redundancy and mining the subjective part only. Figure 10 shows the process of opinion summarization. Table 6 summarizes research works in the area of opinion summarization.
Opinion summarization is one of the challenging area and requires attention to improve accuracy. The applications of opinion summarization are to find the overall polarity and summarize form of any documents.
3.4 Degree of usefulness measurement
The rate of reviews in e-commerce or any social issues is increasing day by day. People used to examine the reviews and based on review rating they can make their decision. To promote their services and products, some third party persons are hired by the manager for writing fake reviews. These reviews may work for some products to increase their sale. At present, spam review detection and degree of usefulness measurement gained considerable attention from the researchers. Spam detection and usefulness measurement are sub-tasks of sentiment analysis. In spam detection, only consider and analyse the good reviews because professionals write fake reviews very intelligently to increase the sale of a product or to reduce the sale of the product. The aim of usefulness measurement is to rank the reviews according to their degree which can be expressed as a regression problem with the features of review lengths, Term frequency-inverse document frequency (TF-IDF) weighting scores sentiment words, review rating scores, POS tags, the timeliness of reviews, reviewers’ expertise’s, subjectivity of reviews, review styles and social contexts of reviewers (Sun et al. 2017). The process of the degree of usefulness measurement is explained in Fig. 11. Research in the degree of usefulness measurement are explained in Table 7.
Review usefulness is one of the challenging task to work upon and still required more attention to improve the accuracy. The applications of the degree of review usefulness in opinion mining are in market prediction, box office prediction, and many more.
3.5 Sentiment lexicon creation
3.5.1 Opinion lexica and corpora creation
A vocabulary of opinion words with corresponding strength value and opinion polarity is considered as Lexicon. The creation of a lexicon is started with the primary words called opinion seed words, the list is further enhanced using antonyms, and synonyms of opinion seed words with the help of the WordNet dictionary (Ravi and Ravi 2015). This process will continue until the extension of the list does not stop. The creation of a corpus is started with the seed word of sentiment words and searches the additional sentiment words in the large corpus with context-specific orientations. In order to collect or compile the opinion words list, there are three main approaches named as manual approach or brute force approaches, corpus-based approach and dictionary-based approach. The subsequent part of this survey summarizes research work (Table 8) and process (Fig. 12) of sentiment lexicon creation.
Researchers are still working on creating general lexicon and corpus that work for all the application like in social media, e-commerce, blogs, etc.
3.6 Aspect selection
3.6.1 Feature selection for opinion classification
Feature selection is one of the most important tasks in opinion classification. Feature selection and extraction from text feature is the first step in opinion classification problem. Some of the text features like Negation, Opinion words and phrases, Part of speech (POS), and Term presence and frequency. Feature selection methods are further divided into two sub-categories methods named as lexicon-based methods and statistical methods. In lexicon-based methods, human annotation is required and approach starts with a small set of words that are called ‘seed’ words. With the help of these ‘seed’ words, obtain the large set of lexicon through synonym and antonym. The most frequent method for feature selection is statistical methods that works automatically. The technique is used for feature selection considering the text document either as a string or a group of words (Bag of Words (BOWs)). After the pre-processing steps, feature selection plays an important role in extracting the good feature. The most frequently used statistical methods for feature selection are Chi square (χ2), Point-wise Mutual Information (PMI), Principal Component Analysis techniques (PCA), Hidden Markov Model (HMM), Latent Dirichlet Allocation (LDA), etc. Figure 13 explains the process of aspect selection. Table 9 summarizes existing research work in aspect selection for opinion classification.
There are some challenging tasks in feature detection such as irony detection, sparsity, polysemy detection, etc. that required more attention.
3.7 Opinion classification
The main objective of opinion classification is to determine the sentiment orientation of the given text. The sentiment orientation of given text is to find the polarity of a given text, whether the given text expresses the positive, negative or neutral opinion towards the subject. The classes of polarity can be varied like a binary (positive or negative), ternary (positive, negative, or neutral) and n-ary. In order to achieve the objective, many techniques are available named as machine learning, lexicon-based and hybrid techniques. The Machine Learning technique applies the well-known machine learning algorithms and uses linguistic features. For classifying the text, it observed that machine-learning techniques are divided into unsupervised and supervised learning methods. In supervised learning approaches, a large amount of labeled training data is available. Whereas in unsupervised learning approaches, it is difficult to find the labeled training data. Some frequently used machine learning methods are Support vector machine, Naïve Bayes, Maximum Entropy, Decision Tree, Neural Network, Bayesian Network, and Rule-based classifier. The Lexicon-based approach depends on an opinion lexicon that is a group of recognized and precompiled opinion terms. To analyse the text, lexicon-based techniques are used to find the sentiment lexicon. There are two approaches in this technique named as dictionary-based approach and a corpus-based approach. The dictionary-based approach is based on seed words and according to seed words, find out their synonyms and antonyms in the dictionary. The corpus-based approach, which uses statistical or semantic methods to find the other sentiment words in a huge corpus with context-specific orientations that begin with a seed list of sentiment words. The hybrid technique combines both Machine-learning and Lexicon-based approach with the objective to find the polarity of the text (sentences, documents, etc.) towards the subject. The objective of the opinion classification is to determine the polarity in multiple fields like multi-lingual, cross-lingual, and cross-domain, etc. The process of opinion classification is explained in Fig. 14. The current state of art, techniques, common datasets that are used for polarity determination explained in Table 10.
Researchers are still working to determine the polarity of texts, documents, and sentences by using a machine-learning approach, lexicon-based approach and hybrid approach along with feature extraction techniques. The challenging task of polarity determination is to yield good accuracy.
3.7.1 Cross-lingual and multi-lingual opinion classification
To determine the opinion, annotated corpora and sentiment lexicons i.e., sentiment resources are very crucial. However, most of the existing resources are written in the English language. To concern the opinions, the numbers of languages are available across the world with different degrees of sensitive power that makes it complex to achieve precise analysis for texts in different languages such as Spanish, Arabic, and Japanese, etc. It is very expensive to create sentiment lexicon or sentiment corpora for every language (Dashtipour et al. 2016). In cross-lingual opinion mining, the machine is trained on a dataset of one language (source language, which is having reliable resource e.g. English) and tested on a dataset of another language (target language or resource lacking language) whereas, in multi-lingual opinion analysis, mixed language is present (Singh and Sachan 2019). Using two different approaches, cross-lingual and multi-lingual opinion analysis can be performed e.g. lexicon-based approach and corpus-based approach. The Generalized process of multi-lingual opinion classification is described in Fig. 15. The current state of art, techniques, common datasets that are used for Cross-lingual and multi-lingual opinion analysis explained in Table 11.
Cross-lingual and multi-lingual sentiment analysis requires more attention in different aspects like code-mixed, phonetic words, social media code mixed, social media content etc. (Lo et al. 2017).
3.7.2 Cross-domain opinion classification
In the field of sentiment analysis, cross-domain sentiment analysis is one of the challenging and interesting problems to work upon. The opinion is expressed differently in a different domain that means a word is expressed positive sentiment towards a domain and the same word expressed negative sentiment towards another domain (“A Domain is a class that consists of different objects”). For example, if we considered two domains like “Hotel” and “computer” and a word like “Hot” will explain the challenging task of cross-domain sentiment analysis. In the case of the computer domain “The computer is too hot when it working” word “hot” express negative sentiment. When we are referring the case of the hotel domain like “The shower is having the great hot water” now the word “hot” expresses the positive sentiment. Due to this, the performance of the trained system will drop drastically. We can’t create the corpus for all the domains. Creating a corpus for all the domains is a very time consuming and costly process. Cross-domain sentiment analysis requires at least two domains, one is called a source domain and the other is called is a target domain. We get the training set from the source and target domain and train our classifier based on this training set. Once our classifier is trained, it is tested on the target domain and its accuracy is checked.
Let’s take the formal definition: \(D_{Src}^{l}\) denotes source domain and \(D_{trg}^{l}\) denotes the target domain. The set of labeled data for the source domain \(D_{Src}^{l}\) explained as:
where \(O^{l}_{s1} ,O^{l}_{s2} , \ldots ,O^{l}_{sm}\) are m sampled review from the source domain and sentiment labels are denoted as P1, P2, …, Pm \(\in\) {+ 1, − 1} where + 1 and − 1 denote the polarity i.e., positive and negative sentiments respectively.
The set of labeled data for the target domain \(D_{trg}^{l}\) explained as:
where \(O^{l}_{t1} ,O^{l}_{t2} , \ldots ,O_{tn}\) are n sampled review form target domain and sentiment labels are denoted as Q1, Q2, …, Qn \(\in\) {+ 1, − 1} where + 1 and − 1 denote the polarity i.e., positive and negative sentiments respectively. In addition to the labeled data set in source and target domain, also exist some unlabeled data set in both the domain.
The set of unlabeled data for the source domain \(D_{Src}^{U}\) denoted as:
where \(O^{U}_{s1} ,O^{U}_{s2} , \ldots ,O^{U}_{si}\) are i unlabeled sampled reviews.
The set of unlabeled data for the target domain \(D_{trg}^{U}\) denoted as:
where \(O^{U}_{t1} ,O^{U}_{t2} , \ldots ,O^{U}_{tj}\) are j unlabeled sampled reviews respectively. The task of cross-domain opinion analysis is to train our n-ary classifier based on the combination of labeled and unlabeled dataset {\(D_{Src}^{l}\),\(D_{trg}^{l}\), \(D_{Src}^{U}\), \(D_{trg}^{U}\)} available in the source and target domain.
3.7.3 Basic terminologies
Pre-processing Sentiment analysis requires many pre-processing steps for structuring the text data and extracting the features. Data is collected from many sources and that text data need to be pre-processed before using it. There are several pre-processing steps used in text data such as Stop word removal, Tokenization, Parsing, Part of Speech (POS) tagging, Stemming, word segmentation, and feature extraction. We explain some general pre-processing techniques.
Stop words do not contribute to the analysis of text so we remove that stop words in the pre-processing steps. Examples of stop words are “a”, “the” “as” and “on” etc. Tokenization is the process in which breaks the sentence into symbols, phrases, words, or some expressive tokens by eliminating some punctuation. In the English language, it is trivial to divide the words by the spaces. It is one of the most fundamental techniques for natural language processing tasks. With the help of a token, we can find out some additional information like name entity or opinion phrases. For tokenization, many fundamental tools are available such as OpenNLP Tokenizer, Stanford Tokenizer, etc.
Parsing and Part of Speech tagging are methods that analyze syntactic and lexical information of the text. Part of speech tagging is performed, to identify different parts of speech in the text. The POS tag is very vital for natural language processing. Part of speech tagging is used to find the equivalent POS tag for each word. POS tags such as noun, adjective, verb, adverb, a combination of two consecutive words like adverb-adjective, adverb-verb, n-gram, etc. are taken-out using the parser. Parsing is an important phase, it gives sentiment words as an output. Sentence parsing involves assigning different POS tags for the given text.
Stemming is a technique to acquire a word into its root form while discounting the different parts of speech of the word. Due to noise and sparseness in textual data, it often needs an extreme level of feature extraction that is one of the important steps in pre-processing.
Word segmentation technique is used when there are no explicit word boundary markers in the text such as Japanese, Chinese language. It is a sequential labeling problem. Several tools and approaches are available such as Stanford Segmenter, THULAC, ICTCLAS, Conditional Random Fields (CRFs), maximum-entropy Markov models, Hidden Markov models, etc. for this task.
For the pre-processing of data, some publically available toolkits are summarized in Table 12.
3.7.4 Languages and available datasets
In the field of sentiment analysis, some famous datasets, data source, sentiment lexicon, and opinion corpora are illustrated in tabular form (Table 13). These datasets are used to accomplish different tasks in sentiment analysis. English is the most frequent language that is used in different datasets (due to its availability of the resource). Research is still going on for the non-English language and it is a very challenging task to create lexicon, corpora, and resources for the different languages. Non-English language includes language such as Chinese, German, Hindi, Spanish, French, etc.
4 Baseline methods and techniques for cross-domain opinion classification
This section explains an outline of baseline methods and techniques of cross-domain opinion classification in the early days. Transfer learning and Domain adaptation or knowledge adaptation play an important role in the field of cross-domain opinion classification. In cross-domain opinion classification, trained a machine based on the available labeled/unlabeled dataset of source and target domain and test that machine on different domain whether machine work properly or not. In the early reviews, most of the articles used the amazon multi-domain dataset. Amazon multi-domain dataset consists of four different types of domains such as Kitchen, electronics, DVDs, and Books with 89,478, 104,027, 179,879, and 188,050 number of features respectively. The key techniques or approaches for the cross-domain opinion classification are explained in the following Fig. 16.
4.1 Structured Correspondence Learning (SCL) Technique
The SCL technique was introduced by:
-
Blitzer et al. (2006)
-
Approach: Structured Correspondence Learning, Support vector machine, part of speech tagging and Amazon multi-domain datasets (English language)
-
Corpora: Kitchen, DVD’s, electronics and Books
-
Explanation: In order to encourage communication among features from a variety of domains, a structured correspondence learning algorithm is introduced. The vital role of structure corresponding learning is to recognize correspondences among associations related to features and pivot features. Pivot features are features that act as a similar mode in both domains for discriminative learning. In their experiment, they considered the unlabeled data from the source and target domain and labeled data from the source domain. By using this dataset, structure correspondence learning outperformed with the semi-supervised and supervised learning approach. This work is extended by Blitzer et al. (2007).
-
-
Blitzer et al. (2007)
-
Approach: Structured Correspondence Learning with mutual information (unigram or bi-gram and domain label), Support vector machine, part of speech tagging and Amazon multi-domain datasets (English language)
-
Corpora: Kitchen, DVD’s, electronics and Books
-
Explanation: Structured correspondence learning depends on the selection of pivot features, and if pivot features are not well-selected that can directly change or alter the performance of a classifier. To overcome this problem, they extend the existing algorithm as Structured correspondence learning with mutual information (SCI-MI). For cross-domain opinion classification, structured corresponding learning with mutual information (SCI-MI) is more suitable as a comparison with Structural corresponding learning because it is selecting top pivot features by using mutual information between a domain label and uni-gram or bi-gram features. To measure the loss between the domains due to adaptation from one domain to another domain, they evaluated the A-distance. Using unlabeled data, A-distance is measured that will help to find out the divergence that affects the classification accuracy. Most recently the concept of Blitzer et al. (2006) is borrowed by Yu and Jiang (2016).
-
-
Yu and Jiang (2016)
-
Approach: Neural network, Sentence Embeddings, Deep learning (Convolutional Neural Networks and Recurrent Neural Network), Movie dataset (Pang and Lee 2004)Footnote 1 and Movie (Socher et al. 2013)Footnote 2 datasets, Digital products (Camera, MP3) (Hu and Liu 2004),Footnote 3 and Laptop and Restaurant SemEval (2015) (English language)
-
Corpora: Five benchmark product review, word embeddings from word2vecFootnote 4
-
Explanation: For domain adaptation, they induced a sentence embedding based on two auxiliary tasks (Sequential-auxiliary and Joint-auxiliary). The experiments are performed on five benchmark datasets and the proposed joint method outperformed several baseline methods.
-
4.2 Spectral feature alignment (SFA) technique
The spectral feature alignment technique is introduced by:
-
Pan et al. (2010)
-
Approach: Spectral feature alignment and Amazon multi-domain datasets, yelp and Citysearch websites (English language)
-
Corpora: Kitchen, DVD’s, electronics, Books, video game, electronics and software from Amazon, hotel from Yelp and CitySearch
-
Explanation: Proposed a new algorithm Spectral feature alignment (SFA), to bridge the gap between the two different domains. The spectral feature alignment algorithm is used to align domain-specific words (collected from different domains) into unified clusters and domain-independent words work as a bridge. The proposed framework includes the spectral feature alignment algorithm and graph construction, for reducing the gap between different domains. To construct the bipartite graph, they used co-occurrence information between domain-independent words and domain-specific words. Domain-specific words and domain-independent words are two different categories of words that are used in cross-domain sentiment data. The bipartite graph uses a spectral clustering algorithm, to co-align domain-independent words and domain-specific words into a unified word cluster and to minimize the mismatches between both domain and domain-specific words. The spectral feature alignment approach outperformed as compared with an existing approach like SCL etc. Later this work is extended by Lin et al. (2014).
-
-
Lin et al. (2014)
-
Approach: Spectral feature alignment, Support vector machine, taxonomy-based regression model (TBRM) and cosine function and Amazon multi-domain datasets (English language)
-
Corpora: Kitchen, Electronics, DVDs and Books
-
Explanation: Introduced two approaches taxonomy-based regression model (TBRM) and a cosine function to choose the most similar models based on the target node. They also utilized the support vector machine classifier, domain adaptation algorithm (spectral feature alignment) and weight adjustment technique. The experimental results showed that proposed approaches outperformed baseline approaches. Recently in cross-domain opinion classification.
-
-
Deshmukh and Tripathy (2018)
-
Approach: Modified Maximum Entropy for classification, bipartite graph clustering, and POS + unigram and Amazon Product Review (English language)
-
Corpora: Kitchen, Electronics, DVDs and Books
-
Explanation: Utilized semi-supervised approach (bipartite graph clustering and modified maximum entropy (enhanced the entropy with modified increment quantity)) for classifying and extracting the opinion or sentiment words from one domain (using set of labeled lexicon from source domain) and analyze the sentiment or opinion words of another domain (labeled and unlabeled from target domain). The authors classified their methodology in two phases. In the first phase, pre-processing steps of datasets (part of speech tagging using Stanford parser) are performed and secondly, they used classifier (modified maximum entropy for classification of opinions and all tagged words after the POS tagging) and clustering (using bipartite graph) on datasets. In the experiment, they used four different product domain reviews (DVD, Book, Kitchen appliances and Electronics) and the result demonstrated that the proposed approach performs better than the other baseline methods. They also used the F-measure and accuracy for analyzing the algorithm performance. They performed 4 different experiments and found that the proposed approach performs better than the existing approach and achieve accuracy between 70% and 88.35%.
-
4.3 Joint sentiment-topic (JST) technique
-
He et al. (2011)
-
Approach: Modified Joint sentiment topic, Maximum Entropy from MALLET, Bag of Word and Amazon multi-domain datasets, MPQA subjectivity lexicon (English language)
-
Corpora: Kitchen, Electronics, DVDs and Books
-
Explanation: Introduced the modified Joint Sentiment Topic model that is incorporated with word polarity. The Joint Sentiment Topic model is based on/extension of Latent Dirichlet Allocation (LDA) to extract the opinion and topic simultaneously from the text. The joint sentiment model is a probabilistic model based on polarity-bearing topics, to enhance the feature space and learning is based on prior information about the domain-independent polarity words. With the help of the joint sentiment topic approach, they performed polarity word extraction on the combined data sets and transfer learning or domain adaptation on amazon multi-domain data sets. The experiment results showed that the proposed joint sentiment topic method outperforms structured corresponding learning on average and gets comparable results to spectral feature alignment. Further JST model was improved by He et al. (2013).
-
-
He et al. (2013)
-
Approach: Dynamic Joint sentiment topic, expectation–maximization (EM), Part of speech tagging, Unigrams + phrases and Mozilla Add-ons web site,Footnote 5 MPQA subjectivity lexicon (English language)
-
Corpora: Personas Plus, Fast Dial, Echofon for Twitter, Firefox Sync, Video DownloadHelper, and Adblock Plus
-
Explanation: To overcome the issue in joint sentiment topic model such as static co-occurrence pattern of words in text and fitting large scale data, proposed dynamic joint sentiment topic model (dJST). The proposed approach permits the recognition and tracing of opinions of the present and regular interests and shifts in topic and sentiment. To update the dJST model using the afresh-arrived data and online inference procedures utilized the expectation–maximization (EM) algorithm. Both topic and sentiment dynamics are recognized by supposing that the present sentiment-topic specific word distributions are produced according to the word distributions at prior epochs. To obtain information on these dependencies, they utilized three different approaches: a skip model, the sliding window, and a multiscale model. The experiment showed that both the skip model and multiscale model is better than the sliding window for sentiment classification.
-
4.4 Active learning and deep learning approach
Active and deep learning techniques are used to select the data from which they can learn and perform effectively in less training. To acquire the preferred outputs at new data points, the active learning approach can interactively query the information source that is a special case of semi-supervised machine learning. To acquire the additional labeled target domain data, active learning uses the source domain information. In the active learning approach, three types of scenarios (pool-based sampling, stream-based selective sampling and membership query synthesis) and query strategies are used. On the other side, deep learning is the unsupervised approach, using an unlabeled dataset intending to mining the good features and obtain meaningful sentiments. In cross-domain opinion classification, very few research studies used the concept of active and deep learning. Some research studies are:
-
Li et al. (2013)
-
Approach: Query-by-Committee (QBC), label propagation (LP), Bag of words (Unigram and bigram), maximum entropy (ME), Mallet ToolkitsFootnote 6 and Amazon multi-domain datasets (English language)
-
Corpora: Kitchen, DVD’s, electronics and Books
-
Explanation: For sentiment classification and informative sample selection, they introduced the active learning approach incorporates with Query By Committee (QBC). The two classifiers (source classifier utilized source domain labeled data and target classifier utilized target domain labeled data) are trained by completely exploiting the unlabeled data in the target domain with the proposed label propagation (LP) approach and utilized Query-By-Committee (QBC) for selection of informative samples. In the experiment, they considered four different product domains Kitchen, Electronics, DVDs and Books. The proposed approach outperformed the state-of-the-art. Further active learning approach used by Tsai et al. (2014).
-
-
Tsai et al. (2014)
-
Approach: Query-by-Committee (QBC), Bag of words (Unigram and bigram) and Chinese language
-
Explanation: To identify the opinion words, used active learning approach incorporates with Query By Committee (QBC).
-
-
Glorot et al. (2011)
-
Approach: Deep learning, Stacked Denoising Autoencoder (SDA) with rectifier units, linear SVM with squared hinge loss and Amazon multi-domain datasets (English language)
-
Corpora: Kitchen, DVD’s, electronics and Books
-
Explanation: For Domain adaptation in opinion classification, they utilized a deep learning approach based on Stacked Denoising Auto-Encoders with sparse rectifier units and a linear support vector machine is trained on the transformed labeled data of the source domain. The experiment results show that the proposed approach outperforms the current state-of-the-art and comparable results to SCL, SFA, and MCT. Later deep learning approach is utilized by Nozza et al. (2016).
-
-
Nozza et al. (2016)
-
Approach: Deep learning, marginalized Stacked Denoising Autoencoder (mSDA), Ensemble learning Methods (Bagging, Boosting, Random SubSpace and Simple Voting) and Amazon multi-domain datasets (English Language)
-
Corpora: Kitchen, DVD’s, electronics and Books
-
Explanation: Proposed a new framework based on deep learning and ensemble methods for domain adaptation. In this framework, deep learning is used for obtaining high-level features of cross-domain and ensemble learning methods are used for minimizing the cross-domain generalization error. The experiments are performed on amazon multi-domain datasets and the proposed approach outperformed the current state of art approaches. Further deep learning is used for domain adaptation by Long et al. (2016).
-
-
Long et al. (2016)
-
Approach: Deep learning, Transfer Denoising Autoencoder (TDA), deep neural networks, multi-kernel maximum mean discrepancy (MK-MMD), TF-IDF and Amazon multi-domain datasets, Email Spam Filtering Dataset,Footnote 7 Newsgroup Classification Dataset,Footnote 8 Visual Object Recognition Dataset (English language)
-
Corpora: Kitchen, DVD’s, electronics, books, Public and user email, Amazon, Webcam, DSLR and Caltech-256
-
Explanation: The proposed framework outperformed state of the art methods on different adaptation tasks such as visual object recognition, newsgroup content categorization, email spam filtering, and sentiment polarity prediction on multi-domain datasets.
-
4.5 Topic modeling
In order to minimize the high dimensionality in a term-document matrix into low dimensions, utilized the Topic modeling approaches that are using the concept of latent semantic indexing and clustering techniques. Some research on these categories are:
-
Wu and Tan (2011)
-
Approach: SentiRank algorithm, manifold-ranking algorithm, Bag of words, Chinese text POS tool-ICTCLASFootnote 9 and Chinese domain-specific data sets Book,Footnote 10 HotelFootnote 11 and NotebookFootnote 12 (Chinese language)
-
Corpora: Book,Footnote 13 HotelFootnote 14 and NotebookFootnote 15
-
Explanation: To overcome the problem of domain adaptation in sentiment analysis, proposed a two-stage framework where the first stage is “building a bridge stage” (by applying the SentiRank algorithm) and the second stage is “following the structure stage” (by employing the manifold-ranking process). In the first stage, they build the bridge to collect some confidently labeled data from target data and reduce the gap between the source domain and target domain. Whereas in the second stage, they used the manifold-ranking algorithm and the manifold-ranking scores for utilizing the intrinsic structure collectively revealed by the target domain and to label the target-domain data. For the experiment, they considered Chinese domain-specific dataset on Books, Hotels, Notebook domain and compared the proposed framework with baseline methods (Proto, transductive SVM (TSVM), SentiRank algorithm, expectation–maximization (EM) algorithm based on Proto, expectation–maximization (EM) algorithm based on SentiRank, and Manifold based on Proto) and shown the comparable results. Later in topic modeling:
-
-
Roy et al. (2012)
-
Approach: Online Streaming Latent Dirichlet Allocation (OSLDA), Learning Transfer Graph Spectra, SocialTransfer: Transfer Learning from Social Stream, Bag of words, and Microblogs (English language)
-
Corpora: YouTube and NIST Twitter dataset
-
Explanation: Based on social streams (by employing Online Streaming Latent Dirichlet Allocation (OSLDA)), proposed a new framework for cross-domain opinion classification named as SocialTransfer. To acquire knowledge from cross-domain data, SocialTransfer is used in numerous multimedia applications. In their experiment consider real-world large-scale datasets like 10.2 million tweets from NIST Twitter dataset (Worked as source domain) and 5.7 million tweets from YouTube (Target domain) and proposed approach SocialTransfer outperformed traditional learners significantly. Further, to identify the properties and common structure used in different domain directly and indirectly explained by Yang et al. (2013).
-
-
Yang et al. (2013)
-
Approach: Probabilistic Link-Bridged Topic (LBT) Model, expectation–maximization (EM), Probabilistic Latent Semantic Analysis (PLSA), Support vector machine and Global domainFootnote 16 and scientific research papers (English language)
-
Corpora: Industry Sectors dataset (topic include—computer science research papers dataset (Data structure, encryption, and compression, networking, operating system, machine learning, etc.))
-
Explanation: Proposed a new model named Link-Bridged Topic (LBT) for transfer learning in cross-domain. In this model, firstly identify the direct or in-direct co-relation, properties and common structure among the documents by using an auxiliary link network. Secondly, Link-Bridged Topic (LBT) concurrently wraps the link structures and content information into a unified latent topic model. The aim of Link-Bridged Topic (LBT) is to bridge the gap across different domains. In their experiment, considered two different domain such as scientific research papers datasets and web page datasets and proposed model suggestively improves the generalization performance. Further in topic modeling indirectly work is extended by Zhao and Mao (2014).
-
-
Zhao and Mao (2014)
-
Approach: Supervised Adaptivetransfer Probabilistic Latent Semantic Analysis (SAtPLSA), Probability Latent Semantic Analysis (PLSA), expectation–maximization (EM) and 20NewsgroupsFootnote 17 and Reuters-21,578Footnote 18 (English Language)
-
Corpora: 20 subcategories of newsgroup and Retures21,578Footnote 19
-
Explanation: To overcome the issue in knowledge transfer like partial utilization of source domain’s labeled information and exploit source domain’s knowledge in the later stage of the training process, introduced a new model named Supervised Adaptivetransfer Probabilistic Latent Semantic Analysis (SAtPLSA). It is an extended version of Probability Latent Semantic Analysis (PLSA). To learn the model parameters, they used the expectation–maximization (EM) approach. In their experiments, they considered nine benchmark datasets (20 Newsgroups and Reuters-21578) and compare the proposed approach with five state-of-art domain adaptation approaches (Partially Supervised CrossCollection LDA (PSCCLDA), Collaborative Dual-PLSA (CDPLSA), Topic-bridge PLSA (TPLSA), Spectral Feature Alignment (SFA), and Topic Correlation Analysis (TCA)) and two classical supervised learning methods (Logistic Regression (LR) and Support Vector Machines (SVM)) and get the effective results. Further in topic modeling:
-
-
Zhou et al. (2015)
-
Approach: Topical correspondence transfer (TCT), Support Vector Machine (SVM), Bag of words (Unigram and bigram) and Amazon multi-domain datasets (English language)
-
Corpora: Kitchen, DVD’s, electronics and Books
-
Explanation: Proposed a new model or algorithm named Topical correspondence transfer (TCT), to bridge the gap between different domains in which labeled data is available only in the source domain. Topical correspondence transfer (TCT) assumes that there exists a set of shared topics and domain-specific topics for the target and source domain. In order to reduce the gap between the domains with the help of shared topics, Topical correspondence transfer (TCT) learns domain-specific information from different domains into unified topics. The experiments are performed on amazon multi-domain datasets and the results of TCT are compared with SCL, SFA, and NMTF for cross-domain opinion classification. Topical correspondence transfer (TCT) gets significant improvements through with state-of-the-art methods. Later in topic modeling
-
-
Liang et al. (2016)
-
Approach: Latent sentiment factorization (LSF), A Library for Support Vector Machine (LIBSVM),Footnote 20 Unigram and bigram, Word2Vec,Footnote 21 probabilistic matrix factorization and Amazon multi-domain datasets (English Language)
-
Corpora: Kitchen, DVD’s, electronics and Books
-
Explanation: Proposed a new algorithm based on the probabilistic matrix factorization approach named Latent sentiment factorization (LSF), to adopt opinion associations of words more efficiently and to bridge the gap between the domains in cross-domain opinion classification. Latent sentiment factorization first maps the documents and the words in source and target domains into a unified two-dimensional space based on domain shared words, after that they employed labeled document’s sentiment polarities in the source domain and prior opinion information of words to constrain the latent space. In his experiments, they used amazon multi-domain datasets and the proposed approach performed well compare with five baseline techniques including NoTransf, Upperbound, SCL, SFL, and TCT. Most recently the work is extended by Wang et al. (2018a, b).
-
-
-
Approach: Sentiment Related Index (SRI), pointwise mutual information (PMI), Support Vector Machine (SVM) Unigram and bigram, SentiRelated algorithm and Rew Data, DoubanData dataset (Chinese language)
-
Corpora: Rew Data (Computer, Hotel, Education), Douban data (Books, Music, and Movie)Footnote 22 and sentiment lexicons (NTUSD),Footnote 23 HowNetFootnote 24
-
Explanation: In order to measure the correlation between different lexical features in a precise domain, sentiment related index (SRI) is created and based on SRI the authors proposed a new algorithm named SentiRelated. By using this approach, they bridge the gap between source and a target domain and validate the novel approach on two different datasets (RewData, DoubanData dataset) in the Chinese language. The experiment results explained that the SentiRelated algorithm performs well to analyze the opinion polarity.
-
4.6 Thesaurus-based techniques
To transfer the knowledge, introduced a new approach “thesaurus” in cross-domain sentiment classification.
-
Bollegala et al. (2013)
-
Approach: Sentiment Sensitive Thesaurus (SST), Query Expansion, Pointwise Mutual Information (PMI), L1 regularization logistic regression,Footnote 25 POS tagging+ (Unigram and bigram), rating information and Amazon multi-domain datasets (English Language)
-
Corpora: Kitchen, DVD’s, Electronics and Books
-
Explanation: Proposed a new approach for the cross-domain sentiment classification named as Sentiment Sensitive Thesaurus (SST). Firstly, they used the labeled datasets from the source domain and unlabeled datasets from both source and target domains and created a sentiment sensitive distributional thesaurus. After that, they used the created thesaurus and expand the feature vector (query expression) at the time of training and testing on the L1 regularized logistic regression-based binary classifier. The proposed approach outperformed as compare with numerous previously cross-domain approaches and baseline methods for multi-source as well as single-source domain adaptation settings along with supervised and unsupervised domain adaptation approaches. Moreover, the authors compared the proposed approach with the lexical resource of word polarity, i.e., SentiwordNet,Footnote 26 and showed that the created thesaurus accurately pick up the words that expressed similar opinions. Further Sanju and Mirnalinee (2014) enhanced the work of Bollegala et al. (2013) and used the Wiktionary in sentiment sensitive thesaurus (SST) in order to reduce the mismatch between the domain. Later in cross-domain sentiment classification using thesaurus (Jimenez et al. 2016) suggested the framework.
-
-
Jimenez et al. (2016)
-
Approach: Bootstrapping algorithm (BS), Term Frequency (TF), POS tagging, (Unigram and bigram) and Spanish MuchoCine corpus (MC), iSOL Spanish polarity lexicon (Spanish language)
-
Corpora: Movie
-
Explanation: For the transfer learning of a polarity lexicon, introduced two corpus-based techniques that are language independent and work in any domain. One corpus-based technique based on term frequency (TF) achieves very promising results by using previously polarity tagged documents. Another corpus-based technique (did not want an annotated corpus) based on the bootstrapping algorithm (BS) improves on the baseline system. To achieve the benefit of the positive features of each of them, they combined both methods and get an improvement of 11.50% in terms of accuracy. Recently in cross-domain sentiment classification using thesaurus technique:
-
-
Bollegala and Mu (2016)
-
Approach: Rule-based modeling, K-NN, Pointwise Mutual Information (PMI) based pivot selection, a numeric Python library for decomposition,Footnote 27 L2 regularization logistic regression in scikit-learn,Footnote 28 POS tagging + (Unigram and bigram), rating information and Amazon multi-domain datasets (English language)
-
Corpora: Kitchen, DVD’s, Electronics and Books
-
Explanation: Developed an embedding technique for the training phase of cross-domain opinion classification that considered following objective functions in isolation and together: (a) pivot’s distributional properties, (b) Source domain document’s label constraints, and (c) Unlabeled target and source domain document’s geometric properties. The experimental results presented that improved performance can be attained by optimizing the above three objective functions together than by optimizing individually each function. This verifies the importance of using domain-specific embedding learning for cross-domain opinion classification and get the regards as the finest performance of an individual objective function.
-
4.7 Case-based reasoning (CBR) techniques
Case-based reasoning utilized experience and predict the results of new problems.
-
Ohana et al. (2012)
-
Approach: Case-based reasoning (CBR), kNN, Euclidean distance, Stanford POS TaggerFootnote 29 and The General Inquirer (GI) lexicon, the Subjectivity Clues lexicon (Clues), SentiWordNet (SWN), the Moby lexicon and the MSOL lexicon (English Language)
-
Corpora: Hotel reviews dataset, IMDB dataset of film review, Amazon product review (Electronics, books, music, and apparel
-
Explanation: For cross-domain sentiment classification, proposed a new approach named as case-based reasoning. The case analysis is a feature vector based on document data, and the case explanation comprises all lexicons that made precise expectations during training. They considered the six different domain film reviews (IMDB), hotel reviews,Footnote 30 Amazon product review (Electronics, Books, music, and apparel) for sentiment classification. The experiment results are comparable to state-of-art.
-
4.8 Graph-based techniques
A weighted graph is used for data representation in a Graph-based technique. In a weighted graph, nodes are data instance and weighted edge represent the relationship between those instances. In the graph-based approach, data is available in a manifold structure that is showing the instance’s behavior and connection. If data is not in the form of manifold structure, we can use the similarity function to find the similarity between graph vertices. The good graph explains the suitable assessment of the similarity between the data instances. One of the most popular algorithms for the graph-based technique was label propagation developed by Zhu and Ghahramani (2002). The proposed algorithm learned from the labeled and unlabeled dataset for sentiment classification. Cross-domain sentiment classification based on graph-based techniques work is introduced by Ponomareva and Thelwall (2012a, b).
-
Ponomareva and Thelwall (2012a, b)
-
Approach: Optimisation problem (OPTIM), ranking algorithm (RANK), kNN, Graph-based approach, Support Vector Machines (SVMs) (LIBSVM library) and Amazon product review (English Language)
-
Corpora: Electronics, Books, DVDs, and Kitchen
-
Explanation: Compared the performance and effectiveness of two existing graph-based approaches named as a ranking algorithm (RANK) and an optimisation problem (OPTIM) for cross-domain opinion classification. The Optimisation problem (OPTIM) considered opinion as an optimization problem and ranking algorithm (RANK) utilized a ranking to allocate opinion scores. In order to find the document similarity, they analysed and performed various sentiment similarity measures such as feature-based and lexicon-based. In their experiments, they considered the amazon multi-domain dataset and compared the existing graph-based approach with each other and with other state-of-art approaches (SCL and SFA) for the cross-domain opinion classification. The experimental results showed comparable results. Later in the graph-based approach work is extended by Ponomareva and Thelwall (2013).
-
-
Ponomareva and Thelwall (2013)
-
Approach: Graph-based approach, modified label propagation (LP), semi-supervised learning (SSL), Cross-domain learning (CDL), linear-kernel Support vector machine (LIBSVM library), Class Mass Normalisation (CMN) and Amazon product review (English Language)
-
Corpora: Electronics, Books, DVD’s and Kitchen
-
Explanation: Proposed a modified label propagation (LP) graph-based approach based on semi-supervised learning (SSL) and Cross-domain learning (CDL) algorithms. The authors observed the performance of graph-based label propagation (LP) along with its three variants (LPαβ, \(LP_{\gamma }^{n}\) LPγ) and its combination with class mass normalisation (CMN) on amazon multi-domain datasets in their experiments. Further in a graph-based approach: Zhu et al. (2013) to extract the labeled data with high precision from the target domain, utilized some emotion keywords and combined the labeled data of source domain and generated labeled data of target domain. After that performed the cross-domain sentiment classification and utilized label propagation (LP) algorithm, unlabeled and labeled data of the target domain. They used the amazon multi-domain dataset and the proposed approach achieves better performance.
-
4.9 Domain similarity and complexity techniques
Domain similarity is one of the approaches that can be used in domain adaptation to select the features from the source domain which are more similar to the target domain. To measure the domain similarity and variance in the complexity of the domains (Remus 2012) introduced the framework.
-
Remus (2012)
-
Approach: Domain similarity (pair-wise Jensen-Shannon (JS) divergence, unigram distributions, Kullback–Leibler (KL) divergence), Domain Complexity, Instant selection (ranked instances), Support Vector Machines (SVMs) with linear “kernel”, LibSVM,Footnote 31 unigram, and bigram features and Multi-domain Sentiment Dataset v2.0Footnote 32 (English Language)
-
Corpora: Kitchen and housewares, health and personal care, electronics, books, music, apparel, DVD, toys and games, sports and outdoors and video
-
Explanation: In order to achieve high accuracy in cross-domain sentiment classification, the authors are tried to find the features from training data set that are similar in the test domain. This study utilized domain similarity, domain complexity and instant selection parameter in the proposed approach and achieved the comparative results in domain adaptation. For the experiment, they considered 10 different domains and rating information of reviews. They employ unigram distributions, Jensen-Shannon (JS) divergence and support vector machine with their cos parameter. Further, this work is extended by Ponomareva and Thelwall (2012a, b).
-
-
Ponomareva and Thelwall (2012a, b)
-
Approach: Domain similarity, Domain Complexity, parts-of-speech (POS), unigram distribution, linear regression, Support Vector Machines (SVMs) and Amazon Multi-domain Sentiment Dataset (English Language)
-
Corpora: Kitchen, Electronics, books, and DVD
-
Explanation: The authors utilized the domain similarity (divergence) and domain complexity (domain self-similarity) approaches. Analysed the performance loss of a cross-domain classifier (predict the average error of 1.5% and a maximum error of 3.4%).
-
4.10 Feature-based techniques
To improve the performance of cross-domain sentiment classification, feature-based techniques are used.
-
Xia et al. (2013)
-
Approach: {Feature ensemble plus sample selection (SS-FE), PCA-based sample selection (PCA-SS), Labeling adaptation, Instance adaptation (sample selection bias), part-of-speech (POS), Naive Bayes (NB) and Amazon Multi-domain Sentiment Dataset (English Language)
-
Corpora: Kitchen, Electronics, books, and DVD
-
Explanation: Introduced a joint approach (that consider instance adaptation and labeling adaptation) named as feature ensemble plus sample selection (SS-FE). Feature ensemble model absorbs a new labeling function in a feature re-weighting manner and sample selection used principal component analysis as an aid to FE for instance adaptation. Experiments are performed on the amazon multi-domain dataset and outcomes indicated the effectiveness of SS-FE in both instance adaptation and labeling adaptation. Further:
-
-
Tsakalidis et al. (2014)
-
Approach: Text-Based Representation (TBR), Feature-Based Representation (FBR), Lexicon-Based Representation (LBR), Combined Representation (CR) (parts-of-speech (POS) using the Stanford POS Tagger + TBR, Ensemble Classifier (hybrid classifier (HC) and Lexicon-based (LC)), n-gram, TF-IDF and Twitter Test Datasets (English Langauge)
-
Corpora: Stanford Twitter Dataset Test Set (STS), Obama Healthcare Reform (HCR), and Obama-McCain Debate (OMD)
-
Explanation: Introduced an ensemble classifier that is trained on a domain and adapts without the need for additional ground truth on the test domain before classifying a document. To deal with the domain dependence problem in cross-domain sentiment classification, the ensemble algorithm is used on twitter datasets and results are comparable to state-of-art approaches.
-
-
Zhang et al. (2015)
-
Approach: Transferring the polarity of features (TPF), Kullback-Leiblern (KL) divergence, Cosine function, Linear classifier, Support vector machine (SVM), Rule-based classifier, bog-of-words, Co-occurrence matrix and Amazon Multi-domain Sentiment Dataset (English Langauge)
-
Corpora: Kitchen, Electronics, books, and DVD
-
Explanation: To address the two-issue polarity divergence and feature divergence in cross-domain sentiment classification, proposed a new approach named Transferring the Polarity of Features (TPF). In order to deal with these issues, the proposed approach selects the high priority independent features from the source and target domain and making the cluster of these high-polarity independent features. To transfer the polarity of the features, independent features work as a bridge between source and target domain. In their experiments, they utilized the amazon multi-domain dataset and the result showed the effectiveness of this approach.
-
4.11 Distance-based technique
In distance-based technique
-
Bisio et al. (2013)
-
Approach: {k-Nearest Neighbor (k-NN), bag-of-words, distance matric, distance-based predictive model WorldNet and Amazon Multi-domain Sentiment Dataset, TripAdvisor and English}
-
Corpora: {Kitchen, Electronics, Books, DVD, and hotel}
-
Explanation: Utilized the distance-based predictive model for opinion classification in the heterogeneous domain. The framework contained three steps. In the first step, they defined the distance metric and training corpus. In the second step, they classified the new review and with the help of a distance metric identify it in the training corpus. Lastly, according to a majority-rule strategy, an unlabeled review is tagged. In his experiments, utilized two publicly available datasets of reviews named amazon multi-domain datasets and hotel reviews from TripAdvisor and performance are evaluated in two different experiments.
-
4.12 Meta-classifier technique
The knowledge enhanced meta-classifier technique
-
Franco-salvador et al. (2015)
-
Approach: Knowledge-enhanced meta-learning (KE-Meta), Meta-learning (Stacked generalization)) bag-of-words classifier, word n-grams classifier (TF-IDF weighting and SVM classifier), lexical resource-based classifiers (SentiWordNet), vocabulary expansion-based classifier, Word Sense Disambiguation (WSD), Babelnet multilingual semantic network,Footnote 33 part of speech tagging and Amazon Multi-domain Sentiment Dataset (English Language)
-
Corpora: Kitchen, Electronics, Books, DVD
-
Explanation: For single and cross-domain sentiment classification, introduced new approach named Knowledge-enhanced meta-learning (KE-Meta) that combine different classifier such as bag-of-words classifier, word n-grams classifier, lexical resource-based classifiers, vocabulary expansion-based classifier, and Word Sense Disambiguation (WSD) based classifier. In the experiments, the proposed approach utilizes the amazon multi-domain dataset reviews and has confirmed to perform at par or better than state-of-art in single and cross-domain polarity classification. In order to generate features from vocabulary expansion and Word Sense Disambiguation, they utilized Babelnet multilingual semantic network. BabelNet (Navigli and Ponzetto 2012) is a multilingual encyclopedic dictionary or multilingual semantic network and represented similarly in WordNet.
-
In the next section, findings observed in a systematic survey are projected through discussion.
5 Discussion
The study explains all the tasks and sub-tasks of sentiment analysis with possible techniques and approaches. In order to perform sentiment analysis tasks, the most common approaches are machine learning, lexicon-based, and hybrid-based approaches. In this survey, different techniques, approaches, datasets, available tools, types of language, etc. for sentiment analysis tasks were presented in tabular form for better visualization and clear understanding. From the study, it can be stated that the machine learning approach is one of the most popular techniques followed by the lexicon-based approach, hybrid, deep learning, and other approaches used by the researchers in the field of sentiment analysis as represented in Fig. 17a. It is observed from the study, support vector machine is the most favourite technique followed by dictionary-based technique, Naïve Bayes, Maximum Entropy, Neural Network, Decision Tree, and other techniques as shown in Fig. 17b. This study focused and explained one of the sub-task of sentiment analysis named cross-domain opinion classification. Cross-domain opinion classification is one of the challenging research areas to work upon and it is observed from the survey that no perfect solution is available till now. Table 14, shows the pros and cons of the bassline techniques in cross-domain opinion classification.
The number of research articles considered from notable digital libraries and databases is explained in Fig. 18. All the publicly available datasets and toolkits along with resource links for sentiment analysis have been explained in this survey.
We briefly summaries more than hundred research articles using sentiment analysis attributes algorithms/Techniques, data scope and source, and language in Table 15.
6 Challenges and future directions
Various challenges and future scope in the area of sentiment analysis from the literature are identified few of which are as follows.
-
Availability of annotated corpora: Due to the non-availability of annotated corpora, unsupervised and semi-supervised techniques have been adopted. However to provide a greater amount of performance it is necessary to have public availability of annotated corpora.
-
Pre-processing: In the opinion mining system, data acquisition is often associated with unstructured, misspelled, and noise data from various sources. Per-processing includes parsing, POS tagging, word segmentation, tokenization, etc. which consume time for conversion into structured data. Hence there is scope for the development of tools or methods for automating the pre-processing stage.
-
Optimization: Unlike traditional feature extraction methods (typically applicable for images), novel methods that can optimally associate text features and their sentiment should be explored for greater optimization during processing.
-
Micro-blog based opinions (limited in words), often associated with sarcasm and irony are language and culture dependent. Fewer explorations have been carried out in analyzing such statements, thereby demanding cognitive intelligence intervention for rescue. Similar is the case with phonetic multi-lingual sentiment classification.
-
Associated areas such as review spam detection, review usefulness, and opinion summarization are also significantly influencing the overall sentiment of a given opinion, hence requires attention for improving the efficiency of opinion mining.
-
Cross-domain sentiment analysis is another neish area that has the potential to associate stronger sentiment by understanding various behavioral perspectives of the user.
-
As per the review indicated in this paper, very limited research in the area of cross-domain sentiment analysis is discovered.
-
Fewer literature is observed in the area of Social Media Code Mixed Texts. When it comes to Non-English based, no research is found. Indicating there is a dire need for this research.
-
Since the creation of a corpus is a cost-oriented approach, transfer learning can be another methodology that can be used for performing sentiment classification.
-
-
Deep learning is another methodology that is often utilized by industrial researchers for computational advantage to overcome flaws in traditional methods like lexical, feature extraction, and classification.
7 Conclusion
In the field of text mining and natural language processing, opinion analysis is one of the interesting and active research areas to work upon. This article covers research articles/papers/reports of sentiment analysis published in different journals, conferences, and magazines from last more than one decade. This study covers different levels of sentiment analysis and different tasks of sentiment analysis such as subjectivity classification, degree of usefulness measurement, opinion summarization, sentiment lexicon creation, aspect selection, spam review detection, and opinion classification. Opinion classification is categorized into some sub-tasks such as cross-domain opinion classification, cross-lingual, multi-lingual opinion classification, polarity extraction, etc. All the tasks and sub-tasks of sentiment analysis are reviewed in some aspects like, different approaches, techniques, and methodologies employed, datasets exploited, lexicon and corpus utilized and type of languages used by the researcher. In this survey, we saw the different intelligent techniques apart from Neural network, Maximum entropy, Naïve Bayes, Support vector machine, lexicon-based approach which has been used in sentiment analysis for different tasks such as for better feature extraction utilized the conditional random field theory, for finding the common opinion words rule miner techniques is used. This literature survey emphasis the methodologies or approaches, publicly available pre-processing toolkits, review datasets/source, sentiment lexicons, and type of language used by the author for better visualization in the field of sentiment analysis. Along with sentiment analysis, we have also discussed cross-domain opinion classification in detail as this is one of the challenging and interesting research areas to work upon.
We summarized more than a hundred different research articles along with some open issues and challenges in the field of sentiment analysis that will help the new researchers. It is observed from the survey analysis that there is no perfect solution available in cross-domain opinion classification.
Notes
References
Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst 26(3):1–34. https://doi.org/10.1145/1361684.1361685
Abbasi A, France S, Zhang Z, Chen H (2011) Selecting attributes for sentiment classification using feature relation networks. IEEE Trans Knowl Data Eng 23(3):447–462
Abdelwahab O, Elmaghraby AS (2018) Deep learning based vs markov chain based text generation for cross domain adaptation for sentiment classification. In: Proceedings of the IEEE international conference on information reuse and integration (IRI), pp 252–255. https://doi.org/10.1109/iri.2018.00046
Abdi A, Shamsuddin SM, Hasan S, Piran J (2019) Deep learning-based sentiment classification of evaluative text based on multi-feature fusion. Inf Process Manag 56(4):1245–1259. https://doi.org/10.1016/j.ipm.2019.02.018
Abdul-mageed M, Diab M, Kübler S (2013) SAMAR: subjectivity and sentiment analysis for Arabic social media. Comput Speech Lang 28(1):20–37. https://doi.org/10.1016/j.csl.2013.03.001
Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of Twitter data. In: Proceedings of the workshop on languages in social media, pp 30–38
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, pp 487–499. https://doi.org/10.1007/BF02948845
Algur SP, Patil AP, Hiremath PS, Shivashankar S (2010) Conceptual level similarity measure based review spam detection. In: Proceedings of the IEEE international conference on signal and image processing (ICSIP), pp 416–423
Al-Moslmi T, Omar N, Abdullah S, Albared M (2017) Approaches to cross-domain sentiment analysis: a systematic literature review. IEEE Access 5:16173–16192. https://doi.org/10.1109/ACCESS.2017.2690342
Aloufi S, Saddik AE (2013) Sentiment identification in football-specific tweets. IEEE Access 6:78609–78621. https://doi.org/10.1109/ACCESS.2018.2885117
Apache OpenNLP. https://opennlp.apache.org/. Accessed 7 May 2019
Araque O, Corcuera-Platas I, Sánchez-Rada JF, Iglesias CA (2017) Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Syst Appl 77:236–246. https://doi.org/10.1016/j.eswa.2017.02.002
Baccianella S, Esuli A, Sebastiani F (2008) SENTIWORNET 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the seventh conference on international language resources and evaluation, pp 2200–2204
Bagheri A, Saraee M, Jong FD (2013) Care more about customers: unsupervised domain-independent aspect detection for sentiment analysis of customer reviews. Knowl-Based Syst 52:201–213. https://doi.org/10.1016/j.knosys.2013.08.011
Bai X (2011) Predicting consumer sentiments from online text. Decis Support Syst 50(4):732–742. https://doi.org/10.1016/j.dss.2010.08.024
Balahur A, Hermida JM, Montoyo A (2012a) Building and exploiting EmotiNet: a knowledge base for emotion detection based on the appraisal theory model. IEEE Trans Affect Comput 3(1):88–101
Balahur A, Hermida JM, Montoyo A (2012b) Detecting implicit expressions of emotion in text: a comparative analysis. Decis Support Syst 53(4):742–753. https://doi.org/10.1016/j.dss.2012.05.024
Balqisnadiah (2016) Web 1.0 and Web 2.0 image—Google Search, Web content. https://www.google.com/search?q=Web+1.0+and+Web+2.0+image&rlz=1C1CHBD_enIN807IN807&source=lnms&tbm=isch&sa=X&ved=0ahUKEwj6pYeG3d7iAhVEb30KHfxyDQEQ_AUIECgB&biw=1366&bih=657#imgrc=_. Accessed 10 June 2019
Banea C, Mihalcea R, Wiebe J, (2008) Multilingual subjectivity analysis using machine translation. In: Proceedings of the empirical methods in natural language processing. Association for Computational Linguistics, pp 127–135
Banea C, Mihalcea R, Wiebe J (2013) Sense-level subjectivity in a multilingual setting. Comput Speech Lang 28(1):7–19. https://doi.org/10.1016/j.csl.2013.03.002
Banerjee S, Chua AYK (2014) Applauses in hotel reviews: genuine or deceptive?. In: Proceedings of the science and information conference, pp 938–942
Basari ASH, Hussin B, Ananta IGP, Zeniarja J (2013) Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization. Procedia Eng 53:453–462. https://doi.org/10.1016/j.proeng.2013.02.059
Bell D, Koulouri T, Lauria S, Macredie RD, Sutton J (2014) Microblogging as a mechanism for human–robot interaction. Knowl-Based Syst 69:64–77. https://doi.org/10.1016/j.knosys.2014.05.009
Benamara F, Cesarano C, Picariello A, Reforgiato D, Subrahmanian V (2007) Sentiment analysis: adjectives and adverbs are better than adjectives alone. In: Proceedings of the international conference on weblogs and social media (ICWSM 2007), pp 203–206
Bird S, Klein E, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media lnc, Newton
Bisio F, Gastaldo P, Peretti C, Zunino R, Cambria E (2013) Data intensive review mining for sentiment classification across heterogeneous domains. In: Proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining, pp 1061–1067
Blitzer J, Mcdonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 conference on empirical methods in natural language processing, pp 120–128
Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45th annual meeting of the association of computational linguistics, pp 440–447
Boiy E, Moens M-F (2009) A machine learning approach to sentiment analysis in multilingual Web texts. Inf Retr 12(5):526–558. https://doi.org/10.1007/s10791-008-9070-z
Bollegala D, Mu T (2016) Cross-domain sentiment classification using sentiment sensitive embeddings. IEEE Trans Knowl Data Eng 28(2):398–410
Bollegala D, Weir D, Carroll J (2013) Cross-domain sentiment classification using a sentiment sensitive thesaurus. IEEE Trans Knowl Data Eng 25(8):1719–1731
Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8. https://doi.org/10.1016/j.jocs.2010.12.007
Bosco C, Patti V, Bolioli A (2015) Developing corpora for sentiment analysis: the case of irony and senti-TUT. In: Proceedings of the international joint conference on artificial intelligence, pp 4158–4162
Bravo-marquez F, Mendoza M, Poblete B (2014) Meta-level sentiment models for big social data analysis. Knowl-Based Syst 69:86–99
Brody S, Elhadad N (2010) An unsupervised aspect-sentiment model for online reviews. In: Proceedings of the human language technologies: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics, pp 804–812
Cambria E (2013) An introduction to concept-level sentiment analysis. In: Proceedings of the Mexican international conference on artificial intelligence. Springer, Berlin, pp 478-483. https://doi.org/10.1007/978-3-642-45111-9_41
Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107. https://doi.org/10.1109/MIS.2016.31
Cambria E, Speer R, Havasi C, Hussain A (2010) SenticNet: a publicly available semantic resource for opinion mining. In: Proceedings of the AAAI fall symposium: common-sense knowledge, pp 14–18
Cambria E, Havasi C, Hussain A (2012) SenticNet 2: a semantic and affective resource for opinion mining and sentiment analysis. In: Proceedings of the twenty-fifth international florida artificial intelligence research society conference, pp 202–207
Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28(2):15–21
Cambria E, Olsher D, Rajagopal D (2014) SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence, pp 1515–1521
Cambria E, Gastaldo P, Bisio F, Zunino R (2015) An ELM-based model for affective analogical reasoning. Neurocomputing 149:443–455. https://doi.org/10.1016/j.neucom.2014.01.064
Cambria E, Poria S, Bajpai R, Schuller B (2016) SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives. In: Proceedings of the 26th international conference on computational linguistics (COLING 2016), pp 2666–2677
Cambria E, Poria S, Hazarika D, Kwok K (2018) SenticNet 5: discovering conceptual primitives for sentiment analysis by means of context embeddings. In: Proceedings of the 32nd AAAI conference on artificial intelligence, pp 1795–1802
Camp MVD, Bosch AVD (2012) The socialist network. Decis Support Syst 53(4):761–769. https://doi.org/10.1016/j.dss.2012.05.031
Carenini G, Ng R, Pauls A (2006) Multi-document summarization of evaluative text. In: Proceedings of the 11th conference of the european chapter of the Association for Computational Linguistics, pp 305–312
Chakraborty R, Bhavsar M, Dandapat SK, Chandra J (2019) Tweet summarization of news articles: an objective ordering-based perspective. IEEE Trans Comput Soc Syst 6(4):761–777. https://doi.org/10.1109/TCSS.2019.2926144
Chan SWK, Chong MWC (2017) Sentiment analysis in financial texts. Decis Support Syst 94:53–64. https://doi.org/10.1016/j.dss.2016.10.006
Chaturvedi I, Cambria E, Welsch RE, Herrera F (2018) Distinguishing between facts and opinions for sentiment analysis: survey and challenges. Inf Fusion 44:65–77. https://doi.org/10.1016/j.inffus.2017.12.006
Che W, Li Z, Liu T (2010) LTP: a Chinese language technology platform. In: Proceedings of the 23rd international conference on computational linguistics: demonstrations, pp 13–16
Chen CC, Tseng Y (2011) Quality evaluation of product reviews using an information quality framework. Decis Support Syst 50(4):755–768. https://doi.org/10.1016/j.dss.2010.08.023
Chen W, Lin S, Huang S, Chung Y, Chen K (2010) E-HowNet and automatic construction of a lexical ontology. In: Proceedings of the 23rd international conference on computational linguistics: demonstrations, pp 45–48
Chen L, Liu C, Chiu H (2011) A neural network based approach for sentiment classification in the blogosphere. J Inform 5(2):313–322. https://doi.org/10.1016/j.joi.2011.01.003
Chen L, Qi L, Wang F (2012) Comparison of feature-level learning methods for mining online consumer reviews. Expert Syst Appl 39(10):9588–9601. https://doi.org/10.1016/j.eswa.2012.02.158
Chen F, Ji R, Su J, Cao D, Gao Y (2018) Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE Trans Multimed 20(4):997–1007. https://doi.org/10.1109/TMM.2017.2757769
Cho H, Kim S, Lee J, Lee J (2014) Data-driven integration of multiple sentiment dictionaries for lexicon-based sentiment classification of product reviews. Knowl-Based Syst 71:61–71. https://doi.org/10.1016/j.knosys.2014.06.001
Costa H, Merschmann LHC, Barth F, Benevenuto F (2014) Pollution, bad-mouthing, and local marketing: the underground of location-based social networks. Inf Sci 279:123–137. https://doi.org/10.1016/j.ins.2014.03.108
Coussement K, Poel DVD (2009) Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers. Expert Syst Appl 36(3):6127–6134. https://doi.org/10.1016/j.eswa.2008.07.021
Cruz FL, Troyano JA, Enríquez F, Ortega FJ, Vallejo CG (2010) A knowledge-rich approach to feature-based opinion extraction from product reviews. In: Proceedings of the 2nd international workshop on Search and mining user-generated contents, pp 13–20
Cruz FL, Troyano JA, Enríquez F, Ortega FJ, Vallejo CG (2013) ‘Long autonomy or long delay?’ The importance of domain in opinion mining. Expert Syst Appl 40:3174–3184. https://doi.org/10.1016/j.eswa.2012.12.031
Dang Y, Zhang Y, Chen H (2010) A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell Syst 25(4):46–53
Dasgupta S, Ng V (2009) Mine the easy, classify the hard : a semi-supervised approach to automatic sentiment classification. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, pp 701–709
Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AYA, Gelbukh A, Zhou Q (2016) Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comput 8(4):757–771. https://doi.org/10.1007/s12559-016-9415-7
Demirtas E (2013) Cross-lingual sentiment analysis with machine translation, utility of training corpora and sentiment lexica. Master dissertation, University of Technology
Deng Z, Luo K, Yu H (2014) A study of supervised term weighting scheme for sentiment analysis. Expert Syst Appl 41(7):3506–3513. https://doi.org/10.1016/j.eswa.2013.10.056
Derczynski L, Ritter A, Clark S, Bontcheva K (2013) Twitter part-of-speech tagging for all: overcoming sparse and noisy data. In: Proceedings of the international conference recent advances in natural language processing, pp 198–206
Deshmukh JS, Tripathy AK (2018) Entropy based classifier for cross-domain opinion mining. Appl Comput Inform 14(1):55–64. https://doi.org/10.1016/j.aci.2017.03.001
Desmet B, Hoste V (2013) Emotion detection in suicide notes. Expert Syst Appl 40(16):6351–6358. https://doi.org/10.1016/j.eswa.2013.05.050
Dey A, Jenamani M, Thakkar JJ (2018) Senti-N-Gram: an n-gram lexicon for sentiment analysis. Expert Syst Appl 103:92–105. https://doi.org/10.1016/j.eswa.2018.03.004
Ding X, Liu B, Zhang L (2009) Entity discovery and assignment for opinion mining applications. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1125–1134
Du J, Xu H, Huang X (2014) Box office prediction based on microblog. Expert Syst Appl 41(4):1680–1689. https://doi.org/10.1016/j.eswa.2013.08.065
Duh K, Fujino A, Nagata M (2011) Is machine translation ripe for cross-lingual sentiment classification ? In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: short papers, pp 429–433
Duric A, Song F (2012) Feature selection for sentiment analysis based on content and syntax models. Decis Support Syst 53(4):704–711. https://doi.org/10.1016/j.dss.2012.05.023
Eirinaki M, Pisal S, Singh J (2012) Sciences feature-based opinion mining and ranking. J Comput Syst Sci 78(4):1175–1184. https://doi.org/10.1016/j.jcss.2011.10.007
Fan T, Chang C (2011) Blogger-centric contextual advertising. Expert Syst Appl 38:2010–2012. https://doi.org/10.1016/j.eswa.2010.07.105
Fang Y, Tan H, Zhang J (2018) Multi-strategy sentiment analysis of consumer reviews based on semantic fuzziness. IEEE Access 6:20625–20631. https://doi.org/10.1109/ACCESS.2018.2820025
Farra N, Challita E, Assi RA, Hajj H (2010) Sentence-level and document-level sentiment mining for Arabic texts. In: Proceedings of the IEEE international conference on data mining workshops sentence-level (IEEE Computer Society), pp 1114–1119. https://doi.org/10.1109/ICDMW.2010.95
Feizollah A, Ainin S, Anuar NB, Abdullah ANB, Hazim M (2019) Halal products on Twitter: data extraction and sentiment analysis using stack of deep learning algorithms. IEEE Access 7:83354–83362. https://doi.org/10.1109/ACCESS.2019.2923275
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89
Franco-salvador M, Cruz FL, Troyano JA, Rosso P (2015) Cross-domain polarity classification using a knowledge-enhanced meta-classifier. Knowl-Based Syst 86:46–56. https://doi.org/10.1016/j.knosys.2015.05.020
Fu X, Yang J, Li J, Fang M, Wang H (2018) Lexicon-enhanced LSTM with attention for general sentiment analysis. IEEE Access Spec Sect Artif Intell Cogn Comput Commun Netw 6:71884–71891. https://doi.org/10.1109/ACCESS.2018.2878425
Fu X, Zhang S, Chen J, Ouyang T, Wu J (2019) A sentiment-aware trading volume prediction model for P2P market using LSTM. IEEE Access 7:81934–81944. https://doi.org/10.1109/ACCESS.2019.2923637
Fusilier DH, Montes-y-gómez M, Rosso P, Cabrera RG (2015) Detecting positive and negative deceptive opinions using PU-learning. Inf Process Manag 51(4):433–443. https://doi.org/10.1016/j.ipm.2014.11.001
García-moya L, Anaya-sánchez H, Berlanga-llavori R (2013) Retrieving product features and opinions from customer reviews. IEEE Intell Syst 3:19–27
Gerani S, Mehdad Y, Carenini G, Ng RT, Nejat B (2014) Abstractive summarization of product reviews using discourse structure. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1602–1613
Geva T, Zahavi J (2014) Empirical evaluation of an automated intraday stock recommendation system incorporating both market data and textual news. Decis Support Syst 57:212–223. https://doi.org/10.1016/j.dss.2013.09.013
Ghiassi M, Skinner J, Zimbra D (2013) Twitter brand sentiment analysis: a hybrid system using n-gram analysis and dynamic artificial neural network. Expert Syst Appl 40(16):6266–6282. https://doi.org/10.1016/j.eswa.2013.05.057
Ghose A, Ipeirotis PG (2011) Estimating the helpfulness and economic impact of product reviews: mining text and reviewer characteristics. IEEE Trans Knowl Data Eng 23(10):1498–1512
Ghulam H, Zeng F, Li W, Xiao Y (2019) Deep learning-based sentiment analysis for roman Urdu text. Procedia Comput Sci 147:131–135. https://doi.org/10.1016/j.procs.2019.01.202
Gimpel K et al (2011) Part-of-speech tagging for Twitter: annotation, features, and experiments. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: short papers, pp 42–47
Gindl S, Weichselbraun A, Scharl A (2013) Rule-based opinion target and aspect extraction to acquire affective knowledge. In: Proceedings of the 22nd international conference on World Wide Web (IW3C2), pp 557–563
Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th international conference on machine learning, pp 513–520
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N project report, Stanford University 1(12), pp 1–6
Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing. Int J Hum Comput Stud 43:907–928
Gui L, Xu R, Lu Q, Xu J, Xu J, Liu B, Wang X (2014) Cross-lingual opinion analysis via negative transfer detection. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (short papers), pp 860–865
Hai Z, Chang K, Kim J, Yang CC (2014) Identifying features in opinion mining via intrinsic and extrinsic domain relevance. IEEE Trans Knowl Data Eng 26(3):623–634
Harakawa R, Ogawa T, Haseyama M (2017) Extracting hierarchical structure of web video groups based on sentiment-aware signed network analysis. IEEE Access 5:16963–16973. https://doi.org/10.1109/ACCESS.2017.2741098
Hassan A, Radev D (2010) Identifying text polarity using random walks. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, pp 395–403
He Y, Lin C, Alani H (2011) Automatically extracting polarity-bearing topics for cross-domain sentiment classification conference item. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies, pp 123–131
He Y, Lin C, Gao W, Wong KF (2013) Dynamic joint sentiment-topic model. ACM Trans Intell Syst Technol 5(1):1–21. https://doi.org/10.1145/2542182.2542188
Hiroshi K, Tetsuya N, Hideo W (2004) Deeper sentiment analysis using machine translation technology. In: Proceedings of the 20th international conference on computational linguistics (COLING’04), pp 494–500
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 168–177
Hu N, Bose I, Gao Y, Liu L (2011a) Manipulation in digital word-of-mouth: a reality check for book reviews. Decis Support Syst 50(3):627–635. https://doi.org/10.1016/j.dss.2010.08.013
Hu N, Liu L, Sambamurthy V (2011b) Fraud detection in online consumer reviews. Decis Support Syst 50(3):614–626. https://doi.org/10.1016/j.dss.2010.08.012
Hu Y, Chen Y, Chou H (2017) Opinion mining from online hotel reviews—a text summarization approach. Inf Process Manag 53:436–449
Huang AH, Yen DC (2013) Predicting the helpfulness of online reviews—a replication. Int J Hum-Comput Interact 29:129–138. https://doi.org/10.1080/10447318.2012.694791
Hung C, Lin H (2013) Using objective words in SentiWordNet to mouth sentiment classification. IEEE Intell Syst 2:47–54
Hussein DMEDM (2016) A survey on sentiment analysis challenges. J King Saud Univ Eng Sci 30(4):330–338. https://doi.org/10.1016/j.jksues.2016.04.002
Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent Twitter sentiment classification. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 151–160
Jimenez SM, Martin-valdivia MT, Molina-gonzalez MD, Urena-Lopez LA (2016) Domain adaptation of polarity lexicon combining term frequency and bootstrapping. In: Proceedings of the 7th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 137–146
Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 3rd international conference on web search and data mining, pp 219–230
Jo Y, Oh A (2011) Aspect and sentiment unification model for online review analysis. In: Proceedings of the fourth ACM international conference on Web search and data mining. ACM, pp 815–824
Jung JJ (2012) Online named entity recognition method for micro texts in social networking services: a case study of twitter. Expert Syst Appl 39(9):8066–8070. https://doi.org/10.1016/j.eswa.2012.01.136
Justo R, Corcoran T, Lukin SM, Walker M, Torres MI (2014) Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web. Knowl-Based Syst 69:124–133. https://doi.org/10.1016/j.knosys.2014.05.021
Kanayama H, Nasukawa T (2014) Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP 2006) Association for Computational Linguistics, pp 355–363. https://doi.org/10.3115/1610075.1610125
Kang D, Park Y (2014) Review-based measurement of customer satisfaction in mobile service: sentiment analysis and VIKOR approach. Expert Syst Appl 41(4):1041–1050. https://doi.org/10.1016/j.eswa.2013.07.101
Kang H, Yoo SJ, Han D (2012) Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst Appl 39(5):6000–6010. https://doi.org/10.1016/j.eswa.2011.11.107
Kennedy A, Inkpen D (2006) Sentiment classification of movie reviews using contextual valence shifters. Comput Intell 22:100–125
Kevin Atkinson (2006) GNU Aspell, Gnu Aspell 0.60.4. http://aspell.net/. Accessed 5 May 2019
Khan FH, Bashir S, Qamar U (2014) TOM: twitter opinion mining framework using hybrid classification scheme. Decis Support Syst 57:245–257. https://doi.org/10.1016/j.dss.2013.09.004
Kim S, Hovy E (2004) Determining the sentiment of opinions. In: Proceedings of the 20th international conference on computational linguistics, pp 1367–1373
Kim S, Zhang J, Chen Z, Oh A, Liu S (2013) A hierarchical aspect-sentiment model for online reviews. In: Proceedings of the twenty-seventh AAAI conference on artificial intelligence, pp 526–533
Kong L, Schneider N, Swayamdipta S, Bhatia A, Dyer C, Smith NA (2014) A dependency parser for Tweets. In: Proceedings of the conference on empirical methods in natural language processing, pp 1001–1012
Kontopoulos E, Berberidis C, Dergiades T, Bassiliades N (2013) Ontology-based sentiment analysis of twitter posts. Expert Syst Appl 40(10):4065–4074. https://doi.org/10.1016/j.eswa.2013.01.001
Kouloumpis E, Wilson T, Moore J (2011) Twitter sentiment analysis: the good the bad and the OMG !. In: Proceedings of the fifth international AAAI conference on weblogs and social media, pp 538–541
Krishnamoorthy S (2015) Linguistic features for review helpfulness prediction. Expert Syst Appl 42(7):3751–3759. https://doi.org/10.1016/j.eswa.2014.12.044
Ku L, Chen H (2007) Mining opinions from the Web: beyond relevance retrieval. J Am Soc Inform Sci Technol 58(12):1838–1850. https://doi.org/10.1002/asi
Lambert P (2015) Aspect-level cross-lingual sentiment classification with constrained SMT. In: Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing, pp 781–787
Lane PCR, Clarke D, Hender P (2012) On developing robust models for favourability analysis: model choice, feature sets and imbalanced data. Decis Support Syst 53(4):712–718. https://doi.org/10.1016/j.dss.2012.05.028
Lang K (1995) NewsWeeder: learning to filter Netnews. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann Publishers, pp 331–339. https://doi.org/10.1016/B978-1-55860-377-6.50048-7
Lau RYK, Li C, Liao SSY (2014) Social analytics: learning fuzzy product ontologies for aspect-oriented sentiment analysis. Decis Support Syst 65:80–94. https://doi.org/10.1016/j.dss.2014.05.005
Lazaridou A, Titov I, Sporleder C (2013) A Bayesian model for joint unsupervised induction of sentiment, aspect and discourse representations. In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics, pp 1630–1639
Lee S, Choeh JY (2014) Predicting the helpfulness of online reviews using multilayer perceptron neural networks. Expert Syst Appl 41(6):3041–3046. https://doi.org/10.1016/j.eswa.2013.10.034
Lee P, Hu Y, Lu K (2018) Assessing the helpfulness of online hotel reviews: a classification-based approach. Telemat Inform 35:436–445. https://doi.org/10.1016/j.tele.2018.01.001
Lerman K, Blair-goldensohn S, Mcdonald R (2009) Sentiment summarization: evaluating and learning user preferences. In: Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics, pp 514–522
Li Y, Li T (2013) Deriving market intelligence from microblogs. Decis Support Syst 55(1):206–217. https://doi.org/10.1016/j.dss.2013.01.023
Li Y, Shiu Y (2012) A diffusion mechanism for social advertising over microblogs. Decis Support Syst 54(1):9–22. https://doi.org/10.1016/j.dss.2012.02.012
Li ST, Tsai FC (2013) A fuzzy conceptualization model for text mining with application in opinion polarity classification. Knowl-Based Syst 39:23–33. https://doi.org/10.1016/j.knosys.2012.10.005
Li N, Wu DD (2010) Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis Support Syst 48(2):354–368. https://doi.org/10.1016/j.dss.2009.09.003
Li W, Xu H (2013) Text-based emotion classification using emotion cause extraction. Expert Syst Appl 41:1742–1749. https://doi.org/10.1016/j.eswa.2013.08.073
Li F, Huang M, Zhu X (2007) Sentiment analysis with global topics and local dependency. In: Proceedings of the twenty-fourth AAAI conference on artificial intelligence, pp 1371–1376
Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: Proceedings of the twenty-second international joint conference on artificial intelligence, pp 2488–2493
Li S, Guan Z, Tang L-Y, Chen Z (2012) Exploiting consumer reviews for product feature ranking. J Comput Sci Technol 27(3):635–649. https://doi.org/10.1007/s11390-012-1250-z
Li S, Xue Y, Wang Z, Zhou G (2013) Active learning for cross-domain sentiment classification. In: Proceedings of the twenty-third international joint conference on artificial intelligence active, pp 2127–2133
Li X, Xie H, Chen L, Wang J, Deng X (2014) News impact on stock price return via sentiment analysis. Knowl-Based Syst 69:14–23. https://doi.org/10.1016/j.knosys.2014.04.022
Li H, Chen Z, Mukherjee A, Liu B, Shao J (2015) Analyzing and detecting opinion spam on a large-scale dataset via temporal and spatial patterns. In: Proceedings of the ninth international association for the advancement of artificial intelligence conference on web and social media analyzing, pp 634–637
Li S, Zhou L, Li Y (2015b) Improving aspect extraction by augmenting a frequency-based method with web-based similarity measures. Inf Process Manag 51(1):58–67. https://doi.org/10.1016/j.ipm.2014.08.005
Li Y, Pan Q, Yang T, Wang S, Tang J, Cambria E (2017) Learning word representations for sentiment analysis. Cogn Comput 9(6):843–851. https://doi.org/10.1007/s12559-017-9492-2
Liang J, Zhang K, Zhou X, Hu Y, Tan J, Bai S (2016) Leveraging latent sentiment constraint in probabilistic matrix factorization for cross-domain sentiment classification. Procedia Comput Sci 80:366–375. https://doi.org/10.1016/j.procs.2016.05.353
Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM conference on information and knowledge management, pp 375–384
Lin C, He Y, Everson R, Ruger S (2012) Weakly supervised joint sentiment-topic detection from text. IEEE Trans Knowl Data Eng 24(6):1134–1145
Lin C, Lee Y, Yu C, Chen H (2014) Exploring ensemble of models in taxonomy-based cross-domain sentiment classification. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management—CIKM’14, pp 1279–1288
Liu B (2012) Sentiment analysis and opinion mining. Morgan and Claypool publishers
Liu L, Nie X, Wang H (2012) Toward a fuzzy domain sentiment ontology tree for sentiment analysis. In: Proceedings of the 5th international congress on image and signal processing (CISP 2012), pp 1620–1624
Liu H, He J, Wang T, Song W, Du X (2013a) Electronic commerce research and applications combining user preferences and user opinions for accurate recommendation. Electron Commer Res Appl 12(1):14–23. https://doi.org/10.1016/j.elerap.2012.05.002
Liu Y, Jin J, Ji P, Harding JA, Fung RYK (2013b) Computer-aided design identifying helpful online reviews: a product designer’ s perspective. Comput Aided Des 45(2):180–194. https://doi.org/10.1016/j.cad.2012.07.008
Lo SL, Cambria E, Chiong R, Cornforth D (2017) Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev 48(4):499–527. https://doi.org/10.1007/s10462-016-9508-4
Long M, Wang J, Cao Y, Sun J, Yu PS (2016) Deep learning of transferable representation for scalable domain adaptation. IEEE Trans Knowl Data Eng 28(8):2027–2040. https://doi.org/10.1109/TKDE.2016.2554549
Lu Y, Kong X, Quan X, Liu W, Xu Y (2010) Exploring the sentiment strength of user reviews. In: Proceedings of the international conference on Web-age information management (WAIM 2010), pp 471–482
Lubis FF, Rosmansyah Y, Supangkat SH (2017) Improving course review helpfulness prediction through sentiment analysis. In: Proceedings of the international conference on ICT for smart society (ICISS), pp 1-5. https://doi.org/10.1109/ICTSS.2017.8288877
Ma Y, Peng H, Cambria E (2018) Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In: Proceedings of the 32nd AAAI conference on artificial intelligence AAAI 2018, pp 5876–5883
Maas AL et al (2014) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 142–150
Majumder N, Poria S, Peng H, Chhaya N, Cambria E, Gelbukh A (2019) Sentiment and sarcasm classification with multitask learning. IEEE Intell Syst 34(3):38–43
Maks I, Vossen P (2012) A lexicon model for deep sentiment analysis and opinion mining applications. Decis Support Syst 53(4):680–688. https://doi.org/10.1016/j.dss.2012.05.025
Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The Stanford corenlp natural language processing toolkit. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics: system demonstrations, pp 55–60
Manshu T, Bing W (2019) Adding prior knowledge in hierarchical attention neural network for cross-domain sentiment classification. IEEE Access 7:32578–32588. https://doi.org/10.1109/ACCESS.2019.2901929
Marcacini RM, Rossi RG, Matsuno IP, Rezende SO (2018) Cross-domain aspect extraction for sentiment analysis: a transductive learning approach. Decis Support Syst 114:70–80. https://doi.org/10.1016/j.dss.2018.08.009
Martín-Valdivia M-T, Martínez-cámara E, Perea-Ortega JM, Ureña-lópez LA (2013) Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Syst Appl 40:3934–3942. https://doi.org/10.1016/j.eswa.2012.12.084
Mcauley J, Leskovec J (2013) Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceeding of the 7th ACM conference on recommender systems, pp 165–172. http://dx.doi.org/10.1145/2507157.2507163
McAuley J, Pandey R, Leskovec J (2015) Inferring networks of substitutable and complementary products. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794. http://dx.doi.org/10.1145/2783258.2783381
Mcdonald R, Hannan K, Neylon T, Wells M, Reynar J (2007) Structured models for fine-to-coarse sentiment analysis. In: Proceedings of the 45th annual meeting of the association of computational linguistics, 432-439
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J Electr Eng 5(4):1093–1113
Miao Q, Li Q, Dai R (2009) AMAZING: a sentiment mining and retrieval system. Expert Syst Appl 36(3):7192–7198. https://doi.org/10.1016/j.eswa.2008.09.035
Mihalcea R, Banea C, Wiebe J (2007) Learning multilingual subjective language via cross-lingual projections. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics, pp 976–983
Min H, Park JC (2012) Identifying helpful reviews based on customer’s mentions about experiences. Expert Syst Appl 39(15):11830–11838. https://doi.org/10.1016/j.eswa.2012.01.116
Moghaddam S, Ester M (2013) The FLDA Model for aspect-based opinion mining: addressing the cold start problem categories and subject descriptors. In: Proceedings of the international World Wide Web conferences steering committee, pp 909–918
Moghaddam S, Jamali M, Ester M (2012) ETF: extended tensor factorization model for personalizing prediction of review helpfulness categories and subject descriptors. In: Proceedings of the 5th ACM international conference on web search and data mining, pp 163–172
Mohammad SM (2012) From once upon a time to happily ever after: tracking emotions in mail and books. Decis Support Syst 53(4):730–741. https://doi.org/10.1016/j.dss.2012.05.030
Molina-González MD, Martínez-Cámara E, Martín-Valdivia M-T, Perea-Ortega JM (2013) Semantic orientation for polarity classification in Spanish reviews. Expert Syst Appl 40(18):7250–7257. https://doi.org/10.1016/j.eswa.2013.06.076
Montejo-Raez A, Diıaz-Galiano MC, Urena-Lopez LA (2014) Crowd explicit sentiment analysis. Knowl-Based Syst 69:134–139. https://doi.org/10.1016/j.knosys.2014.05.007
Montoyo A, Martínez-barco P, Balahur A (2012) Subjectivity and sentiment analysis: an overview of the current state of the area and envisaged developments. Decis Support Syst 53(4):675–679. https://doi.org/10.1016/j.dss.2012.05.022
Moraes R, Valiati JF, Neto WPG (2013) Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 40(2):621–633. https://doi.org/10.1016/j.eswa.2012.07.059
Moreo A, Romero M, Castro JL, Zurita JM (2012) Lexicon-based comments-oriented news sentiment analyzer system. Expert Syst Appl 39(10):9166–9180. https://doi.org/10.1016/j.eswa.2012.02.057
Mostafa MM (2013) More than words: social networks text mining for consumer brand sentiments. Expert Syst Appl 40(10):4241–4251. https://doi.org/10.1016/j.eswa.2013.01.019
Mudambi SM, Schuff D (2010) what makes a helpful online review? A study of customer reviews on amazon.com. MIS Q 34(1):185–200. https://doi.org/10.2307/20721420
Mukherjee S, Joshi S (2013) Sentiment aggregation using conceptnet ontology. In: Proceedings of the sixth international joint conference on natural language processing, pp 570–578
Mukherjee S, Joshi S (2014) Author-specific sentiment aggregation for polarity prediction of reviews. In: Proceedings of the 9th edition of the language resources and evaluation conference (LREC 2014), pp 3092–3099
Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on World Wide Web (IW3C2), pp 191–200
Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M (2013) Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 632–640
Mullen T, Collier N (2004) Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the 9th conference on empirical methods in natural language processing (EMNLP-04), pp 412–418
Nakayama Y, Fujii A (2015) Extracting condition-opinion relations toward fine-grained opinion mining. In: Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 622–631
Narayanan R, Liu B, Choudhary A (2009) Sentiment analysis of conditional sentences. In: Proceedings of the conference on empirical methods in natural language processing, pp 180–189
Nassirtoussi AK, Aghabozorgi S, Wah TY, Ngo DCL (2015) Text mining of news-headlines for FOREX market prediction: a multi-layer dimension reduction algorithm with semantics and sentiment. Expert Syst Appl 42:306–324
Nasukawa T, Yi J (2003) Sentiment analysis capturing favorability using natural language processing. In: Proceedings of the 2nd international conference on knowledge capture. ACM, pp 70–77. https://doi.org/10.1145/945645.945658
Navigli R, Ponzetto SP (2012) BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif Intell 193:217–250. https://doi.org/10.1016/j.artint.2012.07.001
Neri F, Aliprandi C, Capeci F, Cuadros M, By T (2012) Sentiment analysis on social media. In: Proceedings of the 2012 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM 2012), pp 951–958. https://doi.org/10.1109/ASONAM.2012.164
Neviarouskaya A, Prendinger H, Ishizuka M (2011) SentiFul: a lexicon for sentiment analysis. IEEE Trans Affect Comput 2(1):22–36
Ngo-Ye TL, Sinha AP (2014) The influence of reviewer engagement characteristics on online review helpfulness: a text regression model. Decis Support Syst 61(1):47–58. https://doi.org/10.1016/j.dss.2014.01.011
Nguyen HT, Le Nguyen M (2018) Multilingual opinion mining on YouTube—a convolutional N-gram BiLSTM word embedding. Inf Process Manag 54:451–462. https://doi.org/10.1016/j.ipm.2018.02.001
Nielsen FA (2011) A new ANEW: evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903
Nishikawa H, Hasegawa T, Matsuo Y, Kikui G (2010) Opinion summarization with integer linear programming formulation for sentence extraction and ordering. In: Proceedings of the 23rd international conference on computational linguistics, pp 910–918
Nozza D, Fersini E, Messina E (2016) Deep learning and ensemble methods for domain adaptation. In: Proceedings of the IEEE 28th international conference on tools with artificial intelligence (ICTAI), pp 184–189. https://doi.org/10.1109/ICTAI.2016.0037
O’Connor B, Krieger M, Ahn D (2010) TweetMotif: exploratory search and topic summarization for Twitter. In: Proceedings of the fourth international AAAI conference on weblogs and social media, pp 384–385
O’Leary DE (2011) Blog mining-review and extensions: “from each according to his opinion”. Decis Support Syst 51(4):821–830. https://doi.org/10.1016/j.dss.2011.01.016
Ohana B, Delany SJ, Tierney B (2012) A Case-based approach to cross-domain sentiment classification. In: proceedings of the international conference on case-based reasoning, pp 284–296
Ortigosa A, Martín JM, Carro RM (2013) Computers in human behavior sentiment analysis in Facebook and its application to e-learning. Comput Hum Behav 31:527–541. https://doi.org/10.1016/j.chb.2013.05.024
Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 309–319
Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: Proceedings of the NAACL-HLT. Association for Computational Linguistics, pp 497–501
Pan SJ, Ni X, Sun J, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th international conference on World Wide Web—WWW’10, pp 751–760
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, pp 271–278
Pang B, Lee L (2005) Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. arXiv preprint arXiv:cs/0506075
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, vol 10, pp 79–86
Patodkar VN, Sheikh IR (2016) Twitter as a corpus for sentiment analysis and opinion mining. Int J Adv Res Comput Commun Eng 5(12):320–322. https://doi.org/10.17148/IJARCCE.2016.51274
Peñalver-martinez I et al (2014) Feature-based opinion mining through ontologies. Expert Syst Appl 41(13):5995–6008. https://doi.org/10.1016/j.eswa.2014.03.022
Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of in LIWC2015. University of Texas at Austin, Austin
Pessutto LRC, Vargas DS, Moreira VP (2019) Multilingual aspect clustering for sentiment analysis. Knowl-Based Syst 192:105339. https://doi.org/10.1016/j.knosys.2019.105339
Ponomareva N, Thelwall M (2012) Biographies or blenders: which resource is best for cross-domain sentiment analysis? In: Proceedings of the international conference on intelligent text processing and computational linguistics, pp 488–499
Ponomareva N, Thelwall M (2012) Do neighbours help? An exploration of graph-based algorithms for cross-domain sentiment classification. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 655–665
Ponomareva N, Thelwall M (2013) Semi-supervised vs. cross-domain graphs for sentiment analysis. In: Proceedings of recent advances in natural language processing, pp 571–578
Popescu A, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing (HLT/EMNLP), pp 339–346
Popescu O, Strapparava C (2014) Time corpora: epochs, opinions and changes. Knowl-Based Syst 69:3–13. https://doi.org/10.1016/j.knosys.2014.04.029
Poria S, Gelbukh A, Hussain A, Howard N, Das D, Bandyopadhay S (2013) Enhanced SenticNet with affective labels for concept-based opinion mining. IEEE Intell Syst 28(2):31–38
Poria S, Cambria E, Winterstein G, Huang G (2014a) Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl-Based Syst 69(1):45–63. https://doi.org/10.1016/j.knosys.2014.05.005
Poria S, Gelbukh A, Cambria E, Hussain A, Huang G (2014b) EmoSenticSpace: a novel framework for affective common-sense reasoning. Knowl-Based Syst 69:108–123. https://doi.org/10.1016/j.knosys.2014.06.011
Poria S, Cambria E, Gelbukh A (2016a) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108:42–49. https://doi.org/10.1016/j.knosys.2016.06.009
Poria S, Cambria E, Hazarika D, Vij P (2016) A deeper look into sarcastic tweets using deep convolutional neural networks. In: Proceedings of the 26th international conference on computational linguistics (COLING 2016), pp 1601–1612
MF Porter (2001) Snowball: a language for stemming algorithms. http://snowball.tartarus.org/texts/introduction.html. Accessed 5 May 2019
Prabowo R, Thelwall M (2009) Sentiment analysis: a combined approach. J Inform 3:143–157. https://doi.org/10.1016/j.joi.2009.01.003
Ptáček T, Habernal I, Hong J (2014) Sarcasm detection on czech and english twitter. In: Proceedings of the COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 213–223
Purnawirawan N, Pelsmacker PD, Dens N (2012) Balance and sequence in online reviews: how perceived usefulness affects attitudes and intentions. J Interact Mark 26(4):244–255. https://doi.org/10.1016/j.intmar.2012.04.002
Qiu G, Liu B, Bu J, Chen C (2009) Expanding domain sentiment lexicon through double propagation. In: Proceedings of the 21st international joint conference on artificial intelligence, pp 1199–1204
Qiu G, He X, Zhang F, Shi Y, Bu J, Chen C (2010) DASA: dissatisfaction-oriented advertising based on sentiment analysis. Expert Syst Appl 37(9):6182–6191. https://doi.org/10.1016/j.eswa.2010.02.109
Qiu L, Rui H, Whinston A (2013a) Social network-embedded prediction markets: the effects of information acquisition and communication on predictions. Decis Support Syst 55(4):978–987. https://doi.org/10.1016/j.dss.2013.01.007
Qiu X, Zhang Q, Huang X (2013) FudanNLP: a Toolkit for Chinese natural language processing. In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics, pp 49–54
Quan C, Ren F (2014) Unsupervised product feature extraction for feature-oriented opinion determination. Inf Sci 272:16–28. https://doi.org/10.1016/j.ins.2014.02.063
Rabelo JCB, Prudêncio RBC, Barros FA (2012) Using link structure to infer opinions in social networks. In: Proceedings of the IEEE international conference on systems, man, and cybernetics (SMC), pp 681–685
Racherla P, Friske W (2012) Perceived ‘usefulness’ of online consumer reviews: an exploratory investigation across three services categories. Electron Commer Res Appl 11(6):548–559. https://doi.org/10.1016/j.elerap.2012.06.003
Radev DR et al (2003) Evaluation challenges in large-scale document summarization. In: Proceedings of the 41st annual meeting on Association for Computational Linguistics, pp 375–382
Rastogi A, Mehrotra M (2018) Impact of behavioral and textual features on opinion spam detection. In: Proceedings of the second international conference on intelligent computing and control systems (ICICCS 2018) IEEE, pp 852–857
Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches. Knowl-Based Syst 89:14–46. https://doi.org/10.1016/j.knosys.2015.06.015
Rehurek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks, pp 45–50
Remus R (2012) Domain adaptation using domain similarity- and domain complexity-based instance selection for cross-domain sentiment analysis. In: Proceedings of the 12th international conference on data mining workshops domain, IEEE computer society, pp 717–723. https://doi.org/10.1109/ICDMW.2012.46
Reyes A, Rosso P (2012) Making objective decisions from subjective data: detecting irony in customer reviews. Decis Support Syst 53(4):754–760. https://doi.org/10.1016/j.dss.2012.05.027
Rida-e-fatima S et al (2019) A multi-layer dual attention deep learning model with refined word embeddings for aspect-based sentiment analysis. IEEE Access 7:114795–114807. https://doi.org/10.1109/ACCESS.2019.2927281
Rill S, Reinel D, Scheidt J, Zicari RV (2014) PoliTwi: early detection of emerging political topics on twitter and the impact on concept-level sentiment analysis. Knowl-Based Syst 69:24–33. https://doi.org/10.1016/j.knosys.2014.05.008
Roy SD, Mei T, Zeng W, Li S (2012) SocialTransfer: cross-domain transfer learning from social streams for media applications. In: Proceedings of the 20th ACM international conference on multimedia, pp 649–658
Rui H, Liu Y, Whinston A (2013) Whose and what chatter matters? The effect of tweets on movie sales. Decis Support Syst 55(4):863–870. https://doi.org/10.1016/j.dss.2012.12.022
Saeed RMK, Rady S, Gharib TF (2019) An ensemble approach for spam detection in Arabic opinion texts. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2019.10.002
Saleh MR, Martin-valdivia MT, Montejo-Raez A, Urena-Lopez LA (2011) Experiments with SVM to classify opinions in different domains. Expert Syst Appl 38(12):14799–14804. https://doi.org/10.1016/j.eswa.2011.05.070
Sanju P, Mirnalinee TT (2014) Construction of enhanced sentiment sensitive thesaurus for cross domain sentiment classification using Wiktionary. In: Proceedings of the third international conference on soft computing for problem solving, pp 195–206. https://doi.org/10.1007/978-81-322-1768-8
Satapathy R, Guerreiro C, Chaturvedi I, Cambria E (2017) Phonetic-based microtext normalization for Twitter sentiment analysis. In: Proceedings of the IEEE international conference on data mining workshops (ICDMW), pp 407–413. https://doi.org/10.1109/ICDMW.2017.59
Satapathy R, Li Y, Cavallari S, Cambria E (2019) Seq2Seq deep learning models for microtext normalization. In: Proceedings of the international joint conference on neural networks, 1–8. https://doi.org/10.1109/IJCNN.2019.8851895
Satapathy R, Singh A, Cambria E (2019) PhonSenticNet: a cognitive approach to microtext normalization for concept-level sentiment analysis. In: Proceedings of the international conference on computational data and social networks, pp 177–188. https://doi.org/10.1007/978-3-030-34980-6_20
Satapathy R, Cambria E, Nanetti A, Hussain A (2020) A review of shorthand systems: from brachygraphy to microtext and beyond. Cogn Comput
Seki Y, Kando N, Aono M (2009) Multilingual opinion holder identification using author and authority viewpoints. Inf Process Manag 45(2):189–199. https://doi.org/10.1016/j.ipm.2008.11.004
Shuang K, Guo H, Zhang Z, Loo J (2018) A sentiment information collector–extractor architecture based neural network for sentiment analysis. Inf Sci 467:549–558. https://doi.org/10.1016/j.ins.2018.08.026
Sindhu I, Muhammad Daudpota S, Badar K, Bakhtyar M, Baber J, Nurunnabi (2019) Aspect-based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access 7:108729–108741. https://doi.org/10.1109/ACCESS.2019.2928872
Sindhwani V, Melville P (2008) Document-word co-regularization for semi-supervised sentiment analysis. In: Proceedings of the eighth IEEE international conference on data mining, pp 1025–1030. https://doi.org/10.1109/ICDM.2008.113
Singh SK, Sachan MK (2019) SentiVerb system: classification of social media text using sentiment analysis. Multimed Tools Appl 78(22):32109–32136
Sobkowicz P, Kaschesky M, Bouchard G (2012) Opinion mining in social media: modeling, simulating, and forecasting political opinions in the web. Govern Inf Q 29(4):470–479. https://doi.org/10.1016/j.giq.2012.06.005
Socher R, et al (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642
Spina D, Gonzalo J, Amigó E (2013) Discovering filter keywords for company name disambiguation in twitter. Expert Syst Appl 40(12):4986–5003. https://doi.org/10.1016/j.eswa.2013.03.001
Stanik C, Haering M, Maalej W (2019) Classifying multilingual user feedback using traditional machine learning and deep learning. In: Proceedings of the IEEE 27th international requirements engineering conference workshops (REW 2019). IEEE, pp 220–226. https://doi.org/10.1109/REW.2019.00046
Stone PJ, Dunphy DC, Smith MS (1966) The general inquirer: a computer approach to content analysis
Sun S, Luo C, Chen J (2017) A review of natural language processing techniques for opinion mining systems. Inf Fusion 36:10–25. https://doi.org/10.1016/j.inffus.2016.10.004
Taboada M, Grieve J (2004) Analyzing appraisal automatically classifying sentiment. In: Proceedings of the AAAI spring symposium on exploring attitude and affect in text Stanford, pp 158–161
Taboada M, Brooke J, Tofilosk M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Tackstrom O, Mcdonald R (2008) Semi-supervised latent variable models for sentence-level sentiment analysis. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies, pp 569–574
Taddy M (2013) Measuring political sentiment on Twitter: factor optimal design for multinomial inverse regression. Technometrics 55(4):37–41. https://doi.org/10.1080/00401706.2013.778791
Tan S, Cheng X, Wang Y, Xu H (2009) Adapting Naive Bayes to domain adaptation for sentiment analysis. In: Proceedings of the European conference on information retrieval in advances in information retrieval, pp 337–349
Tan C, Lee L, Tang J, Jiang L, Zhou M, Li P (2011) User-level sentiment analysis incorporating social networks. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD-11), pp 1397–1405
Tan LK, Na J, Theng Y-L, Chang K (2012) Phrase-level sentiment polarity classification using rule-based typed dependencies and additional complex phrases consideration. J Comput Sci Technol 27(3):650–666. https://doi.org/10.1007/s11390-012-1251-y
Tang H, Tan S, Cheng X (2009) A survey on sentiment detection of reviews. Expert Syst Appl 36(7):10760–10773. https://doi.org/10.1016/j.eswa.2009.02.063
Tang D, Wei F, Qin B, Zhou M, Liu T (2014) Building large-scale Twitter-specific sentiment lexicon: a representation learning approach. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 172–182
Tang D, Qin B, Wei F, Dong L, Liu T, Zhou M (2015) A joint segmentation and classification framework for sentence level sentiment classification. IEEE/ACM Trans Audio Speech Lang Process 23(11):1750–1761
Tartir S, Abdul-Nabi I (2017) Semantic sentiment analysis in Arabic social media. J King Saud Univ Comput Inf Sci 29(2):229–233. https://doi.org/10.1016/j.jksuci.2016.11.011
Thelwall M, Buckley K (2013) Topic-based sentiment analysis for the social web: the role of mood and issue-related words. J Am Soc Inform Sci Technol 64(8):1608–1617. https://doi.org/10.1002/asi.22872
Thelwall M, Buckley K, Paltoglou G, Cai D (2010) Sentiment strength detection in short informal text. J Am Soc Inform Sci Technol 61(12):2544–2558
Thelwall M, Buckley K, Paltoglou G (2011) Sentiment in Twitter events. J Am Soc Inform Sci Technol 62(2):406–418. https://doi.org/10.1002/asi.21462
Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web 1. J Am Soc Inform Sci Technol 63(1):163–173
Thet TT, Na J, Khoo CSG (2010) Aspect-based sentiment analysis of movie reviews on discussion boards. J Inf Sci 36(5):823–848. https://doi.org/10.1177/0165551510388123
Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the conference of the North American chapter of the Association for Computational Linguistics on human language technology, vol 1. Association for Computational Linguistics, pp 173–180
Trainor KJ, Andzulis J, Rapp A, Agnihotri R (2013) Social media technology usage and customer relationship performance: a capabilities-based examination of social CRM. J Bus Res 67(6):1201–1208. https://doi.org/10.1016/j.jbusres.2013.05.002
Tsai AC, Wu C, Tsai RT, Hsu JY (2013) Building a concept-level sentiment on commonsense knowledge. IEEE Intell Syst 28(2):22–30
Tsai Y-L, Tsai RT-H, Chueh C-H, Chang S-C (2014) Cross-domain opinion word identification with query-by-committee active learning. In: Proceedings of the international conference on technologies and applications of artificial intelligence. Springer, Cham, pp 334–343. https://doi.org/10.1007/978-3-319-13987-6_31
Tsakalidis A, Papadopoulos S, Kompatsiaris I (2014) An ensemble model for cross-domain polarity classification on Twitter. In: Proceedings of the international conference on web information systems engineering, pp 168–177
Tsytsarau M, Palpanas T (2012) Survey on mining subjective data on the web. Data Min Knowl Disc 24(3):478–514. https://doi.org/10.1007/s10618-011-0238-6
Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL), pp 417–424
Velikovich L, Blair-goldensohn S, Hannan K, McDonald R (2010) The viability of web-derived polarity lexicons. In: Proceedings of the human language technologies: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics, pp 777–785
Vilares D, Peng H, Satapathy R, Cambria E (2018) BabelSenticNet: a commonsense reasoning framework for multilingual sentiment analysis. In: Proceedings of the 2018 IEEE symposium series on computational intelligence (SSCI 2018), pp 1292–1298. https://doi.org/10.1109/SSCI.2018.8628718
Vinodhini G, Chandrasekaran RM (2014) Opinion mining using principal component analysis based ensemble model for e-commerce application. CSI Trans ICT 2(3):169–179. https://doi.org/10.1007/s40012-014-0055-3
Virmani D, Arora P, Kulkarni PS (2017) Cross domain analyzer to acquire review proficiency in big data. ICT Express 3(3):128–131. https://doi.org/10.1016/j.icte.2017.04.004
Walker MA, Anand P, Tree JEF, Abbott R, King J (2012) A corpus for research on deliberation and debate. In: Proceedings of the 8th international conference on language resources and evaluation (LREC-2012), pp 812–817
Wan X (2008) Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 553–561
Wang J, Lee C (2011) Unsupervised opinion phrase extraction and rating in Chinese blog posts. In: Proceedings of the IEEE international conference on privacy, security, risk, and trust, and IEEE international conference on social computing, pp 820–823
Wang S, Manning CD (2012) Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 90–94
Wang H, Lu Y, Zhai C (2010) Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD conference on knowledge discovery and data mining (KDD’2010), pp 783–792
Wang G, Xie S, Liu B, Yu PS (2011) Review graph based online store review spammer detection. In: Proceedings of the 11th IEEE international conference on data mining review (IEEE Computer Society), pp 1242–1247. https://doi.org/10.1109/ICDM.2011.124
Wang S, Li D, Song X, Wei Y, Li H (2011b) A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Syst Appl 38(7):8696–8702. https://doi.org/10.1016/j.eswa.2011.01.077
Wang G, Sun J, Ma J, Xu K, Gu J (2013a) Sentiment classification: the contribution of ensemble learning. Decis Support Syst 57:77–93. https://doi.org/10.1016/j.dss.2013.08.002
Wang H, Yin P, Zheng L, Liu JNK (2013b) Sentiment classification of online reviews: using sentence-based language model. J Exp Theor Artif Intell 26(1):13–31. https://doi.org/10.1080/0952813X.2013.782352
Wang T et al (2014) Product aspect extraction supervised with online domain knowledge. Knowl-Based Syst 71:86–100. https://doi.org/10.1016/j.knosys.2014.05.018
Wang L, Liu K, Cao Z, Zhao J, Melo GD (2015) Sentiment-aspect extraction based on restricted Boltzmann machines. In: Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing, pp 616–625
Wang J, Peng B, Zhang X (2018a) Using a stacked residual LSTM model for sentiment intensity prediction. Neurocomputing 322:93–101. https://doi.org/10.1016/j.neucom.2018.09.049
Wang L, Niu J, Song H, Atiquzzaman M (2018b) SentiRelated: a cross-domain sentiment classification algorithm for short texts through sentiment related index. J Netw Comput Appl 101:111–119
Wei B, Pal C (2010) Cross lingual adaptation: an experiment on sentiment classifications. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, pp 258–262
Weichselbraun A, Gindl S, Scharl A (2014) Enriching semantic knowledge bases for opinion mining in big data applications. Knowl-Based Syst 69:78–85. https://doi.org/10.1016/j.knosys.2014.04.039
Whissell CM (1989) The dictionary of affect in language. In: The measurement of emotions, Academic Press, pp 113–131
Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. In: Proceedings of the 14th ACM international conference on information and knowledge management. ACM, pp 625–631
Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39:165–210. https://doi.org/10.1007/s10579-005-7880-9
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the human language technology conference and conference on empirical methods in natural language processing (HLT/EMNLP), pp 347–354
Wu Q, Tan S (2011) A two-stage framework for cross-domain sentiment classification. Expert Syst Appl 38(11):14269–14275. https://doi.org/10.1016/j.eswa.2011.04.240
Wu C, Tsai RT (2014) Using relation selection to improve value propagation in a ConceptNet-based sentiment dictionary. Knowl-Based Syst 69:100–107. https://doi.org/10.1016/j.knosys.2014.04.043
Wu P, Li X, Shen S, He D (2019a) Social media opinion summarization using emotion cognition and convolutional neural networks. Int J Inf Manag 51:101978. https://doi.org/10.1016/j.ijinfomgt.2019.07.004
Wu S, Wu F, Chang Y, Wu C, Huang Y (2019b) Automatic construction of target-specific sentiment lexicon. Expert Syst Appl 116:285–298. https://doi.org/10.1016/j.eswa.2018.09.024
Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152. https://doi.org/10.1016/j.ins.2010.11.023
Xia R, Zong C, Hu X, Cambria E (2013) Feature ensemble plus sample selection: domain adaptation classification. IEEE Intell Syst 28(3):10–18
Xie J, Chen B, Gu X, Liang F, Xu X (2019) Self-attention-based BiLSTM model for short text fine-grained sentiment classification. IEEE Access 7:180558–180570. https://doi.org/10.1109/ACCESS.2019.2957510
Xu K, Liao SS, Li J, Song Y (2011) Mining comparative opinions from customer reviews for competitive intelligence. Decis Support Syst 50(4):743–754. https://doi.org/10.1016/j.dss.2010.08.021
Xu H, Zhang F, Wang W (2015) Implicit feature identification in Chinese reviews using explicit topic mining model. Knowl-Based Syst 76:166–175. https://doi.org/10.1016/j.knosys.2014.12.012
Xuan HNT, Le AC, Nguyen LM (2012) Linguistic features for subjectivity classification. In: Proceedings of the international conference on asian language processing (IALP), pp 17–20. https://doi.org/10.1109/IALP.2012.47
Xueke X, Xueqi C, Songbo T, Yue L, Huawei S (2013) Aspect-level opinion mining of online customer reviews. China Commun 10(3):25–41
Yan Z, Xing M, Zhang D, Ma B (2015) EXPRS: an extended PageRank method for product feature extraction from online consumer reviews. Inf Manag 52(7):850–858. https://doi.org/10.1016/j.im.2015.02.002
Yang B, Cardie C (2014) Context-aware learning for sentence-level sentiment analysis with posterior regularization. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, pp 325–335
Yang P, Gao W, Tan Q, Wong K (2013) A link-bridged topic model for cross-domain document classification. Inf Process Manag 49(6):1181–1193. https://doi.org/10.1016/j.ipm.2013.05.002
Yessenalina A, Yue Y, Cardie C (2010) Multi-level structured models for document-level sentiment classification. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1046–1056
Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 13(3):55–75. https://doi.org/10.1109/MCI.2018.2840738
Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions : separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the conference on empirical methods in natural language processing, pp 129–136
Yu J, Jiang J (2016) Learning sentence embeddings with auxiliary tasks for cross-domain sentiment classification. In: Proceedings of the conference on empirical methods in natural language processing, pp 236–246
Yu X, Liu Y, Huang JX (2012) Mining online reviews for predicting sales performance: a case study in the movie domain. IEEE Trans Knowl Data Eng 24(4):720–734. https://doi.org/10.1109/TKDE.2010.269
Yu L, Wu J, Chang P, Chu H (2013a) Using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news. Knowl-Based Syst 41:89–97. https://doi.org/10.1016/j.knosys.2013.01.001
Yu Y, Duan W, Cao Q (2013b) The impact of social and conventional media on firm equity value: a sentiment analysis approach. Decis Support Syst 55(4):919–926. https://doi.org/10.1016/j.dss.2012.12.028
Zhai Z, Liu B, Xu H, Jia P (2011) Clustering product features for opinion mining. In: Proceedings of the 4th ACM international conference on web search and data mining, pp 347–354
Zhai Z, Xu H, Kang B, Jia P (2011b) Exploiting effective features for Chinese sentiment classification. Expert Syst Appl 38(8):9139–9146. https://doi.org/10.1016/j.eswa.2011.01.047
Zhai Z, Liu B, Wang J, Xu H, Jia P (2012) Product feature grouping for opinion mining. IEEE Intell Syst 27(4):37–44
Zhan J, Loh HT, Liu Y (2009) Gather customer concerns from online product reviews—a text summarization approach. Expert Syst Appl 36(2):2107–2115. https://doi.org/10.1016/j.eswa.2007.12.039
Zhang Z (2008) Weighing stars: aggregating online product. IEEE Intell Syst 23(5):42–49
Zhang Z, Ye Q, Zhang Z, Li Y (2011) Sentiment classification of Internet restaurant reviews written in Cantonese. Expert Syst Appl 38(6):7674–7682. https://doi.org/10.1016/j.eswa.2010.12.147
Zhang K, Xie Y, Yang Y, Sun A, Liu H, Choudhary A (2014) Incorporating conditional random fields and active learning to improve sentiment identification. Neural Netw 58:60–67. https://doi.org/10.1016/j.neunet.2014.04.005
Zhang Y, Hu X, Li P, Li L, Wu X (2015) Cross-domain sentiment classification-feature divergence, polarity divergence or both? Pattern Recogn Lett 65:44–50. https://doi.org/10.1016/j.patrec.2015.07.006
Zhang RUI, Wang Z, Yin KAI, Huang Z (2019) Emotional text generation based on cross-domain sentiment transfer. IEEE Access 7:100081–100089
Zhao R, Mao K (2014) Supervised adaptive-transfer PLSA for cross-domain text classification. In: Procceedings of the IEEE international conference on data mining workshop, pp 259–266. https://doi.org/10.1109/ICDMW.2014.163
Zhao W, Guan Z, Chen L, He X, Cai D, Wang B, Wang Q (2018) Weakly-supervised deep embedding for product review sentiment analysis. IEEE Trans Knowl Data Eng 30(1):185–197. https://doi.org/10.1109/TKDE.2017.2756658
Zhao W, Peng H, Eger S, Cambria E, Yang M (2019) Towards scalable and reliable capsule networks for challenging NLP applications. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1549–1559. https://doi.org/10.18653/v1/P19-1150
Zheng X, Lin Z, Wang X, Lin K, Song M (2014) Incorporating appraisal expression patterns into topic modeling for aspect and sentiment word identification. Knowl-Based Syst 61:29–47
Zhou L, Chaovalit P (2008) Ontology-supported polarity mining. J Am Soc Inform Sci Technol 59(1):98–110. https://doi.org/10.1002/asi.20735
Zhou H, Song F (2012) Aspect-level sentiment analysis based on a generalized probabilistic topic and syntax model. In: Proceedings of the twenty-eighth international Florida artificial intelligence research society conference, pp 241–244
Zhou G, Zhou Y, Guo X, Tu X, He T (2015) Cross-domain sentiment classification via topical correspondence transfer. Neurocomputing 159:298–305. https://doi.org/10.1016/j.neucom.2014.12.006
Zhu Z, Dai D, Ding Y, Qian J, Li S (2013) Employing emotion keywords to improve cross-domain sentiment classification. In: Proceedings of the workshop on Chinese lexical semantics, pp 64–71
Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation
Zhu J, Wang Q (2015) NiuParser: a Chinese syntactic and semantic parsing toolkit. In: Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing: system demonstrations, pp 145–150
Zhu J, Wang H, Zhu M, Tsou BK, Ma M (2011) Aspect-based opinion polling from customer reviews. IEEE Trans Affect Comput 2(1):37–49
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, R.K., Sachan, M.K. & Patel, R.B. 360 degree view of cross-domain opinion classification: a survey. Artif Intell Rev 54, 1385–1506 (2021). https://doi.org/10.1007/s10462-020-09884-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-020-09884-9