360 degree view of cross-domain opinion classification: a survey

Singh, Rahul Kumar; Sachan, Manoj Kumar; Patel, R. B.

doi:10.1007/s10462-020-09884-9

360 degree view of cross-domain opinion classification: a survey

Published: 06 August 2020

Volume 54, pages 1385–1506, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Artificial Intelligence Review Aims and scope Submit manuscript

360 degree view of cross-domain opinion classification: a survey

Download PDF

1497 Accesses
23 Citations
Explore all metrics

Abstract

In the field of natural language processing and text mining, sentiment analysis (SA) has received huge attention from various researchers’ across the globe. By the prevalence of Web 2.0, user’s became more vigilant to share, promote and express themselves along with any issues or challenges that are being encountered on daily activities through the Internet (social media, micro-blogs, e-commerce, etc.) Expression and opinion are a complex sequence of acts that convey a huge volume of data that pose a challenge for computational researchers to decode. Over the period of time, researchers from various segments of public and private sectors are involved in the exploration of SA with an aim to understand the behavioral perspective of various stakeholders in society. Though the efforts to positively construct SA are successful, challenges still prevail for efficiency. This article presents an organized survey of SA (also known as opinion mining) along with methodologies or algorithms. The survey classifies SA into categories based on levels, tasks, and sub-task along with various techniques used for performing them. The survey explicitly focuses on different directions in which the research was explored in the area of cross-domain opinion classification. The article is concluded with an objective to present an exclusive and exhaustive analysis in the area of opinion mining containing approaches, datasets, languages, and applications used. The observations made are expected to support researches to get a greater understanding on emerging trends and state-of-the-art methods to be applied for future exploration.

A survey on classification techniques for opinion mining and sentiment analysis

Article 18 December 2017

Social Network Opinion Mining and Sentiment Analysis: Classification Approaches, Trends, Applications and Issues

Opinion Mining Classification Based on Extension of Opinion Mining Phrases

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the present time, the Internet plays an important role in people’s life. With the help of the latest technologies, people can access, share, and generate content over the Internet. In the World Wide Web, Web 1.0 refers to the first generation, which was entirely made up of web pages connected by hyperlinks and people can explore a website, read the content of the pages, but cannot write or add anything on the web page. Internet users are moving from Web 1.0 to Web 2.0 since 2004. Web 2.0 (“Interactive Web” or “The Social Web”) explains a novel stage of web facilities, social websites, and applications with an increasing emphasis on user collaboration (User-generated content and the read–write web). People are consuming as well as contributing information through sites such as YouTube, Flickr, Digg, blogs, etc. Web 3.0 (“web of meaning” or “the Semantic Web”) refers to the third generation of the World Wide Web. Web 3.0 includes smart search and behavioral advertising along with Web 2.0 features. Web contents are unstructured, structured, semi-structured, wrongly spelled and noisy that required Natural Language Processing techniques to analyze the data.

Social networking sites play an important role in Internet activities. Internet users and Internet content are increasing day by day. People are excited to share and express their feelings on any issues and day-to-day activities on the Internet. The micro text or short text is the biggest challenge in text analysis and different approaches are utilized for micro text normalization (Satapathy et al. 2020; Cambria 2016). Due to the explosive progress of online activities on the Internet (conferencing, chatting, social media communication, ticket booking, surveillances, e-commerce, online transaction, micro-blogging, and blogging, etc.) leads us to load, transmute, extract, and analyse the very large extent of data that is referred to as Big Data. This large amount of data can be analysed in several real-life applications by using a combination of data mining, text mining, web mining, and information retrieval techniques. Several blogs, forums, e-commerce websites, additional web resources, news reports, and social networks work as platforms to express views, which can be used to observe or report the feelings of the customer and general public on public occasions, organisations plans, monitoring reputations, political activities, promotion campaigns and product preferences (Ravi and Ravi 2015). A huge amount of raw data is tough to analyse and needs extant methods to get a comprehensive review summary. The population of the world and Internet users is going to increase day by day. People are giving more attention to web 1.0 and web 2.0 and many activities are performed by Internet users as shown in Fig. 1 (Balqisnadiah 2016).

User-generated views are the main source of raw text. With the rapid progress of user-generated typescripts on the web or the Internet, mining of valuable data automatically from plentiful documents receives more research interest in numerous areas of Natural Language Processing (NLP) (Sun et al. 2017). The concept of artificial intelligence is used everywhere example Amazon’s Alexa from phone to devices. Nowadays machine learning methodologies or technologies are increasingly used in artificial intelligence fields. The concept of artificial intelligence together with a large amount of data is used by many different companies (Netflix, Google, data companies, etc.). NLP focuses on smartphones the human language to explain insight, help in human text and many more. The everyday human says several words to other people that interpret in countless meanings because every word is context-dependent. NLP is used for many prospectives such as word suggestion, a quick compilation of data, voice to text converter (Google assistant, Alexa application, Search engine optimization (SEO) application, handwriting recognition (online and offline), speech recognition system, opinion mining, etc. The Computational study of a person’s thoughts, moods, reviews, feelings, emotions, events, appraisal, issues, attitude, and topics are defined in sentiment analysis. The following text, explain the early reviews in the field of sentiment analysis.

1.1 Sentiment analysis: the earlier reviews

In the field of sentiment analysis (Pang and Lee 2008) reviewed and analyzed more than 300 research articles by covering the major tasks (opinion summarization, opinion classification, sentiment mining, polarity determination, opinion spam detection, different level of sentiment analysis, etc.), challenges and applications of opinion mining. Later, Tang et al. (2009) highlighted some issues in the field of sentiment analysis or opinion mining such as sentiment extraction, document opinion classification, word opinion classification, and subjectivity classification. Further, the authors specified some approaches for the subjectivity classification such as a cut-based classifier, multiple Naïve Bayes classifier, Naïve Bayes classifier, and similarity dependent.

O’Leary (2011) reviewed on blog mining and outlined different kind of blog search, forums, sentiments to be analysed and their applications. Montoyo et al. (2012) mentioned applications, attainments and some open issues in the field of sentiment mining and subjectivity classification. Tsytsarau and Palpanas (2012) focused on opinion spam detection, contradiction analysis, opinion aggregation, and opinion mining. The authors equated different opinion mining techniques and approaches that are applied in a common dataset.

Liu (2012) surveyed more than four hundred research articles in the area of opinion mining and sentiment classification. This survey covered NLP issues, sentiment analysis applications, sentiment lexicon and its issues, different levels of analysis, opinion summarization, cross-domain sentiment classification, cross-lingual sentiment classification, aspect-based sentiment analysis, sentence subjectivity classification, quality of reviews, sentiment lexicon generation, and some challenging issues in the field of opinion classification and sentiment analysis.

Feldman (2013) studied in the field of sentiment analysis and pointed-out some specific difficulties: sentiment lexicon acquisition, comparative sentiment analysis, sentence-level sentiment classification, document-level sentiment classification, aspect-based sentiment classification and some open challenging issues such as sarcasm detection, automatic entity recognition, discussion on multi-entity in the same review, and composition statement’s in sentiment analysis. Cambria et al. (2013) the survey focused on complexities involved in opinion mining, concerning demand and future direction.

After that Medhat et al. (2014) focused on the problem of sentiment analysis concerning the techniques not the applications’ point of view. The author categorized the article according to the techniques involved and classified the various techniques of sentiment analysis with brief details of algorithms. The author explained the available datasets and categorized the datasets according to the applications. Finally, discussed some sentiment analysis fields for enhancement such as transfer learning, building resources, and emotion detection. At last briefed fifty-four research articles listing out task accomplished, type of language, data source, polarity, data scope, algorithm utilized and domain-oriented.

Later, Ravi and Ravi (2015) reviewed about 251 research articles during (2002–2015) and classified the survey based on opinion mining approaches, applications, and tasks. This literature covered different tasks of sentiment analysis and pointed-out major issues in sentiment classification, review spam detection, degree of usefulness measurement, subjectivity classification and lexicon creation. Finally briefed thirty-two publicly available datasets and one hundred sixty-one research articles in the tabular form listing out concepts and techniques utilized, type of language, polarity, type of data and dictionary.

Further, Hussein (2016) identified challenges relevant to the techniques and methods in the area of opinion mining. Based on two-comparisons among forty-seven research articles, the authors discussed the effects and importance of opinion mining challenges in opinion evaluation. Finally summaries sentiment challenges and how to improve the accuracy based on previous work. After that, Al-Moslmi et al. (2017) studied about 91 research articles from 2010 to 2016 in the field of cross-domain sentiment classification. This study focused on the techniques, algorithms, and approaches used in cross-domain opinion classification. Further, highlighted some open issues in cross-domain opinion classification. Finally summarized the methodologies and findings of twenty-eight research articles in the area of cross-domain opinion classification. It is observed from the survey analysis that there is no perfect solution found in cross-domain opinion mining.

Recently, Sun et al. (2017) reviewed opining mining using the techniques of NLP. Firstly, the authors explained information fusion techniques for combining information from multiple sources to solve certain tasks. This study also presented some natural language processing techniques for text processing. Secondly introduced the different approaches, methods, and resources of sentiment analysis for different levels and situations. The aim of opinion mining is to extract the sentiment orientation (positive or negative) from different levels of sentiment analysis (sentence level, document level, fine-grained level, word level, etc.) using supervised, unsupervised and semi-supervised learning methods. Finally discussed some advanced topics (opinion spam detection, review usefulness measurement, opinion summarization, etc.), some open problems (annotated corpora and cumulative errors from pre-processing) and some challenges (deep learning for accuracy) in the field of opinion mining. Most recently Young et al. (2018) explained the latest trends of deep learning in NLP, compared various deep learning models and explained the past, present and future of deep learning in NLP.

This literature survey diverges from earlier review articles in several ways such as (1) categorized the standing studies based on different tasks in sentiment analysis and different level of sentiment analysis, (2) this study emphasized the cross-domain sentiment classification that is one of the most challenging tasks in sentiment analysis (3) summarized different tasks of sentiment classification in some aspects (approaches, techniques or methodologies, datasets, lexicon or corpus, and type of languages are utilized in sentiment classification), (4) this study provides a detailed list of publically available toolkits and supported language by toolkits for sentiment analysis’ tasks, (5) summarized a detailed list of available datasets, data sources, annotated corpora, and sentiment lexicons along with type of languages that is utilized in the field of sentiment analysis, (6) classified the baseline methods and research articles of cross-domain opinion classification in four aspects (approaches and methods utilized, datasets and languages used, name of the corpora or dictionary utilized and details description of research article) (7) study discussed some challenging issues, open problems and future directions in the area of sentiment analysis (8) summarized one hundred plus research articles of sentiment analysis in the aspect of techniques, methodologies, datasets, data source, and type of language.

The main aim of this survey paper is to understand the different techniques, approaches, and datasets, used in the field of sentiment analysis to achieve accuracy. Rest of the article is organized as follows: Sect. 2 explains the sentiment analysis and different levels of sentiment analysis. Section 3 presents different tasks and sub-task of sentiment analysis, state-of-the-art discussion on opinion mining along with publicly available datasets/lexicon and toolkits. Section 4 explains one of the most challenging tasks of sentiment classification named cross-domain opinion classification. Sections 5 and 6 covers the outcomes from the survey, the pros and cons of different baseline methods, challenges, open issues and future direction in the field of sentiment analysis. Section 7 concludes the survey.

2 Sentiment analysis

Sentiment analysis is a computational study of people’s views or aspects towards an entity. Here entity can be an individual thing like topics, blogs or events. The term sentiment analysis was firstly introduced in early of this century and has become an active field for research. According to the definition of sentiment (Liu 2012), it is represented as a quintuple.

Definition of sentiment analysis (e_i, a_ij, s_ijkl, h_k, t_l), where e_i represents the i^th entity, a_ij represents the j^th aspect of the i^th entity, h_k represents the k^th sentiment holder, t_l represents the time when the sentiment conveyed and s_ijkl represents the sentiments on aspect a_ij of entity e_i at t_l time for h_k opinion holder. The sentiment or opinion s_ijkl is neutral, positive or negative.

For Example, “The Power backup of a power bank is excellent!” Here “Power bank” represents an entity, “Power backup” represents as aspect, and the sentiment expressed as positive.

Sentiment analysis examines the people’s moods, aspects, views, opinions, attitudes, feelings, and emotions towards entities (services, products, researches, political issues, organizations, random issues, and any topics). Sentiment analysis aims to find the polarity. Polarity can be positive, negative or neutral towards the entity.

Machine learning, lexicon-based, and hybrid approaches play an important role in sentiment analysis for obtaining the polarity. Sentiment analysis can be used in different applications such as movie sale prediction, market prediction, recommender system, customer satisfaction measurement and many more for achieving one goal i.e., opinion analysis of people’s reviews. Due to rapid growth and interest in e-commerce, this is one of the prominent sources of analyzing and expressing their opinions. Opinions are important for both sides: customers as well as the manager’s point of view. Many customers take their decision based on reviews that are available on the Internet. Sentiment analysis is a multifaceted problem, not a single problem. Texts for sentiment analysis are coming from various sources in diverse formats. Various pre-processing steps are needed to perform the task of sentiment analysis. Sentiment analysis helps in achieving various tasks such as sentiment classification, spam detection, usefulness measurement, subjectivity classification, and many more. Data pre-processing and acquisition are the most common subtask required for text classification and sentiment analysis, which are explained in the Fig. 2. The next subpart of this section explains the different levels of sentiment analysis i.e., document, sentence and aspect level.

2.1 Levels of sentiment analysis

In general, sentiment analysis has been classified at three different levels such as document level, sentence level, and entity/aspect level. Some studies explain the concept of user-level and concept-level sentiment analysis also as shown in Fig. 3. Concept-level sentiment analysis emphasized on semantic analysis of the text by using semantic networks (Cambria 2013). User-level sentiment analysis analyze the opinion expressed in individual texts (what people think) (Tan et al. 2011). A brief explanation of these levels of sentiment analysis is presented below.

2.1.1 Document-level sentiment analysis

The main task of document-level sentiment analysis is to find out the overall opinion polarity of the document such as blogs, tweets, movie reviews, institute reviews, product reviews, and any issues. The objective of the document-level sentiment analysis is to determine the third tuple from the quintuple as per the definition of sentiment analysis. The generalized framework of document-level sentiment analysis is explained in Fig. 4. Research articles of document-level sentiment analysis are summarized below in Table 1.

Table 1 Document-level sentiment analysis literature compilation

360 degree view of cross-domain opinion classification: a survey

Abstract

Similar content being viewed by others

A survey on classification techniques for opinion mining and sentiment analysis

Social Network Opinion Mining and Sentiment Analysis: Classification Approaches, Trends, Applications and Issues

Opinion Mining Classification Based on Extension of Opinion Mining Phrases

Explore related subjects

1 Introduction

1.1 Sentiment analysis: the earlier reviews

2 Sentiment analysis

2.1 Levels of sentiment analysis

2.1.1 Document-level sentiment analysis

2.1.2 Sentence-level sentiment analysis

2.1.3 Aspect-level sentiment analysis

3 Different tasks of sentiment analysis

3.1 Subjectivity analysis

3.2 Spam review detection

3.3 Opinion summarization

3.4 Degree of usefulness measurement

3.5 Sentiment lexicon creation

3.5.1 Opinion lexica and corpora creation

3.6 Aspect selection

3.6.1 Feature selection for opinion classification

3.7 Opinion classification

3.7.1 Cross-lingual and multi-lingual opinion classification

3.7.2 Cross-domain opinion classification

3.7.3 Basic terminologies

3.7.4 Languages and available datasets

4 Baseline methods and techniques for cross-domain opinion classification

4.1 Structured Correspondence Learning (SCL) Technique

4.2 Spectral feature alignment (SFA) technique

4.3 Joint sentiment-topic (JST) technique

4.4 Active learning and deep learning approach

4.5 Topic modeling

4.6 Thesaurus-based techniques

4.7 Case-based reasoning (CBR) techniques

4.8 Graph-based techniques

4.9 Domain similarity and complexity techniques

4.10 Feature-based techniques

4.11 Distance-based technique

4.12 Meta-classifier technique

5 Discussion

6 Challenges and future directions

7 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation