1 Introduction

Emotions are essentially human feelings and thoughts, and to understand what a human needs or thinks about a specific topic, we need to study their emotions. With the rapid growth of social network platforms and the tremendous increase in users, people have started voicing their opinions and emotions through, for example, online posts, tweets, remarks, feedback, and reviews. All of this has filled these platforms with massive amounts of opinionated content. Opinion mining has a large body of research, with many focused and specialized areas being investigated, while emotion detection is still in its early stages, and needs more studies. The interesting thing about emotion detection is that it intersects with different fields, including neuroscience, cognitive science, and psychology. It has recently gained traction in the field of computer science. There are several applications for developing systems that can detect emotions from social networks. Emotion detection in customer service may assist marketers to know if their consumers are satisfied, and what areas of their services could be enhanced or altered to build a strong relationship with their consumers [58].

Additionally, because what other people think can greatly influence consumer purchase decision-making [23], this information is used by businesses in areas such as marketing and customer support. The Intelligent Tutoring System in e-learning apps may choose educational materials depending on the user’s emotion and mental state. Having the technology to discern emotions opens up new avenues of access, such as allowing consumers to filter search results by emotion. An emotion detection system’s output can also be used as an input to other systems. Rangel et al. [110], for example, utilized the emotions detected in a text to profile the author, determining the writer’s age and gender. Finally, psychologists may deduce patients’ emotions and forecast their mental condition accordingly. They can identify whether a patient is depressed or stressed [39] or even contemplate suicide over a longer time, which is incredibly valuable since the patient may be directed to counseling services [80]. However, with the rapid rise in Web 2.0 technology, individuals can now access several media via which they may express themselves and their feelings. This has given the region a new dimension. Furthermore, text, facial expressions, photographs, speeches, paintings, music, and other forms of media have all been studied to detect emotions. Facial expressions and voice-recorded speeches, in particular, provide the most significant indications and have been extensively researched. However, emotion detection from social network content is still not mature enough and need more effort. In this work, our focus is on social network content that includes text, images, voice, and video media, since social networks may transmit emotions in different ways (text, images, emoticons, videos, and voices). Therefore, we will not cover the studies with information provided by facial or auditory channels, which focused on human vocal and bodily expressions and gestures only.

There are some surveys on emotion detection by Kao et al. [68] and Hirat et al. [62], but most of the works on emotion detection focused on a single data type to explore it, such as textual or speech. These motivate us to cover the methods and resources developed for emotion detection and prediction on social networks to better understand the research in this field. This study aimed to synthesize the findings to date and identify the areas of challenges and opportunities for future research. More so, this study presents a review of studies published from 2010 to December 2020 related to predicting emotions in online social network posts to answer the following research questions (RQs):

  • RQ1: Which category of emotions have been frequently used in the domain of emotion detection in social network data?

  • RQ2: What is the impact of studying user emotions on social networks?

  • RQ3: What methodologies and datasets have been used to predict and detect emotions in online interactions?

  • RQ4: How has emotion prediction been used in a real context?

Despite the interest in emotion detection and prediction, we believe that studies on emotion detection on social networks need to be reviewed and observed to highlight the challenges and opportunities to move the field forward. Moreover, SLR in the field of social network emotion detection will provide a clear and comprehensive overview of the available evidence in the field. Furthermore, SLRs aid in the identification of research gaps in the present understanding of a field, as well as methodological concerns in research projects that can be used to improve future work in the field. Finally, SLR helps in the identification of questions for which the available information provides clear answers, negating the need for more investigation. This study provides a more comprehensive picture of various emergent topics, methodologies, and theories to provide guidelines for research and to enhance future studies in this area. Also, this study contributes knowledge to the field of social media, with a focus on emotion detection and prediction through a careful analysis of existing studies in the field.

The remainder of the study is organized as follows: Section 2 provides overview of the works and surveys related to this area of research. Section 3 briefly provides an overview of this area of research. Section 4 discusses the review method used in this systematic literature review (SLR). Section 5 provides a detailed review of emotion detection models for single and multiple modality emotion analyses. Next, in Section 6, we examine the challenges, gaps, and limitations of the current body of literature on emotion analysis in social networking. In Section 7, the opportunities for emotion analysis research in social networking are presented. The conclusion in Section 8 wraps up the study.

2 Related works

Various surveys and reviews based on emotion detection using several approaches, such as machine learning and deep learning, or focusing on a dealing of specific data types have been presented by the researchers. In this section, we summarize some of the important review studies that cover all the important aspects of the emotion detection task, such as challenges, classification of existing emotion detection and prediction approaches, deep learning models for emotion detection, classification of models used, and handling multimodality in emotion detection.

From a psychological perspective, Feldman Barrett et al. provided an introduction in [15], highlighting the challenges of emotion detection using current approaches. Thanapattheerakul et al. [130] provided an overview of previous emotion theories, ranging from Darwin to Russell, alongside the current state of research. They also described how neuroimaging may be used to identify emotions using functional magnetic resonance imaging (fMRI) or electroencephalogram (EEG), and autonomic nervous system (ANS) response.

In [8], Anagnostopoulos et al. reviewed emotion detection based on technical and theoretical aspects. The authors observed and listed various techniques and methodologies used to recognize emotions in speech signals. They surveyed and classified emotion detection based mostly on extracted and selected features, and used classification methodology for the theoretical aspects, discussing the available databases for experimentation, and performance issues.

Subsequently, Seyeditabari et al. [118] reviewed emotion detection in text, the techniques, methodologies, and models used to identify emotion expressions, and they argued the open problems in detecting the emotions of textual content and how it is important to pay attention to the linguistic intricacies of emotion expression. Other surveys on text emotion detection and analysis outline work on this subject across time, detailing existing emotion detection approaches, methodologies, datasets, experiments, and results [114, 152]. Sailunaz et al. in [113], provided a detailed study of emotion analysis from text, including the evolution of emotions, emotion models, emotion detection methods, available datasets, and limitations of emotion analysis.

Vogt et al. [137] provided an early literature review that focused on speech emotion detection, and García-García et al. [53] compared the commercial emotion detection applications that use facial expressions, speech, written text, bodily motions, and physiological conditions, concluding that sensor fusion is the optimal option, if at all possible. As previously shown, survey studies on emotion detection primarily focused on facial and speech analysis [46, 83] and text-based emotion detection [32, 62], but they did not consider whether the data used in the research was collected from a social network platform or online available datasets.

3 Background

The evolution of technology and communication platforms has simplified our daily lives and made it possible for services and marketers to predict our needs and address them in close to real time. However, to do so, it is essential for machines and computers to have the ability to continuously gain a deeper understanding of human behavior [38]. One of the most interesting research areas in technology is the push toward better artificial intelligence that can predict our emotions and opinions.

Opinion mining focuses on determining users’ attitudes about certain events or topics. It may also refer to all areas of analysis, detection, classification, and appraisal of the user’s emotional state regarding various activities, problems, resources, or several other purposes. More specifically, this area seeks to harvest opinions, sentiments, and emotions based on impressions of individuals’ behavior that can be collected through their facial expressions, writings, speech, music, actions, and so on. The field of opinion mining intersects with many other fields, including data analysis, deep learning (DL), natural language processing (NLP), and computer vision techniques, depending on the specific type of opinion mining [85]. Throughout this section, we overview the related terms and concepts starting with the original source of data used in studies (social networks) down to the main focus of our study (emotion).

3.1 Social networks

Online social networks can be defined as platforms where a person, group, or community forms an aggregation of individual users of these platforms [54]. According to Serrat et al. [117], social networks are nodes of individuals, communities, organizations, and associated systems connected through one or more types of interdependency, such as mutual beliefs, goals, and ideas; social contacts; kinship; conflict; financial exchanges; trade; joint membership in organizations; and community involvement in events, among many other aspects of human relationships. Social networks represent a great opportunity for users to express themselves. On most of these platforms, users post descriptive information about themselves, although some users and platforms focus on anonymity. In all platforms, most users communicate with each other, but some do not [123]. Many platforms focus on specific types of data that can be sent and published by users, causing users to choose suitable platforms for different types of communication and sharing. These various types of online social network platforms allow users to communicate with others formally or informally, find individuals, share common interests, post a brief or update, document thoughts, stories, articles, and share photos and videos. Today, millions of people use these social networking platforms to share their opinions, views, emotions, and information about their ordinary and social activities. People may write about anything, including commenting on products or other users’ posts. Social network platforms can be described as interactive forums created for users of online communities to interact with, influence, and communicate with others across the site [109]. While we talk about social networks, we must mention social network analysis (SNA), which arose decades ago as an important research topic in sociology [40, 141]. Since the late 1990s, SNA has branched into various fields, and research has typically been performed under the umbrella concept of complex networks, a newly developed area in which networks are analyzed in several contexts using data from a diverse array of sources. SNA refers to predicting the structure of the relationships of social entities and the impact these structures have on other social phenomena. This field is an interdisciplinary research area that investigates methodologies for representing, measuring, and analyzing social structures [26]. In general, SNA is the method used to examine social structures through the use of networks and the theory of graphs [98], in which nodes represent the individual actors, people, or objects in the network, and the ties, edges, or links represent the relationships or interactions connecting the nodes in diagrams called sociograms. Networks are also visualized in this manner. These visualizations provide a means of qualitatively evaluating networks by modifying the representations of their nodes and edges to reflect attributes of interest [57].

These methods for analyzing social networks are applicable to a wide spectrum of relevant domains, such as the analysis of concepts in mental models [14, 142] and the study of wars between nations [145]. SNA provides psychologists with powerful tools to help explain and model the relational context in which behaviors occur alongside the relational dimensions of those behaviors. Takahashi et al. [128] conducted a survey on depressive tendencies among people who actively used social networks. The researchers used SNA to understand the respondents’ thoughts and behaviors. Importantly, SNA introduces objective foundations to network structure relationships and emerging findings for the testing of hypotheses. For instance, SNA has been used to study the pace at which a problem can be solved by a network of individuals [73]. In addition, SNA helps researchers understand how people’s actions impact and are affected by social structures. SNA has also been used in recent years to study the relationships between different individuals and organizations alongside the dynamics, sentiment analysis, and activities of other networked groups [4, 70]. SNA is extremely valuable for big data services, given the millions or billions of bits of information, such as updates, email exchanges, photographs, videos, and daily online sharing of all these items. SNA provides a systematic approach for investigating the spread of big data online about people and their relationships to others. In these cases, SNA can be conducted using data mining techniques, and it has a significant impact, especially with big data that cannot be handled using traditional techniques. SNA has been applied in fields ranging from health to finance to telecommunications and security. In addition, SNA has recently been used to identify customer communities and influencers who attract others to join a community, thus gaining customers for the social network platform.

3.2 Opinion mining and sentiment analysis

Opinion mining and sentiment analysis are two research areas that are often confused. They have been described as “interdisciplinary areas situated between the fields of NLP, artificial intelligence, and text mining” [69]. The term opinion is related to a psychological attitude, and the study of sentiments and the mining of opinion are used interchangeably. Some studies do not distinguish between the two fields [69, 132], while others render this distinction [3, 30]. Shahheidari et al. [119] proposed adding emotion analysis to opinion and sentiment as a third area. Liu [76] described opinions as the center of all human activities. It is an important aspect that influences human behavior. Opinion mining is the computational study of opinions, attitudes, and evaluations regarding an entity or its elements. In general, the aim of opinion mining is to recognize current social patterns based on the opinions, alliances, moods, perceptions, and aspirations of stakeholder groups or the public. Correspondingly, Singh et al. [121] described sentiment analysis as an NLP task aiming to identify the subjective content that contains a feeling and sentiment, which is classified as positive, negative, or neutral. Some researchers have argued that opinion mining and sentiment analysis have quite different meanings [131]. Opinion mining collects and analyzes what people think about an entity, while sentiment analysis identifies and analyzes the sentiments expressed in a text. The main task of sentiment analysis is to classify sentiment, which can be accomplished using three automated teaching approaches: supervised, unsupervised, or hybrid. To train a sentiment classifier, supervised methods usually use a named corpus [138]. The unsupervised approaches are based on sentiment lexicons that focus on syntactic structures, such as dictionaries or a corpus [86]. The synthesis of these two approaches is adopted in the hybrid approach [12]. The information generated through the process of sentiment analysis can be either general or technical, where general knowledge is connected to a studied population, posts, comments, or similar elements, while technical expertise is linked to the techniques and methods used.

3.3 Text mining

When we talk about opinion, sentiment, and emotion, it has emerged strongly in the field of text mining because the vast amount of online text generated every day is now accessible via numerous social sites, internet news streams, emails, and digital library partnerships. The primary difficulty today is mining and categorizing texts culled from such enormous amounts of data. The process of converting unstructured text data into meaningful and actionable information is called text mining. Text mining can be an automated process that extracts useful insights from unstructured text using NLP. This approach automates the process of classifying texts by sentiment, subject, and purpose. Over time, the process of text classification has undergone many stages and the development of applications in divergent areas, including those discussed in this section. The methodologies of text mining are described in the next subsections.

3.3.1 Rule-based

These methods are based on linguistic rules, meaning human-crafted associations between a particular linguistic pattern and a tag. References to syntactic, morphological, and lexical patterns usually comprise rules. Semantic or phonological aspects may also be related. The use of rule-based algorithms for opinion and sentiment analysis refers to studies conducted by language experts, where the consequence of this analysis is a set of rules (also known as a lexicon) that identify a word’s sentiment or opinion, along with their corresponding intensity measure.

3.3.2 Machine learning

Rather than relying on manually designed rules, machine learning text classification is the process of learning to make classifications based on past observations. Machine learning algorithms gain knowledge of the various associations between pieces of text using prelabeled examples as training data, and a specific output (class) is required for associated inputs. There are numerous machine learning techniques, including naïve Bayes (NB) and k-nearest neighbor (KNN) [75] as non-parametric techniques, and support vector machine (SVM) learning.

3.3.3 Deep learning (DL)

DL is an evolving branch of machine learning that was inspired by artificial neural networks, and offers ways of learning representations of data and then automating the extraction from the data of these representations and abstractions in a supervised or unsupervised manner with the aid of a hierarchy of layers that enable multiple simultaneous processes [17, 74]. To automatically extract complex representations, DL uses a large amount of unsupervised data. The field of artificial intelligence, which has the general objective of observing, analyzing, learning, and making decisions, especially for extremely complex problems, is primarily driven by DL. The main inspiration for DL algorithms that aim to mimic the human brain’s hierarchical learning approach has been work related to these complex challenges. When trying to extract useful information from complex structures and relationships in the input corpus, models such as decision trees, SVMs, and case-based reasoning may fall short. Conversely, DL architectures can be generalized in non-local global ways, creating learning patterns and associations in the broader set of data, in addition to immediate neighbors [74]. In fact, DL is a significant move toward artificial intelligence; it not only provides complex data representations that are appropriate for artificial intelligence tasks, but also makes the machines independent of artificial intelligence’s ultimate objective, which is human understanding. Without human intervention, it extracts representations directly from unsupervised data. Lately, DL approaches have achieved results that surpass previous machine learning algorithms on tasks such as image classification, NLP, face recognition, and more. The success of these DL algorithms is rooted in their capacity to model complex and nonlinear relationships within data [74]. Because humans have an innate capacity to transfer knowledge through tasks, what we acquire as knowledge when learning about one task can be utilized in the same way to solve unrelated but similar tasks. In machines, transfer learning is an effort to create a similar transfer of information from the source domain to the target domain by relaxing presumptions about training and test data. This would have a substantial beneficial impact on many domains that have been difficult to develop due to inadequate training data. For some time now, this technique has been used in vision research, where researchers take models trained to learn features from the massive ImageNet dataset and train it further in different tasks with smaller datasets. In NLP, assigning a probability distribution for sequences of words that match the distribution of a language uses language models such as the embeddings from language models (ELMo) [102], bidirectional encoder representations from transformers (BERT) [44], and generative pretrained transformer (GPT) [108]. In addition, there are uses of language modeling that involve pretraining in a general language modeling task and then fine-tuning on text classification or other tasks. In theory, this will work well because the model can use the knowledge of language semantics gained from generative pretraining.

3.4 Emotion

Emotion refers to a basic nervous system related to a mental state, such as joy, anger, or sadness. Researchers conducting previous studies referred to emotions as episodes affected by stimuli. More specifically, emotion was defined as “an episode of interrelated synchronized changes in the state of all or most of the five organismic subsystems in response to the evaluation of an external or internal stimulus event as relevant to major concerns of the organism” [116]. From a psychological viewpoint, emotions are a critical part of our daily lives when, for instance, we feel annoyed if somebody writes unfair comments about us, happy when we buy new things, or fearful when we read a horror story.

Therefore, we cannot escape our emotions; they typically occur because we consciously or unconsciously evaluate an event as relevant to an important purpose. The core of any emotion is a readiness to act, and emotion provides priority for the things that give a sense of importance. All humans express multidimensional emotions in typical ways, including through writing, speaking, body language, facial expressions, and gestures. Previous studies have shown that emotions grow over time and with various situations. Human emotions can be categorized and classified based on the type of emotion, its strength, and many other factors, all of which can be noticed and incorporated into models of emotion. Emotion models usually define different emotions according to their various ranks, degrees, or even dimensions, and other existing models define emotions by their intensity, duration, synchronicity, behavioral impact, and more [74]. Based on the various theories of emotion, current models can be divided into two types: categorical and dimensional [105]. The dimensional models focus on a few dimensions and parameters that are used to infer the emotion. Consequently, there are two or three dimensions that are widely used in dimensional models: valence (positive or negative), arousal (the degree of excitation), and dominance (level of control over the emotion) [112]. Conversely, categorical models define a list of discrete categories of emotions. Table 1 summarizes the basic emotion modeling approaches widely used in the literature, and expresses nearly every potential human emotion.

Table 1 Emotion modeling approaches

In this study, we tried to cover the prediction of emotions in social network data using the different traditional and machine learning techniques by exploring their use in emotion detection and prediction and specifically relating them to social network data.

4 Review method and execution

The review method and research questions are described in section 4.1, and the statistics on published and presented papers in different journals and conferences are presented in section 4.5. We used an SLR for this study, in which we employed repetitive processes that combined all the existing literature on a particular topic or research question [22].

4.1 Review method

The review aims to provide a better understanding and detailed overview of the current state of research on emotion detection by identifying, critically evaluating, and integrating the findings of all relevant individual high-quality studies that address one or more of the research questions. This review methodology will examine how much progress has been made in the field of emotion detection and prediction in social networks, based on the results of current studies. We will describe the extant studies’ relationships, discrepancies, and gaps, and propose reasons for continued analysis and study in this field. This study will also help by formulating general statements, developing theories, and describing future research directions [133]. For this review, SLR is important because it can address much broader issues than single empirical studies due to its potential to provide the most important practical implications [16]. We used the steps suggested by Brereton et al. [22] to perform this SLR, specifically, identify the review questions, formulate a review protocol, set up inclusion and exclusion criteria, review selection procedures and strategies, conduct quality assessment of the studies reviewed, and extract data and synthesize evidence. The following review questions were identified:

  • RQ1: Which emotion categories have been frequently used in the domain of emotion detection in social network data?

Answering this question enables us to identify the most commonly used model and the emotion covered in that model, which can help in knowing whether there is any specific emotion that needs more study.

  • RQ2: What is the impact of studying user emotions in social networks?

Answering this question enables us to discover the different aspects of the impact of emotion analysis and detection in social networks, according to the covered literature.

  • RQ3: What methodologies and datasets have been used to predict and detect emotions in this area?

Answering this question will guide the researchers to investigate the most promising methodology and will help us to discuss the challenges and opportunities depending on the current state of the research.

  • RQ4: How has emotion prediction been used in a real context?

Answering this question will provide an overview of the use cases of emotion prediction on social networking sites and the purpose of studying the emotional state of users in the literature.

These questions are a few of many whose answers require us to review the studies and approaches applied in emotion detection on social networks to identify effective aspects and present different approaches and methodologies applied currently, which may improve and develop the field in future.

While conducting our review, we encountered some problems in our search process. The main problem was that there is little conflict in the terminology used to describe the research of emotion detection on social networks; some of the work used recognition instead of detection or prediction with the research where the images and videos were considered. We proposed a methodology that followed the analytical steps of a systematic review in general, but differed from other systematic reviews in terms of searching for and identifying relevant literature. Therefore, the following sections describe our method in detail.

4.2 Review protocol

To help realize comprehensive results, the steps deployed in this SLR have been defined. The review protocol is an important key to the realization of an SLR because it identifies the criteria that can be used to pursue the review goals while also limiting the chances of bias [71]. According to Brereton et al. [22], the review plan entails the following steps: research setting, search strategy, review questions, review selection process, quality assessment, data extraction, and synthesis of the data extracted.

4.3 Inclusion and exclusion criteria

The primary purpose of using selection criteria is to ensure that all the studies selected for the SLR are significant and relevant. Data were collected from various documents, including journals, conference papers, book chapters, and workshops that were published between 2010 and 2020, written in English, and available in one of the selected digital databases. In addition to those criteria, the remaining selection criteria for this review were that the study was related to one or more of the research questions and focused on detecting and predicting emotions through various data from social networks. We excluded articles with content that did not apply to this study.

4.4 Search strategy

For this review, we employed the search strategy of using key terms to search specific databases, which involves manual and automatic search efforts to obtain precise results. A manual search was conducted for primary study references, while an automatic search was conducted to identify keywords and provide terms related to the search field for use in searching scientific databases online [22]. A broad review of early studies on a specific topic was taken from the primary search to determine what evidence exists on the topic. The comprehensive search strategy involved the following steps:

  1. 1.

    Determining this study’s questions.

  2. 2.

    Identifying the online databases that would include most of the studies related to emotion detection and prediction in online social networks.

  3. 3.

    Extracting key search terms from the research questions.

  4. 4.

    Using Boolean operators to link words and obtain search strings.

For our research, search strings were used in academic database sources. Using Boolean logics, search strings are defined by detecting synonyms and alternative spellings for each of the query components and aggregating them (Boolean OR and Boolean AND). Moreover, many search strings were constructed using relevant terms based on our research topic. Therefore, the following keywords were used in the research process: “emotion detection,” “emotion prediction,” “emotion classification,” “online social networks,” “emotion mining,” “online social networks,” “emotion classification,” “online social networks,” and “emotion recognition”. To expand the scope of the search, search strings were applied based on a review of the title, abstract, and body of studies. All selected primary studies were reviewed using a set of inclusion criteria to identify whether they were beneficial in answering the research questions. Rayyan, EndNote, and Microsoft Excel were used to organize and arrange all studies and help identify duplicate studies so that they could be removed. Rayyan [99], a free software tool available, was used to assist with performing SLRs. In addition, it helped in applying the inclusion and exclusion criteria.

4.5 Database sources

In our study, we used Google Scholar to search for keywords and also selected the databases that are considered the most relevant and up-to-date. They are also the most prominent among peer-reviewed journals and attract the most reputable conferences and articles in the field. The selected databases were IEEE Explorer, Springer, Science Direct, and the ACM Digital Library. Statistical data about the considered articles from these databases are discussed in the next section.

4.6 Publication statistics

Consequently, published research and relevant literature were found for the key phrases “emotion detection on the social network,” “predicting emotions in online social networks,” and “models of emotion detection in social networks.” After careful study of the collected literature, only those articles related to emotion detection and prediction based on social networks, which can provide an acceptable level of concepts and information and help to answer the research questions were kept. We eliminated studies related to sentiment detection because they focused only on positive and negative predictions. The phases of the selection process are described in Fig. 1. Following the first two phases, 239 studies were collected. Out of 239 studies, 69 duplicates were removed using the EndNote application. Of the remaining 170 studies, the inclusion and exclusion criteria were used to exclude those not written in English or classified as sentiment only, resulting in a further 30 studies being removed, leaving 104 studies used for the SLR.

Fig. 1
figure 1

Study selection process

4.7 Quality assessment

We examined the studies selected for review by applying a set of techniques that provide a decision on the primary studies’ interpretations and findings [22]. These techniques assess the extent to which the design and management of a study was likely to have avoided systematic errors and biases. To assess the quality of the results of these studies, we applied the following set of six quality criteria:

  • QA1. How does this study address “predict emotions in online social networks”?

  • QA2. Does the topic pertain to the fields of computer science, information systems, or other similar fields?

  • QA3. Does the study explain the context in which the research was conducted?

  • QA4. Is the research methodology described extensively?

  • QA5. Does the study clearly explain the methodology used for data collection?

  • QA6. Did the study correctly employ a data analysis approach?

4.8 Overview of publication sources

The primary studies were assessed against the criteria of inclusion and exclusion, which required a qualitative analysis of the findings. There were 28 journal articles, 53 conference papers, and 11 articles from workshops; conference papers predominated. This finding is depicted in Fig. 2.

Fig. 2
figure 2

Article types reviewed in the study

In the following section, we discuss the findings from the studies we reviewed. Therefore, we will discuss the studies, their methodology, the associated practices, and the datasets used to successfully transfer the knowledge and answer the research questions, which lead to discovering the challenges and opportunities of the field.

5 Emotion detection

Emotion detection on different types of social network platforms is a recently investigated research topic. In a few studies, the different types of content posted by users of online social networking platforms were analyzed to detect the specific emotions expressed by the users’ posts. In a few studies, combinations of voice tone, speech, facial expressions, gestures, EEG signals, various biosignals, and texts were used to detect emotion in the multimodal data. However, in these studies of emotion classification and detection on social networks, the researchers performed the prediction process on a single type of data, which can hide much of the full range of emotions expressed in the data. As a whole, different approaches have been carried out in the area of emotion detection on social networks, but only a limited number of these studies were based on DL [11]. Moreover, the existing research employing neural networks for emotion detection has mainly used a single-hidden layer. This type of neural network gained considerable interest due to its fast training times and ease of implementation, but the researchers did not consider the benefits of other neural network models and what they might add to the field of emotion detection [42]. Existing studies have determined the complexity of understanding human emotions and the need to know the context of the interaction at the time when the user expressed emotions. But studies published on emotion detection seem to all address the common lack and limitations of datasets, which is why a multimodal approach is the best; it outperforms systems where only a single modality is chosen. However, using a single type of data (image, text, video, or audio) to detect emotions in social networks may miss out on some of the other emotions involved or may not yield a full picture of the data [115]. In the next section, we will provide the survey results for studies that used a single type of data (i.e., single modality). In section 5.2, we will provide the survey results for studies that used multiple types of data, followed by benchmarking models in emotion detection in section 5.3. Furthermore, the datasets used frequently are described in section 5.4. Finally, statistics result of emotion modeling used in the studies are shown in section 5.5.

5.1 Single modality emotion analysis

Today, there is a rich body of research on sentiment analysis and opinion mining, and many focused and specialized areas have been investigated, but emotion detection and analysis are still in their infancy and still have a long way to go. In contrast, emotion detection has long been an interesting topic in many disciplines, such as neuroscience, cognitive sciences, and psychology. However, only recently has it attracted attention in computer science. Developing systems that can detect emotions in a text or any form of data that can be published on social networks has numerous potential applications. The single modality emotion analysis can be categorized as either a text-based analysis model or an image-based analysis model.

5.1.1 Text-based emotion analysis model

Current emotion detection approaches in textual social network data may be divided into two main lines: lexicon-based methods [72] and machine learning based methods [6]. The lexicon-based methods extract emotional keywords using a type of dictionary. Studies in the body of literature have shown that a supervised machine learning approach can outperform lexicon-based approaches [127]. The machine learning approach [56] approaches emotional detection as a problem of prediction, and implements classification or regression algorithms to infer emotions. Nonetheless, in this study, all relevant research addressing aspects of the research questions were reviewed without a focus on the methods and algorithms used, but there has been a noticeable tendency in studies of emotion analysis and detection using social network data since 2010 to use artificial intelligence, machine learning, and DL techniques. However, in some of those studies, a lexicon was used to generate emotion labels, and graphical emoticons, punctuation, and expressions were used to perform multi-label emotion classification for tweets [153]. Moreover, Syed and Afraz [127] presented a lexicon-based approach to extracting sentiment and emotion from tweets for digital marketing purposes. Using machine learning algorithms, Sailunaz et al. [113] detected and analyzed sentiment and emotion in Twitter posts and used the results to generate recommendations using an NB classifier for supervised classification of emotions and sentiments. In addition, text-based parameters were merged with user-based parameters to detect influential users of emotion and sentiment networks. In another study, Wikarsa et al. [143] developed a text mining application to detect the emotions of Twitter users by extracting the emotional words in tweets, constructing an emotion classifier using NB, and then analyzing the measured text.

Hasan et al. [61] also investigated the major challenge of automated emotion detection. To address this issue, they proposed an approach to detecting emotion in text stream messages using a dimensional model and soft classification to measure the probability of assigning a message to each class of emotion. Plaza del Arco et al. in [104]studied different machine learning approaches to automatically recognize emotions in messages written on social network. One study conducted using a semi-supervised learning method for emotion recognition in tweets was presented by Sintsova et al. [122]. Based on a general-purpose emotion lexicon, the researchers constructed a balanced weighted voting classifier to correctly detect domain-specific emotional tweets. The classifier was evaluated on tweets about sports, and the results showed that balanced weighted voting managed to outperform their baseline based on an NB classifier. For instance, using DL and neural networks, Moers et al. [89] proposed a model combination of a neural network and a baseline emotion miner and predicted the reaction distribution on Facebook user public pages posts. Likewise, Stojanovski et al. [124] used a DL system with the embedding of several words and added classifiers to take Twitter messages and extract features from the textual data for applying sentiment analysis and emotion identification. Abdullah et al. [1] investigated using word and document embeddings with a set of semantic features and applied deep neural network architectures to combine a convolutional neural network (CNN) and long short-term memory (LSTM) with a fully connected neural network to improve the performance of emotion detection in Twitter data.

A similar study by Huang et al. [63] constructed a character-aware convolutional recurrent network with self-attention for apperceiving the different emotional intensities to capture the emotional content of tweets and computed a score quantifying the contribution to the overall emotion in the text. Some studies combined two approaches; Gaind et al. [50] proposed two methods of classifying social media texts into six emotional categories. The first approach extracted emotion from the texts using NLP, such as emoticons, parts of speech, negations, and grammatical analysis. The second approach was based on two machine learning algorithms: SVM and J48. In addition, a large bag of words was produced in English that conveyed word emotions and their intensities.

Geetha et al. [55] intended to analyze the state of mind conveyed by emoticons and text in tweets on Twitter. They developed a future prediction architecture based on efficient classification (FPAEC), which incorporates various classification algorithms, including Fisher’s linear discriminant classifier, artificial neural networks, SVMs, naïve Bayes, and balanced iterative reducing. FPAEC also incorporates a hierarchical clustering algorithm. In addition, the researchers proposed a two-step method for aggregating classified data, in which clustering followed a preliminary classification stage. A similar study by Meo et al. [87] investigated and compared the results of using various classifiers, including Bayesian, random forest, logistic regression, and SVM, with social network texts according to Plutchik’s wheel of emotion. Loia et al. [77] developed a structure for extracting the emotions and feelings expressed in textual data, wherein emotions are represented by a polarity, either positive or negative. The emotions are based on Minsky’s conception of emotions, which comprises four affective dimensions: pleasantness, attention, sensitivity, and aptitude. Each dimension has six activation levels, dubbed semantic levels, and each level represents an emotional state of mind that can be more or less intense, judged through using fuzzy logic that depends on the location of the corresponding dimension.

In the studies, emotion was studied from many different viewpoints. Some studies focused on the techniques and how to improve the detection of emotion in social network users, while other studies considered customers and communities and how to understand their emotions about a specific event or product. Naik et al. [96] proposed a method to determine the community-specific emotional behavior of users related to a particular topic. In customer care, emotion detection and analysis could help marketers gather information about how satisfied their customers are and what aspects of their services can be enhanced or updated to create a good relationship with consumers. García-Crespo et al. [52] have researched the ongoing interaction between consumers and organizations regarding how it impacts the social networking environment with marketing and new product development. Furthermore, they developed a platform for customer SNA based on semantics and emotion mining. These do not apply only to the customer side; marketers will also look for bad reviews of their products and report the issues to their developers and designers so that they can meet public demands. Yassine and Hajj [154] focused on extracting the emotional content of online content and proposed a framework for characterizing emotional interactions in social networks. In their study, Choudhury et al. [35] identified and studied different moods that were frequently found in Twitter posts, used the dimensions of valence and activation to represent moods in the circumplex model, and studied the topology of this space regarding mood usage, network structure, activity, and participatory patterns. In addition, they used Twitter data to extend the conceptualization of human mood to incorporate usage levels and linguistic diversity, and their results provided evidence connecting mood to social structure and behavior in interactions. In a study looking at emotion contagion in social networks, Clos et al. [33] proposed an approach to detecting the emotional impact of the news using a dataset extracted from the Facebook pages of the New York Times newspaper by mapping from news items to an emotion space fed into a multilinear regression model. Some studies have investigated the dynamics of human emotion, including how certain emotions evolve, change over time, and affect the emotional states of friends. In particular, Zhang et al. [158] predicted users’ emotions in a social network based on a dynamic continuous factor graph model that modeled the users’ historic emotion logs and their social network.

Some studies have analyzed the usage of the like or the reaction button on the Facebook social networking platform regarding personality prediction and recognition and classification of emotions. Raad et al. [107] proposed a framework for predicting Facebook post-reaction distribution and classification of emotions. In event detection, Valkanas and Gunopulos [134] focused on the problem of automatically identifying events as they occur and built a system called TwInsight that proposed addressing the issue by using notions from emotional theories combined with spatiotemporal information and employing online event detection mechanisms to solve the problem on a large scale in a distributed fashion. Following that work, Valkanas et al. [135] presented a user interface prototype for TwInsight, in which the system applies emotion extraction techniques on microblogs and location extraction techniques on user profiles to map tweets to emotions. In a study aimed at determining the emotional impact of events, Bernabé-Moreno et al. [18] proposed a system for monitoring the flow of social media interactions that are geo-located in specific cities and creating an emotional profile of a location based on its long-term emotional rating. In [82], the authors also used geospatial data to analyze the emotions of the park visitors and extract them from the tweets published in park locations. In detecting emotions in socially affective situations, Balahur et al. [13] built a knowledge base called EmotiNet of action chains, representing and storing effective reactions to real-life contexts and situations described in texts. Then, they proposed a method to detect emotions in texts based on EmotiNet. Alhamid et al. [5] proposed a model that automatically collects and analyzes trending social data. The model examines and analyzes trending content, the overall attitude toward textual content, and the relationships among the participants using an adaptive algorithm. In a paper by Williams et al. [144], multidimensional structural problems discuss a preliminary study of the emotions present in software-relevant tweets. In software-relevant tweets, detecting emotions and collective mood states are established and then examined based on emotions associated with specific software-related events. In learning environment, Estrada et al. in [48] developed classifiers using three techniques: machine learning, deep learning, and an evolutionary approach called EvoMSA for classifying the opinions and emotions of learners.

Recently, many studies have been interested in the multi-label classification of emotions. Purver et al. [106] investigated the use of automatically labeled data to train supervised classifiers for multi-class emotion detection in Twitter messages with no manual intervention using a different approach via distant supervision with conventional markers of emotional content in the texts themselves as surrogates for explicit labels. Zhang et al. [157] addressed multiple emotion detection in online social networks from a user viewpoint with observations of an annotated Twitter dataset and discovered emotion label correlations, social correlations, and temporal correlations in online social networks.

Several methods have been used in the field of emotion detection, including the use of an EEG to capture brain wave patterns. Almehmadi and Bourque [7] presented a model for determining a user’s emotional state through their postings on social media. At the end of the message, both the emotional state and the geographic details were appended to provide additional meaning to the message that may have been missing from the original tweet. Moreover, this work introduced the idea of using a brain–computer interface to detect human emotion. Another area of emotion analysis or emotion classification is the identification of emotion intensity. Mashal et al. [84] proposed a model capable of extracting fine-grained intensities of emotion. Dai et al. [37] investigated emotion recognition in vocal social media. This study similarly proposes a computational method by turning the issue of emotion recognition and affective computing on vocal social media into a PAD value estimation exercise using 25 extracted acoustic feature parameters of speech signals based on a trained LS-SVR model. Furthermore, detecting depression was investigated by analyzing emotion tweets. Some studies have focused on depression by running emotion analysis in social data to detect depression. For instance, Deshpande et al. [43] tried crawling tweets that contained keywords linked to depression as Twitter ground truth.

5.1.2 Image-based emotion analysis model

To detect and analyze emotions in image posts, DL was considered for predicting the depicted emotion [51]. Jindal and Singh built a framework that aimed to pretrain CNNs with large-scale data collected from Flickr for object recognition, which would be used to further perform transfer learning [67]. They classified image emotions using Flickr images, utilizing three techniques: SVM on the high-level features of VGG ImageNet; fine-tuning on pretrained models, including ResNet, Places205, and VGG16; and VGG ImageNet [51]. Psychological and behavioral studies have proved that the human perception of emotion varies depending on user demographics. Hence, Cai et al. [27] inferred emotions from images by introducing group information and implemented a joint emotion inference model combining image content, user personalization, and group information. Cai et al.’s [27] experiment proved that groups influence the emotions of members, contributing to the improvement of the image inference of emotions. Similarly, not only was text data involved in using machine learning and DL to detect the emotion through the social network data, but the images were also considered. Malighetti et al. [81] adopted an artificial intelligence-based using the emotion application program interface API from Microsoft Azure Cognitive Service discrete emotional analysis for detecting emotions in Instagram images marked with hashtags referring to body image–related components to recognize the most expressed emotions in such a kind of images. Moreover, Nagarsekar et al. [95] used supervised learning with two different machine learning algorithms to classify tweets into six basic emotions. Both classifiers were trained and tested on three datasets that differed in their emotion distribution. In urban studies, Ashkezari-Toussi et al. [10] focused on the emotional content of Flickr images for decision-making in urban areas and social studies. They selected facial images from Flickr, which were then sorted based on age and gender. The detected emotions on the faces were extracted, and the emotional state was mapped for each city to show the spatial distribution of different emotions.

The perspective of a reader has received less interest in the study; however, it has many applications, including helping writers anticipate how their work will impact their target audience or helping readers access documents related to their desired emotion. A study by Rao et al. [111] was concerned with the detection of the emotions that social media evokes in readers. Analysis from the reader’s perspective can be more meaningful when extended to social media, as opposed to classical emotion analysis performed from the writer’s perspective. In Table 2, we have compiled some of the literature relevant to emotion in single modality. The studies listed with describing the methodology, the data source (social network site used, description), emotion modality or categories, with the performance measures (type, percentage) only if it had been highlighted in the study.

Table 2 Surveys of single modality emotion analysis

5.2 Multiple modality emotion analysis

The field of emotion detection and prediction research will benefit from the new possibilities that the multimedia data available online brings. Every day, many people use social networks to share their experiences and express their mood and state of mind. To date, most studies have relied only on emotion analysis from textual content. However, there are a few studies have considered hybrid models or multiple modality emotion analysis using combined sources of information, such as speech and text or image and text, to classify emotions. However, multimedia content, including images and videos, has become prevalent in online social networks. Indeed, online social network providers are competing with each other by providing easier access to their increasingly powerful and diverse services that include pictures. As the saying goes, a picture is worth a thousand words. People with different backgrounds can easily understand the main content of an image or video. Apart from the large amount of easily available visual content, today’s computational infrastructure is also much cheaper and more powerful, making the analysis of computationally intensive visual content feasible. In this era of big data, the integration of visual content can provide us with more reliable or complementary online social signals [24, 101]. However, images and text always come in pairs, which is becoming increasingly prevalent in online social networks. Intuitively, we can alleviate the challenges in visual sentiment analysis just discussed through the integration of textual knowledge. Silvia et al. [34] investigated whether combining and integrating visual and textual data permits the identification of the emotions elicited by an image. Zhao et al. [159] proposed predicting personalized emotion perceptions of images in the valence-arousal space using shared sparse regression as a learning model. Different types of factors that may affect personalized image emotion perceptions, including visual content, social context, temporal evolution, and the influence of location, were jointly investigated.

Conversely, categorical approaches map sentiment into one of the representative categories. One of the primary challenges in image emotion detection is the fact that different viewers may have totally different emotional reactions to the same image, which is caused by many personal and situational factors, including cultural background, personality, and social context. Zhao et al. [160] proposed detecting personalized perceptions of image emotions by incorporating various factors (social context, temporal evolution, and location influence) into visual content. Rolling multitask hypergraph learning was used to combine these factors jointly.

llendula and Sheth [65] investigated the effect of emojis and images using the BiLSTM model with the attention mechanism, in which they fed FastText embeddings, and then used EmojiNet with the extracted features from the images. They used the three modalities of text, emoji, and images to encode various emotional details. Xu et al. [149] introduced a social–emotional optimization algorithm (SEOA), which assumes that most individuals aim to extend their social status. In addition, they extracted visual, textual, and audio information from videos and proposed a multimodal emotional classification framework to capture the emotions of users on social networks.

Some studies have used YouTube data to detect emotions. Chen et al. [31] classified YouTube videos into six emotion categories through unsupervised and supervised learning methods, applying sparse ensemble learning to synthetic aperture radar (SAR) image classification. A method of heterogeneous knowledge transfer for video emotion detection using YouTube video datasets was proposed by Xu et al. [147]. The extraction of deep convolution network features and zero-shot emotion learning, a method that predicts emotions not seen in the training set, is used to recognize emotions in user-generated videos. Image transfer encoding (ITE) is proposed to encode retrieved characteristics and produce video representations to complete this task. In Table 3, we have compiled some of the literature relevant to emotion in multiple modalities. The studies listed with describing the methodology, the data source (social network site used, description), emotion modality or categories, with the performance measures (type, percentage) only if it had been highlighted in the study.

Table 3 Surveys of multiple modality emotion analysis

To identify the benchmarking model in emotion detection, we will consider the state-of-the-art (SOTA) research to be discovered in the next section.

5.3 Benchmarking models

With the evolution of machine learning techniques all the way to DL, some recent accurate models have relied on DL and transfer learning (TL). Moreover, using the knowledge gained while solving one problem and applying it to a different but related problem will improve efficiency and help to get more accurate results in the detection.

Some state-of-the-art (SOTA) models on emotion detection are shown in Table 4, listed with the machine learning model (model) used, model-based (text, image, and multiple mobilities) types, dataset, and the accuracy achieved with the said model.

Table 4 Emotion detection models

5.4 Datasets

Various existing or customized datasets have been used in the reviewed studies to detect emotions according to the different researchers’ experiment types. Specific emotion models annotate annotated datasets, thus considering only those emotions that exist in a specific emotion model. A few datasets labeled with emotions are available, some by researchers who contributed a data corpus they built to use in their experiments. However, the number of datasets available for analyzing emotions remain inadequate. Moreover, there is no generalized dataset that can be used on any model of emotion. Researchers are bound to either use the emotion model that exists in the datasets or generate a data corpus, and then annotate the data manually according to the emotion labels. The latter is a time-consuming task. However, some previous studies focused on how to leverage social network data to collect large volumes of data and create large datasets labeled with emotion classes. Wang et al. [139] created a large emotional dataset of approximately 2.5 million tweets using the emotion-related hashtags available in the tweets and then implemented two separate machine-learning algorithms to classify emotions. Mohammad [90] compiled a large corpus of tweets and the associated emotions using emotion word hashtags, and a word–emotion association lexicon from the Twitter emotion corpus was performed. Overall, the experiments in this study show how the Twitter emotion corpus can be used to improve the accuracy of emotion classification in different domains.

For their research, Abdul-Mageed and Ungar [2] employed a DL approach for emotion detection in text and built a huge dataset for fine-grained emotions called EmoNet, which claims to be the largest dataset constructed from tweets. Similarly, Marquez et al. [20] proposed a methodology for expanding the NRC word–emotion association lexicon (EmoLex) for the language used on Twitter by performing multi-label classification of words and comparing different word-level features extracted from unlabeled tweets, such as unigrams, Brown clusters, POS tags, and word2vec embeddings. Many studies used hashtags (#) and emotional hashtags to annotate the collected data and label the datasets [59, 60]. In a different approach, Hussien et al. [64] proposed a methodology for detecting emotions using an automatic method of annotating training data based on emojis. According to their paper, the use of automatically annotated tweets based on emojis to train a classifier for detecting emotions induces better performance than training it on manually annotated tweets. Some of the most frequently used data sets in recent works are listed in Table 5, along with a brief description of the amount of data available in each set.

Table 5 Emotion’s datasets

5.5 Emotion modeling

Emotion modeling is the foundation of any emotion detection and prediction system because it defines how emotions are represented. The categories’ models assume that emotions exist in various states; thus, distinguishing between the various emotional conditions is imperative. When undertaking any emotion detection-related activity, it is imperative to initially define the emotion categories model to be used. In the studies explored in our review, various forms of representing emotions are identified; however, highlighting the most frequently used emotion model is crucial. Figure 3 shows the percentage share of use by each emotion category model that we have discussed in this paper.

Fig. 3
figure 3

Emotion modeling: Percentage share of use of different models in the studied research papers

6 Challenges, gaps, and limitations of emotion analysis in social networks

In the past decade since 2010 or so, the field of emotion detection and sentiment analysis has seen positive growth rates and developing technologies; however, one cannot fail to notice that it is one of the foundational reaches areas in machine learning, yet it still has some issues and challenges that we noticed after reviewing the studies that need to be addressed. Further research is required to handle these areas, from the nature of social network data to fake and spamming emotion. Below is a discussion of some challenges in the fields, and reviewing these various issues will help in developing the field of emotion detection on social networks.

6.1 Dynamic nature of data

In the field of emotional analysis, the most challenging task is the dynamic nature of social network data, which makes it difficult to begin an analysis and detect the emotions of users from the early stages of collection to preprocessing. People talk about their experiences and opinions every day, and new aspects, facts, and features arise almost daily. The increasing amount of real-time data, the variable and dynamic nature of that data, and the noise in social network data make it difficult to discover emotions relevant to specific topics, trends, and events. This means that feature sets or aspects that were important for classifying emotions quickly become irrelevant, and if the dataset is not updated frequently, emerging features will be neglected, which could detrimentally affect precision. Another factor exacerbating this issue is differentiating the non-important factors from the important and recurring ones in the training process. A structured and efficient approach is highly needed across all areas of emotion detection and analysis in social network data.

6.2 Context-related challenges

Hashtags (#) are ubiquitous these days in tweets with the purpose of threading them in with a running or trending issue. This can be an opportunity to tackle the first challenge of handling frequently changing and dynamic datasets by training on, for example, a weekly basis, using these hashtags as important features. Indeed, this is being done in some cases. But a different issue arises when we encounter the fact that many users employ hashtags in a sarcastic manner, and the objective of these hashtags does not necessarily make the text relevant to a given agenda. Sarcasm is the Achilles heel of sentiment classification by machine learning, unless we employ an advanced neural system to identify the mood by considering the possibility of sarcasm. However, normally, in most observations, sarcasm lowers the output of most probabilistic techniques.

6.3 Interpretation of sentiment analysis versus emotion analysis

Sentiment analysis has developed into emotion research, which has finer granularity. Positive, negative, and neutral sentiments can be expressed with different emotions, such as joy and love for positive sentiments, anxiety, and sadness for negative sentiments, and apathy for a neutral sentiment. In searching for emotion detection and analysis on social networks, many sentiment papers (considering positive, negative, and neutral) come under the banner of emotion detection. Sentiment analysis starts early, and the emotion analysis search begins afterwards, standing in the area of sentiment and opinion mining. However, whether the emotion model will be considered in a paper or only sentiments requires clarification. This step will clarify and delineate the area of emotion analysis and detection, making it easier to analyze the field of emotion detection research. This challenge affects the quality of emotion analysis and detection research, and it is vital to determine the literature needed to evolve the emotions model in the social network area without falling into the trap of a paper being rated for emotion, while the content is sentiment analysis.

6.4 Trustworthiness

While not prevalent within the scope of this study, another issue is bias or untrustworthy data derived from social media. There is the possibility that anyone who wants could implement a sentiment classification or emotion detection system on a set of tweets regarding some particular topic that might be trending online. The notion of accuracy, validity, and even truthfulness in user content is weak, however widespread and regular the use of social media might be. It is difficult to be certain that a post by any user on social media is accurate, valid, or even true. Therefore, the concept of emotion detection needs to address certain concerns. Furthermore, detecting and capturing the true emotions that a person feels when witnessing a specific activity or event may significantly impact the authenticity and sincerity of posts from that person about that activity or event. It is not unusual for people to value tips and suggestions from individuals in their social media networks more than information obtained through conventional methods, such as advertisements. Furthermore, social media suggestions impact people more often than those obtained through other channels, such as word-of-mouth or overheard conversations. If it can be concluded that increased visibility and retention contribute to some measure of increased intervention, then the accuracy, authenticity, and veracity of such social media posts is a critical concern. These can be addressed by attaching detected emotion information to a social media message. We may determine a post’s truthfulness by making sure that the emotion detected at the time of the event or action is consistent with what the person is describing. If a person says they enjoyed their dining experience, we can fairly expect an experience of arousal or satisfaction to be the emotion they feel. In comparison, feelings of annoyance, indignation, and maybe even boredom could mark a bad experience. Capturing and attaching this information to posts on social media can be useful, but the detection of emotions has typically been a pseudo-science. Furthermore, when fetching a dataset from social media, one might assume that the dataset actually reflects the emotions of authors or the mood of the public. Therefore, use this principle to analyze and predict general opinion about certain issues under discussion as positive or negative or as falling under a particular emotion. However, the very foundational principle fails once we include the possibility of biased data that might be scattered throughout the Internet by bots, for instance, and which might affect the entire emotional polarity of a given dataset.

6.5 Language-oriented issues

When we speak of social networks, performing emotion detection and analysis in texts in English or non-English languages, such as Hindi, Arabic, Chinese, and many more, is a crucial challenge. Emotion analysis is tied to the various characteristics of each language, and there are only a few studies available in languages other than English that already have various available corpora and lexical dictionaries. It is nonetheless important to conduct emotion detection and analysis in non-English languages due to the large percentage of non-native English speakers around the world; for instance, there are millions of people in Middle Eastern countries whose native language is Arabic, making emotion analysis on Arabic social networks crucial for political and economic reasons. Some studies have attempted to tackle language-related problems using a cross-language classifier in which non-English is automatically translated into English, and the classification is based on English corpora and dictionaries. However, the accuracy of automatic translation is still unremarkable.

Due to the different characteristics of English and non-English languages used on social network platforms, the number of corpora and dictionary lexica are limited compared with the standard English corpora and dictionaries. Moreover, for non-English languages, task difficulty is based on each language’s morphologies and characters. This problem requires more research to develop and build corpora and lexica for non-English languages used on social network platforms.

6.6 Fake and spam emotion

Communities of social networks are characterized by the anonymity of their users, and the anonymity of users can be used to defraud other users. Companies and organizations may use emotion spammers to post false positive reviews to promote their goods or to discredit their rivals through false negative opinions. This also applies to the political domain or any other area, where feelings about targeted events can affect a reader’s assessment of events. The challenge is that, when reading such material, it is difficult to distinguish spam from non-spam emotions. The challenge for emotion analysis is to develop the necessary techniques and advance the algorithms in the collected datasets to identify and filter out false emotions.

7 Opportunities in the field of emotion analysis in social networks

The use of DL approaches for developing and improving emotion detection models having particular potential given the success rates of artificial neural networks, DL architectures, and transfer and reinforcement learning in making machines intelligent is all well known. Various factors, such as decreased computational costs, the rise of computing capacity and power, and the availability of cloud and unlimited storage at reasonable costs, have contributed significantly to this development. The above factors used in joint learning methods will reshape machine capacities to comprehend and unravel hidden emotions within social networking’s various data types. As is widely known, DL has achieved good success in many fields of text mining and image recognition, and DL promises to be key for stepping forward in the development of emotion detection. However, to leverage these benefits for detecting and analyzing emotions in social networks, we need to first address some of the challenges in applying DL, which is an area for future research:

  1. 1.

    The challenge of poorly annotated large data resources that may hamper the performance of any DL model because these methods rely on large amounts of labeled data. Moreover, while reviewing the multimedia emotion analysis, we noticed that datasets of images and associated texts validated with emotion categories were unavailable.

  2. 2.

    To build a DL model and train it from scratch, a massive amount of data and powerful computing resources are needed to get satisfactory results. This makes it difficult for researchers without support to work and generate a model from scratch. Transform learning, which is a general process of training a model on one problem or task and then using it with a second related problem, can be used in those cases. However, the transfer learning domain in emotion detection using social network data is quite new.

  3. 3.

    Using transformers [136] has yielded many SOTA results in many NLP applications and tasks since its birth in 2017, particularly due to its constituents. The transformer model, initially designed for machine translation, is being used for language modeling, making it applicable for other NLP tasks, such as text classification, document summarization, and question answering. Recently, it has been used in image recognition tasks, video understanding, and multiple modalities such as vision and language [79, 129]. Applying transformers to emotion detection on social network data will improve efficiency and the results. Meanwhile, the design of transformers allows processing of multiple modalities (e.g., images, videos, text, and speech) using similar processing blocks and demonstrates excellent scalability to huge capacity networks and huge datasets.

  4. 4.

    The study and investigation of implicit expressions of emotions has great potential for further research, since it has been only sparsely studied to date, despite having had an impact on machine learning and neural models.

Automatic emotion detection and classification is important, especially when we talk about online social network data because of the huge number of posts made daily on these platforms. The study of momentary emotion is significant in many fields, including studying and monitoring a healthcare patient’s emotions to detect depression and suicidal feelings. Despite the essential need to automatically detect emotion, only a few studies have considered this aspect. In addition, it is essential to consider all social network platforms; as we saw, most of the examined studies considered Twitter data, and only a few studies explored Instagram or other platforms. Moreover, new platforms have appeared in the last few years and become a trend used by millions, such as Snapchat and TikTok, and it is important to pay attention to these new trends. For detecting emotion in textual data, we must note the casual style of social network data and text messages, which are usually written in a casual style and contain numerous slang words. While the use of formal language has been widely studied regarding emotion analysis, the use of casual language regarding emotion analysis and detection has been limitedly explored.

8 Discussion

The analysis and examination of the literature, as shown above, helps us understand that many of the studies have been dependent on machine learning techniques. In fact, it is believed that a huge number of the literature have used Twitter as a source of data to perform the emotion analysis and detection on them. However, research, which tends to focus on text-based emotion detection rather than methods based on images or multiple modalities, is limited and requires more study. Today, many social network platforms rely heavily on images and video along with the text. Since the video is based on sound or speech and image frames, the multimedia-based method needs more study to understand the feelings of the users, especially when the image or video is not personal, with gestures and facial features. Several approaches and models have been employed to conduct emotion detection on social networks, which generally can be categorized into groups. Table 6 shows the studies grouped based on the approach used that the authors applied within their papers.

Table 6 Studies grouped based on the approaches used

Furthermore, we believe that most of the researchers focused on the process itself rather than the purpose; they focus on how to predict and detect the emotion rather than why we need to detect these emotions, what we going to gain from predicting the social network users’ emotions. Emotion detection from social networks can be useful in practically every area of our daily lives. The utility of emotion detection is significant in monitoring the mental health of people with depression [39, 43], enhancing business tactics following consumer demands [29], tracking public emotions during elections and prediction based on these emotions [9, 45], detecting potential criminals or terrorist from analyzing the emotions of people after a terrorist attack or crime [114], identifying if the headline of news is safe or unsafe for incorporating advertisements [104] or making efficient e-learning systems to improve student motivation [125]. The perceptions of theme park visitors were assessed by integrating geospatial and emotion analysis alongside the simulation of unified areas where expressions are collected in reference, with a focus on social network messaging [82]. Emotion contagion in Facebook posts was analyzed to detect the emotions of people on rainy days, together with the effect of one emotion on their friends’ emotions [36]. Moreover, detecting influential people from a social network is a very lucrative research area for referral marketing to spread information related to any product and reach the highest possible nodes of a network.

Furthermore, [96] determined the community-specific emotional behavior of users related to a particular topic, which can help to understand the emotion about a specific event or product in any case. A few studies have considered the perspective of a reader, which has many applications, including helping writers anticipate how their work will impact their target audience or helping readers access documents related to their desired emotion [111]. Overall, we can gain more effective models and improve the field by considering emotions from all the different perspectives, writers, and readers. Finally, some potential applications and cases to predict emotions in real contexts were highlighted in this section to answer the RQ4. There are promising applications if future studies in the field consider real-life problems or trends to apply emotion detection and predictions.

9 Conclusion

People use social networking sites to share their feelings more often every day. They post about their everyday activities, events, successes, opinions, emotions, and many other things related to their daily experiences. In response to the need to detect correct emotions from user-generated content in different socioeconomic areas, researchers have developed models and systems to analyze and detect emotions using different approaches. In this study, we investigated emotion detection and analysis through a comprehensive literature review of this important topic.

In this paper, we considered three research questions: Which is the most frequently used model of emotion? As shown in Fig. 3 and discussed in the review of the studies, 42% of the reviewed papers did not consider a specific psychological emotion model, and 27% of the studies used the Ekman model. Regarding the second research question, the impact of emotion detection research, multiple perspectives are involved, from user daily feedback in marketing to monitoring the mental health of psychiatric patients. Between the two extremes of customer feedback for improving product recommendations and the detection of depression and suicidal feelings, there are many other important fields that can be positively impacted by improving emotion analysis and detection techniques and approaches. To answer the third question about the methodologies and datasets used in this field, Section 5 presented a summary of the approaches and datasets used in the reviewed literature, and the emotions covered in the reviewed studies were detailed in Tables 2 and 3. Moreover, some state-of-the-art (SOTA) models of emotion detection are shown in Table 4. The opportunities in the field of emotion detection and analysis are numerous, and in this studies, we have tried to give a brief overview of the opportunities and areas that need more research to improve and move this field of research forward.