Keywords

1 Introduction and Background

The growth and advancement of information and communication system have made news content easily available for consumption especially with the use of social media [1]. Although the development of the internet is a blessing to mankind, on the contrary, it has certain negative effects. Unlike the traditional media (newspaper, TV, and Radio) Social media has ushered in a new trend in news known as “fake news” where malicious or misleading information is rapidly spread [2].

Although social media was created to enhance communication, it has almost replaced mainstream media. A vast majority of people no longer watch television or listen to the radio, even if they listen to it, it will be done on social media. Fake news can be traced as far back as in 1439 when the printing press was invented [3], however, the discourse on fake news gained prominence especially during the 2016 US presidential election [4, 5]. With the growing popularity of social media, we are increasingly being exposed to a plethora of fake news. Fake news has caused enormous damage to our society and hence emerged as a potential threat not only to press freedom but to democracy as well [1, 3, 4]. There has not been any clear definition or acceptance of the concept of fake news [1, 3, 4]. Therefore, for us to accept what is considered to be fake news, one must first understand what news is, authentic or real news.

Based on Jack Fuller (1996) in [6] “News is a report of what a news organization has recently learned about matters of some significance or interest to the specific community that news organization serves” [6]. Gans [7] gave a precise and widely acceptable definition of news, he contended that news is “information which is transmitted from the source to recipients by journalists who are both - employees of bureaucratic, commercial organizations and also members of a professional group” [7]. This definition makes us understand that news has an author i.e.; journalist to give concrete news to its followers. This gives us an insight into why fake news is spreading so fast, fake news has no author, journalists are licensed to give news [9] or work for a news organization, those on social media works for themselves and propagate fake news for financial gains such as the Macedonia teenage group. Revealed by [8] regardless of potential benefit, the proliferation of fake news is further exacerbated by the social media outlet.

In order to attempt a true meaning of fake news, we borrow the definition from [10] who alluded that “fake news is fabricated information that mimics news media content in form but not in organizational process or intent. Fake-news outlets, in turn, lack the news media’s editorial norms and processes for ensuring the accuracy and credibility of information”. Brummet and Colleagues [8] coined the term “ideologically motivate fake news” to resemble those who are not driven by financial benefit in participating in fake news but are fabricated to enhance uniques principles as well as beliefs, this will lead to smearing misinformation which is contrary to other people’s belief and principles [8].

Prior surveys to fake news detection strategies have been a useful guide to this study given the fact that fake news is a hot issue nowadays. Review by [11] focus on social bots detection model on three social networks platforms namely; Facebook, Twitter, and LinkedIn and posit that some bots are of good nature in the sense that they automatically respond to customers’ need faster than real humans and could attend to many customers within a short period, weather updates and news pushing are essential elements of social bots. However, nowadays, social bots have been created for malicious functions such as spreading false information. This survey is different in ours in that it focuses on social bots detection, which is just a tool used in spreading fake news, instead our work focus on detecting fake news irrespective of which particular tools are used to spread it. Zhou and Zafarani [4] surveyed fake news detection methods and opportunities. They classified fake news into four distinct categories. Knowledge-based, style-based, user-based, propagation-based study. Our work is different in that we did an in-depth overview of various detection models and select only those models which have a high accuracy rate as compare to the previous author who did a general review. We further classified the different types of fake news and the motives which is an essential criterion in detecting fake news. The work of [12] centers on data mining perspective on fake news characterization and detection. In characterization, the author classified fake news in two features, such as on traditional media as well as on social media. The detection models were based on news content and social content while giving a narrative approach to those models.

A study closer to this is that of Klyuev [2] who did an overview of various semantic approaches in filtering fake news, he focused on natural language processing (NLP), mining of text to verify their authenticity. Machine Learning (ML) including to detect social bots. His approach differs from ours in the sense that he took a narrative approach to explain how various detection methods work without considering the different types of fake news and their motives. Contrary we give a state-of-the-art approach by detailing each detective model with a working example and comparing their success rate. Also related is the work of Oshikawa and Wang [13] which focuses on an automatic method to detect fake news using NLP. Their survey is based on one form of detection method i.e., NLP. Contrary to our work, we gave details of different types of detection models including both automatic and manual-facts checking as well as hybrid.

The objective of this study is to get an insight into the various type of fake news as well as the method of detecting them. We opine that fake news have different types with different motives and so one method cannot be used to detect all fake news because of the different goals and objectives of those spreading them. The rest of the paper is arranged as follows: in Sect. 2 we focused on how fake news proliferate on social media, Sect. 3 give details account of the various type of fake news while in Sect. 4 we detailly discussed the various detection models with a working example. Section 5 we discuss the open challenges and made our concluding remark in Sect. 6.

2 How Fake News Proliferate on Social Media

The proliferation of fake news on social media have short-term as well as long-term implications for its consumers which can result in a reluctance to engage in genuine news sharing and posting due to fear of such information being misleading, this is due to the fact that fake news constitute two major ways in which they are proliferated through the social media which are; disinformation and misinformation.

Misinformation refers to those who share fake news without knowing that it is fake mostly simply because they see their friends or others sharing it [14]. The echo chamber effect contribute enormously to this aspect, the social media system is made of an algorithm that recommend certain news or information to a consumer due to the group in which he/she belongs to on the social media, their prior history, circle of friendship such that when a friend view something, another friend is recommended the same thing and it will notify the user that such a content has been viewed or liked by his/her friends which will motivate such an individual also to share or like it. This recommendation algorithm also acts as a motivating factor for the consumer to share content even when they don’t know the veracity of such content.

People who have the same belief or are in the same political party will spread and share information that favors their political aspiration without proper verification. Cognitive theories [3] holds that human beings are generally not good at detecting what is real and what is authentic and posit that due to the gullible nature of human being, they are prone to fake news. In [3], the author contends that people usually tend to believe something that conforms with their view (confirmation bias) and will share it without verification because it is in accord with their thinking and will distort those that are not in accordance with their view even if there are factual.

Disinformation refers to those who are aware that such information is fake and continue to spread it either for political or financial gains. This aspect is further exacerbated by the use of social bots and trolls. Social bots and trolls are potential sources of fake news on social media. Social bots here refer to an online algorithm that interacts in human forms. Although social bots were initially created to respond to customers’ needs by some companies, some ill-minded individuals have used social bots to spread malicious and misleading information, Social bot easily retweets and follow thousands of account on twitter as well as share a post on facebook within a short time. Dickerson et al. [15] used sentiment to detect bots on twitter and found out that human gives stronger sentiments than bots. While trolls refer to human control account, they are so many accounts that are trolls account control by human beings also meant to spread malicious and distorted information. Figure 1 above shows a social bot account that runs automatically and spreads false and misleading information. Xiao and colleagues [16] build a cluster to detect trolls and malicious accounts on the social media network and were able to detect whether an account is a troll account or legitimate. A psychological study by [17] has proven that attempt to correct fake news has often catalyzed the spread of fake news, especially in cases of ideological differences.

Fig. 1.
figure 1

Showing sample of social bots account

3 Type of Fake News

In this section, we made a classification of the different types of fake news. In detecting fake news, it is important to distinguish the various forms of fake news which include; clickbait, hoax, propaganda, satire and parody, and others, as seen in Fig. 2.

Fig. 2.
figure 2

Type of fake news

3.1 Clickbait

Clickbait is a fake story with eye-catchy headlines aimed at enticing the reader to click on a link. Clicking on the link will generate income to the owner of that link in the form of a pay per click [14, 18]. A study by [18] finds most clickbait headlines to be enticing and more appealing than normal news. They define eight types of clickbait and contend that clickbait articles usually have misleading information in the form of gossip with low quality that is generally not related to the headlines [18]. Clickbait has proven to be a very lucrative business especially to the Macedonia teenagers [14], the Macedonia city of Veles is now termed the fake news city as fake news producers are already preparing for the 2020 US presidential election [14].

3.2 Propaganda

Propaganda is also a form of fake news, although date back during wartime, propaganda was famous in war reporting where journalists often report false information to save the public from panic especially during first and second world wars. According to [9] propaganda refers “to news stories which are created by a political entity to influence public perceptions”. States are the main actor of propaganda, and recently it has taken a different turn with politicians and media organs using it to support a certain position or view [14]. Propaganda type of fake news can easily be detected with manual fact-based detection models such as the use of expert-based fact-checkers.

3.3 Satire and Parody

Satire and Parody are a widely accepted type of fake news, this is done with a fabricated story or by exaggerating the truth reported in mainstream media in the form of comedy [8]. According to [9], Satire is a form of fake news which employs humorous style or exaggeration to present audiences with news updates. The difference with a satirical form of fake news is that the authors or the host present themselves as a comedian or as an entertainer rather than a journalist informing the public. However, most of the audience believed the information passed in this satirical form because the comedian usually projects news from mainstream media and frame them to suit their program. Satirical and comic news shows like The John Stewart Show and The Daily Show with Trevor Noah has gained prominence in recent years.

Although both satire and parody uses comedy to pass out information in the form of entertainment, satire uses factual information and modified or frame it to mean something else, contrary to parody, the entire story is completely fake such that if someone is not familiar with such site he/she is meant to believe the story. A good example of a parody site is The Onion and Daily Mash, which has often misinformed people as they often fabricate eye-catching and human interest information.

3.4 Hoaxes

Hoaxes are intentionally fabricated reports in an attempt to deceive the public or audiences [9, 19]. Since they are done deliberately, it is well coined such that at times, the mainstream media report it believing it to be true. Some author refers to this type of fake new as large scale fabrications and alludes that hoaxing has often caused serious material damage to its victim. It is usually aimed at a public figure [19]. Tamman and Colleagues [20] formulated a TextRank algorithm based on the method of the PageRank algorithm to detect hoax news reported in the Indonesian language. Using Cosine Similarity to calculate the document similarity, the author could rank them in order of their similar nature and then apply the TextRank algorithm. The result of the study was quite impressive given the fact that it was done in the Indonesian language.

3.5 Other (Name-Theft, Framing)

Name-theft refers to a fake news source that attempts to steal the identity of a genuine or authentic news provider in order to deceive the audience to believe that such information is coming from a well-known source. This is usually done with the creation of a website that mimics an already existing authentic news website, for instance, a producer of fake news in order to deceive the public may use credible news source websites such as (cnn.com to cnn1.net, foxnews.com to foxnewss.com). This is usually done with the inclusion of the site logo which easily deceives consumers into believing that such information is coming from the site they already recognized as genuine.

Framing is also one form of fake news, this aspect tries to deceive the reader by employing some aspect of reality while making it more visible meanwhile the truth is being concealed. It is logical that people will understand certain concepts based on the way it is coined, consumers will normally perceive something differently if framed in two different ways although it all meant the same thing. Framing became more popular during the US presidential debate when most media will provide misconceptions about what a political aspirant actually said. For instance, suppose a leader X says “I will neutralize my opponent” simply meaning he will beat his opponent in a given election. Such a statement will be framed such as “leader X threatens to kill Y” such a framed statement has given a total misconception of the original meaning.

4 Fake News Detection Models

Due to its rapid development and the complexity of solving it, some scholars allude that the utilization of artificial intelligence tools and machine learning techniques should be applied [1, 5]. In this section, we vividly explain the various fake news detection models citing working examples (Fig. 3).

Fig. 3.
figure 3

Fake news detection models

4.1 Experts Facts-Checker Approach

Professional fact-checkers are a small group of experts in various disciplines who are capable of verifying the veracity of certain news items and decide whether such information is fake or authentic. Author in [4] posited that the strength of expert-based fact-checking techniques lies in the fact that they are small in number thus, easy to manage and have a high accuracy rate. A study by [21] explains that an expert-facts checker is a natural approach to verifying fake news which uses “professional fact-checkers to determine which content is false, and then engaging in some combination of issuing corrections, tagging false content with warnings, and directly censoring false content e.g., by demoting its placement in ranking algorithms so that it is less likely to be seen by users”. The expert-fact checking technique is slow especially in a situation where they are given a large volume of information to verify due to their small number, also the fact the process is manual. During the 2016 US presidential election as well as the Brexit referendum, most expert fact-checker could not respond to a growing number of fake news that was being proliferated. Some examples of prominent fact-checking sites include; Snopes, Hoaxslayer, Fullfact, TruthOrFiction, The Washington Post Fact Checker, PolitiFact, FactCheck mostly focus on American politics. Due to the limitation of the expert-based fact-checkers, the crowdsourced technique is seen as a good alternative.

4.2 Crowdsourced Approach

Crowdsourced or “wisdom of the crowds” approach is based on the premise that no matter how smart someone is, the collective effort of individuals or groups supersedes any single individual intellectual capacity. Brabham [22] see crowdsourcing as, “an online, distributed problem-solving and production model that leverages the collective intelligence of online communities to serve specific organizational goals”. Gaining knowledge from different sources such as collective consensus is an important element with the wisdom of the crowd approach [28]. The weaknesses of expert-based fact-checkers have prompted many to seek the “wisdom of the crowds” technique. In [21], the authors used crowdsourced judgments of news source on social media and discovered that the crowd is more effective than professional fact-checker, in judging the news source quality laypeople got a similar rating with professional fact-checkers. In a set of 60 news websites, they classified them into 3 groups, 20 renowned mainstream media websites such as (cnn.com, bbc.com and foxnews.com) and 22 websites that are hyperpartisan in their coverage and reporting of facts i.e. (breitbart.com, dailykos.com) and lastly 18 websites that are well known for spreading fake news such as (thelastlineofdefense.org, now8news.com) Using a set of n = 1,010 recruited from the Amazon Mechanical Turk (AMT), they compare their judgement with those of expert-based facts-checker in a second survey and found their judgment to be accurate. In their study, they could identify the limitation of the “wisdom of the crowds” approach, firstly, because the crowd is made up of laypeople and have little knowledge of some news site, consequently, news sites which they are unfamiliar with are marked as an untrusted site. For instance, Huffington Post, AOL News, NY Post, Daily Mail, Fox News, and NY Daily News were rated as an untrusted site by the crowd as opposed to experts fact-checkers who labeled all the above mentioned as trusted sites. Fiskkit is a modeled example of a crowdsourcing site.

4.3 Machine Learning Approach

Early Machine Learning (ML) method in detecting fake news was proposed by [1] because it is assumed that fake news is created intentionally for the political and financial benefit, so they often have an opinionated and enticing headline, at such the extraction of the textual and linguistic feature is necessary for ML. The authors in [1] used Naive Bayes classifier and classified linguistic features such as lexical features, including word count and level, as well as syntactic nature, which involves sentence level characterization. They use datasets from BuzzFeed News aggregator, which contains data from Facebook posts and major political news agencies such as Politico, CNN, and ABC News. They divided the datasets into three sets namely the training, validation, test dataset and got 75% accuracy. Most AI tools for detecting fake news rely heavily on Click-Through Rates (CTR), the position of the stream page increases as the CTR increase and some fake news type such as clickbait articles usually have high CTR due to it enticing and appealing nature. Consequently, such an approach cannot be used to detect fake news types such as clickbait. Biyani and colleagues [18] propose a ML model to detect fake news; using Gradient Boosted Decision Trees (GBDT), their model achieves strong classification performance and saws that informality is a crucial factor of the “baity” nature of web-pages. Using datasets from yahoo news aggregator, they collected 1349 (training set) clickbait and 2724 (testing set) non-clickbait web pages. They employ the concept of Informality and Forward Reference. By comparing clickbait articles, they assert that most clickbait has misleading information such as gossip and most appealing headlines aimed at enticing the reader to click on the link. The landing page is usually of low quality and thus, they contend that because news aggregator site, i.e., yahoo news aim to serve its user with news article via it homepage, the proliferation of clickbait article which usually has low quality increases user’s dissatisfaction rate and amplify their abandonment which is bad for business and hence detecting and removing clickbait site become inevitable. This approach is not without limitation, fake news is a broad issue with several types but this study focuses only on one type of fake news, i.e., clickbait which has two ways of detecting it; firstly, it can be easily detected because the content is different from the headline and secondly, based on the fact that the content is of low quality.

4.4 Natural Language Processing Technique

NLP work within automated deception detection technique which involves the application of lexical and semantic analysis, with the use of regression, clustering, as well as classification techniques such as binary classification of text where news are classified as real and not real, in a two-class problem, where it is difficult to detect, a third-class may be added such as partially real or partially fake. Sentiment Score is then calculated using the Text Vectorization algorithm and Natural Language ToolKit. Deception cues are identified in the text which is extracted and clustered [2]. Grammar and style detector and syntactic analyzer such as Standford parser have been reported by [2] which gives accurate results. A study by [23] shows that truth verification with NLP has proven to show greater success when compared with human verification in a sample of n = 90. The basic task is to identify some verbal and lexical cues which will point out linguistic differences when human tell lies as oppose to when they tell the truth. For instance, deceivers produce total words-count and sense-based words such as those that show lower cognitive complexity, the use of more negative emotion words, extreme positive words.

4.5 Hybrid Technique

Hybrid detection techniques emerge as an alternative to several fake news detection methods, due to the complexity and ambiguous nature of fake news, the combination of other method is imperative. According to Mahid and Colleagues [24], the Hybrid-based detection model involves “the fusion of techniques from the content-based model as well as social context-based techniques utilizing auxiliary information from different perspectives”. The failure of the single model in detecting fake news prompted scholars to find alternative measures to accurately detect fake news. In this study, we discuss Hybrid Expert-crowdsource and Hybrid Machine-crowdsource detection method.

(a) Expert-Crowdsource Approach

The hybrid expert-crowdsource approach is relatively a new method that emerges as a result of the weaknesses of the previous methods. This approach involves the combination of the two manual fact-checking systems by applying human knowledge as opposed to automatic facts-checking involving the use of the machines. The key idea behind this approach is that where experts failed, the crowdsourced approach can complement and vice versa [24]. Recently, Facebook has announced the combination of an expert-crowdsource approach in fighting the proliferation of fake news on its network. The expert-based has often been accused of being politically biased, not independent, and very slow in detecting fake news [25]. While a study by [21] allude that the crowd is limited in many areas since they are composed of laypeople and at such, they will give the wrong prediction to content which they are unfamiliar with. Therefore, it is imperative that since the crowd is unbiased and acting independently, larger in number and thus can easily work on a large volume of information, the aggregation of the crowds’ decision can be sent to the expert which will yield better results since experts are familiar with many areas.

(b) Human-Machine Approach

Most machine learning algorithms developed to automatically detect fake news has often failed. This is because all news does not have the same writing pattern and also involves several topics with salient features. A study by [26] found out that one of the limitations of automatic fake news detection is low accuracy, those machine algorithms developed to detect fake news through news contend are prone to low accuracy due to the fact that most language use in writing fake news bypass the detection process. While the wisdom of the crowd as seen already is a right approach but slow and time-consuming and lack expert knowledge because usually crowd are compose of laypeople [21], the combination of machine learning algorithms and the collective effort of humans has proven to yield better fruits, especially in the area of detecting fake news automated by social bots. One of the hybrid machine-crowdsource technique was proposed by [26], they propose a model that uses a hybrid machine-crowd approach to detect fake news and satire. They use a dataset from the Fake vs Satire dataset. Crowdsource was use to classify news from Satire and fake news and distinguish them which was difficult to detect by the machine. By applying a combination of ML techniques they got an overall accuracy rate of 87%. The work of Wang [27] achieved a similar result as the author applied a hybrid crowdsource-machine technique in detecting fake news on social media by framing a 6-way multi-class text classification problem, the author design a hybrid CNN to integrate meta-data with text and got higher results [27]. With the application of crowdsourced, they gathered over 12000 manually labeled short statements (LIAR) dataset from politifact.com API of which those datasets are mostly used for fack-checking. By randomly initializing a matrix of embedding vectors to encode the metadata embeddings, the author employs 5baselines which includes LR, SVM a Bi-directional Long Short-Term memory networks model, CNN model as well as majority baseline. The SVM and LR gave a good performance to the classification problem as compare to the other baselines, while the CNN gave an overall high accuracy.

5 Discussion and Challenges

The discourse on fake news detection models reveals that base on the existing models, detecting fake news will still remain a potent challenge. More sophisticated models are required. Manual facts checking, which includes the use of experts, as well as crowdsourced judgment in checking the veracity of certain news content has yielded some fruits. However, manual fact checking is still faced with a lot of limitations such as labor, and time especially when they are faced with large volumes of information. The automatic fact-checking method is able to deal with large volumes of data within a concise time, however, it has a lot of limitation because most ML algorithm trained to detect fake news is base on particular lexical and textual contents as well as style. The manufacturer of fake news is also improving on new techniques to bypass this algorithm, and hence, manual facts-checking will always be required. The social media networks yield financial benefits not only to it creator but to it users as well and consequently, owner of these social networks are often reluctant to flag and remove some items or information on their site for fear of losing their financial gains, and this is a challenge to many users. Facebook and Youtube have often come under strong criticism for allowing certain fake information on their platforms.

6 Conclusions and Future Works

The proliferation of fake news on social media has often made people reluctant to engage in genuine news and information sharing for fear that such information is false and misleading. The debate on fake news detection has been a challenging one due to the complex and dynamic nature of fake news. In this paper, we did an overview of fake news detection models taking into cognizance the various types of fake news. It is a reality that fake news has caused enormous damage not only to democracy but to the freedom of speech due to its rapid spread on social media and hence detecting them become imperative. We recommend that fake news can be verified based on sources, authors or publishers, and experts can be able to distinguish between those genuine sources and fake sources.

Social bots and trolls account has often acted as a catalyst in generating and spreading fake news, which is a potent challenge. Hence, future work is required in areas of social bots detection, the main problem is not the fake news rather, it is the sharing and spreading of fake news that is causing more harm. The use of social bots in sharing fake news makes it go viral, and it has further exacerbated the proliferation of fake news as these contents are shared and like automatically making it difficult for experts to detect.