1 Introduction

The Internet has experienced a constant growth and development, both in the past and nowadays, creating digital traces that can be collected and processed to define different individual schemes, themselves useful to discern both single- and group-related behaviors. This is indeed a time when a huge quantity of information has an outstanding speed of diffusion throughout the networks: Data equipped with these features have thereby been labeled as big data (Mayer-Schönberger and Cukier 2013; Erl et al. 2016). Some figures per minute, hinting at an approximate dimension of the phenomenon, are the following: millions of e-mails, Twitter and Facebook posts, thousands of Instagram photos and the like.

As a consequence, such massive volumes of data are becoming a basic feature of the society and, at the same time, the ability to analyze, correlate and learn from them is turning into a useful element to compete as well as to support growth, productivity and innovation in different fields (Lin et al. 2010; Bello-Orgaz et al. 2016). In this perspective, broadband Internet connections and social media are playing a fundamental role in strengthening the relationship between companies and their customers: They have the power to dramatically change the marketing strategies (Erevelles et al. 2016) and the political attitudes making successful predictions (Song et al. 2014). This is also proven by the movement of a great amount of investments toward social media-driven decision systems, during the last years, to the detriment of traditional alternatives (Bowen and Bowen 2016; del Val et al. 2016).

Although big data can be definitely considered a blessing for decision-making, having big data does not automatically lead to better marketing as they are intertwined with some important challenges and issues: the impossibility to use a unique central unit and classical storage facilities, the need for real time analytics, the correctness of the insights, privacy preservation and so on (Lesk 2013; Lu et al. 2015).

An important role in the context of social media analytics is played by machine learning and computational intelligence techniques. Indeed, text mining, user profiling and localization, sentiment analysis, social sensing and the like are just some of the means used to perform a deep analysis of the traces that people continuously leave on social media (Zeng et al. 2010; Liu 2012; Li et al. 2016). Recently, a number of contributions for exploiting machine learning and computational intelligence techniques for big data analysis have been discussed in the specialized literature (Wu et al. 2014; Jamshidi et al. 2015; Moreno et al. 2016; Jha et al. 2016).

In this paper, we want to provide some operative and technological advice on how to harness the wealth of data available on social media to enhance corporate marketing strategies. We first review the main features and issues related to social media, big data and the current technologies to deal and analyze them. Then, we offer a glimpse of an operative method, inspired by the works of Berkhout et al. (2006, 2010) that we suggest to exploit for social big data analytics. In this way, we provide some hints on how gathering information from unstructured data, present in the Internet, and from structured data already owned by enterprises, in order to reinforce a proactive and emotional action for the, possibly real time, management of customers’ needs. Finally, in order to highlight those issues still unsolved and shed light over new possible research paths, we also perform a brief review of prior research works and use cases of the recent literature according to a four-level framework: integration, development, brand reputation and customer relationship.

The sections ahead of the paper are organized as follows: Sect. 2 summarizes the current main sources of big data, namely online social media, and provides some hints to understand the big data phenomenon, describing some intrinsic characteristics of social big data and giving a brief summary of the main advantages and drawbacks they provide to the society. In Sect. 3, we review some recent IT technologies that can be employed for handling and analyzing social big data. Moreover, we also draw an operative methodology for making social big data analytics an effective marketing tool. Section 4 presents an assessment and classification of use cases regarding marketing applications of analytics obtained from social big data in the perspective of highlighting what has been already investigated and future research issues. Finally, Sect. 5 sums up the article with some conclusions and possible future developments and topics for better exploiting the social big data phenomenon.

2 Society 2.0 and social big data

Human beings have always tried to “datify” the world, but currently the novel information technologies allow them to strongly increase the ease and the speed of transforming each human and social phenomenon into data; as said by the authors in Mayer-Schönberger and Cukier (2013), the world is becoming more and more “datified.” Indeed, the twenty-first century is more and more composed by a generation living, entertaining and studying (Ducange et al. 2017) on the Internet that leaves digital marks, often elaborated to produce a detailed description of both individual and group behaviors. This leads to the possibility of transforming the perception of lives and of the whole society.

Society 2.0 is a neologism used to define the interconnected world we live in, vouched by the flood of big data currently present on the Internet. This is the realization of past theories (Bell 2008) enacting a new way of thinking and new economic and social dependencies. Society 2.0 is based on social media, web-based means of interaction among people, through which they create, share and exchange various kinds of information ranging from pictures to texts, from music to videos and so on, in a sort of virtual community. Some of the most popular are Facebook, Twitter, LinkedIn, YouTube, Instagram, Google+, Tumblr, Flickr, e-mails, forums, blogs and so on. Among these, a subset, known as Online Social Networks (OSNs), is garnering a major part in the big data scenario, mainly thanks to its ability to mirror existent bonds in real life, such as those with friends, relatives, colleagues and the like (Song and Phang 2016). The purpose of a social network may be multifaceted: interaction with friends and families, creation of new business contacts, sharing photos and experiences, sharing emotions and feelings at any time and across distance, meeting new people, and so forth. Anyway, they are one of the most used ways people exploit to communicate and this is clear from Fig. 1, courtesy of CLT (2014). It shows the number of digital data generated, on average per minute, by social media on the Internet in 2014, 2013 and 2012.

Fig. 1
figure 1

Social big data numbers in 1 min. Courtesy of CLT—Hong Kong (CLT 2014)

Apart from giving a hint on the ongoing dimensions of social big data, Fig. 1 also verifies the constant and exponential increase of the circulating digital data over the years. Another emerging concept is that the more we are able to share (data, information, etc.), the more we will be able to garner; this is an idea by now present in all those people belonging to the so-called millennials generation, i.e., those born in the ’80s and ’90s, and even more in the so-called digital natives or generation Z, born in the 2000s and on. Finally, as vouched for in Hamed and Wu (2014), thanks to social big data, the world and the society are shrinking; this is related to what is known as “degrees of separation,” i.e., the number of intermediaries between two individuals in the real world, which Milgram, in the ’60s, showed to be between 4.4 and 5.7 (Milgram 1967). In the era of social media and social networks, thanks to social big data and connections, these degrees drop to less than 4 (Hamed and Wu 2014).

2.1 The Vs of big data

In this subsection, we perform an analysis of the features of social big data, shedding light on their evolution and on the issues and challenges connected with them.

In 2001, Laney introduced three basic concepts (Laney 2001) to describe big data:

  • Volume referred to the huge quantity of data, but also to the fact that more than 90% of the world data were generated in the last years (McCafferty 2014). The issues related to this feature regard to acquire, store and rein in an efficient way even zettabytes of information of any type that, in turn, must be organized, verified and analyzed.

  • Velocity intended both as the relentless rapidity with which data spread throughout networks and as the quickness needed to parse data in real time. This is a key feature of big data and marks a difference between them and a simple large data set. A possible issue concerns the identification of a trend or an opportunity in a few minutes before competitors in order to attain a competitive advantage, as analyzing data two minutes later could be very dangerous in a vying market.

  • Variety considered as the different types of data coming from different sources, such as sensors, social networks, and other devices and applications, but also seen as a different kind of richness, which traditional data sets are not able to convey. The point here is that traditional technologies are perhaps unable to deal efficiently and contemporaneously with structured, semi-structured and unstructured data, and new specific solutions are needed.

Indeed, as time went by, Laney’s three Vs have proven inadequate to describe big data completely, and along the years, new Vs were added in order to properly characterize them. Therefore, a shift has taken place from a three Vs model to a five Vs scheme (Brown and Harmon 2014) and, more recently, to a seven and nine Vs pattern (Owais and Hussein 2016). The five-V model adds Virality and Vector to the framework. The latter refers to the possible directions and dimensions of the spreading of big data and their inherent geolocalization and GPS information, and the former concerns the percentage of perfect temporal arrival of data and information to certain nodes or users.

However, the most recent models encompass nine Vs (Owais and Hussein 2016), and in the transition from the five-V model to the one featuring nine Vs, an evolution occurred in the concepts of Virality and of Vector. The first concept can be incorporated in both the Veracity and the Value notions. The second one has been absorbed partially by the Visualization idea and partially by the Variability conceit, considering the meaning of space information. Moreover, Volatility and Validity have been introduced. In the following, we list the additional six Vs, accompanying the traditional three ones, as well as some related issues:

  • Veracity in the sense of truthfulness and accuracy. The issues connected with this feature regard the quality and the trustworthiness of the insights to construct a novel added value. The former may impact supporting decision processes, while trustworthiness is becoming more and more important in various IT fields (Pecori 2016), and the ranking and dependability of the retrieved data are becoming a fundamental prerequisite to devise new ideas and intuitions.

  • Variability concerns the changes, over time and over different contexts, the connotation and significance of any given datum may experience. The related issue is to frame the insights into a particular situation to decode the exact sense of a word or sentence in specific circumstances.

  • Visualization concerns the hard process of making huge amounts of data understandable and interpretable for better user experiences and services. This is not the most difficult task, but it is surely one of the most hard and crucial.

  • Validity regards the correct usage of uncorrupted data. It is tightly bound with Veracity but considering data integrity.

  • Volatility encompasses the retention policy of data and the period they should be stored in for future useful usage.

  • Value is the most important gift social big data could offer to firms, businesses and customers. The worthiness is in their investigation and parsing and in the way they turn into useful information, new knowledge, competitive advantage and return on the investments.

3 Technologies and methodologies for exploiting social big data for marketing decisions

3.1 The main technologies to handle social big data

The intersection between big data and social media is reflected in the employment, by social networks, of the big data technologies in the design and development of their IT infrastructures. As a matter of fact, classical programming and storage paradigms cannot be employed in order to handle the huge amount of data coming from social media and social networks. For this reason, Google defined and patented the distributed MapReduce programming model (Dean and Ghemawat 2008). MapReduce is based on three elements: mapper, combiner and reducer functions. To effectively sum it up, input data are mapped to key values that are combined and grouped together, according to their similarities, and finally reduced to a minimum set of output values (Dean and Ghemawat 2008). The MapReduce paradigm may be implemented within different programming frameworks, and Apache Hadoop (2015) can be considered as the most important one. Hadoop provides both storage facilities, such as Hadoop Distributed File System (HDFS), spreading multiple copies of the data chunks into different machines and parallel processing capabilities based on the MapReduce paradigm. In order to overcome some of the drawbacks Hadoop has, the Apache SparkFootnote 1 framework may be used, especially when handling iterative jobs. Spark is a framework for cluster computing and allows for reusing a working set of data across multiple parallel operations. Compared with Hadoop, it provides performance up to 100 times faster for certain application, by allowing data to be queried repeatedly (Bello-Orgaz et al. 2016). Anyway, Spark cannot be used standalone, but it requires an external cluster manager and a third-party distributed storage system, as it is the case with HDFS. Distributed data may be easily handled by MongoDBFootnote 2, HyperDexFootnote 3, DocumentDBFootnote 4 and the like, which are document-based database systems storing data in a JSON-like fashion. Also NoSQL databases, that exploit the key-value paradigm for distributed fast queries, may be successfully employed for managing social big data. Finally, in this context, graph-based DBs, which are capable of mirroring the social relationships among users, may assume a very important role. Among them, we recall Neo4J,Footnote 5 Virtuoso,Footnote 6 Stardog,Footnote 7 etc.

Should elicitation last for at least some hours, Hadoop, Spark and NoSQL databases are surely up to the task of mining useful knowledge from social big data. On the other hand, when dealing with data streams that are quickly produced, change very quickly and need a real-time analysis, other technologies, such as Apache StormFootnote 8 or Apache Samza,Footnote 9 may be more suitable. Storm is a distributed real-time computation system useful for quick analysis of big streams of data. It is based on a master–slave approach and is composed by both a complex event processor and a distributed computation framework. Samza is a framework processing stream messages as they arrive, one at a time. Streams are divided into partitions that are in reality an ordered sequence of read-only messages.

Anyway, the aforementioned technologies must be used together with a combination of automatic machine learning, natural language elaboration, network analysis and statistics to extract interesting knowledge from social big data, the so-called social media analytics. As a matter of fact, text analysis algorithms, sentiment analysis and classification models may be used to detect traffic events, opinion trends and customer attitudes from streams of Status Update Messages (SUMs), comments and so on (Zeng et al. 2010; Liu 2012; D’Andrea et al. 2015; Ibrahim and Landa-Silva 2016; Li et al. 2016). In this context, Natural Language Processing (NLP) plays a major role (Cambria and White 2014) given the fact that SUMs are typically made of text. Indeed, NLP approaches, usually based on artificial intelligence paradigms, helps machines to read and understand text by simulating the human reasoning. As a consequence careful usage of the different algorithmic steps, such as tokenization, stop-word filtering, stemming, stemming filtering, information extraction and so on, should be employed.

In the field of machine learning a breakthrough technology, more and more employed in the big data framework, is surely deep learning (Chen and Lin 2014). This emerging paradigm consists of different layers of nonlinear projections which help to learn effective representations of increasing levels of abstraction through backpropagation. Modern powerful hardware, e.g., graphics process units (GPUs), is well suited for managing millions of free parameters and parallel computations of complex architectures (Sani et al 2016), such as Deep Belief Networks and Convolutional Neural Networks, anyway better deep learning architectures are to be devised for handling social big data in an efficient manner.

The Apache Mahout frameworkFootnote 10 and SAMOAFootnote 11 are examples of tools offering specific data mining algorithms for static and streaming data, respectively. Apache Flink,Footnote 12 which includes a machine learning library, allows handling both static big data and big data streams, and Spark has its own module for handling data streaming and a library of machine learning algorithms, namely MLib.Footnote 13 To the best of our knowledge, \(\hbox {H}_2\hbox {O}\) Footnote 14 is the most recent open-source framework that provides a parallel processing engine, analytics, math and machine learning libraries, along with data preprocessing and evaluation tools.

3.2 An operative methodology for social big data in marketing

In order to carry out social sensing and big data analytics in a marketing-effective way, we need a model taking into consideration human interactions in OSN (Wang et al. 2013; Anastasi et al. 2013) and the data sources (e.g., the SUMs) of this scenario, i.e., people’s posts and interplays. The major issue here concerns the correct interpretation of the huge and dynamic amount of data sources in an automatic but still correct way, exploiting the techniques described in the previous subsection. In Fig. 2 we show a cyclic operative methodology for the exploitation of social big data, integrating marketing strategies and social big data techniques. The model follows a typical approach of social media marketing and combines some interesting ideas found in the cyclic innovation model of Berkhout et al. (2010). In particular, in the shown model, big data technologies mirror Berkhout’s technological research, whereas the final result reports resemble the generic product creation. Indeed, the final products are useful insights used as a feedback, phenomenon of many typical social media models, to adapt future marketing strategies.

Fig. 2
figure 2

A cyclic process for social big data applied to marketing

Therefore, the results obtained at the end of the process are not set in stone and permanent, but can be helpful to consolidate and improve the strategic domain, the deployed technologies and the outcomes themselves, in future cycles. As a matter of fact, social big data analysis, and the following support to decisions, results to be more effective if accomplished continuously and adaptively, in order to monitor some strategic aspects or periodically verify the relevant changes after some activities and decisions have been enacted.

Fig. 3
figure 3

Examples of types of reports. a Words cloud. b Topic trend, courtesy of Yan (2015)

The aforementioned operative methodology encompasses four main phases, which are the following:

  • definition of the strategic social media domain,

  • selection of the most effective big data technologies,

  • extraction and interpretation of knowledge,

  • elaboration of the result reports.

The first step here is to identify the specific context, from which we want to mine useful information, e.g., the social media and social networks we intend to analyze and their most important conversations. These, in turn, depend on the chosen topics, contexts, markets and stakeholders. At this stage, appropriate filters on the queries to the social media and social networks have to be defined. The relevant filters may regard geographic positions, the language of the conversations, temporal intervals, etc. Some specific keywords, regarding a particular company and its brands, products and competitors, may be chosen for defining appropriate queries on social media/networks.

As examined in the previous subsection, a number of technologies may be employed, for data storage, management and information mining. Furthermore, a huge number of ready-to-use social media monitoring online services, such as Radian6,Footnote 15 Atlas.tiFootnote 16 and T-LAB,Footnote 17 are currently available for social big data analytics. Whenever we want to carry out a social media content analysis, first we have to accurately select the technologies and/or the ready-to-use monitoring services we intend to employ. Then, the social media analysis system must be setup, accordingly with the identified strategic social domain.

As stated before, the stage of knowledge extraction and interpretation is strictly related to machine learning, especially to social sensing and sentiment analysis. Such paradigms, being based on passions and emotions rather than on prices and costs alone, allow to get and interpret all the information needed to build up strategic decisions. At this stage, the results should also be evaluated by a team of content analysts to further assess the influence of posts, the compliance of the induced sentiments to the original conveyed message and the evaluation of different types of user (Wang and Luo 2013). However, the work of the human analysts needs to be supported by result reports in order to effectively support decisions. According to the different technologies and organizational needs, various types of report can be exploited: word clouds, topic analysis and trend, influence viewer, river of news and share of conversation (Chi et al. 2015). The word cloud is a set of frequent words within a given strategic domain: The bigger the word, the greater the number of times the word has been mined from social media/networks. The topic analysis concerns infographics representing, for each main topic, the distribution of the conversations where it is present. Such an action can be performed according to different parameters such as competitors, specific channels, languages, geographic provenance, sentiments and the like. On the other hand, the topic trend refers to the temporal tendency of any given conversation with respect to a specific theme and/or a certain parameter. The influence viewer allows to detect the most influencing channels and users for certain topics. The river of news is simply a detailed list of conversations per single topic, while the share of conversation is a percentage highlighting the popularity of a certain theme in the reference domain. Two examples of the aforementioned reports are depicted in Fig. 3.

4 A review of social big data analytics in support of marketing decisions

In this section, we intend to carry out a brief analysis and review of the current literature to frame various social big data use cases within four main options when effective marketing decisions are concerned. We aim to show the best ways to emphasize growth opportunities, while highlighting the main unresolved issues in the area. We employed a systematic approach, similar to science mapping analysis (Cobo et al. 2011), in order to identify the most relevant academic and peer-reviewed articles for the literature review process. First, we selected some academic databases, including IEEE Xplore, Web of Science, Business Source Premier, Science Direct, Emerald and Wiley Online Library. Then, we searched these databases using specific keywords, such as: “social media analytics,” “big data mining” and “social media marketing.” Finally, we selected those contributions concerning social big data analytics for marketing strategies, considering only primary studies published in English and contained in journals, conference proceedings, books and white papers. Each candidate study among such works went through the following three selection stages: (1) assessment of the title, the whole work was discarded, if its title did not prove a well-grounded relationship to social big data analysis and marketing; (2) reading of the abstract, excluding those works not touching the core topic of the research; (3) retrieval of the study and reading of its introduction and conclusions, discarding those contributions being too similar to other more relevant studies by the same authors. Moreover, in order to have a snapshot on the most recent trends, we filtered the surviving papers per year considering only 2014, 2015 and 2016. Following this phase, 52 studies remained and we proceeded to cluster those marketing actions, being present in the works we analyzed and fostered by social big data, as follows:

  1. 1.

    Integrating the traditional market researches and strategies allowing to highlight trends and to define new scenarios, analyze competitors and innovate products and services (Table 1).

  2. 2.

    Developing advertisement campaigns as well as communication and promotional activities, both online and off-line (Table 2).

  3. 3.

    Analyzing the perception and the reputation management as regards the brand, the products and the services offered by a certain firm (Table 3).

  4. 4.

    Managing customer relationships, also known as social customer relationship management (Table 4).

These categories have been chosen in order to frame separately, as much as possible, the contributions we considered, in order to get from them different open issues and research directions that will be discussed in Sect. 4.5.

4.1 Integration of the traditional market researches and strategies

In this category, we can classify the work in Erevelles et al. (2016), advocating that corporations must properly manage and adapt their physical, human and organizational resources to the big data paradigm. By “physical means” we intend the involved software and platforms, the human capitals are those scientists and practitioners who know how to catch useful insights and the organizational resources specifically refer to internal processes and procedures to turn the extracted information and insights into practical daily activities. These resources should be carefully re-engineered, enhancing the flow of information and continuously training the personnel on the matter; this is witnessed also in Gastaldi (2014) and in Bubanja (2016). In the former case, businesses are encouraged to adopt social big data techniques, as well as mobile sensors, and incorporate their use in their economic models and strategies, as they may affect the longevity and the success of the business itself; in the latter case the importance of social media analysis is highlighted regarding the development of South-East European companies.

The contribution in Heinrich et al. (2015) shows how social big data analytics foster dynamic capabilities, in order to find those hitherto unmet needs and predict consumers’ behavior, as well as adaptive skills, in order to continuously measure key performances and maintain their temporary competitive advantage toward their competitors. In such a way, the value coming from big data could be deployed for both their incremental and radical innovations: The former may enhance both the current and the already tested marketing strategies, whereas the latter allows to define new techniques, such as the new anticipatory shipping strategy employed by Amazon (Banker 2014). Another interesting example of a successful integrated marketing campaign can be drawn from Bekmamedova and Shanks (2014), a work describing the social media marketing strategy of a bank. In this contribution the insights and actions coming from social big data are effectively embedded into existent business processes and legacy decision-making routines of marketing managers and all business analysts. We can also enshrine in this section the studies in Dutta and Bose (2015) and Jiang and Chai (2016). In the latter work the authors try to study the engrossment, on the part of a generic business, of the knowledge and the value coming from social big data analytics, by using genetic algorithms and back-propagation neural networks, in order to fully deploy and enhance a business model made up by nine building blocks. The former highlights the complexity of a social big data project and the need for a change of mindset in both the marketing heads and the employees of any given firm, as the applying of social big data analytics would involve various phases of the production cycle, be they the strategic groundwork identification, data mining strategies and their own implementations. In this perspective, in Gilfoil et al. (2014) an important question is highlighted, i.e., how many resources should be re-allocated from traditional marketing tools toward new social media marketing campaign. This is expressed also in Phan and Park (2014) where social big data are assessed in the framework of the luxury industry, and in Milolidakis et al. (2014) where the adoption of social big data is referred to mobile brands and where their influence, both on the internal and external organization of a company, is pointed out. The issue of firm organization is discussed also in Leung et al. (2015), where an in-depth study of social big data, applied to the tourism industry and to its online marketing strategy, is presented. Another viewpoint on the matter is provided by the authors of Xiang et al. (2015), who investigate, in a qualitative and quantitative way, the implementation in the hospitality industry of the insights arising from social big data. This work uses both the data coming from social media, like comments and reviews, and tracing of customer behaviors, such as online web visits and off-line attitudes. Another example may be drawn from Wang and Zhao (2016) where the assimilation of social big data into a Chinese manufacturing firm is analyzed through a two-layer framework.

Table 1 Summary of papers employing social big data as integration of traditional marketing

4.2 Fostering of commercials and promotion activities

Some significant examples in this category may be those of Amazon, eBay and other similar Web sites, together with their innovative selling platforms and recommender systems (Bello-Orgaz et al. 2016). Some examples of these algorithms may be found also in Google Adwords (Geddes 2014) or IBM Business Analytics for Marketing (Schwarz et al. 2015), designed to help marketing leaders taking “data-driven decisions.” Moreover, the work in Wamba et al. (2015) regards suggestions for ad hoc services in order to create more and more highly targeted advertisements and marketing campaigns, with the aim of both building business-to-customer relationships and reaching niche markets as well. This is reaffirmed also in Varlamis et al. (2015), where a framework is built in order to suggest to companies the aspects and weaknesses to improve through personalized or group advertisements.

Considering social networks, even though Facebook advertisements may be perceived as unfruitful for purchasing or even annoying by most users, it is proven that they increase the returns on investments and the profits of web-based firms, through an increase in the number of visitors (Tucker 2014; Hargittai 2015). Another recent work considering Facebook to foster marketing campaign is presented in Gull et al. (2014), where influencing key users are detected through fuzzy C-means clustering applied to those social big data coming from users’ posts.

Important issues and challenges in this category concern the modeling of the dynamics of viral marketing on social networks, or the in-depth analysis of those feelings and passion-driven information toward certain brands (Hayes et al. 2016). In this perspective, the contribution in Brown and Harmon (2014) sheds light on the possibilities of big data together with viral mobile messages for geosensing-empowered direct marketing services. This work proposes to employ a platform allowing to take advantage of the localization-based information in real time from mobile users and GIS (Geographic Information Systems) tracking technologies, in order to identify virtual market zones. Users entering a certain geofence automatically trigger personalized promotion messages. A similar work evaluates the popularity of a product, namely iPhone 6, in terms of geo-spatial coordinates and gender of the user providing tweets (Hridoy et al. 2015). This leads to assess correctly short period trends and consequently to adjust eventual advertisement campaigns, according to the advantages or drawbacks of the considered product.

An innovative usage of social big data in this category is presented in Bao et al. (2014), where the authors analyze different techniques, with different granularity degrees, for selecting appealing allies in a partner-marketing campaign on the basis of geographic, purchasing and brand information.

Specific usages of social big data in this group may also concern politics marketing, for example for driving online donations and voters mobilizations, policy discussions and campaign advertising, as well as transparency and participation fostering (Wamba et al. 2015).

Another particular usage of social big data in this class may refer to LinkedIn (2014). Firms, through the usage of this social network, are able to observe both their customers and their potential candidates understanding what moves jobless people from the working point of view, i.e., they are able to create a general identikit of the typical millennial. The main potentiality, in this new perspective, is, however, the chance to perform an evolved recruitment.

An innovative deployment of social big data, in this cluster, can also be found in Davcheva (2014), where young sport talents are early identified for the sporting goods industry through social media analysis of their reputation and mentions on social networks. This task is achieved also thanks to instruments such as the aforementioned word cloud and buzz graph, as well as natural language processing tools, but it still lacks something, as the right sources of data may convey different information during time and the identified talents may quickly turn into common players.

Finally, an effective deployment of social media in this subdivision is analyzed in He et al. (2014) as regards the food industry. The work presents social media as giant word-of-mouths, able to catalyze and distribute social big data virally, to establish competitive benchmarks and to monitor performances as well as possible weaknesses. The Virality concept is studied also in Ketelaar et al. (2016) where the authors test a conceptual framework to predict pass-on behavior for online contents, and in Kafeza et al. (2016), wherein a pattern-based diffusion process is studied for historical retweeting behaviors and particular users are detected for fostering certain marketing campaigns.

4.3 Analyzing perception and reputation

Concerning the reputation of any given brand, the work in Sashittal et al. (2015) shows how Twitter can be an important way of spreading brands and their slogans in a sort of electronic word-of-mouth. A relevant percentage of tweets concerns some brands and a subset of these contains also sentiments toward such trademarks. These are extremely important pieces of information for corporations, whether they regard themselves or their competitors, especially because of the real-timeliness of the knowledge able to generate potential competitive advantages. The work in Liu et al. (2014) focuses, in particular, on the time trends and frequency of tweets, considering the variation in the users’ retweeting behavior according to the temporal, social and topic contexts as a whole. The authors, even if not applying any big data technology, succeed into proposing an effective predictive model for users’ retweets as well as a sort of time-aware recommendation method, able to suggest to companies which information to publish during a certain time interval.

The contribution in Leung et al. (2015) highlights that authenticity, genuineness, as well as transparency, are fundamental to appear attractive toward both new and past customers. Moreover, teamwork and passion create a more involving environment and context that may help in reducing consumers’ churn.

A particular case of brand reputation is that of politics and government trust. In Anjaria and Guddeti (2014) a real-time assessment of tweets, through sentiment analysis and various classification algorithms, has been successfully deployed in two scenarios: the US 2012 Presidential Elections and the Karnataka 2013 Assembly Elections. The effectiveness in monitoring public opinion and forecasting political results emerges also in Song et al. (2014), where Twitter has been exploited to mine dynamic social trends and to highlight communications among people supporting the same political candidates, and in Calderon et al. (2015), where the authors try to find a relation between big data analytics coming from Twitter and the citizens’ trust toward politicians and government in the aftermath of the 2014 Brazilian World Cup. An interesting analysis about electoral big data and their distributions can be found also in Wang and Wiebe (2014), where historical elections data and potential policy gains are investigated through fuzzy methods in order to assess policy confirmations and general consensus.

Table 2 Summary of papers employing social big data to foster commercials

The work in Zhang (2016) exploits sentiment analysis and users’ perceptions as well; in this case, the objective is to create an intelligent recommending system that is emotion aware and able to rate revisions on the basis of particular emotional offsets. A further work based on sentiment analysis in this category is the one in Lou et al. (2016), where the authors try to improve location-based recommending systems through sentiments and opinions of users in some microblogs about certain points of interest. Considering the sentiment attributes of locations can result into an improvement of the performance for travel agencies and Web sites, through more humanization and closeness to what consumers really need and feel.

A different specific application of brand reputation through social big data can be found again in LinkedIn (2014). This social network is able to provide useful social big data both for companies and for those people looking for a job position, because enterprises have to promote their business brand, whereas unemployed people want to advertise their personal brand.

Further examples in this category can be found in Bekmamedova and Shanks (2014), Kumar et al. (2016) and in Oh et al. (2015). In the first of these works the authors analyze the improvement of brand reputation in the financial sector, specifically a bank tried to change the impression consumers have toward itself exploiting new social media channels. Also the second work considers the enhancing of web pages of some banks in the perspective of increasing or maintaining a brand loyalty; this is accomplished through an eye tracker analysis of Facebook pages and the usage of heat maps. In the third aforementioned work, the researchers propose an assessment of trends and reputations of TV programs on social channels, correlating audience ratings during certain periods of time with the amount of social big data generated over particular social networks concerning the same program or TV series. The convergence between traditional TV indicators and social big data is studied also in Pensa et al. (2016), where the authors try to get much deeper real-time understanding of what viewers think about shows and the brands that advertise on them, through a concept-level integration framework exploiting conceptual abstractions and ontological domain knowledge.

In Su and Chen (2016), the authors present a product improvement framework based on social media analytics, focusing especially on Twitter as a source of opinions and sentiments of users in order to improve a company’s product, namely Apple iPhone.

Finally, in this category we can also mention the usage of social big data to find opportune influencers for a certain business or product (Lauherta-Otero and Cordero-Gutierrez 2016). For example, fashion influencers and fashion bloggers are an emerging phenomenon of the last years: They are capable of driving trends and purchases only through opportune photos and proper comments about some brands. Therefore, discovering correct people for increasing the reputation of a certain product is going to become one of the main usages of social big data analytics.

4.4 Managing customer relationship

The study in Sand et al. (2014) presents a recent project, with the aim of bringing together enterprises and consumers. The main objective is to increase the return of investments of corporations through accurate social media marketing campaigns. This is achieved through an involvement strategy, targeted to consumers, taking advantage of gaming techniques. Thereafter the user and their information are sold as the main sale product. The proposed platform aims to be a concrete framework for social media marketing, capable of positioning firms and geofencing markets, but it deploys a holistic approach and it undergoes some important and not trivial market barriers such as penetrating the global audience.

An innovative and upstream idea for this category can be found in Berger et al. (2014) as well. Here, the authors envision consumers being paid with incentives or real money for their participation in online communities, in leaving comments and so on. These ideas are indeed already used by some web services providing cashbacks, both in actual money and through coupons and vouchers, for online purchases passing through their web pages or for participating into surveys or pay-per-click campaigns. Some examples of such sites are Beruby.com,Footnote 18 i-say.com,Footnote 19 Nielsen Online RewardsFootnote 20 and the like. In He et al. (2014), it is highlighted that it is possible to foster customer care, customer support and retention, as well as development and innovation, driven by the consumers themselves, through praises and complaints. Another important aspect regards the creation of a friendly and happy atmosphere, by means of instruments such as greetings, promotions, deals, contests, games and so on, that boosts the attraction of new customers, their loyalty to the company brand and the mitigation of certain levels of criticality (Fourati-Jamoussi 2015). Thanks to their social traces, customers can be in the middle of the marketing strategy and become more powerful and participative in the company marketing decisions. This is mainly due to the public and transparent features of social big data compared with the private opportunities for complaining and praising that were available in the past. The central role of customers’ social big data is highlighted also in Vadivu and Neelamalar (2015), where an accurate analysis of users’ engagement is carried out on some Facebook pages, considering the number of likes, of posts, of shares and of comments, in the latter case both as voluntary actions and as a response to the moderator’s ones. In the work, it is found that the interactivity and the quality of posts may be much more successful than the bare number of posts and of fans.

Table 3 Summary of papers employing social big data used to analyze perception and reputation of brands

Finally, the work in Arora and Malik (2015) suggests to fruitful employ social big data analytics to identify strategies for retaining customers before they decide to leave a company for a competitor. This fits perfectly the logic of the customer lifetime value, defined as the present value of all existing and future cash flows generated from a customer (Zhang et al. 2016). As a matter of fact, the cost of gaining a new client is significantly higher than that of avoiding his/her churn in favor of competitors. In this perspective, the analysis of social big data may help corporations to be proactive and providing offers and discounted prices for long-term dissatisfied customers. However, identifying the real cause of customers’ churn, such as inefficiencies or lost time, is not always easy. Indeed, understanding real motivations is a challenging task and may not always yield users’ true intentions, thus wrong predictions may still take place and strategies to avoid them should be enacted. This difficulty is evident also in the study in Coursaris et al. (2016), where the links between brand equity, consumers’ engagement and purchase intentions are thoroughly analyzed. The findings of this work are mainly the importance of creating a community connected to the brand and the central role of brand equity that can both be influenced by engaging social media contents and foster a further involvement into social media activities. The same authors in Coursaris and Osch (2016) analyze further the relationship between social big data content types, the benefits perceived by the consumers involved and their connection with loyalty and purchase intentions. The study, which considers both a luxury and a non-luxury French cosmetics brand, reveals that only social and hedonic benefits, deriving from dialog and community contents, respectively, can be useful predictors of future consumers’ expenditures, whereas informational benefits, coming from news content, are not so influencing.

4.5 Discussion and future research

The following figures can be drawn from the recent literature we reviewed: 14 papers deal with the integration of social big data into traditional marketing techniques, 17 address the fostering of promotions through social big data, 16 try to exploit social big data in the perspective of brand reputation and finally 9 concern the customer relationship. Only 4 papers of 52 can be classified in more than one category; thus, the considered classification is quite effective in differentiating the various points of view of marketing and social big data. As it can be inferred from Fig. 4, the analyzed research production on the topic is slightly decreasing from the 20 works of 2014 to the 17 of 2016, but the maturity of the studies is keeping constant as one can see from the number of journal publications, numbering about 7 during every year we considered, while the number of conference contributions experienced a slight decrease during the same period. The scientific research in the field has reached a certain maturity overall, as the number of journal contributions is almost the half of the analyzed papers, anyway some interesting conference papers are still being published, as they encompass ongoing developments by analyzing in depth some subtle details, remained uncovered till now, such as the most involving contents, on the part of the consumers, of social big data or the usage of this pervasive information to recruit new effective testimonials.

Fig. 4
figure 4

Distribution of the analyzed papers

Table 4 Summary of papers employing social big data used to foster customer relationship

In general, the four considered categories, used to classify the recent literature, suggest different findings and open issues, summarized in the following.

As regards the integration of legacy marketing techniques, there is the need for a change in company cultures toward the more dynamic and adaptive stream of social big data, in which corporations should delve and imbibe themselves. This should take place, in the best possible way, in a collaborative and distributed way across different platforms, for example, through the usage of system of systems service architectures as envisioned in Wong et al. (2014).

Furthermore, social big data must change the marketing strategies without turning upside-down the whole organization; what appears from the survey is the importance to focus on a few social marketing channels, as the point is not to please everybody, but rather the most fruitful consumers.

Finally, the reliability of the results mainly depends on the maturity of the local market and clients: What is interesting, and pointed out also by To and Lai (2015), is that social big data sources and contents may vary between western and eastern markets and different prerequisites must be considered in order to correctly deploy their insights.

Concerning the fostering of advertisement what can be gleaned from this survey is that variables such as the penetration degree, the transmission rate and the emotive depth of marketing campaigns should be carefully considered in a possibly unified framework. Moreover, the reasons for the happening of spikes of emotions during certain time intervals and the analysis of different online social networks, other than the traditional Facebook and Twitter, are some possible research gaps in this field that the authors encourage to investigate further.

The suggestions coming from the third category we considered concern the fact that the proposed social big data analysis techniques could be deployed to investigate public opinion about hot social and political issues as well, in order to drive politicians and companies toward better decisions for the whole society, which in turn could boost brand and identity reputation. This could be enacted also considering legacy TV channels as a tool complementary to modern social media networks, but this integration needs further research.

From the fourth category, we can draw the conclusion that penetrating the global audience is not so easy, even with the help of social big data and that the bond with the customers could be strengthened involving them in social activities, such as polls and questionnaires promoted by the companies themselves. Anyway, finding the real cause of customers’ churn, such as inefficiencies or lost time, is not always easy: Understanding the real motivations and the effective drivers of increasing purchases is a challenging task and may not always yield users’ true intentions; thus, wrong predictions still may take place and strategies to avoid them should be enacted.

The whole insights of the analyses we carried out throughout this paper concern the power of social big data to enable companies: (i) to garner suggestions and hints useful in defining new marketing strategies, (ii) to adapt the existing strategies to a new context undertaking corrective actions, (iii) to determine new or past scenarios where to intervene in terms of communications, products, services and markets. What comes to light is that it is desirable to extend further the intersections among social media, big data and data analysis techniques, maybe through a combination of different social big data dimensions and sentiment analysis into the same marketing methods. This is motivated by the fact that mixed methods, which take advantage of contents, relations, experiences and activities improve the confidence marketing managers have in their findings, and they can also yield valuable insights into the inadequacies, gaps, biases of traditional one-dimension procedures (Behrendt et al. 2014).

However, some conclusive issues to be investigated further, concerning all the categories considered, are the following: (i) the doubt whether social big data activities should be outsourced rather than conducted within the company itself and in which case the return shall prove to be greater, (ii) the required number of experts and scientists, with a deep knowledge of social big data analysis techniques, for a successful marketing strategy, (iii) the need for a standardized assessment platform and the lack of an integrated theoretical framework for both social media analytics and big data technologies, (iv) the necessity to take particular care with privacy and security issues as well as an effective risk assessment (Tanimoto et al. 2016), (v) the growing need for real-time and timely processing of social big data and (vi) a lexicon-independent sentiment analysis (Fiarni et al. 2016).

5 Conclusions and future perspectives

In this paper, we surveyed social big data, the current techniques to manage them and a variety of effective marketing deployments based on such a huge amount of data. In the first part, we have accurately described the numbers characterizing Society 2.0, the possible dimensions of social big data and the technologies currently employed to handle them. Moreover, we described an effective and operative methodology to foster marketing strategies by taking advantage of such valuable information sources that social big data represent. In the second part, we provided a systematic review of the recent literature on social big data and marketing, uncovering some findings and some open research issues.

What can be concluded is that social big data is no more a niche topic nowadays, but rather a proper phenomenon capable of radically changing the world and to promote a great variety of benefits such as the increase of operating margin, the growth in the available jobs and working positions, the growth in market figures, the escalation of on-time decisions and saved time, the boost in digitally generated data and customer experience-driven promotions and the like.

After a broad-spectrum analysis of the main marketing possibilities of social big data in various fields and their classification, we have to conclude that we are still far from a single centralized, holistic and rigorous platform having a few buttons through which a marketer can take relevant decisions for his future market strategies. Indeed, the aims and the fields of application may be very different and in each context, many issues and problems still have to be overcome and investigated further.