1 Introduction

People around the world use Social Media (SM) to communicate, connect and interact with other users, sharing and propagating information at a great rate [1]. SM facilitate sharing information, ideas, interests and other forms of expression through virtual communities and networks [2]. There is a great variety of services offered having many common features [3]. SM are considered interactive Internet-based applications [4]. SM are full of user-generated data, such as posts, photos, videos and so on. They offer user accounts (profiles) on websites and mobile apps, facilitating the generation of web based social networks, connecting users or groups [5].

A Social Network (SN) is a social structure consisting of several actors/entities/groups of entities, that describe a variety of interactions among them. Studies like the one reported in [6] present taxonomies for SN, which describe the spectrum of attributes that relate to these systems. They provide a reference point for different system compositions, aiming at capturing their building blocks, whilst examining the architectural designs and business models they might pose.

SN offer different techniques for analyzing the structure of social atoms (entities), as well as a set of theories for understanding and recognizing patterns hidden in them [7]. Such patterns can be local or global, which can be further analyzed in order to mine special entities that might influence others or examine characteristics of parts or the whole network [8].

During the early years of SM networking, Social Media Platforms (SMP) had a clear vision statement. Nowadays, most SM provide services and functionalities using different names. SM users take advantage of services such as connecting, sharing, entertaining, monetizing etc., seeking to detect brand awareness indicators, usage for sales, feedbacks, opinions and more, before approaching specific target groups. Figure 1 shows the number of SM users worldwide since 2010, along with estimated numbers for up to 2021. Categorizing SMPs helps addressing appropriate groups and improve our understanding regarding SM, whilst getting better results from each platform/site. New opportunities arise for research and improvements based on new data at our disposal. Although SM networking is considered a new field of studies, more and more researchers work on it, due to its wide user adoption [9].

Fig. 1
figure 1

Number of SM users worldwide (2010–2021*) [9]

SM data types are highly dependent on typical user activities. There are various characteristics and implications on SM that often lead to confusions regarding data handling [10]. Therefore, our work aims to elaborate on Social Media Types (SMTs), updating current literature, as well as to introduce new perspectives on SMPs multiple feature offerings.

While we refer to SMTs and networks, we survey and categorize most common such types and we research an update to their current standardization. To achieve that, we extract from SMTs features and services that we refer to as “Utilities”, and develop a methodology based on our initial hypothesis H0 (“standard SMTs can be narrowed down to a smaller number n”) which is later backed up by further elaboration on our SM feature dataset.

We report on SM evolution and how we can use a data-driven approach in order to generate a new SMTs taxonomy. This is significant because SM offer an increasingly wider variety of services, making it difficult to determine their core purpose and mission; therefore, their type. This paper assesses SMT evolution, presents and evaluates a novel hypothesis-based data driven methodology for analyzing SMPs and categorizing SMTs based on their services.

As a result of our first experiment (Experiment #1, detailed in Sect. 4.2) we propose five (5) SMTs, which we argue to be better and more synched with the current state of play in SM than categorizations proposing, nine (9) [11] or seven (7) [2] SMTs respectively. Yet, when comparing these early results with work proposing three (3) SMTs [4], we conclude that a tighter categorization scheme is needed.

Thus, we conduct further research, striving for better results. With Experiment #2 we came up with four (4) clusters which can be interpreted as four (4) SMTs. Finally, we present an insight into the merged version of the two (2) experiments, which proposes a new categorization that consists of three (3) SMTs, namely: Social networks, Entertainment networks, and Profiling networks, typically capturing emerging SMP services.

The remainder of this paper is structured as follows: Literature review (Sect. 2) presents the state of the art on SMTs. Methodology (Sect. 3) defines our problem, methods, dataset, observations and research process. Experiments (Sect. 4) presents experimental results, while Research summary (Sect. 5.1) discusses key findings relating them with H0 and presents important extracts from our research. The rest of the Conclusions (Sect. 5.2 & Sect 5.3) discusses results, assesses the importance of our work along with biases and threats to validity and presents directions for future work.

2 Literature review

There are various approaches when dealing with a new taxonomy proposal. For example, Engelbrecht et al. categorize data-driven business models based on three points: the data source, the target audience and the technological effort [12]. Then, they propose eight (8) categories of business models. Our work aims to research categories of SM (SMTs), a rather untapped topic regarding SM.

Based on Social Theories, there is the Social Atom as an individual that interacts with the Social Molecule which is the community, constructing seven (7) probable building blocks (Identity, Conversations, Sharing, Presence, Relationships, Reputation, Groups) of SM [2]. A categorization of SM sites (and by extension SMTs) such as blogs, social media sites, and virtual game worlds can be found in [4]. The classification is based on purpose and functionality. Nine (9) types of Social Media are identified [11]:

  1. 1.

    Online Social Networking Web-based services that allow individuals and communities to connect with real world friends and acquaintances online. Users interact with each other through status updates, comments, media sharing and messages. Examples: Facebook, Myspace, LinkedIn.

  2. 2.

    Blogging Journal-like websites for users, to contribute textual and multimedia content, arranged in a reverse chronological order. Blogs are generally maintained by an individual or by a community. Examples: Huffington Post, Business Insider, Engadget, WordPress.com, Medium.

  3. 3.

    Micro-blogging Same as blogs, but with limited content. Examples: Twitter, Tumblr, Plurk.

  4. 4.

    Wikis Collaborative editing environment that allows multiple users to develop Web pages. Examples: Wikipedia, Wikitravel, Wikihow.

  5. 5.

    Social news Sharing and selection of news stories and articles by communities of users. Examples: Digg, Slashdot, Reddit, Quora.

  6. 6.

    Social book-marking Allows users to bookmark Web content for storage, organization, and sharing. Examples: Delicious, StumbleUpon.

  7. 7.

    Media sharing Sharing of media on the Web including video, audio, and photos. Examples: YouTube, Flickr, UstreamTV.

  8. 8.

    Opinion, reviews and rating The primary function of such sites is to collect and publish user submitted content in the form of subjective commentary on existing products, services, entertainment, businesses and places. Examples: Epinions, Yelp, Cnet, Zomato, TripAdvisor.

  9. 9.

    Answers Platforms for users seeking advice, guidance or knowledge to ask questions. Other community users can answer these questions based on previous experiences, personal opinions or relevant research. Answers are generally judged using ratings and comments. Examples: Yahoo! answers, WikiAnswers.

3 Methodology

In this section we analyze our methodology, including the problem definition, our methods, the data set, some key research observations and the corresponding process.

3.1 Problem definition

The current standardization on categories of SMTs (like the ones presented in [2, 4, 11]) is considered decaying, since SMTs develop rapidly on platforms that offer various services and multiple features that we label as Utilities. Our aim is to introduce a new taxonomy that narrows down the current SMTs standardization, since most of the modern SMPs tend to offer multiple Utilities into a single platform/product. Therefore, we investigate this issue, expecting to offer another option regarding SMTs. Our methodology takes into consideration our observations (Sect. 3.4) on a dataset that contains different SM alongside their official features. We perform two (2) experiments (reported in Sect. 4) involving association rule mining and clustering in order to unfold a data-driven methodology that validates our summarized research question: “Can the current state of the art on SMTs (Sect. 2) be updated by reducing the number of SMT standards; thus, better reflecting the current state of play?”

3.2 Methods

It should be noted that there are numerous data mining functions to choose from; two prominent ones are association rules and clustering, implemented by a variety of algorithms [13, 14]. We used RapidMinerFootnote 1 [17] for experimentation, because it contains all the algorithms we want to utilize for our experiments. The following subsections contain a short introduction to unsupervised learning (like clustering) and association rule mining with brief descriptions of key algorithms, as well as details about the methods we employed for our experiments.

3.2.1 Association rule mining

Association rule mining [18] is a machine learning method for discovering relations between variables in large databases [19]. The intention here is to identify strong rules in databases using some measures of interest, like confidence and support [20]. There are exhaustive and heuristic association rule algorithms, like Apriori [21], a prominent algorithm for mining frequent itemsets for Boolean association rules and FP-Growth [22] that is detailed in this subsection. Also, ARMICA [14], a novel ARM method, based on the heuristic Imperialism Competitive Algorithm (ICA), for finding frequent itemsets and extracting rules from datasets, whilst setting support automatically. In this paper we use two (2) measures in order to find interesting rules from the dataset: minimum support and confidence.

Let I = {i1, i2,…, in} be a set of n binary attributes called items. Let D = {t1, t2,…, tm} be a set of transactions called the database. Each transaction in D has a unique transaction ID and contains a subset of the items in I. A rule is defined as an implication of the form X ⇒ Y where X, Y ⊆ I and X ∩ Y = ∅. The sets of items (itemsets) X and Y are called antecedent (left-hand-side or LHS) and consequent (right-hand-side or RHS) of the rule [23]. In order to select interesting rules from the set of all possible rules, constraints on various measures of significance and interest can be used. The best-known constraints are minimum thresholds on support and confidence.

Definition of Support

[24]

The support supp(X) of an itemset X is defined as the proportion of transactions in the dataset which contain the itemset.

Definition of Confidence

[24]

Confidence can be interpreted as an estimate of P(Y |X), i.e. the probability of finding the RHS of the rule in transactions under the condition that these transactions also satisfy the LHS, or the measure that indicates how often the rule is true. The confidence of a rule is defined as:

$$ {\text{conf}}({\text{X}} \Rightarrow {\text{Y}}) = {\text{supp}}({\text{X}} \cup {\text{Y}})/{\text{supp}}\left( {\text{X}} \right). $$
(1)

FP-Growth [22] was used in Experiment#1 (Sect. 4.2). This algorithm counts occurrences of items in the dataset and appoints them to a header table. Then it builds the FP-tree structure (“a compact structure that stores quantitative information about frequent patterns in a database”) [25] by inserting instances. Items in each instance are sorted by descending order of their frequency in the dataset for faster tree processing. Then a threshold for coverage is applied and all items that do not meet the requirements are removed. Recursive processing of this compressed version of the dataset grows large itemsets directly, instead of generating candidate items and testing them against the entire database. After a few more steps [22] the recursive process is finalized and the largest sets of items with minimum coverage have been found, and association rule creation begins [26].

3.2.2 Clustering

Clustering is an unsupervised learning method, which creates groups from datasets that consist of objects or entities that are characterized by similar or identical attribute values, but are adequately different from entities that belong to other clusters [13]. For running a clustering algorithm, we need to specify the distance measure (e.g. Euclidean, Manhattan, Jaccard, Cosine distances) [27]. After that, clustering methods often continue with the process of object selection and a method for evaluating the results [28]. For evaluation we can use quality measures like cohesiveness (measure for object-to-object distance), separateness (measure for cluster-to-cluster distance) and silhouette index (mix of cohesiveness and separateness) [29].

Clustering algorithms that we use in our experiments (specifically, Experiment#2, Sect. 4.3) are:

Density-based spatial clustering of applications with noise

(DBSCAN) [30] It is density-based, meaning that given a set of points in some space, it tries to group together points that are packed together, labeling outlying points that are alone in low-density regions. It functions on three (3) abstract steps [31]:

  1. 1.

    Find the points in the ε (eps) neighborhood of every point and identify the core points with number of neighbors more than minPts.

  2. 2.

    Find the components that are connected with core points on the neighboring graph, without taking into consideration non-core points.

  3. 3.

    Assign every non-core point to a nearby cluster if the cluster is an ε (eps) neighbor, else assign it to noise.

For the RapidMiner [17] implementation of this algorithm, we used: epsilon = 1: (Range:real; 0.0 ± ∞; default:1), which specifies the size of the neighborhood and min points = 5: (Range:integer; 1 ± ∞; default:5), which specifies the minimum number of points forming a cluster. As for measure types, there are four (4) options: Mixed Measures, Nominal Measures, Numerical Measures and Bregman Divergences. The last two (2) cannot be used since our dataset does not contain numerical attributes. So, out of the remaining two (2) groups of measure types we chose Mixed Measures, and specifically the Mixed Euclidean Distance for two (2) reasons: a) Nominal Measures contain, Nominal Distance, Dice Similarity, Jaccard Similarity, Kulczynski Similarity, RogersTanimoto Similarity, RussellRao Similarity and Simple Matching Similarity which all form two (2) clusters with no reasonable results except from Nominal Distance. which produces exactly the same results as Mixed Euclidean Distance, and b) according to RapidMiner user statistics, 79% of users utilize the Mixed Euclidean Distance measure which in our case outperforms the rest of the measures.

k-Medoids

is a clustering algorithm related to k-means and the medoidshift algorithm [32]. Both k-means and k-Medoids partition the dataset, and attempt to minimize the distance between points labeled to belong to a cluster and a point designated as the epicenter of the cluster. Running this algorithm in RapidMiner we used the following default parameter values: max runs = 10, max optimization step = 100. We also tried other values, but they produced the same or poorer results. Regarding the measure type, we used Mixed Euclidean Distance, as we did with DBSCAN.

Random-Clustering

[33] It generates simple and uniform random partitions. It has a single parameter controlling the partition of a random permutation into its cycles. The limit distribution of the size index of the generated partition is the join of the independent Poisson distributions with means determined by the size and the parameter. As for RapidMiner’s parameters, in this algorithm the only one required is the number of clusters to be formed (more in Sect. 4.3).

3.3 Dataset

The dataset used for our methodology contains various SMPs; the choice is based on ranking regarding active monthly users, using the expanded and merged version of Table 2 and “Appendix A”. We consider a platform’s user penetration, as well as the variety of its official features, as the most important attributes when enlisting a candidate platform to our methodology. It is built and populated by data retrieved from the official sites of each of the 112 SMPs we review. Some platforms with smaller user penetration implement fewer features. Clearly the list is not exhaustive, given the volatile nature of SM popularity and feature base. We use data pre-processing techniques such as removing duplicates and missing values, or data transformation and reduction as needed to normalize our research dataset (further explained in Observation#1 below).

Having presented the most common SMTs in Sect. 2, Table 1 summarizes the top fifteen (15) ranked SM information networks with regards to active users [34].

Table 1 SM ranking by active users

3.4 Observations

Table 1 shows the top fifteen (15) ranked sites, based on active users. The mapping of features to Utilities is described step-by-step by Observations #1–4 below. All in all, we examined each feature, and grouped these logically, according to their semantic meaning in context. Each group was then labelled by a term, signifying the corresponding utility.

Observation#1

We map platform features onto Utilities using common sense, semantics and denotation forming “Appendix B”, in line with similar research [2, 4, 11]. This mapping is heuristic, not guaranteed to be the optimal, but it is suitable for practically appointing each feature (described by a word or a sentence) to a Utility. For example, Facebook, LinkedIn and VK implement the “Messaging” feature, which can be grouped under the Utility we call “Connecting”.

The most representative official features for SMPs are shown in Table 2 (data retrieved from the official documentation for each platform [3549]). Nowadays, the majority of SM support multimedia sharing, posting, hash-tagging features and more, under different feature labeling. We use an expanded form of the current standardized types, as used in [2, 4, 11], to assign relevant feature labels into conceptually compliant Utilities.

Table 2 Official features for the 15 top-ranked sites

Observation#2

We transform features so that each attribute in our dataset represents a semantically equivalent specific Utility in the real-world. Examples: feature “Messaging” becomes “Connecting”, users exchange text, voice and/or video etc. which is a means for establishing social connections. Feature “Tags” becomes “Sharing”, feature “wall” becomes “Profile” etc.

Based on Observation#1 and Observation#2 we came up with fourteen (14) distinct Utilities (Connecting, Sharing, Multimedia, Privacy, News, Promoting, Voting, Publishing, Schedule, Profile, Applications, Professional, Opinions, Entertainment) that group up unique official SM features under a single conceptual label (Utility). “Appendix B” showcases the feature transformations for the complete dataset (112 SM sites).

Observation#3

By using the map in “Appendix B” and grouping features under the Utility label, we observe that different SMPs utilize common Utility instances, as shown in Table 3.

Table 3 SMP grouping based on common Utility

Observation#4

By further observing Observation#3 and Table 3 we could allude that various hybrid SMTs can be formed, characterized by specific Utilities. For example, hybrid type#1 [Pinterest, Reddit, Facebook, Twitter] that characterizes SMPs that offer News, Multimedia, and Connecting capabilities, hybrid type#2 [Instagram, LinkedIn] that offers Professional, Connecting and Application capabilities.

3.5 Research process

Our research process can be divided into seven (7) steps. A brief description of the proposed steps follows: Step 1 entails data collection to form a dataset of features from 112 SMPs (Sect. 3.3). Step 2 combines pre-processing by data normalization, transformation and reduction along with missing values and duplicate removal (Sect. 3.4). In Step 3, we record observations and finalize the dataset based on SM utilities (Sect. 3.4). Step 4 defines the axioms to follow for enlisting and shifting between the proposed SMTs (Sect. 3.5). Step 5 involves experiments (Sect. 4) by using: (a) FP-Growth, an association rules algorithm in Experiment#1, and (b) three (3) Clustering algorithms (DBSCAN, k-Medoids, Random Clustering) in Experiment#2. Step 6 uses experimental results to propose a new SMTs taxonomy (Sect. 4.2). Finally, Step 7 examines whether the proposed taxonomy is viable by testing our hypothesis and comparing our results with related work (Sect. 5).

Since we implied that SMPs can form hybrid types based on their common Utilities, we extend our effort to introduce a new taxonomy. The process is a mixture of data-driven and hypothesis-based approaches emphasizing on the data-driven aspect, meaning that the feature dataset will be more decisive and act as a validator for our initial hypothesis H0 when forming the proposed taxonomy.

In Sect. 3.4 we recorded our observations from the dataset we built regarding 112 SM. Table 4 shows the absolute count (c) of occurrences of each Utility, along with the proportion of c as a fraction of c over the total number of Utility occurrences in our dataset.

Table 4 Fraction of each Utility in dataset

Appendix C shows the complete set of Utility occurrences for each SM whilst Table 5 summarizes the utilities of the top fifteen (15) SMPs. Using “Appendix C”, we extend our effort to support H0 with the inception of generalized axioms for enlisting and shifting between our Proposed Social Media Types (taxonomy) as follows:

Table 5 Top 15 SMPs with their Utilities
  • Axiom 1 (A1): Primary Utility (P) for each SM platform is its Utility with the highest count of occurrences, c.

  • Axiom 2 (A2): Secondary Utility (S) for each SM platform is its Utility with the second highest count of occurrences, c.

  • Axiom 3 (A3): Trivia Utility (T) for each SM platform is its Utility with the lowest count of occurrences, c.

  • Axiom 4 (A4): If there is a tie in calculating P among 2 or more Utilities in a SM entry, we consider (\( \sum \nolimits_{1}^{c} P \)) utilities.

  • Axiom 5 (A5): If there is a tie in calculating S among 2 or more Utilities in a SM entry, we consider (\( \sum \nolimits_{1}^{c} S, \)) utilities.

  • Axiom 6 (A6): When none of A1A5 apply, we categorize a platform by its official goals.

Based on axioms A1A6 and our dataset observations in Sect. 3.4, each of the proposed SMT is characterized by Primary, Secondary, and Trivia Utilities, as presented in “Appendix D”.

Some examples of applying the rules to the top populated SM are presented in Tables 6, 7 and 8. For further clarification of the mapping process we note that “Appendix C” appoints the features to Utilities, thus Table 6 counts seven (7) occurrences of Connecting since its seven (7) features: Fans, Groups, Live Chat, Pokes, Gifts, Messaging, User Groups are grouped under the Utility Connecting (refer to Observation#1). On the same context, in Table 7 YouTube scores one (1) on Sharing since the feature “Post Text” is semantically linked with the Utility “Sharing”.

Table 6 Facebook break-down of Utility occurrences
Table 7 YouTube break-down of Utility occurrences
Table 8 Instagram break-down of Utility occurrences

Having examined Appendixes C and D, we extend our effort trying to prove H0 by mining our dataset using RapidMiner (as stated in Sect. 3).

4 Experiments

We conducted two experiments using RapidMiner on our dataset. In the first experiment, we used FP-Growth, an exhaustive Association Rules Mining (ARM) algorithm, which produces the same results as Apriori, but is faster [50]. In the second experiment, we followed a progressive approach using three different heuristic clustering algorithms, DBSCAN, k-Medoids, Random Clustering, running twelve (12) experiments, organized in four (4) steps as explained later, because we needed to compare intermediate results at each step. Our research experiments do not exclusively deal with the association rule concepts, but also with clustering. We used a “learn-by-data” based approach to reduce the possible number of clusters on SMTs. This means that we experimented with FP-Growth, but results were not satisfactory. Then we moved on with our experiments using clustering algorithms that seem to have better results than association rules. These experiments are detailed in the remaining of this section.

4.1 Biases

Before presenting our experiments, we should note biases in our methodology. These biases as well as assumptions motivate our future work reported in Sect. 5.

4.1.1 Dataset biases

As mentioned in Sect. 3.3, our data were gathered from the official SM descriptions. We recorded and processed their features to generate a dataset by grouping under adjective comprehension, removing duplicates and missing values when necessary. The SM used were chosen taking into consideration user penetration and available features. Some SM implement fewer features than others (e.g. Facebook compared with Tinder), thus our analysis might be impaired by this disparity.

4.1.2 Biases in Experiment#1

We extracted frequent itemsets in order to produce generalized rules for forming new SMTs, yet with relatively high confidence, but rather low support. Ideally we were after strong rules (high confidence and support), but due to the nature of our dataset explained in Sect. 3.3 (we implement a simple grouping although our results might be considered ambiguous, due to the general subjectivity of grouping features as we comprehend them under a specific Utility), it is not possible to do so at the extend we would have liked. This perceived threat to validity was the primary reason for pursuing further experimental validation by clustering.

4.1.3 Biases in Experiment#2

The second experiment offers more positive results, since we further reduced the number of categories. In order to generate fewer clusters, we experimented with removing dominant utilities during our analysis. We assume that by removing one by one the three (3) most frequent utilities, while presenting and analyzing the output in a sequential manner, will enhance results.

4.2 Experiment#1

We executed FP-Growth aiming to generate strong association rules for our Utility entries for each SM on our dataset. Figure 2 presents all the association rules when using min confidence = 100%, min items per itemset = 1, and max items per itemset = 3. 100% confidence guarantees that the rule is always true. Regarding the support level, we experimented with a variety of values based on the data of each experiment. We started with minimum support 2.7% and raised it up to 10%. We aimed at the greatest values possible (driven by data) both in confidence and support, in order to find strong rules [51].

Fig. 2
figure 2

Association Rules from the dataset

We found that some utilities form strong rules with high values for support and 100% confidence. For example:

  1. (a)

    When an SM platform provides the Applications utility, it is sure to contain Connecting (support = 6.2%). This suggests that based on our data “Applications” and “Connecting” can be part of the same meta-utility, meaning that in essence “Applications” are never provided unless “Connecting” is.

  2. (b)

    In the same manner, when a platform provides the News utility, it is sure to contain Connecting (support = 5.4%).

  3. (c)

    When it provides the Multimedia and Privacy utilities, it is sure to contain Connecting (support = 5.4%).

  4. (d)

    When it provides the Multimedia and Applications utilities, it also contains Connecting (support = 4.5%).

  5. (e)

    When it provides the Multimedia and News utilities, it also contains Connecting (support = 3.6%).

When it provides the Professional and Applications utilities, it also contains Connecting (support = 3.6), and so on. However, if we wanted to use the twenty-three (23) rules shown in Fig. 2, to formulate groups of utilities, we would have to observe that sixteen (16) rules are of the form X =>Connecting. In other words, ten (10) utilities including Connecting would form one (1) big group, whilst the remaining four (4) utilities will be standalone, producing a taxonomy of five (5) new SMTs. The complete list of rules with confidence = 100% is shown in Fig. 2. For further reference, “Appendix E” displays all frequent itemsets with min. support = 2.7%, including itemsets producing the rules presented in Fig. 2 with confidence = 100%.

At first, we experimented in order to create rules with min. confidence = 100%, yet they proved to be too strict, so we lowered our thresholds by including all results with confidence ≤ 100%, but with min support = 10%. Based on these frequent itemsets we perform a basic grouping, aiming to produce results that better back our stated hypothesis H0. Applying a threshold of 10% Support on “Appendix E” we observe that we can create eight groups of utilities as shown in Fig. 3.

Fig. 3
figure 3

Venn Diagram for Support = 10%

Figure 3 implies that Connecting, Professional, Multimedia and Sharing belong to the same group while Entertainment, Profile, Publishing and Opinions form standalone groups as shown in Fig. 4.

Fig. 4
figure 4

Venn Diagram with five (5) groups

Grouping our utilities based on this approach means that we do not take into consideration itemsets with lower support levels while it leads to the generation of one (1) big group and four (4) smaller ones.

Despite the positive results, association rules could be considered biased since some utilities appear more often than others in our dataset as shown in Table 5. To address that we conducted Experiment #2.

4.3 Experiment#2

We clustered our dataset in a sequential way by excluding one by one the top three (3) dominant utilities (Connecting, Multimedia, Professional). At this point we can generate taxonomies using clustering as shown in the Tree Diagram in Fig. 5.

Fig. 5
figure 5

Tree diagram for k-Medoids results

We started our experimentation by executing clustering algorithms aiming to generate groups that could help us form new SMTs. Table 9 lists results after running three (3) different clustering algorithms: DBSCAN, k-Medoids and Random Clustering on our dataset, before removing the dominant utilities (Connecting, Multimedia, Professional). For DBSCAN we used the default parameters from RapidMiner which are: epsilon = 1, min points = 5. DBSCAN does not need to be given the number of clusters. It automatically produced k = 6 clusters. For k-Medoids we used k = 6, max runs = 10, max optimization steps = 100 and for Random Clustering, k = 6. Each of the algorithms produced six (6) clusters of variable composition. Given the lack of a ground truth and the unsupervised nature of clustering these results cannot be meaningfully evaluated in a standalone basis.

Table 9 Clustering including dominant attributes

Next, we ran the three (3) algorithms removing one by one the most dominant Utilities from our dataset. First, we executed our experiment with the same parameters having removed the top ranked of the biased Utilities: Connecting (Table 10). DBSCAN produced k = 5 clusters which is an output that is closer to validate our hypothesis (H0). For our next experiments, we reduced k according to the number of clusters produced by DBSCAN, since it is an algorithm that determines the number of clusters. The reason we did that is for comparing the output for each run of the three (3) clustering algorithms. Our goal was to find the point at which two (2) or more algorithms produce the same number of clusters.

Table 10 Clustering without Connecting utility

Then, we experimented with the same parameters having removed the top two (2) ranked of the biased utilities: “Connecting” and “Multimedia” (Table 11). DBSCAN again produced k = 5 clusters.

Table 11 Clustering without Connecting and Multimedia Utility

Finally, we experimented having removed all dominant utilities: Connecting, Multimedia, Professional, with the same parameters, except this time, given that DBSCAN produced k = 4 clusters, we also used k = 4 for Random Clustering in order to compare the results for the same number of clusters. As we can see, DBSCAN reduces the number of clusters from six (6) to four (4), so does k-Medoids since for k = 6 it creates two (2) clusters (Cluster4 and Cluster5) that each contains zero items and for k = 4 it simply swaps the items in Cluster 3 with the ones in Cluster 2, as shown in Table 12.

Table 12 Clustering without all biased attributes

After examining “Appendix F” we found that the generated clusters are formed based on the presence of specific utilities in each cluster. In particular, SM with the Entertainment Utility belong to Cluster0. SM with the Sharing Utility belong to Cluster1. SM with the Profile Utility belong to Cluster2. All the remaining SM which do not have any Utility, or they have any Utility except from Entertainment or Sharing, or Profile belong to Cluster3.

Table 13 shows a part of our results (see the complete cluster analysis in “Appendix F”) from the last step of the sequential execution of the clustering algorithms.

Table 13 Sample of taxonomies with k-Medoids (k = 4)
  • General Purpose Networks: SM which are mainly described by Connecting, Multimedia, Professional and Sharing Utilities as shown in Table 3 belong to this set.

  • Entertainment Networks: This set describes SM that have to do with Entertainment. Gaming, Shopping, Sports, Travel, Movies etc.

  • Publishing Networks: This set contains SM with blogging, general form of publishing and microblogging being their main functionality.

  • Profiling Networks: This set comprises SM that offer functions promoting skills, goals, personal journals, etc.

  • Opinion Networks: The final set contains SM that mainly deal with recommendations, reviews, discussions, polls etc.

Expreriment#2

We created a taxonomy for SMTs based on a set of generalized axioms produced after running Experiment#2:

  • Axiom 7: Any SM that provides at least the Entertainment Utility alone, or Entertainment along with Profile, or Entertainment along with Sharing, is assigned to Cluster0.

  • Axiom 8: Any SM that provides at least the Sharing Utility alone, or Sharing along with Profile, is assigned to Cluster1.

  • Axiom 9: Any SM that provides at least the Profile Utility alone is assigned to Cluster2.

  • Axiom 10: If none of axioms 7–9 above stands, the SM belongs to Cluster3.

  • This leads to the conclusion that we can propose a new Taxonomy for SMTs as follows:

Entertainment Networks

The first cluster showcases results that are similar to Experiment#1 generating a SM category which describes SM that have to do with general entertainment, gaming, shopping, sports, travel, movies etc.

Sharing Content Networks

This cluster contains SM that support features that prompt content sharing, hashtags, quotes, location sharing, any kind of posts etc.

Profiling Networks

This cluster produces the same results with Experiment#1, forming a category that describes SM that offer functions that promote skills, goals, personal journals, etc.

General Purpose Networks

The final cluster has all the remaining SM that did not enroll on one of the above Networks (Entertainment, Sharing, Profiling).

Moving on to the evaluation of our two (2) experiments (Experiment#1, Experiment#2), we aimed to produce a methodology that reduces the number of SMTs. To the best of our knowledge, current literature proposes nine (9) SMTs [11] or seven (7) SMTs [2]. In comparison with our work, we noted that by running clustering methods on our dataset, the output is better than that of association rules, since the formed clusters (taxonomies) were reduced from five (5) to four (4) moving closer to proving our initial hypothesis H0. However, in both of our experiments we produce fewer SMTs.

By examining our results from Experiment#1 and Experiment#2 we provide an insight for a proposed new taxonomy on SMTs motivated and reasoned by our dataset observations and experiments:

Entertainment networks

This cluster of SM appears in both Experiments#1 and #2 and it consists of SM that have to do with general entertainment, such as games, sports, cinema, travel, and so on. By further analyzing our data we found that this SMT offers the following Utilities:

  • Primary Utility Entertainment

  • Secondary Connecting, Multimedia, Opinions

  • Trivia Sharing, Privacy, News, Promoting, Voting, Publishing, Schedule, Profile, Applications, Professional.

Profiling Networks

This cluster also appears in both Experiment#1 and #2, and forms an SMT describing SM that offer functions promoting skills, goals, personal journals, etc. By analyzing our data, we observed that such SM offer the following Utilities:

  • Primary Utility Profiling.

  • Secondary Connecting, Multimedia, Professional, Opinions, Publishing, Privacy, Voting, Applications, Promoting

  • Trivia Sharing, News, Schedule, Entertainment

Social Networks

This SMT is generated by merging General Purpose Networks as described by findings from Experiments#1 and 2. Such SM offer the following Utilities:

  • Primary Utility Connecting, Multimedia, Professional, Sharing

  • Secondary Publishing,

  • Trivia Privacy, News, Promoting, Voting, Schedule, Profile, Applications, Opinions, Entertainment

On all of the three (3) proposed SMTs, we labeled secondary Utilities the ones that are found to be paired with the Primary Utility of each proposed SMT, without considering the support level of the association rule and we labeled as trivia the ones that do not display any association rule at all (“Appendix E”). This proposed taxonomy verifies our initial hypothesis (H0). Evaluating our results, Table 14 summarizes our findings compared with the relevant literature. Source [11] essentially concludes with nine (9) SMTs, source [2] with seven (7) SMTs and source [4] with three (3) yet not operationally representing based on the current evolution of SM. By consolidating results from Experiments 1 and 2 we come up with an updated version of SMTs as described in this section.

Table 14 Comparing our work with the literature

5 Conclusions

5.1 Research summary

Literature review reveals that SMTs are in a rapid stage of evolution. SMPs integrate multiple user services; thus, we conclude that a variety of SMTs tend to offer conceptual Utilities instead of being “single minded”. This is due to the accelerated spread and absorption of various SM services. Users require all-in-one platforms easy to use, that satisfy their needs holistically [52, 53].

In this paper we research this issue, aiming to offer an alternative regarding SMTs. Our methodology is based on observations on a dataset that contains various SM along with their descriptions. We performed two (2) experiments using association rule mining and clustering algorithms in order to implement a data-driven approach that proves our initial hypothesis (H0) stating that current standardization on SMTs can be updated, thus reducing the number of SMTs.

Table 14 summarizes the outcomes of existing research on SMTs, as well as our work. Observing empirically our results, we can conclude that the first experiment (Experiment #1) produces five (5) SMTs which is perceived to be better and more synched with the current state of play in SM than categorizations proposing nine (9) [11] or seven (7) [2] SMTs respectively. Yet, when comparing this early result with work proposing three (3) SMTs [4], despite this referring to a different time period (2010), we concluded that a tighter categorization scheme was needed. Thus, we conducted further research, striving for better results. With Experiment #2, we discovered four (4) clusters, i.e. four (4) SMTs, which seems more semantically appropriate and representative than five (5) produced by Experiment #1. Finally, we presented an insight of the consolidated version of the two (2) experiments, as discussed in Sect. 4, typically capturing emerging SMP services.

5.2 Implications

As Valentini and Kruckeberg [54] stated: “Within this digital environment, it is extremely important to have a clear understanding of the meaning, use, and implication of new/digital and social media”. Along with the rise of the number of SM and their users, the ambiguity of their features rises, too. According to the same study it is vital to distinguish digital technologies from their social functionality and to understand the SM use in order to evaluate user behavior and attitudes. Our study can aid researchers, SM users and professionals by facilitating (a) SM Selection, (b) identification of new trends and (c) collaborations and acquisitions.

5.2.1 SM selection

Despite the fact that there is a clear preference over SM that users and professionals use [55]; and with the top-10 SM having 500+ million users each, there is still some confusion over their role. In this work we aimed at selecting the most popular and representative SM in terms of features, yet this selection is not exhaustive. The study in [56] demonstrated that teen SM users spend around seven (7) hours per day using screen media, whilst three (3) of these hours are spend in social networking websites. According to [57] “Social media pose serious challenges for uses-and-gratifications research, such as the entangled use of contemporary media services”. There are indeed detailed features and characteristics for each SM, although many of them are overlapping, as they are similar. At the same time, there is a great number of volatile features and there are dissimilarities that may not seem to be so distinct; yet, they create a chaotic environment that can confuse the users. Our proposed categorization of SM might help the stakeholders to select the optimum SM that best meets their needs, since 50% of the respondents of Copp’s survey agree that the need to personalize content and experiences is a major challenge [58]. An appropriate SM selection can support and reinforce public communication activities and social connection.

5.2.2 Identification of new trends

Teague mentions that around half of business marketers are still making up social media plans on the fly without proper marketing strategies, whilst most of them (~ 65%) are valuing likes, comments and shares as extremely important for their strategies [59]. According to [60, 61] the new trends in SM for 2019 are: (1) Rebuilding trust in SM platforms, (2) Storytelling, (3) Building a brand narrative, (4) Quality and creativity over quantity, (5) Put Community and Socialization back in SM, (6) Influencers continue to grow their communities, (7) Selfies, videos and branding (Live Videos, Vertical videos, Interactive videos, more smartphone-quality videos), (8) Earn, rebuild, or keep the trust of your followers, (9) Hyper-targeted personalization, and (10) Know your platforms. Our proposed hybrid SMTs’ conceptualization can facilitate the identification of new trends in the future, since they incorporate the features and suggest more functional, well-structured and up-to-date SM that marketers and researchers could use.

5.2.3 Collaborations and acquisitions

There are constantly buyouts between SM platforms and applications. For instance, even back in 2014, around 26 billion USDs were spent during the seven (7) most important buyouts in SM [51]: 1. Google buying YouTube for $1.65 billion, 2. Facebook buying Instagram for $1 billion, 3. Facebook buying WhatsApp for $19 billion, 4. Google buying Waze for $966 million, 5. Twitter buying Vine for $970 million, 6. Microsoft buying Yammer for $1.2 billion and 7. Yahoo buying Tumblr for $1.1 billion. Facebook for instance has acquired around 80 other companies [62]. Finally, index.co has accumulated the acquisitions in SM per year [63]. Table 15 depicts the number of acquisitions, the average per acquisition and the total cost of acquisitions per year.

Table 15 Number of acquisitions

According to Table 15 more than 423 billion USDs has been spent for approximately 700 acquisitions in SM. Therefore, we believe with this work, in which we documented features from more than 100 SM, classified and suggested new hybrid categories, can facilitate collaborations and acquisitions between SM. For instance, SM with complementary features can be merged or collaborate. Similarly, a popular SM that lacks a specific feature, can acquire a SM with this distinct feature, like in the case of Facebook and WhatsApp.

5.3 Future work

In Sect. 4.1 we presented biases in our methodology as well as assumptions that motivate future work. Therefore, we plan to elaborate more on SMTs, by continuing to monitor their evolution. It is likely to observe more aggressive merges of SMPs soon, forcing updates on our proposed taxonomy. Our next step is to improve our methodology to better handle our biases (Sect. 4.1) in order to improve the quality of the research output by performing an empirical study on the understanding the usage of each SM from the user perspective.

Furthermore, we aim to automate the methodology in a way that even when new SM become popular, new features are added or biased data entries persist, SM allocation on a SMT should be effectively adjusted. This way we should be able to track future changes in SM when new features are added. As mentioned in [12], SM are under a rapid evolution, growth and metamorphosis. Scientists around the world have started using online tools and various technologies dedicated to SM, but the adoption and acceptance is still poor across the wider research community. Our work could help academics and practitioners to keep track of the evolution on SMTs by having a point of reference regarding the essence of SM usage. For example, which list of SM should we refer to, when we want to research market trends, which one for people’s discussions, which one for entertainment purposes, and so on.