1 Introduction

Music plays an important role in the daily lives of human beings. With the development of mobile devices and Internet technology, digital music market has been growing rapidly. Nowadays, online music services (e.g., Last.fmFootnote 1, SpotifyFootnote 2, PandoraFootnote 3 and GroovesharkFootnote 4) show exponential growth. At the same time, people can easily obtain hundreds of thousands of songs. However, with the increasing of the huge amount of online media content, people are facing a problem, that is how to search their desired songs from a large selection in a reasonable time. To solve this problem, music recommender systems (MRSs) have been emerging. Music recommender systems aim to help people discover new music which matches their tastes [17].

Although MRSs have been researched extensively, many new challenges in developing intelligent MRSs are posed by various music consumption paradigms [10]. The main reason is that for subjective and complicated objects (such as movies, music), it is quite difficult to describe the product characteristics customers desired such as different styles and genres, social and geographic factors that affect users predilection [13]. So far, some online applications also supply helpful tools with listeners who are able to easily manage and share their own collections, such as bookmarking music, artists and albums, creating playlists, and giving comments. As a reward, this is beneficial for MRSs to better model users’ listening preferences. Moreover, the data owned on these online applications demonstrate tremendous diversity, which offers new chances to improve the music recommendation (e.g., music, artist, and social tie) performance.

In previous literatures, most of research leveraged various digital footprints generated by interactions between people and online music service applications to produce music recommendation [8, 23, 34]. Also, most algorithms considered collaborative filtering and content-based model [17]. The former one focuses on song rating and the listening history of the users, and the latter one analyzes music features extracted from user generated content. Cheng et al. detected the influence of song play sequence on designing personalized music recommendation systems [36]. Moshfeghi et al. enhanced the performance of a model-based CF system by studying the influence of emotion and semantic spaces [27]. Schedl et al. depicted a listener’s music preference using three computational features: diversity, mainstreaminess, and novelty [33]. Schedl et al. denoted and merged geospatial information into music recommendation by considering a geospatial model and a cultural model [32]. Among existing works, social influence-aware music recommendation has not been explored in depth.

Social influence catches the methods in which people influence each others’ thoughts, interactions, and behaviors in a social network [44]. Social influence is usually mapped in changes in social behavior patterns in the social media. In [44], authors mentioned that social influence acts in a pervasive way to advertise the nature of the music that people listen to. We argue that when a user listens to a song due to the social influence, there is a possible side influence due to the music inference recommendations from online music applications. For instance, when Amy listens to a song “Hello” of the album “25” from Adele due to the recommendation from friends, she may also pick up the other songs of this album due to an online recommendation, which may in turn trigger additional listening of the album among her friends. To the best of our knowledge, this additional propagation which is triggered by the music recommendations has not been considered in existing research.

In this paper, we focus on the problem of social influence aware music recommendation, namely, exploiting social influence for music recommendation, which aims to return a list of songs for a target user. We believe that this is a natural and useful extension to the conventional music recommendation problem. However, it is challenging and crucial to predict which song a man will listen to from the sheer volume of music collection.

To exploit social influence for music recommendation, we propose an approach incorporating social influence. As [33] said, users with similar interests and with frequent correlate actions have a stronger influence on each other. In order to exploit social influence, we first construct a heterogeneous social network. Based on this network, we propose a novel meta path-based similarity measure called WPC and denoted the framework of similarity measure in a heterogeneous network. Then, we use the topological potential approach to mine social influence in heterogeneous networks. Second, to improve music recommendation by incorporating social influence, we present an approach based on social influence. Specifically, we leverage a factor graphic model to generate music recommendation by synthetically considering social influence, personal attributes, friendship strength, personal preferences, and spatial feature.

In our experiments on a real-world dataset, our proposed method significantly outperforms state-of-the-art algorithms. The major contributions of this paper are summarized in the following.

  • We focus on a new social influence aware music recommendation problem, which aims to recommend a specific list of songs for a user.

  • We show a novel meta path-based similarity measure to calculate the distance between two users in a heterogeneous social network.

  • We present the topological potential approach to mine social influence in heterogeneous networks.

  • We develop the music recommendation method that exploits social influence. Moreover, we fuse five vital features with a framework to make social influence aware music recommendation.

  • We evaluate the proposed music recommendation method by comprehensive experiments on one real world dataset collected from Last.fm. Experimental results show that our method outperforms the state-of-the-art methods in music recommendation.

The remaining of the paper is organized as follows. We first introduce the preliminaries on music recommendation and denote the task of music recommendation in Section 2. We next present a more effective model computing social influence between two users in Section 3. Then, we show an approach incorporating social influence to produce music recommendations in Section 4. We report our experiments and results in Section 5, discuss related work in Section 6, and conclude the study in Section 7.

2 Preliminaries and problem definition

In this section, we denote some important concepts and the research problem of this paper.

2.1 Notations definitions

Music recommendation aims to discover music pieces that the target user would probably listen to. In this work, we need to consider the following entities: a set of users U = {u1, u2,..., u|U|}, a set of music pieces (or songs) M = {m1, m2,..., m|M|}, a set of tags T = {t1, t2,..., t|T|}, a set of artists A = {a1, a2,..., a|A|}, a set of albums AL = {al1, al2,..., al|AL|}, a set of genres GE = {ge1, ge2,..., ge|GE|}, and a set of historical listening records \(R^{u} = \left \{{r^{u}_{1}},{r^{u}_{2}},...,r^{u}_{|R^{u}|} \right \}\) which belong to user u, where uU. An artist ali sings a song mi which is described by a genre gei and a set of tags \(T_{m_{i}} = \left \{{t_{m_{i}}}_{1},{t_{m_{i}}}_{2},...,{t_{m_{i}}}_{|T_{m_{i}}|} \right \}\) and belongs to an album ali. Table 1 summarizes the key symbols used in this paper.

Table 1 Symbols

2.2 Problem statement

In this work, we produce music recommendation by extracting heterogeneous relationships among different kinds of objects. For instance, by the Last.fm, users can add tags into a song and classify the songs into different categories. Users can give some comments on different music objects (e.g., music, artist, album, etc.). Specifically, it also provides an interface by which users can state their idea, e.g., dislike/like and express their own feeling about the music. Throught these information, we can extract users’ tastes about a candidate music piece according to different heterogeneous relationships. It offers us a method of extracting these relationships for music recommendation to construct a heterogeneous music network.

Definition 1

(Heterogeneous network) A heterogeneous network is denoted as a directed graph G = (V, E, W) with an entity type mapping function \( \phi : V \to \mathcal {A} \) and a link type mapping function \(\psi : E \to \mathcal {R} \), where each entity vV belongs to one particular entity type \(\phi (v) \subseteq \mathcal {A} \), each link eE belongs to a particular relation type \(\psi (e) \subseteq \mathcal {R} \) and W : ER+ is a weight mapping from an edge eE to a real number wR+. Notice that, when the types of entities \(|\mathcal {A}| > 1\) and also the types of relations \(|\mathcal {R}| > 1\), the network is called heterogeneous information network. An example is shown in Fig. 1.

Fig. 1
figure 1

Heterogeneous Network Schema. Here, instances of different objects are represented by different color nodes, links among different objects are represented by different line styles and different link weights are represented by different line widths

Definition 2

(Edge Weight) Edge weights on the heterogeneous graph are denoted as the transition probability from one vertex to another. Assumed a heterogeneous network, the total number of songs labeled by the tag we view N1, the total number of tags the song we view have N2, the total number of songs listened by users we view N3, the total number of song listened by user u we view N4, the total number of distinct playlists in which this song is shared N5, and the number of songs the playlist we view contains N6. Several types of edge weight are denoted in this work.

For edge Tag-Song, its weight can be defined as:

$$ W(t_{i},m_{j}) = \left\{\begin{array}{ll} tf(t_{i},m_{j})idf(t_{i}) = \frac{1}{N_{2}}\log\frac{N_{2}}{N_{1}} & \text{if }N_{1} > 1 \\ 1 & \text{if }N_{1} = 1 \end{array}\right. $$
(1)

where \(tf(t_{i},m_{j})= \frac {1}{N_{2}}\), which is the frequency of the tag ti in the song mj. \(idf(t_{i}) = \log \frac {N_{2}}{N_{1}}\), which measures the importance of tag ti for the song mj. For edge Album-Song, we give the similar definition.

For edge Song-Playlist, its weight can be defined as:

$$ W(p_{i},m_{j}) = \left\{\begin{array}{ll} tf(p_{i},m_{j})idf(m_{j}) = \frac{1}{N_{6}}\log\frac{N_{6}}{N_{5}} & \text{if }N_{5} > 1 \\ 1 & \text{if }N_{5} = 1 \end{array}\right. $$
(2)

where \(tf(p_{i},m_{j})= \frac {1}{N_{6}}\), which is the frequency of the song mj in the playlist pi. \(idf(m_{j}) = \log \frac {N_{6}}{N_{5}}\), which measures the importance of the song mj for the playlist pi.

For edge User-Song, its weight can be defined as:

$$ W(u_{i},m_{j}) = \left\{\begin{array}{ll} tf(u_{i},m_{j})idf(m_{j}) = \frac{1}{N_{3}}\log\frac{N_{4}}{N_{3}} & \text{if }N_{4} > 1 \\ 1 & \text{if }N_{4} = 1 \end{array}\right. $$
(3)

where \(tf(u_{i},m_{j})= \frac {1}{N_{3}}\), which is the frequency of the song mj played by user ui. \(idf(m_{j}) = \log \frac {N_{4}}{N_{3}}\), which measures the importance of the song mj for the user ui.

For edge Album-Artist, because an album can only belong to an unique artist, the weight can be defined as:

$$ W(a_{i},al_{j}) = 1 $$
(4)

For other edges like Artist-Song, Artist-Genre, Song-Album etc, we give similar definitions.

Definition 3

(Listening Action) We use a triple (ui, tm, m) to express that user ui listens to a song m at time tm. For the song m, we define all users’ listening actions as the listening history Y = {ui, tm, m}i, tm. Further we denote yi, tm as the action status of user ui at time tm for the given song m.

For simplicity, for the song m, we take into account the binary action, i.e, \(y_{i}^{tm} \in \left \{0, 1\right \}\), where \(y_{i}^{tm} = 1\) implies that user ui listened to a song m at time tm, and \(y_{i}^{tm} = 0\) expresses that the user did not listen to this song. We call users who executed a listening action as active users, otherwise inactive users. As one vital aim of this work is to understand how users’ listening behaviors affect (or are affected by) friends in their heterogeneous network, we further denote the notion of social influence.

Definition 4

(Social Influence) Social influence is usually regarded as the effect of latent predictions acquired on social network, which means that users are apt to follow their friends’ behaviors [37, 44]. The social influence propels users to take in behaviors shown by their neighbors. In other words, the stronger the relationship between two users is, the more effective the prediction is in affecting the user [42].

Further, we give a formal definition. Assume at time tm, for the song m, user ui has a set of active friends \(N_{m}^{tm} = \left \{u_{j} | u_{j} \in G \bigwedge y_{j}^{tm} = 1\right \}\) in the network G, we use a function, which quantifies the degree that user uj’s listening action at time tm(tm > tm) is affected by the active friends \(N_{m}^{tm}\), to denote social influence, i.e., \(Q (N_{m}^{tm}, G)\). Here, we only give a common definition of social influence, which can be instantiated in different methods.

Definition 5

(Direct Social Influence) Direct social influence refers to the influence between two users who are friendship on the heterogeneous network. That is the influence exists between directly connected users.

Definition 6

(Indirect Social Influence) Indirect social influence refers to the influence between two users who are not yet friendship on the heterogeneous network. That is the influence exists users with indirect relations.

Definition 7

(Temporal Social Influence) If user ui is affected by uj, then ui will tend to listen to a song in accordance with the recommendation obtained from uj. This indicates a relationship in which ui listens to the same song after uj shares her own listening action.

Problem definition

Given a music set M, and a target user u, our aim is to detect the probability of user u listening to music mM, defined as P(m|u, M), then return a top-k list of music pieces with the maximum probability for u.

3 Mining social influence in heterogeneous networks

In this section, we propose a new measurement to calculate social influence in heterogeneous social network. From the angle of network topology, we present that the locality of a node in the network reflects its position potential, named as topological potential, which characterizes its ability of influencing other nodes, and vice versa.

3.1 Topological potential

In 1837, M.Faraday proposed the classic concept of field. From that on, the field which is seen as the non-contacting interaction between particles of different granularity, from atom to universe, has accomplished excellent success. In physics, there exist many kinds of fields, such as gravitational field, electric field and magnetic field. All of these fields clarify the interaction law of particles. In terms of the field theory in physics, the potential in a conservative field is denoted as a position function, which is inversely proportional to the distance and is directly proportional to the magnitude of the particle mass. Inspired by the knowledge of physical fields, we fuse the field theory with network topological structure in order to depict the relationship among the nodes in the network and to reveal the general characteristic of the potential distribution.

Given a heterogeneous network G, for ∀uV, let φv(u) be the potential energy at any point v generated by u. Here, φv(u) must comply with the following rules: (a) φv(u) is a continuous, smooth, and finite function; (b) φv(u) is isotropic in nature; (c) φv(u) monotonically decreases in the distance ∥vu∥, where ∥vu∥ = 0, φv(u) reaches maximum value, and ∥vu∥→ 0, φv(u) → 0.

The modularity structure of the real-world network indicates that the interaction among nodes shows the localization properties. Assumed a node in the network as a potential source, it can influence other nodes along the paths, which connect each other. With the increasing of the topology distance, the influence of each node fast decreases. We denote the topological potential of a node in the form of Gaussian function. The potential of node viV in the network can be formalized as:

$$ \varphi(v_{i}) = \frac{1}{n} \sum\limits_{j = 1}^{n} \varphi(j \to i) = \frac{1}{n}\sum\limits_{j = 1}^{n}(m_{j} \times e^{-(\frac{d_{j \to i}}{\sigma})}) $$
(5)

where mj is the mass of vj, which describes activity of the node, dji defines the topological distance from node vj to node vi, σ denotes the influence factor, which is used to control the influence region of each node. If σ is too small, the interaction range is very limited. On the contrary, if σ is too large, there exists strong interaction between the nodes.

The topological potential can be considered as the position potential of each node in the topological structure. This indicates that the ability of each node that influences other nodes in the network, and vice versa.

3.2 Modeling social influence by topological potential

So far, most existing studies on social influence are centered on analyzing influence between users, such as pairwise influence and structure influence [44]. Generally, the pairwise influence is based on social relation and users’ interactions. Additionally, the influence may be direct social influence or indirect social influence. In [39], authors pointed out that structural diversity can be seen as a strong predictor of user on-site engagement. For the sake of simplicity, they first construct an indicator by considering the number of connected circles, then study the relevancy with the probability that the user will participate in some activities, and acquire a significant positive correlation.

In [25], they mentioned that users whose actions frequently correlate have a stronger influence on each other. In this work, we measure the correlation using the distance dji. In homogeneous networks, most of the existing traditional metrics are based on hops or shortest path length. However, in heterogeneous networks, because there exist multi-type objects and multi-type links, the neighbors of an object could pertain to different types, and the paths between two objects could show different relationships. Thus, we need to construct more sophisticated strategies to capture topological features in heterogeneous networks to differentiate paths with different semantics. In this work, we give the following measures to obtain the distance between users.

  • Path count (PC). Path count refers to the number of path instances p between two objects (ui and uj), defined as PC(ui, uj).

    $$ PC(u_{i},u_{j}) = |\left\{p:p \in \mathcal{P}\right\}| $$
    (6)

    Here, p starts with ui and ends with uj.

  • Random Walk (RW). Random walk is calculated by adopting the random walk measure along a meta path with a pre-defined length path, which is a natural generalization of ProFlow [24]. Random walk score can be denotes as \(RW(u_{i},u_{j}) = \frac {PC(u_{i},u_{j})}{PC(u_{i},\cdot )}\).

  • Weighted Path Count (WPC). Weighted Path count is to discount the total weight of the paths between two objects in the network,which can be denoted as \(WPC(u_{i},u_{j}) = \frac {PC_{w}(u_{i},u_{j}) + PC_{w}(u_{j},u_{i})}{PC_{w}(u_{i},\cdot ) + PC_{w}(\cdot ,u_{j})}\), where PCw(ui, uj) refers to the overall weight of the path instances between two objects (ui and uj), denoted as \(PC_{w}(u_{i},u_{j}) = {\sum }_{p \in P}W_{p}\), PCw(ui,⋅) defines the total weight of the paths starting with ui, and PCw(⋅, uj) defines the total weight of the paths ending with uj.

3.3 Optimizing the influence factor

From the definition of topological potential, there are three most important factors: the mass mj, the distance dji, and the influence factor σ. In this work, we assume the mass of nodes in the network are equal, and let them be 1. The distance has been obtained in the preceding section. The influence factor σ decides the influence range of the node, and different σ value indicates the different preference of the node’s importance. Therefore, how to choose the value of σ becomes an important issue in order to get the best result of topological potential which reflects the different position of the nodes in the topology network.

In this work, Shannon entropy principle is introduced in order to obtain the optimal choice of influence factor σ. Let φ(v1),..., φ(vn) be the topological potential of v1,...vn, respectively. The optimizing potential entropy function H can be defined as:

$$ min \; H = min (-\sum\limits_{i = 1}^{n} \frac{\varphi(v_{i})}{Z}log(\frac{\varphi(v_{i})}{Z})) $$
(7)

where σ ≥ 0, and \(z = {\sum }_{i = 1}^{n} \varphi _{i}\) is a normalization factor.

4 Listening action prediction

In this section, we first introduce the denoted features, and then describe our method that will be used to predict the listening action.

4.1 Feature definition

In order to predict listening actions, besides the social influence based features, we also consider other basic features that may influence the listening action. We denote four kinds of basic features, including personal attributes, friendship strength, personal preferences, and spatial feature.

Personal Attributes. In [9, 28], authors mentioned that users with similar demographics have more similar music preferences than users with different demographics. Further, they gave an example, that is users with the same age or gender have more similar music interests. We adopt six personal attributes, including age, gender, occupation, education, and the age of the account in order to model the influence of such specific factors for music recommendation.

Friendship Strength. Seo et al. pointed out that people’s relationship information plays an important role in many personalized recommender systems [35]. In this work, we calculate the friendship strength using their proposed method.

Personal Preferences. As [35] said, personal preferences, e.g., regarding favorite artists, can affect what people add to their playlists. We use the method in [35] to model a use’s preferences. The authors used a personalized scoring scheme to capture a use’s preferences. Two features can be denoted as follows: artists and genres.

Spatial Feature Schedl et al. ever mentioned that incorporating information about a listener’s position may help improve music recommendation [21]. Here, we adopted Gaussian mixture models in [21] to capture the spatial feature.

4.2 Factor graphic model based on social influence

In this work, we present a factor graphic model based on social influence (FGMSI) to predict the listening action. A factor graph for each song m is constructed. Each instance \(u_{i}\xrightarrow {\text {play}} m\) can be seen as a node in the factor graph and a label yi for each node is assigned, where yi = 1 implies that user ui listened to a song m and yi = 0 implies that ui did not listen to this song. Each node is related to an attribute vector \(\vec {x_{i}}\) and each dimension of \(\vec {x_{i}}\) is from social influence feature and basic features denoted in the last section. Two kinds of factors are defined: the first one is the attribute factor which refers to the posterior probability of the label yi given the attribute vector \(\vec {x_{i}}\); the second one is the correlation factor which defines the correlation between the relationships.

Further, we show the formal definition of the objective function and instantiate the feature definitions in order to implement the factor graphic model. Given a network G, a set of historical listening records \(R^{u} = \left \{{r^{u}_{1}},{r^{u}_{2}},...,r^{u}_{|R^{u}|} \right \}\), and the corresponding feature vector \(X^{u} = \left \{{x^{u}_{1}},{x^{u}_{2}},...,x^{u}_{|R^{u}|} \right \}\) with some known labeled relationships yi = 1(or yi = 0) and some unknown labeled relationships, our aim is to predict these unknown labeled relationships.

We first define the joint distribution over Y as:

$$ P(Y|X,G) = {\Pi} f(y_{i}, x_{i}, x_{j})g(y_{i}, G(y_{i})) $$
(8)

where f(yi, xi, xj) refers to the attribute factor and g(yi, G(yi)) refers to the correlation factor.

In this work, we adopt exponential-linear functions to denote these two kinds of factors. f(yi, xi, xj) can be defined as:

$$ f(y_{i}, x_{i}, x_{j}) = \frac{1}{Z_{\alpha}} exp\{\alpha^{\top} \phi(y_{i}, x_{i}, x_{j})\} $$
(9)

where Zα is a normalization factor, α is a weighting vector; ϕ is a vector of feature functions denoted between ui and uj with respect to the value of yi; xi and xj are attributes associated with ui and uj. Similarly, g(yi, G(yi)) can be denoted as:

$$ g(y_{i}, G(y_{i})) = \frac{1}{Z_{\beta}} exp\{\sum\limits_{y_{j} \in G(y_{i})}\beta^{\top} \psi(y_{i}, y_{j})\} $$
(10)

where ψ is a vector of indicator functions.

4.3 Model learning

The objective function can be denoted as Oα, β = logPα, β(Y |X, G). For learning the factor graphic model, we estimate a parameter configuration ζ = (α, β) for a set of historical listening records, which aims to maximize the log-likelihood objective function.

A gradient decent method is employed for model learning. We first go to show how to learn the parameter α. To be specific, we give the formulation of the gradient of α regarding the objective function:

$$ \frac{\partial O(\zeta)}{\partial \alpha}= \mathrm{E}[f(y_{i}, x_{i}, x_{j})] - \mathrm{E}_{P(y_{i}|X,G)}[f(y_{i}, x_{i}, x_{j})] $$
(11)

where E[f(yi, xi, xj)] is the expectation of the attribute factor function f(yi, xi, xj) for the given data distribution and \(\mathrm {E}_{P(y_{i}|X,G)}[f(y_{i}, x_{i}, x_{j})]\) refers to the expectation of f(yi, xi, xj) under the distribution P(yi|X, G). Similarly, we can obtain the parameter β.

What needs illustration is that the graphical structure in the FGMSI model can be arbitrary and may contain cycles, which makes it intractable to directly calculate two expectations of E[f(yi, xi, xj)] and \(\mathrm {E}_{P(y_{i}|X,G)}[f(y_{i}, x_{i}, x_{j})]\). By referencing to [38], we also use Loopy Belief Propagation to approximate the marginal distribution. With the marginal probabilities, we can get the gradient by summing over all instances. Specially, we need perform the LBP process twice, one time for estimating the marginal distribution over all instances, and another time for estimating the marginal distribution of unknown variables. Finally, with the gradient, we update each parameter with a learning rate ξ.

Based on the estimated parameters ζ, we can predict the label of unknown relationships by finding a label configuration which maximizes the objective function:

$$ Y^{\star} = argmax\;O(Y|X, G, \zeta) $$
(12)

To do this, we again utilize the Loopy Belief Propagation to calculate the marginal distribution of each node with unknown variable, and then infer the listening action as the label with largest marginal probability. Here, the number of attribute factors and correlation factors can be denoted as λ1 and λ2 in our FGMSI respectively. For each time of LBP, the time cost of propagation is O(λ1d(ϕ) + λ2d(ψ)), where d is the vector dimension. Assume that the learning model can be executed for n iterations, and LBP can be executed for n iterations in each time. Then, the time complexity can be estimated as O((λ1d(ϕ) + λ2d(ψ)) × n × n).

5 Experiments

In this section, we show various experiments to evaluate the efficiency and effectiveness of the proposed approach.

5.1 Data set

We collected metadata and user generated data from Last.fm via their public API, which is a popular social music website that supplies users with online listening and tagging services. In Last.fm, we can obtain the total number of listeners and playing counts for each song and the last 50 songs that a user has ’loved’. The collected dataset covers 102,112 playlists and 772,601 tags created by 216,420 active Last.fm users, who listened to 1,163,123 unique songs from 72,601 unique albums by 80,014 unique artists categorized into 322 sub-genres from 23 genres. Furthermore, information about features like location, tempo, comments, release year, or social tags are retrieved with the public APIs of Last.fm, theechonest.com, and musicbrainz.org.

For the used dataset, we only keep the playlists with no less than 10 songs, keep the tags with frequency of at least 5, exclude the users only listened to less than 10 songs, and exclude the songs which have been played by less than 10 users. The final attributes of the dataset are shown in Table 2.

Table 2 Properties of the dataset

5.2 Experimental setup

To evaluate the quality of the different methods, we carry out 5-fold cross validation experiments on the user level. For the experimental settings, we divide the users into two sets in terms of the following Pareto principle: we keep the full listening history of the 80% users and the half of listening history for the remaining 20% users as the training data, and the missing half of the remaining 20% users as the test data. Given a target user and her listening history, we attempt to recommend a list of new songs that she might be interested in. The whole process of our method is shown in Fig. 2

Fig. 2
figure 2

The Whole Process

5.2.1 Evaluation metrics

In music recommendation, users are more interested in results in top-ranked items [8]. In this work, we mainly concentrate on the evaluation of top results according to accuracy. We adopt three standard metrics to evaluate the recommendation performance, including precision at k (P@k), recall at k (R@k), and F-Score (F1). For P@k, we evaluate how many of top k songs suggested by the methods actually have links from the target user u. For R@k, we evaluate how many of top k songs suggested by the methods belong to the music set, which is formed by songs having links from the target user u. We divide the songs into two sets: the test set \(T_{u}^{\prime }\) and the top-k set \(R_{u}^{\prime }\). Songs that appear in both sets are members of the hit set. Precision and Recall are denoted as follows:

$$ P@k = \frac{size \;of \;hit \;set}{size \;of \;top \;k \;set} = \frac{|T_{u}^{\prime} \cap R_{u}^{\prime}|}{k} $$
(13)
$$ R@k = \frac{size \;of \;hit \;set}{size \;of \;top \;k \;set} = \frac{|T_{u}^{\prime} \cap R_{u}^{\prime}|}{|T_{u}^{\prime}|} $$
(14)

5.2.2 Benchmark methods

The following benchmark methods are implemented for comparisons.

User-based Collaborative Filtering (UCF). It assumes that similar preference are shared by similar users.

Personality-Based Music Recommender Systems (PBMRS). It classifies the influence of integrating the target user’s personality in music recommender systems [28].

Multi-modal Music Recommender System (MMR). It bridges users’ preferences with music effectively by incorporating social with collaborative information [41].

Music Play Sequence for Music Recommendation (MPSMR). It improves the performance of music recommendation by exploiting the music play sequence information in matrix factorization [10].

5.3 Experimental results

In this subsection, we show and analyze the experimental results. We first examine the optimal measure method and the influences of parameters in our approach, and then compare the approach with other baseline methods.

The optimal measure method

In Fig. 3a, we can see that the measure WPC outperforms all other measures, producing the best prediction performance in terms of precision. In Fig. 3b, according to recall, WPC also significantly outperforms the other two measures. The possible reason is that WPC considers the information about the path weight. In this work, we adopt WPC to calculate the distance between users to more accurately mine the appropriate tracks for each target user.

Fig. 3
figure 3

Average Accuracy over the Dataset for Different Measures

Influence of k

The parameter k controls the length of the top recommendation list used for evaluation. In other words, k can be seen as the size of the recommendation list shown to the user. Intuitively, k should not be too huge in order not to flood the user with information. Figures 4 and 5 illustrate recommendation performance when k is set to different values. From Fig. 4, the precision value decreases quickly when the values of k are greater than 30. From Fig. 5, our proposed method has maintained itself in a steady state when the values of k exceed 30. From these two figures, we can obviously observe that the best performance is obtained when k is 30.

Fig. 4
figure 4

Precision

Fig. 5
figure 5

Recall

Overall performance

Figure 4 compares the precision of the previously mentioned baseline approaches with our proposed method for different length of the top list. From Fig. 4, we can see that all methods show the same types of sensitivity. That is, with the increase of the length of the top recommendation list, precision of all algorithms is trending downward. Besides, we can obtain that, our proposed method is the best in terms of precision. Specially, our proposed method respectively increases Precision@30 by 19.3%, 11.7%, 9.1%, 5.1% compared with UCF, PBMRS, MMR, MPSMR. The possible reason for these is that our proposed FGMSI not only captures users’ basic information like social relationship, personal preference, but also considers social influence, in the prediction processing. In addition, we can notice that UCF obtains a relatively lower performance. The reason is that UCF only considers users’ similarity, and neglects other vital factors. The performance of PBMRS is similar to MMR, but they both are worse than MPSMR. The main reason is that MPSMR not only takes into account users’ preferences, which PBMRS and MMR use, but also extracts the song’s similarity. Because of the space limit, the similar analysis is shown in Figs. 5 and 6.

Fig. 6
figure 6

F1

Convergence analysis

We execute an experiment which is relevant to the influence of the number of iterations of the Loopy Belief Propagation (LBP). From Fig. 7, we find that the learning algorithm can converge when the number of iterations reaches 15. After around 11 iterations, the performance tends to be stable. This result implies that the learning algorithm is very efficient and has good performance of convergence.

Fig. 7
figure 7

Convergence analysis of the learning algorithm on Precision

Factor contribution analysis

We now analyze the ways in which different factors can help discover appropriate songs for each target user. In the FGMSI model, we consider five major factors: personal attributes (PA), friendship strength (FS), personal preference (PP), spatial feature (SF) and social influence (SI). Here, we verify the contribution of different factors denoted in our FGMSI model. Particularly, we use personal attributes as the basic features in the model and explore the contribution of the other four factors. We first remove four factors, namely, FS, PP, SF and SI, only keeping the basic information factor PA. We then add each of the four factors into the model and evaluate the performance improvement by each factor. Table 3 shows the results of factor analysis. We can obtain that almost all the factors are useful in generating music recommendation; however, their individual contributions are very different. Take precision as an example. FGMSI can, on average, increase by 7.3%, 8.1%, 2.3%, and 9.1% regarding FS, PP, SF and SI, respectively. We also observe that social influence is more important than other factors in generating music recommendation. This analysis indicates that our method works well when combining all features together.

Table 3 Factor contribution analysis

6 Related work

The aim of music recommendation is to help the user easily choose the favorite music pieces from a large music archive by associating the users’ social relationship, preferences with music and so on. To this end, there have already been a reasonable amount of researches on music recommendation [5, 6, 11, 12, 15, 22, 31, 41, 43]. We summarize the existing location recommendations into two categories: music recommendation and social influence.

6.1 Music Recommendation

There are different methods for music recommendations, including Collaborative Filtering (CF) approaches, Content based approaches, Context-Aware Music Recommendation, and Playlist Generation Techniques. However, to the best of our knowledge, how to apply the information of social influence between users to improve the recommendation performance has not been deeply explored.

Collaborative filtering method relies on user-music relationship, which has been proved to be one of the most effective recommendation approaches. In [26], authors predisposed the crawled data and observed the artists’ preference and proposed a ranking based algorithm combined with these preferences. Schedl et al. detected the usage of user’s demographic information in collaborative filtering [34]. In [1], Aizenberg et al. established similarities across users and items by analyzing listening patterns from users. In [36], Su et al. proposed a new method based on social and collaborative information that can bridge users’ preferences to music effectively. Qi et al. attempted to depict users’ preferences through the inferred user-to-tag ratings in order to improve user-based CF [30]. In [45], Zheng et al. put forward a Neighborhood-Integrated Matrix Factorization (NIMF) method for predictions of collaborative- and personalized-Web service QoS (Quality-of-Service) values.

Some researchers attempted to alleviate the cold start problem by incorporating content-based features. In [10], Cheng et al. explored the influence of music play sequence on developing effective personalized music recommender systems. Cheng et al. took into account the effect of venue types on users’ music preference and designed a venue-aware music recommender system [7]. In [9], a novel User-Information-Aware Music Interest Topic (UIA-MIT) model was proposed to find the potential music interest space of general users and capture the music preferences of users in different ages and genders. Authors leveraged a latent factor model for recommendation, and predicted the latent factors from music audio [13]. In [40], Wang et al. constructed a unified framework by using deep learning techniques which perform feature learning from audio signals and music recommendation. Guo et al. adopted learning-to-rank method to incorporate different features for music recommendation [17]. In [14], Ferwerda et al. modeled additional user features by embedding personality and emotional state to improve music recommendations.

A number of playlist generation techniques have been presented in the last decade. In [21], Jannach et al. proposed a novel algorithmic approach and optimization scheme to generate playlist. Shay et al. presented a model, which integrates per-artist parameters to obtain the distinct characteristics of an artist’s playlists, for playlist generation designed for Microsoft’s Groove music service [3]. Gillhofer et al. empirically classified the importance of personalization in choosing suitable songs [16].

Several researchers have previously studied the use of contextual information in various applications of music recommender systems. In [18], authors put forward a context-aware music recommender system based on contextual information by considering the most recent sequence of tracks liked by the user. In [32], Schedl et al. proposed a geospatial model that leverages GPS coordinates and a cultural model that uses semantic locations for the task of music recommendation. In [29], authors exploited the influence of users affecting the taste of friends for improving the quality of music recommendation. Marius et al. put forward a novel hybrid method to recommend music relevant with POIs [22].

6.2 Social influence

Considerable work has been proposed for studying the effects of social influence. In [20], authors designed an efficient algorithm for content-aware influence maximization leveraging bounded local arborescences to calculate influence propagation. Zhang et al. first embedded global and local influence nodes as regularization terms into a matrix factorization recommendation method, then fused social influence information into social recommendations [47]. Bilanakos et al. designed a modeling framework to study the optimal strategy followed by a monopolistic firm in order to process the information flow in a social network [4]. In [2], Althoff et al. analyzed how social networks affect user behavior in a physical activity tracking application. Hung et al. designed a novel model, Social Item Graph (SIG), that captures both influences in the form of hyperedges [19]. Zhang et al. carried out a sampling test to validate the existence of influence locality, and then formally denoted the influence locality function by observing social influence on retweet behaviors [46]. In this work, we also quantify social influence. However, we mainly focus on the problem of social influence aware music recommendations.

7 Conclusion

A large amount of users’ listening action data are left when they interact with social music streaming websites, which brings opportunities and challenges for researchers in the music recommendation. In this work, we propose a novel approach to improve music recommendations by integrating social influence. We start with a new approach exploring the distance between users. We then propose a new method that utilizes the topological potential to measure social influence. Lastly, we leverage the factor graphic model based on social influence to produce music recommendations. We conduct extensive experiments over one real-world music dataset. The experimental results show that the proposed method outperforms all the baseline methods.

In the future, we will study two directions of music recommendations to extend our method. First, we will continue to study how to take advantage of both emotion information and temporal influence, and detect novel usage of such information. Second, we will investigate more features to construct better and meaningful representations for enhancing music recommendations.