1 Introduction

Electronic commerce is an important topic in both academia and industry. As it becomes more and more popular, an increasingly large number of products are sold on electronic commerce platforms. Consequently, consumers on the platforms are facing the information overload problem, i.e., the difficulty of finding suitable products from too many of them. To overcome the information overload problem, recommendation methods have been developed to suggest suitable products for consumers [1,2,3]. In previous product recommendation methods, product-related content and consumer behaviors are frequently used to generate recommendations. Product-related content mainly refers to textual content, such as product descriptions and user-generated reviews. Consumer behaviors refer to various consumer actions performed on the electronic commerce platforms, such as clicking products, adding products to favorites, adding products to carts, and purchasing products. Consumers tend to view and compare multiple products before making a purchase. Therefore, the behaviors capture consumers’ interests in a timely manner and can be used to recommend products that meet their current needs. Besides, consumers’ propensity for interest drift further emphasizes the necessity of considering their behaviors in product recommendation.

Given the importance discussed above, consumer behaviors have been used for product recommendation in some studies in which the behaviors were treated equally [4, 5]. Different consumer behaviors were considered to have equal importance to product recommendation and the composites of different behaviors were not explored for product recommendation. However, the consumer behaviors indicate different preferences of consumers so do the composites of the behaviors. For example, purchasing behaviors should indicate stronger preferences for products than clicking behaviors. Treating heterogeneous consumer behaviors equally may lead to ineffective results. Therefore, there is a need to capture the heterogeneity of the consumer behaviors and reveal their importance in the product recommendation. To bridge the gap, this research proposes a heterogeneous network-based recommendation approach that not only ensures the recommendation performance but also identifies the roles of the behaviors and their composites in product recommendation.

The proposed approach aims to integrate heterogeneous behavioral relations between consumers and products to generate recommendations. Different types of entities (e.g., consumers and products) and consumer behaviors are collected to construct a heterogeneous network in which entities are denoted by different types of nodes and behaviors are represented by different types of edges. Meta paths that describe behavioral relations between consumers and products are used to calculate their similarities. Many meta paths exist in the heterogeneous network and some of them are trivial to product recommendation. Incorporating the trivial meta paths brings noise to product recommendation and leads to high computational cost. Therefore, this research proposes a heuristic selection mechanism to select informative meta paths for product recommendation. In addition, the research uses a non-negative matrix factorization method to learn the weights of the selected meta paths and then makes personalized recommendations for consumers. To evaluate the proposed approach, experiments are conducted on a real-world data set obtained from Taobao, the largest electronic commerce platform in China. Four recommendation methods that can leverage network information are used as baseline methods. The experimental results show that the proposed approach not only identifies the importance of the consumer behaviors, but also outperforms the baseline methods in terms of precision, recall, F-score.

The rest of the paper is organized as follows. Section 2 discusses previous works on product recommendation and studies that leverage heterogeneous network for personalized recommendation. Section 3 introduces the details of the heterogeneous network-based approach. Section 4 describes the experiments used to evaluate the proposed approach. Section 5 presents and discusses the experimental results. Section 6 provides the conclusions of this research.

2 Related work

2.1 Product recommendation

Personalized recommendation methods have been used to solve the information overload problem in various domains, such as researcher recommendation [6, 7], music recommendation [8, 9], project recommendation [10, 11], and group recommendation [12, 13]. In the electronic commerce domain, product recommendation has also attracted much attention and many product recommendation methods have been proposed. Previous product recommendation methods mainly use product-related content and/or consumer behaviors to infer consumer preferences and thus generate recommendations.

Product-related content (e.g., product categories, product descriptions, and user-generated reviews) can be mined to discover representations of products or consumers’ preferences. For example, Pourgholamali et al. [14] used word embedding techniques to learn product and consumer representations from their related textual content and applied the learned representations in matrix factorization and link prediction frameworks for product recommendation. Zhang and Piramuthu [15] used a latent Dirichlet allocation model to discover hidden topics and topic proportions in user-generated reviews and used the discovered topics to find similar products for product recommendation. Similar to the previous literature, Zhang et al. [16] exploited product reviews to learn product representations and used the representations to identify complements and substitutes of products for better product recommendation performance. Product-related content has also been mined to discover consumers’ sentiments towards products. For example, Jing et al. [17] proposed an enhanced collaborative filtering method that employs sentiment assessment techniques to extract aspect-level sentiment scores from user-generated reviews and uses the scores to find similar consumers for product recommendation. Similarly, Sun et al. [18] proposed a multi-aspect user-interest model that mines consumers’ sentiments and interests from user-generated reviews for product recommendation.

On electronic commerce platforms, consumers can perform different behaviors, such as clicking products, adding products to favorites, and purchasing products. The consumer behaviors provide timely information for inferring consumers’ preferences and have been used for product recommendation. For example, Linden et al. [19] proposed an item-based collaborative filtering method based on customers’ purchasing behavior to recommend products. Jia et al. [20] proposed a purchase prediction model with features extracted from consumer behaviors. Wang et al. [5] constructed a product graph based on consumers’ clicking and viewing behaviors and proposed an embedding with side information method to represent products in a low-dimensional space. Then they used the low-dimensional representations to find similar products for recommendation. Although the consumer behaviors were used in previous studies, they were treated equally. However, different behaviors and the composites of the behaviors indicate different consumer preferences. To capture the heterogeneity of the behaviors for product recommendation, this research proposes a heterogeneous network-based product recommendation approach.

In addition, some studies focus on modelling sequential behaviors for product recommendation. Such studies aim at learning complex functions for predicting the probability that a consumer will select a product at the current timestamp based on the sequence of his/her behaviors at previous timestamps. For example, Hidasi et al. [21] used recurrent neural networks to model consumers’ sequential behaviors and a Bayesian personalized ranking model to generate recommendations. Wu et al. [22] modelled consumers’ behavior sequences as graph-structured data and used graph neural networks to capture the complex sequence information for personalized recommendation. In addition, Wei et al. [23] used a long short-term memory model and an attention-based model to represent consumers’ long-term and short-term behaviors for next click recommendation. Although these studies considered consumer behaviors for product recommendation, they modelled the sequence of the behaviors. Different from these studies, the current research uses the heterogeneous network to model the heterogeneity of the behaviors and proposes a heterogeneous network-based approach for product recommendation.

2.2 Use of heterogeneous network in personalized recommendation

A heterogeneous network is a network that consists of different types of nodes and edges. It has been used in the recommendation domain due to the flexibility of modelling data heterogeneity. Previous studies have developed different methods to exploit information in heterogeneous networks for personalized recommendation and they can be classified into two categories, namely embedding-based methods and path-based methods.

Embedding-based methods learn representations of nodes and edges in a heterogeneous network and then incorporate the learned representations into a recommendation framework. The representations can be learned by network embedding techniques, which embed the nodes and edges from a symbolic space into a continuous vector space. Embedding-based methods can consist of different embedding techniques and recommendation frameworks. For example, Shi et al. [24] employed the DeepWalk technique to obtain user and item embeddings from a heterogeneous network and integrated the embeddings into the classic matrix factorization model for recommendation. Yu et al. [25] proposed a social friend recommendation method that uses the Skip-Gram technique to learn user embeddings and incorporates the embeddings into a Bayesian personalized ranking model for recommendation. Similarly, He et al. [26] proposed a heterogeneous network-based patent recommendation method that applies the Skip-Gram technique to learn user and patent embeddings and recommends patents based on the cosine similarity between the embeddings. Embedding-based methods have high flexibility in utilizing information in the heterogeneous network, but the uninterpretable nature of the learned representations makes recommendations less interpretable. This is worrying in electronic commerce because such ambiguity disqualifies consumers’ right to understand why certain products are recommended and may lead consumers to distrust the recommendations.

Path-based methods use explicit connections between nodes in a heterogeneous network to measure nodes’ relevance and leverage the relevance for personalization recommendation. For example, Deng and Ma [27] proposed a community recommendation method based on the heterogeneous network. They used PathSim measure to calculate user-community similarities based on expert-selected meta paths and aggregated the similarities with heuristic weights for community recommendation. Hu et al. [28] used PathSim measure to find similar users based on expert-selected meta paths and recommended items liked by the similar users. Similarly, Wang et al. [29] proposed a patent recommendation method that employs AvgSim measure to obtain user-patent similarities based on expert-selected meta paths and uses the weighted sum of the similarities to generate recommendations. Compared to the embedding-based methods, the path-based methods use explicit relations to measure entity similarities and are able to explain why entities are similar. Consequently, the path-based methods provide better interpretability to recommendations than the embedding-based methods do. The advantage makes the path-based methods more suitable for product recommendation. However, previous path-based methods mainly rely on experts to select meta paths. This requires expert knowledge and is labor-intensive when the number of possible meta paths is large. Therefore, this research proposes a selection mechanism to select informative meta paths and uses a non-negative matrix factorization method to learn path weights for product recommendation.

3 The heterogeneous network-based approach

Figure 1 presents the framework of the heterogeneous network-based approach. The framework has three modules. The network construction module uses the heterogeneous network to model behavioral relations between consumers and products. The similarity calculation module selects informative meta paths that indicate consumers’ preferences for products. It then calculates the similarities between consumers and products based on the selected meta paths. The product recommendation module learns the importance of different meta paths and then aggregates consumer-product similarities based on the selected meta paths for making recommendations. The following subsections introduce the details of the three modules.

Fig. 1
figure 1

Framework of the proposed approach

3.1 Network construction

Given a data set that contains heterogeneous data, this module defines a network schema to describe what kinds of data are contained in the data set. Then, the module constructs a heterogeneous network based on the network schema to model the heterogeneous data. Three concepts related to the heterogeneous network are defined and illustrated below.

3.1.1 Network schema

A network schema is defined as \(S=\left(\mathcal{A},\mathcal{ }\mathcal{R}\right)\) that comprises a set of entity types \(\mathcal{A}=\left\{A\right\}\) and a set of relation types \(\mathcal{R}=\left\{R\right\}\). It describes the types of entities and relations contained in a given data set. Figure 2 provides an example of the network schema for heterogeneous consumer behaviors in the electronic commerce context. The network schema comprises two types of entities (i.e., consumers and products) and four types of relations (i.e., consumers click products, consumers add products to favorites, consumers add products to carts, and consumers purchase products).

Fig. 2
figure 2

An example of the network schema

3.1.2 Heterogeneous network

Following previous research [30], a heterogeneous network is defined as \(G=\left(V, E\right)\) with an entity type mapping function \(\phi : V\to \mathcal{A}\), a relation type mapping function \(\psi : E\to \mathcal{R}\), and the following condition: \(\left|\mathcal{A}\right|>1\) or \(\left|\mathcal{R}\right|>1\). In the definition, \(V\) is a set of nodes that represent entities, \(E\) is a set of edges that represent relations, \(\mathcal{A}\) is a set of entity types, \(\mathcal{R}\) is a set of relation types, and \(\left|\mathcal{A}\right|\) and \(\left|\mathcal{R}\right|\) are the numbers of entity types and relation types, respectively. The entity type mapping function is to map each entity to its entity type and the relation type mapping function is to map each relation to its relation type. The condition \(\left|\mathcal{A}\right|>1\) or \(\left|\mathcal{R}\right|>1\) is to make sure that the network contains at least two types of entities or relations. Otherwise, it is a homogeneous network. Figure 3 provides an example of the heterogeneous network with the network schema shown in Fig. 2. In the heterogeneous network, consumers and products are connected by different relations. The connections between a consumer and a product indicate his or her preference for the product and can be used to make recommendations. For example, \(C1\overset {Add\;to\;carts} \longleftrightarrow P3\overset {Add\;to\;carts} \longleftrightarrow C3\overset {Purchase} \longleftrightarrow P5\) is an indirect connection between C1 and P5, which indicates that consumer C1 may like product P5. This is because consumer C1 is similar to consumer C3 in terms of adding product P3 to carts and C3 has purchased P5. Such connections convey semantic meanings and can be described by meta paths, which are defined in the following subsection.

Fig. 3
figure 3

An example of the heterogeneous network

3.1.3 Meta path

A meta path is defined as a sequence of relations between entity types and denoted as \(P={A}_{1}\overset {R_{1} } \longleftrightarrow {A}_{2}\overset {R_{2} } \longleftrightarrow \cdots \overset {R_{1} } \longleftrightarrow{A}_{l+1}\), where \({R}_{i}\left(i=\text{1,2},...,l\right)\) is a relation, \({A}_{j}\left(j=\text{1,2},\ldots ,l+1\right)\) is an entity type, and the number of relations defines the length of the meta path. The series of relations constitutes a composite relation \(R={R}_{1}\circ {R}_{2}\circ \cdots {\circ R}_{l}\) between \({A}_{1}\) and \({A}_{l+1}\). Therefore, meta paths describe composite relations between their starting and ending entity types, and convey different semantic meanings. For example, \(Consumer\overset {click} \longleftrightarrow Product\overset {click} \longleftrightarrow Consumer\) indicates the similarity between consumers in terms of clicking the same products and \({{Consumer}}\overset {click} \longleftrightarrow {{Product}}\overset {click} \longleftrightarrow {{Consumer}}\overset {{{Purchase}}} \longleftrightarrow {{Product}}\) indicates consumers’ preferences for products based on other similar consumers. In the current research, meta paths model the composite behavioral relations between consumers and products, and capture consumer preferences for product recommendation.

3.2 Similarity calculation

3.2.1 Meta path selection

A heterogeneous network can have many possible meta paths, but only some of them are informative to product recommendation. This research proposes a path selection mechanism to mine the informative meta paths. The selection mechanism first extracts all possible meta paths of a given path form and path length. Then, it selects meta paths of which the informativeness is above a given threshold. The path form describes the starting and ending entity types and involved relations of the meta paths. In this research, the informativeness of a meta path is the ratio of consumer-product pairs connected by the meta path to all possible consumer-product pairs. The more pairs the meta path connects, the more informative it is. For example, a meta path is not informative if it connects only 1 consumer-product pair out of 1000 possible consumer-product pairs, because it contributes no information to calculating the similarities of the other 999 consumer-product pairs.

Specifically, this research considers meta paths with the form of \(Consumer\overset {R_{1} } \longleftrightarrow \cdots {\text{ }}\overset {R_{{l - 1}} } \longleftrightarrow Consumer\overset {Purchase} \longleftrightarrow Product\) to measure the similarities between consumers and products. The path form indicates the similarities between consumers through composite behaviors and the last relation (i.e., Purchase) identifies the products liked by the similar consumers. The motivation resembles the underlying assumption of collaborative filtering, which states that consumers having similar opinions on some products tend to have similar opinions on the others [29]. This paper considers the purchased products as the ones that are known to be liked by the consumers because they have explicitly expressed their preferences. Given a network schema (e.g., Fig. 2) and the path form, the selection mechanism extracts all possible meta paths that start at the \(Consumer\) node and end at the \(Product\) node with the \(Purchase\) relation. The number of possible meta paths grows exponentially as path length increases. For example, 16 meta paths of length 3 and 256 meta paths of length 5 with the given form can be extracted from the network schema in Fig. 2. Considering all meta paths in the recommendation process is computationally expensive and prone to noise. For example, Sun et al. [31] used heterogeneous networks for top-k similarity search and demonstrated that a long meta paths reduces performance and increases computational cost. Liu et al. [32] used heterogeneous networks to model user preferences for recommendation and stated that computational cost grows exponentially with the increase of path length. They also concluded that long meta paths introduce more noise than information and result in poor generalization performance. Therefore, the selection mechanism considers only meta paths of a given length (e.g., 3 or 5).

The second step of the selection mechanism calculates the informativeness of the extracted meta paths and retains the ones with informativeness above a given threshold. The informativeness of meta path \({P}_{k}\) is calculated as follows:

$${I}_{{P_{k}}}^{{\prime}}=\frac{{N}_{{P_{k}}}}{{N}_{consumer}\times {N}_{product}}$$
(1)
$${I}_{{P_{k}}}=\frac{{I}_{{P_{k}}}^{{\prime }}}{{\sum }_{{P}_{k}\in M}{I}_{{P_{k}}}^{{\prime }}}$$
(2)

where \({I}_{{P_{k}}}^{{\prime}}\) is the informativeness of meta path \({P}_{k}\), \({I}_{{P_{k}}}\) is the normalized informativeness, \(M\) is the set of extracted meta paths, \({N}_{{P_{k}}}\) is the number of consumer-product pairs connected by \({P}_{k}\), \({N}_{consumer}\) is the number of consumers, \({N}_{product}\) is the number of products, and \({N}_{consumer}\times {N}_{product}\) is the total number of consumer-product pairs. The procedure of the path selection mechanism is presented in Fig. 4.

Fig. 4
figure 4

The path selection procedure

3.2.2 Consumer-product similarity calculation based on each meta path

The similarities between consumers and products based on meta paths can be measured by path-based similarity measures. Previous research has proposed several path-based measures to calculate the similarity between two entities in a heterogeneous network, such as Path-constrained Random Walks (PCRW) [33], PathSim [31], HeteSim [34], and AvgSim [35]. PCRW measures the similarity between any entities via any meta path. However, it is an asymmetric measure, which means that similarities obtained along and against a meta-path are not equal. To keep the symmetry property of similarity measures, PathSim, HeteSim, and AvgSim were proposed. Among them, PathSim calculates the similarity only between the same types of entities via symmetric meta paths of even lengths. HeteSim measures the similarity between any entities via any meta path, but it needs extra work to convert the meta paths of odd length to the meta paths of even length. This requires high computational costs. As an enhanced version of HeteSim, AvgSim directly measures the similarity between any entities via any meta path of any length. Therefore, this research uses AvgSim to calculate the similarities between consumers and products.

Given a meta path \({P}_{k}\) with a composite relation \(R={R}_{1}\circ {R}_{2}\circ \cdots {\circ R}_{l}\), the similarity between consumer \({c}_{i}\) and product \({p}_{j}\) is calculated as follows:

$$Similarity\left({c}_{i},{p}_{j}|{P}_{k}\right)=\frac{1}{2}\left[RW\left({c}_{i},{p}_{j}|{P}_{k}\right)+RW\left({p}_{j},{c}_{i}|{P}_{k}^{-1}\right)\right]$$
(3)
$$RW\left({c}_{i},{p}_{j}|{R}_{1}\circ {R}_{2}\circ \dots \circ {R}_{l}\right)=\left\{\begin{array}{ll}\frac{1}{\left|O\left({c}_{i}|{R}_{1}\right)\right|}{\sum }_{\text{q}=1}^{\left|O\left({c}_{i}|{R}_{1}\right)\right|}RW\left({O}_{q}\left({c}_{i}|{R}_{1}\right),{p}_{j}|{R}_{2}^\circ \cdots ^\circ {R}_{l}\right), &\quad if\,\, \left|O\left({c}_{i}|{R}_{1}\right)\right|\ne 0\\ 0,&\quad if\,\, \left|O\left({c}_{i}|{R}_{1}\right)\right|=0\end{array}\right.$$
(4)

where \({c}_{i}\) is a target consumer, \({p}_{j}\) is a candidate product, \({P}_{k}^{-1}\) is the reversed meta path of \({P}_{k}\), \(RW\left({c}_{i},{p}_{j}|{P}_{k}\right)\) and \(RW\left({p}_{j},{c}_{i}|{P}_{k}^{-1}\right)\) measure the reaching probability between \({c}_{i}\) and \({p}_{j}\) by randomly walking along and against \({P}_{k}\), respectively. The reaching probability between \({c}_{i}\) and \({p}_{j}\) is 0 if \(\left|O\left({c}_{i}|{R}_{t}\right)\right|=0, \forall {R}_{t}\in R\) because \({c}_{i}\) cannot reach \({p}_{j}\). Besides, \(O\left({c}_{i}|{R}_{1}\right)\) is the set of neighbors of \({c}_{i}\) based on relation \({R}_{1}\), \({O}_{q}\left({c}_{i}|{R}_{1}\right)\) is the \(q\)-th neighbor of \({c}_{i}\) based on relation \({R}_{1}\), and \(\left|O\left({c}_{i}|{R}_{1}\right)\right|\) is the number of neighbors in \(O\left({c}_{i}|{R}_{1}\right)\). Equation (4) calculates the reaching probabilities for each out-neighbor of \({c}_{i}\) to \({p}_{j}\). It then adds up the probabilities and normalizes the total probability by the number of out-neighbors to obtain the final reaching probability from \({c}_{i}\) to \({p}_{j}\). In addition, \(RW\left(s, t\right)\) equals 1 if entity \(s\) and entity \(t\) are the same entity and 0 otherwise.

3.3 Product recommendation

After calculating the similarities between consumers and products, the next step is to recommend products based on the selected meta paths. Recommendation score is used to reflect the predicted preferences of a consumer for a product. Given a consumer \({c}_{i}\) and a product \({p}_{j}\), the recommendation score is defined as follows:

$$RS\left({c}_{i},{p}_{j}\right)={\sum }_{{P}_{k}\in Path}{w}_{k}\cdot Similarity\left({c}_{i},{p}_{j}|{P}_{k}\right)$$
(5)

where \(Path\) is the set of selected meta paths, \({w}_{k}\) is the weight assigned to meta path \({P}_{k}\), \({w}_{k}\in \left[\text{0,1}\right]\), and \(RS\left({c}_{i},{p}_{j}\right)\in \left[\text{0,1}\right]\). For each target consumer, all candidate products are sorted based on their recommendation scores and products with the highest recommendation scores are recommended. The recommendation is based on a single meta path if \(Path\) only contains \({P}_{k}\), otherwise it is based on multiple meta paths. When recommending products based on multiple meta paths, the weights assigned to them should capture their importance to product recommendation. The weight learning process is presented below.

Let \(w\) denote the weight vector in which the element \({w}_{k}\) is the weight assigned to meta path \({P}_{k}\), \(RS\) the recommendation score matrix in which the entry \({RS}_{i,j}\) is the recommendation score between consumer \(i\) and product \(j\) calculated using Eq. (3), and \(Y\) the consumer-product purchasing matrix in which the entry \({y}_{i,j}=1\) if consumer \(i\) has purchased product \(j\) and \({y}_{i,j}=0\) otherwise. The weight vector should satisfy the requirement that the recommendation scores based on the selected meta paths are as close as to the real consumer-product purchasing relationships. Therefore, the weights of the selected meta paths can be obtained by solving the following optimization problem:

$$\begin{array}{c}\underset{w}{\text{min}}L\left(w\right)=\frac{1}{2}{||Y-RS||}_{2}^{2}+\frac{\lambda }{2}{||w||}_{2}^{2}\\ s.t.\, w\ge 0 \end{array}$$
(6)

where \(RS={\sum }_{{P}_{k}\in Path}{w}_{k}\cdot {Similarity}_{{P_{k}}}\), \({Similarity}_{{P_{k}}}\) is the consumer-product similarity matrix calculated based on meta path \({P}_{k}\), \(\lambda\) is a regularization parameter used to prevent overfitting, \({||\cdot ||}_{2}\) is the Frobenius norm, and \({||\cdot ||}_{2}^{2}\) is the square of the Frobenius norm. The optimization problem is a non-negative quadratic programming problem and thus this research employs a gradient descent method [36] to solve the problem.

The weight learning procedure is presented in Fig. 5, where \(\frac{\partial L\left(w\right)}{\partial {w}_{k}}\) is the gradient of Eq. (6) with respect to \({w}_{k}\). In the procedure, the weights of all meta paths have been normalized to ensure that they are added up to 1.

Fig. 5
figure 5

The weight learning procedure

4 Experimental evaluation

To evaluate the effectiveness of the proposed approach, experiments are conducted based on real-world data. The following subsections introduce the data, evaluation metrics, and baseline methods used in the experiments.

4.1 Data description

The data used in the experiments are from a public data setFootnote 1, which was collected from Taobao, the largest electronic commerce platform in China. The original data set has more than 4 million products and about 1 million consumers who have behaviors (i.e., clicking products, adding products to favorites, adding products to carts, and purchasing products) during November 25 to December 03, 2017. Without compromising the evaluation, 1000 consumers who have purchased at least 2 products are randomly selected from the original data set. Consumers who have purchased only one product are filtered out to increase data density because the original data set is very sparse. Then, the products that are related to at least two of the selected consumers are chosen. The products that are related to only one consumer are filtered out because they provide little information to the baseline methods as well as the proposed approach. The relations between the selected consumers and products are also included in the selected data set. It is noted that consumers must click a product if they add the product to favorites, add the product to carts, or purchase the product. Similarly, consumers are likely to add a product to carts before purchasing the product. To differentiate between different types of relations, the endogenous duplicates among them are removed from the selected data set. Specifically, the records that consumers clicked products are removed if the consumers added the products to favorites, added the products to carts, or purchased the products. Similarly, the records that consumers added products to carts are removed if they purchased the products. The statistics of the selected data set are presented in Table 1.

Table 1 The statistics of the selected data set

The experimental procedure is described as follows. First, the purchasing records in the selected data set are randomly split into five parts of equal size. One of the five parts is used as a test set and the rest are used as a training set. Second, recommendation methods are trained on the training set and recommendation results are compared with the test set to evaluate the performance. Third, the experiments are repeated for five times. At each time, a different part of the purchasing records is used as the test set so that all the purchasing information is utilized for evaluation. Fourth, the experimental results of the five runs are averaged to obtain the final results.

Based on the training data set and the selection procedure introduced in Sect. 3.2.1, the proposed approach selects 10 out of 16 possible meta paths of length 3 by setting the informativeness threshold \(\theta\) to 0.01. The 16 meta paths are introduced in Table 2.

Table 2 The 16 meta paths of length 3

4.2 Evaluation metrics

To evaluate the recommendation performance, three common metrics are used, i.e., precision, recall, and F-score [37]. Given a target consumer, precision measures the percentage of the recommended products that were purchased by the consumer. Recall is the ratio of correctly recommended products to the products that were purchased by the consumer. F-score is the harmonic mean of precision and recall. The three metrics are mathematically defined as follows:

$$precision@n=\frac{\left|RSet\cap TSet\right|}{\left|RSet\right|}$$
(7)
$$recall@n=\frac{\left|RSet\cap TSet\right|}{\left|TSet\right|}$$
(8)
$$F{-}score@n=\frac{{2\times {precision}\times {recall}}}{{precision+recall}}$$
(9)

where precision@n, recall@n, and F-score@n are the precision, recall, and F-score when recommending the top \(n\) products. \(RSet\) is the set of products recommended to the target consumer and \(TSet\) is the set of products purchased by the target consumer in the test set.

4.3 Baseline methods

This study selects four state-of-the-art recommendation methods for comparison. First, the matrix factorization method (MF) [38], which is one of the most popular recommendation methods and can leverage network information for recommendation. The MF method maps the network information into a low-dimensional latent space \({R}^{f}\) in which each consumer and product are modeled as vector \({u}_{c}\) and vector \({v}_{p}\), respectively. The dot product of the two vectors indicates the interaction between consumer \(c\) and product \(p\). Let \({y}_{cp}\) be the actual interaction between consumer \(c\) and product \(p\). \({y}_{cp}=1\) if consumer \(c\) has interacted with product \(p\) and \({y}_{cp}=0\) otherwise. Let \(I\) denote the set of consumer-product pairs for which \({y}_{cp}=1\). The consumer and product vectors can be learned by minimizing the regularized squared error on the set of consumer-product pairs with known interactions:

$$min\sum _{(c,p)\in I}{({y}_{cp}-{v}_{p}^{T}{u}_{c})}^{2}+\lambda ({||{v}_{p}||}^{2}+{||{u}_{c}||}^{2})$$
(10)

where \(\lambda\) is a constant that controls the extent of regularization to avoid overfitting. The optimization problem can be solved by gradient descent methods. After obtaining consumer vector \({u}_{c}\) and product vector \({v}_{p}\), the preference of consumer c for product \(p\) is estimated by the dot product \({v}_{p}^{T}{u}_{c}\). Given a target consumer, candidate products with the highest estimated preferences are recommended.

Second, the random walk method (RW), which is one of the widely-used network-based recommendation methods [39]. Similar to the proposed approach, the RW method models data as a network in which nodes represent consumers and products and edges represent their interactions. Then, the RW method generates recommendations based on the transition probabilities of a random walk that goes from target consumers to candidate products. Let \(X\) denote the adjacency matrix of the network and \(D\) the diagonal matrix of the node degrees in the network. Then, the transition matrix of the random walk can be calculated as \(T={D}^{-1}X\). Given the transition matrix \(T\), the RW method calculates the transition probability matrix as follows:

$$M={T}_{\alpha }^{3}$$
(11)

where \({T}_{\alpha }\) is to raise the entries of the matrix \(T\) to the power of \(\alpha\) and \({T}_{\alpha }^{3}\) is the third power of the matrix \({T}_{\alpha }\). The parameter \(\alpha\) is empirically optimized to achieve the best recommendation performance. The entry \({M}_{c,p}\) in \(M\) is the transition probability from consumer \(c\) to product \(p\) in the network. Given a target consumer, candidate products with the highest transition probabilities are recommended.

Third, the heterogeneous network embedding based recommendation method (HERec), which leverages meta paths to infer node embeddings and fuses the embeddings into an extended MF model for preference prediction [24]. Similar to the proposed approach, this benchmark uses the heterogeneous network and meta paths for recommendation. Specifically, the benchmark uses the meta paths to generate node sequences from the heterogeneous network for node embedding. This step is to obtain consumer embedding \({e}_{c}\) and product embedding \({e}_{p}\), which belong to the same embedding space \({\mathbb{R}}^{d}\). Then, the benchmark integrates the embeddings into the extended MF model to estimate consumer preferences:

$$\widehat{{r}_{cp}}={x}_{c}^{T}\cdot {y}_{p}+\alpha \cdot {e}_{c}^{T}\cdot {\gamma }_{p}+\beta \cdot {\gamma }_{c}^{T}\cdot {e}_{p}$$
(12)

where \(\widehat{{r}_{cp}}\) denotes the estimated preference of consumer \(c\) for product \(p\), \({x}_{c}\in {\mathbb{R}}^{D}\) and \({y}_{p}\in {\mathbb{R}}^{D}\) are the consumer’s and product’s latent factors obtained by factorizing the consumer-product purchasing matrix, \({\gamma }_{c}\) and \({\gamma }_{p}\) are consumer-specific and product-specific latent factors for pairing with the embeddings; and \(\alpha\) and \(\beta\) are parameters for combining different parts in the equation.

Fourth, a variant of the proposed approach, i.e., the heterogeneous network-based approach with equal weights for each selected meta path (HN-AVG). This approach infers the preference of a consumer for a product by calculating their average similarity based on all the selected meta paths. The research selects this benchmark to evaluate the effectiveness of the weight learning mechanism in the proposed approach.

5 Results and discussion

5.1 Recommendation performance of different meta paths

Figures 6, 7, and 8 report the recommendation performance of the proposed approach based on different meta paths of length 3. The results show that the meta paths contribute differently to recommendation precision, recall, and F-score. For example, the figures show that \({P}_{1}\), \({P}_{13}\), and \({P}_{16}\) are the 3 meta paths that perform best while \({P}_{7}\), \({P}_{10}\), and \({P}_{12}\) are the 3 meta paths that perform worst in all cases. The results indicate that composite behaviors have different importance to product recommendation. Among the four consumer behaviors, clicking and purchasing behaviors lead to better recommendation performance than the other two behaviors. The results also confirm the need of treating meta paths differently when recommending products based on multiple meta paths.

Fig. 6
figure 6

Recommendation precision based on meta paths of length 3

Fig. 7
figure 7

Recommendation recall based on meta paths of length 3

Fig. 8
figure 8

Recommendation F-score based on meta paths of length 3

In the three figures, \(All\) and \(Selected\) indicate the heterogeneous network-based approach based on all meta paths and the selected meta paths in Table 2, respectively. The recommendation results based on all meta paths and the selected meta paths are much better than the performance based on every single meta path. This shows the effectiveness of the heterogeneous network-based approach for leveraging different consumer behaviors for product recommendation. Besides, the recommendation results based on all meta paths and the selected meta paths are very close. The unselected meta paths barely bring new information to the product recommendation. The proposed selection mechanism can select informative meta paths since the selected meta paths have performance comparable to that of all meta paths.

5.2 Robustness check

To further confirm the effectiveness of the heterogeneous network-based approach, this research varies path length from 3 to 5 and conducts experiments similar to the last subsection. Following the selection procedure introduced in Sect. 3.2.1, 19 out of 256 meta paths of length 5 with the form of \(Consumer\overset {R_{1} } \longleftrightarrow \cdots \overset {R_{{l - 1}} } \longleftrightarrow Consumer\overset {Purchase} \longleftrightarrow Product\) are selected by setting the informativeness threshold \(\theta\) to 0.01. The 19 selected meta paths of length 5 are listed in Table 3.

Table 3 The 19 selected meta paths of length 5

Figures 910 and 11 present the recommendation results of the heterogeneous network-based approach based on different meta paths of length 5. The results are consistent with the results when path length is set to 3. First, the composite behaviors contribute differently to product recommendation. Among the 19 selected meta paths, \({P}_{1}^{{\prime}}\), \({P}_{4}^{{\prime}}\), and \({P}_{18}^{{\prime}}\) have the best performance while \({P}_{2}^{{\prime}}\), \({P}_{8}^{{\prime}}\), and \({P}_{15}^{{\prime}}\) have the worst performance in all cases. The results confirm that clicking and purchasing behaviors lead to better recommendation performance than the behaviors of adding products to favorites and carts. The results further confirm the need of considering the heterogeneity of consumer behaviors in product recommendation. In addition, recommendation performance based on multiple meta paths is much better than the performance based on every single meta path. The result validates the effectiveness of using the heterogeneous network to capture the heterogeneity of the consumer behaviors for product recommendation.

Fig. 9
figure 9

Recommendation precision based on meta paths of length 5

Fig. 10
figure 10

Recommendation recall based on meta paths of length 5

Fig. 11
figure 11

Recommendation F-score based on meta paths of length 5

Second, the figures confirm the effectiveness of the proposed selection mechanism. Top30 in the figures means the heterogeneous network-based approach based on the top 30 informative meta paths. Not all meta paths of length 5 are considered in the experiment because there are too many of them. Surprisingly, the heterogeneous network-based approach produces almost the same results when using the 19 selected meta paths and the top 30 informative meta paths. This means that the 19 selected meta paths cover most useful information for the product recommendation and the extra 11 meta paths provide little new information.

Besides, the results show that the heterogeneous network-based approach has better recommendation performance when setting path length to 5 than setting it to 3. This is because more information is incorporated for recommendation when setting path length to 5. On one hand, long meta paths can capture long-distance relations between consumers and products that are ignored by short meta paths. On the other hand, long length produces a larger number of possible meta paths and thus a larger number of informative meta paths. However, meta paths longer than 5 are not further considered since this study does not aim at finding the optimal path length for product recommendation. Moreover, meta paths that are too long could bring noise to the recommendation and increase computational cost.

5.3 Recommendation performance of different approaches

To demonstrate the superiority of the proposed heterogeneous network-based approach, this study compares it with four baseline methods, i.e., the MF, RW, HERec, and HN-AVG methods introduced in Sect. 4.3. Path length is set to 5 since length 5 leads to better performance than length 3. Figures 12, 13, and 14 present the recommendation performance of the baseline approaches and the proposed approach based on the selected meta paths of length 5. HN in the figures represents the proposed heterogeneous network-based approach. The figures show that the HERec, HN-AVG and HN methods have larger precision, recall, and F-score than the MF and RW methods. Compared to the latter two methods, the former three methods take advantage of heterogeneous consumer behaviors based on the heterogeneous network and the selected meta paths. This validates the effectiveness of using the heterogeneous network to model the heterogeneity of the consumer behaviors for product recommendation. The results show that the HN method has better performance than the HERec method. This benchmark was originally designed for predicting explicit user preferences (e.g., product ratings from 1 to 5). However, consumer preferences in the current context are implicit (i.e., purchasing a product or not). The difference may lead to the poorer performance. The HN method also outperforms the HN-AVG method, which validates the effectiveness of using the non-negative matrix factorization method to learn path weights for product recommendation.

Fig. 12
figure 12

Precision of product recommendation methods

Fig. 13
figure 13

Recall of product recommendation methods

Fig. 14
figure 14

F-score of product recommendation methods

5.4 Discussion

The absolute values of recommendation precision, recall, and F-score are not high for the proposed method and the baseline methods. This is due to the data sparsity of the collected data set. There are 1000 consumers, 15,755 products, and only 4557 purchasing records in the data set. Averagely, each consumer has purchased less than 5 products out of 15,755 products in the data set. Among the 4557 purchasing records, 20% of them are randomly selected as the test set. Averagely, each consumer has less than 1 purchased product in the test set. It is difficult to suggest the right product for a consumer from such a large pool of candidate products. For example, precision@5 would not exceed 0.20 even when all the right products are recommended to the 1000 consumers. This is the main reason why recommendation metrics have low values in the current study.

This study has a few practical implications according to the results. The results show that the heterogeneous network-based approach outperforms the baseline methods in terms of precision, recall, and F-score. From the perspective of consumers, a more effective recommendation can lower their searching cost when finding suitable products on electronic commerce platforms. From the platforms’ perspective, a more effective recommendation can improve consumers’ loyalty and increase the platforms’ revenue. The proposed approach leverages meta paths to represent the information considered in product recommendation and uses path weights to determine the extent to which the information is important to the recommendation. Platform mangers can leverage such mechanism to explain why certain products are recommended. The results also show that clicking behavior, purchasing behavior, and their combinations lead to better recommendation performance than other consumer behaviors and other combinations. Therefore, platform managers can better manage the two consumer behaviors for further enhancing recommendation performance. For example, the managers can remove unintentional clicking records that do not represent consumers’ true preferences. In addition, the managers can remove purchasing records that end with cancellations or refunds because these records indicate that consumers do not like the purchased products.

6 Conclusion

With the rapid growth of online products, product recommendation has become the cornerstone of electronic commerce platforms. Consumers on the platforms perform various behaviors, which indicate their interests in a timely manner and can be used to recommend products. Although some studies have used the consumer behaviors for product recommendation, the heterogeneity and the roles of the behaviors in product recommendation have seldom been explored. The consumer behaviors convey different information about consumers’ interests and treating them equally may lead to ineffective recommendation results. This research aims at addressing a new problem that is to capture the heterogeneity of the consumer behaviors for product recommendation and to reveal the importance of different behaviors in the recommendation. To achieve the goal, this research proposes a heterogeneous network-based recommendation approach that integrates heterogeneous consumer behaviors in a systematic way and uses composite behaviors to infer consumer preferences for products. Experiments based on a real word data set demonstrate that the proposed approach outperforms the baseline methods in terms of precision, recall, and F-score.

Possible future work of this research is also discussed. Firstly, this research only used behavioral information for product recommendation while ignored the product-related content. Future research will use text analysis techniques to extract information (e.g., topics and sentiment) from product-related content and merge them with behavioral information in the heterogeneous network to enhance product recommendation. Secondly, the meta paths convey semantic information and can be used to explain why certain products are recommended. Future research will explore explainable product recommendation based on the heterogeneous network.