Keywords

1 Introduction

Market basket analysis aims to discover meaningful patterns from massive users’ transaction data [3]. It helps retailers analyze sale trends, optimize commodity placement, and comprehend users’ preferences. Especially with the prevalence of mobile applications and online e-commerce systems, market basket analysis becomes even more important in stimulating the consumptions and enlarging the selling profits, by providing the key technologies for personalized next-basket recommendation.

Generally, existing methods on basket recommendation can be summarized into two paradigms. One is the item-centric paradigm, which explores the sequential transaction data by predicting the next purchase based on the last actions. A major advantage of this model is its ability to capture sequential behavior for good recommendations, e.g. for a user who has recently bought a mobile phone, it may recommend accessories that other users have bought after buying that phone. A number of approaches have been proposed to mine the meaningful sequential patterns from users’ transactional data [11, 15]. However, the directly mined sequential patterns are usually very sparse and hard to cover many long-tailed items and users.

The other is the user-centric paradigm, with the key idea as that “one is likely to buy the items favored by similar users”. One of the most successful methods in this paradigm is the model based collaborative filtering [1, 6, 9, 17]. The typical way is to represent users’ historical purchase behaviors as a user-item matrix by discarding transaction information, and apply matrix factorization for recommendation. Obviously, these methods are good at modeling users’ general interests, but hard to capture the sequential purchase behaviors of users which is often crucial for the basket recommendation.

A better solution for basket recommendation, therefore, is to take both item-centric and user-centric paradigms into consideration. For this purpose, we propose a hybrid method in this paper, namely Co-Factorization model over Sequential and Historical purchase data (CFSH for short). Specifically, on one hand, we apply sequential pattern mining methods to the massive transaction data, and aggregate all the mined patterns to form a item-item matrix. On the other hand, we construct a user-item matrix based on users’ whole historical purchase data. These two matrices are then simultaneously factorized to learn the low-dimensional representations of both users and items. With the learned representations, we provide personalized basket recommendation based on both sequential behaviors and users’ general interests.

Compared with existing basket recommendation methods, our approach has the following advantages:

  • By mining and factorizing global sequential patterns, our approach can avoid the data sparseness problem in traditional item-centric methods;

  • By factorizing the item-item matrix and user-item matrix simultaneously, our approach can exploit both sequential and historical behaviors to learn better representations of both users and items;

  • By adopting a hybrid recommendation approach, our model enjoys the merits of both item-centric and user-centric paradigms, and thus achieves better performances on basket recommendation.

We conducted empirical experiments over three real-world purchase datasets: two from retailers and one from the online e-commerce website. Both the existing item-centric and user-centric methods have been taken into comparison. The results demonstrate that the proposed CFSH method can perform significantly better than the baseline methods on basket recommendation task.

2 Related Work

In this section, we briefly review the related work on basket recommendation from the following two aspects, i.e. item-centric models and user-centric models.

2.1 Item-Centric Models

The key idea lies in item-centric models is the phenomenon observed in users’ purchase behavior that “buying one item leads to buying another next”. Therefore, item-centric models, mostly relying on Markov chains, explore the sequential transaction data by predicting the next purchase based on the last actions. For example, in early work, different data mining methods including ApriorALL, SPADE have been designed for mining frequent sequential patterns among items. Zimdar et al. [2] investigate how to extract sequential patterns to learn the next state using probablistic decision-tree models. Mobasher et al. [16] study different sequential patterns for recommendation and find that contiguous sequential patterns are more suitable for sequential prediction task than general sequential patterns. However, the directly mined sequential patterns are usually too sparse with respect to the size of users and items. One way to tackle the data sparseness problem is to learn latent models over the sequential patterns.

2.2 User-Centric Models

User-centric models, in contrast, does not take sequential behavior into account but make recommendation based on users’ whole purchase history. The key idea is collaborative filtering (CF) which can be further categorized into memory-based CF and model-based CF [19]. The memory-based CF provides recommendations by finding k-nearest-neighbor of users or items based on certain similarity measure [15]. While the model-based CF tries to factorize the user-item correlation matrix for recommendation.

For example, Lee et al. [9] treat the market basket data as a binary user-item matrix, and apply a binary logistic regression model based on principal component analysis (PCA) for recommendation. Hu et al. [8] conduct the factorization on user-item pairs with least-square optimization and use pair confidence to control the importance of observations. Rendle et al. [18] propose a different optimization criterion, namely Bayesian personalized ranking, which directly optimizes for correctly ranking over item pairs instead of scoring single items. They apply this method to matrix factorization and adaptive KNN to show its effectiveness. General speaking, these models are good at capturing users’ general taste, but can hardly adapt its recommendations directly to users’ recent purchases without modeling sequential behavior.

3 Our Framework

In this paper, we propose to factorize both users’ sequential and historical purchase data for basket recommendation. By adopting such a hybrid recommendation method, we can enjoy the advantages of both item-centric and user-centric paradigms. In this section, we will present the proposed method, namely the Co-Factorization model over Sequential and Historical purchase data (CFSH for short) in detail. Specifically, we first introduce the notations used in this work. Then we describe how to factorize the sequential and historical purchase data for basket recommendation respectively. The hybrid model CFSH is then presented based on the above two recommendation paradigms. Finally, we present the optimization algorithm for the proposed hybrid recommendation model.

3.1 Notations

Let \(U=\{u_1,u_2,\ldots ,u_{|U|}\}\) be a set of users and \(I=\{i_1,i_2,\ldots , i_{|I|}\}\) a set of items, where |U| and |I| denote the total number of unique users and items respectively. For each user \(u\in U\), a purchase history \(T^{u}\) of his transactions is given: \(T^{u}:=(T^{u}_1,T^{u}_2,\ldots ,T^{u}_{t_{u}-1} )\), where \(T^u_t \subseteq I\), \(t\in [1,t_u-1]\). The purchase history of all users is denoted as \(T:=\{T^{u_1},T^{u_2},\ldots ,T^{u_{|U|}} \}\). More formally, let \(V^U=\{\varvec{v}^U_u\in R^n|u\in U\}\) denote all the user vectors and \(V^I=\{\varvec{v}^I_i\in R^n|i\in I\}\) all the item vectors. Given each user‘s purchase history, the task is to recommend items that user u would probably buy at the next (i.e. \(t_{u}\)-th) visit.

3.2 Factorizing Sequential Purchase Data

The item-centric models explore the sequential patterns in transaction data and provide basket recommendation based on users’ last actions. A severe problem of existing methods in this paradigm is the sparseness of sequential patterns, which is too sparse to generalize well for long-tail users and items. In our work, therefore, we propose to mine the sequential patterns in a global way, and factorize the mined patterns to learn better low dimensional representations of items for recommendation. Here we first give the definition of sequential patterns in transactional data.

Definition 1

Sequential Pattern. Given the transaction set \(T^{u}:=(T^{u}_1,\ldots ,T^{u}_{t_{u}-1} )\) of user u, Sequential Pattern is defined as a weighted pair of items \(<i_k,i_{k'},s^{u}_{k,k'}>\), where \(i_k\in T_p^{u}\), \(i_{k'}\in T_{q}^{u}\), \(p< q\), and \(s^{u}_{k,k'}\) denotes the support of the sequential pattern.

If we restrict that the patterns are mined from consecutive transactions of each user, then the obtained patterns are called as Contiguous Sequential Pattern (CSP for short) [4]. Otherwise, they are called as Non-Contiguous Sequential Patterns (NCSP for short). Existing work finds that CSPs are more suitable for sequential prediction task. Figure 1 shows an example, where six CSPs can be mined from a user with three transactions. The mined CSPs can be represented as a matrix \(S^{u}\), where the entry \(S^{u}_{k, k'}\) corresponds to the pattern \(<i_k,i_{k'},s^u_{k,k'}>\), and the question mark denotes a missing pattern. Obviously, the CSPs capture the local dependency between user’s purchase behaviors. For example, a user would probably buy a sim card in the next transaction if she bought a phone in the previous transaction. A higher support of the pattern indicates higher possibility that sequential behavior will take place.

Fig. 1.
figure 1

Contiguous sequential patterns mined from a single user.

We conduct the above mining procedure over each user’s transaction data and aggregate all the mined CSPs into one global matrix \(S=\sum _uS^u\). This matrix captures all the globally salient sequential patterns, and is more dense and more robust to noise than any single user’s matrix (i.e. \(S^u\)). We then factorize S to learn the low dimensional representations of items, by assuming that the support of the sequential pattern can be recovered by the representations of the two items within the pattern. The objective function of our matrix factorization is shown as follows:

$$\begin{aligned} \begin{aligned}&\displaystyle \mathop {\mathrm {min}}\{ \quad \Vert S-V^IV^{I^T}\Vert ^2+\lambda \Vert V^I\Vert ^2\}\\&\text{ s.t. }\quad \begin{array}{lc} V^I \ge 0\\ \end{array}. \end{aligned} \end{aligned}$$
(1)

where \(\lambda \) is the regularization coefficient.

With the learned low dimensional representations of items, we can then provide personalized basket recommendation given the user’s last transaction. Specifically, given the user u’s last transaction \(T^u_{t_u-1}\), we calculate the probability of item \(i_l\) to appear in the next transaction as the aggregated support for the item

$$\begin{aligned} \begin{aligned} P(i_l\in T^u_{t_u}|T^u_{t_u-1})\propto agg\_supp_u(i_l)=\sum _{i_k \in T^{u}_{t_u-1}} \varvec{v}^I_k\cdot \varvec{v}^I_l \end{aligned} \end{aligned}$$
(2)

We then sort the items according to the probability and obtain the top-K items for recommendation.

3.3 Factorizing Historical Purchase Data

For the user-centric paradigm, we follow the previous practice and apply model based collaborative filtering. Specifically, we first construct the user-item matrix H based on users’ whole historical purchase data, as shown in Fig. 2. Note here all the sequential information has been discard. The entry \(H_{u,k}\) denote the purchase count of item \(i_k\) for user u. The higher the count is, the more likely the user will buy the item.

Fig. 2.
figure 2

Customer-item matrix mined from transactions. Users’ purchase count to a certain item indicates users’ general interest to it. Elements with ? indicate unobserved purchase count.

We then factorize the user-item matrix H to learn the low dimensional representations of users and items with the following objective function.

$$\begin{aligned} \begin{aligned}&\displaystyle \mathop {\mathrm {min}} \{\parallel H-V^UV^{I^T}\parallel ^2 +\lambda _1\parallel V^U\parallel ^2+\lambda _2\parallel V^I\parallel ^2\}\\&\text{ s.t. }\quad \left\{ \begin{array}{lc} V^U \ge 0\\ V^I \ge 0\\ \end{array}\right. \end{aligned} \end{aligned}$$
(3)

where \(\lambda \) is the regularization coefficient With the learned representations of users and items, we calculate the probability of item \(i_l\) to appear in user u’s next transaction as the preference of the user on this item:

$$\begin{aligned} \begin{aligned} P(i_l\in T^u_{t_{u}}|u)\propto pref_{u}(i_l)=\varvec{v}^U_{u}\cdot \varvec{v}^I_{l} \end{aligned} \end{aligned}$$
(4)

We then sort the items according to the probability and obtain the top-K items for recommendation.

3.4 Hybrid Method

Now we have described how to provide personalized basket recommendation in two ways, i.e. one explores the correlation between items and the other relies on the correlation between users and items. The former can well capture the sequential behaviors of users while the latter can model users’ general interests. Our idea is then to combine the two ways so that we can enjoy the powers of both paradigms and meanwhile complement each other to achieve better performances.

Specifically, we propose to simultaneously factorize the item-item matrix and the user-item matrix, by sharing the same low dimensional representations of items. The objective function of our hybrid method is as follows:

$$\begin{aligned} \begin{aligned}&\displaystyle \mathop {\mathrm {min}} \{\alpha \parallel H-V^UV^{I^T}\parallel ^2 +(1-\alpha )\parallel S-V^IV^{I^T}\parallel ^2+ \lambda _1\parallel V^U\parallel ^2+\lambda _2\parallel V^I\parallel ^2\}\\&\text{ s.t. }\quad \left\{ \begin{array}{lc} V^U \ge 0\\ V^I \ge 0\\ \end{array}\right. \end{aligned} \end{aligned}$$
(5)

where \(\alpha \in [0,1]\) is the coefficient which balance the importance of the two paradigms, and \(\lambda _1\) and \(\lambda _2\) denote the regularization coefficients.

With the learned low dimensional representations of users and items, we provide personalized basket recommendation also in a hybrid way. Specifically, given user u’s last transaction \(T^u_{t_{u}-1}\), we calculate the probability of item \(i_l\) to appear in the next transaction as follows

$$\begin{aligned} \begin{aligned} P(i_l\in T^u_{t_u}|T^u_{t_u-1})=\alpha \sum _{i_k \in T^{u}_{t_u-1}} \varvec{v}^I_k\cdot \varvec{v}^I_l +(1-\alpha )\varvec{v}^U_{u}\cdot \varvec{v}^I_{l} \end{aligned} \end{aligned}$$
(6)

We then sort the items according to the probability and obtain the top-K items for recommendation.

3.5 Optimization of the Hybrid Objective

To optimize the hybrid objective mentioned above, we find there is no closed-form solution for Eq. 5. Therefore, we apply an alternative minimization algorithm to approximate the optimal result [5]. The basic idea of this algorithm is to optimize the loss function with respect to one parameter, with all the other parameters fixed. The algorithm keeps iterating until convergence or the maximum of iterations. First we fix \(V^I\), and calculate \(V^U\) to minimize Eq. 5. The updating algorithm is shown in Algorithm 1.

figure a

4 Experiment

4.1 Data Description

We evaluate different recommendation methods based on three real-world purchase datasets, i.e. two retail datasets namely Ta-Feng and BeiRen, and one e-commerce dataset namely T-Mall.

  • The Ta-FengFootnote 1 dataset is a public dataset released by RecSys conference, which covers items from food, office supplies to furniture.

  • The BeiRen dataset comes from BeiGuoRenBaiFootnote 2, a large retail enterprise in China, which records its supermarket purchase history during the period from Jan. 2013 to Sept. 2013.

  • The T-MallFootnote 3 dataset is a public online e-commerce dataset released by TaobaoFootnote 4, which records the online transactions in terms of brands.

We first conducted some pre-processes on these purchase datasets. For both Ta-Feng and BeiRen dataset, we remove all the items bought by less than 10 users and users that has bought in total less than 10 items. For the T-Mall dataset, which is relatively smaller, we remove all the items bought by less than 3 users and users that has bought in total less than 3 items. The statistics of the three datasets after pre-processing are shown in Table 1. Finally, we split all the datasets into two non overlapping set, one used for training and the other for testing. The testing set contains only the last transaction of each user, while all the remaining transactions are put into the training set.

Table 1. Statistics of the datasets used in our experiments.

4.2 Baseline Methods

We evaluate our hybrid model by comparing with several existing methods on basket recommendation:

  • TOP: The top popular items in training set are taken as recommendations for each user.

  • MC: An item-centric model which mines the global CSPs from purchase data, and predict the basket based on the last transaction of the user.

  • FMC: A factorized MC model, which factorizes the global CSP matrix to learn the low dimension representations of items, and predict the basket based on the last transaction of the user.

  • NMF: A state-of-the-art user-centric model based on collaborative filtering [13]. Here Nonnegative Matrix Factorization is applied over the user-item matrix, which is constructed from the transaction dataset by discarding the sequential information. Obviously, this model can be viewed as a sub-model of our approach, since it is exactly the same as the user-centric model described in Sect. 3.3. For implementation, we adopt the publicly available codes from NMF:DTU ToolboxFootnote 5.

For all the latent models, including NMF, FMC and CFSH, we run several times with random initialization by setting the dimensionality \(d\in \{50, 100, 150, 200\}\) on Ta-Feng and BeiRen datasets, and \(d\in \{10,15,20,25\}\) on T-Mall dataset. We set all the regularization parameters to 0.01, and parameter \(\alpha \) used in CFSH equals to 0.6. We repeat these experiments ten times, and compare the average performances of different methods in the following sections.

4.3 Evaluation Metrics

The performance is evaluated for each user u on the transaction \(T^u_{t_u}\) in the testing dataset. For each recommendation method, we generate a list of K items (K=5) for each user u. We use the following quality measures to evaluate the recommendation lists against the actual bought items.

  • F1-score: F1-score is the harmonic mean of precision and recall, which is a widely used measure in recommendation [7, 12, 14].

  • Hit-Ratio: Hit-Ratio is a All-but-One measure used in recommendation [10]. If there is at least one item in the test transaction also appears in the recommendation list, we call it a hit. Hit-Ratio focuses on the recall of a recommender system, i.e. how many people can obtain at least one correct recommendation.

4.4 Performance on Basket Prediction

In this section we compare our hybrid model to state-of-the-art methods in basket recommendation.

Fig. 3.
figure 3

Performance comparison on three datasets.

Figure 3 shows the results on Ta-Feng, BeiRen, and T-Mall respectively. We have the following observations from the results: (1) Overall, the Top method is the weakest. However, we find that the Top method outperforms MC on the T-Mall dataset. By analyzing the dataset, we found that the filling rate of the CSP matrix of the T-Mall dataset is around \(3.7\,\%\), which is much lower than that of the Ta-Feng dataset (\(11.8\,\%\)) and BeiRen dataset (\(15.2\,\%\)). It indicates that the CSPs mined from the T-mall dataset are too sparse for MC to generate reasonable recommendations for users. (2) By either factorizing global sequential patterns, FMC can obtain better results over the MC method. Specifically, we can see obvious improvement in terms of Hit-Ratio, which indicates the factorized methods can cover more users than the original MC method. The improvement becomes larger on the T-Mall dataset which is much more sparse when only considering directly mined CSPs. (3) By factorizing the historical purchase data, the NMF method also outperforms the MC method in most cases, and its performance is between the two factorized item-centric methods. (4) By combining both item-centric and user-centric paradigms, the proposed CFSH method performs best on all the three datasets. Take Ta-Feng dataset as an example, when compared with second best performed baseline method, the relative performance improvement by CFSH is around 16.2 %, 9.8 %, 21.1 % and 16.5 % in terms of Precision, Recall, F1-Score, and Hit-Ratio respectively. All the improvement are statistically significant (p-value < 0.01). The results demonstrate that by factorizing global sequential patterns and historical purchase data simultaneously, we can learn better representations of users and items and thus obtain improved performance on basket recommendation.

To further investigate the performance of different methods, we split the users into three groups (i.e. inactive, medium and active) based on their activeness and conducted the comparisons on different user groups. Take the Ta-Feng dataset as an example, a user is taken as inactive if there are less than 5 transactions in his/her purchase history, and active if there are more than 20 transactions in the purchase history. The remaining users are taken as medium. In this way, the proportions of inactive, medium and active are \(40.8\,\%\), \(54.5\,\%\), and \(4.7\,\%\) respectively. Here we only report the comparison results on Ta-Feng dataset with the dimension equals 50 due to the page limitation. In fact, similar conclusions can be drawn from other datasets. The results are shown in Table 2.

From the results we can see that, not surprisingly, the Top method is still the worst on all the groups. Furthermore, we find that MC, FMC works better than NMF on both inactive and medium users in terms of all the measures; While on active users, NMF can achieve better performance than MC, FMC. The results indicate that it is difficult for NMF to learn a good user representation with few transactions for recommendation. Finally, CFSH can achieve the best performances on all the groups in terms of all the measures. The results demonstrate that by combining both item-centric and user-centric paradigms, we can enjoy the merits of both methods and complement each other to achieve better performance.

Table 2. Performance comparison of different methods on Ta-Feng over different user groups with dimensionality set as 50.

5 Conclusions

In this paper, we proposed a new hybrid recommendation method, namely CFSH, for basket recommendation based on massive transactional data. The major purpose is to leverage the power of both item-centric and user-centric recommendation paradigms in capturing correlations between items and users for better recommendation. By conducting experiments on three real world purchase datasets, we demonstrated that our approach can produce significantly better prediction results than the state-of-the-art baseline methods.

In the future work, we would like to explore other context information, e.g. time and location, for better basket recommendation. Obviously, people’s shopping behavior may be largely affected by these factors. It would be interesting to investigate how these types of information can be integrated into our proposed hybrid method.