1 Introduction

With the wide adoption of mobile devices as well as technological developments, mobile apps have been used widely in our daily lives. In the field of enterprise marketing, mobile marketing has become a prevalent way. As the statistical data [1] tells, mobile marketing is the top priority task for technical users, since it has shown a high ROI; 71% of salesmen believe that mobile marketing is the core of the business; 58% of the companies involved in the survey have a specialized mobile marketing team, and they think having a team of people who are experts in mobile marketing can achieve maximum benefits from marketing efforts. 83% of B2B salesmen say that mobile apps are important to marketing and they plan to spend more money on developing their own marketing apps.

On one side, people enjoy the colorful and convenient life with rich items and information online; on the other side, people are feeling lost in such an online world. They find it very hard to get the required items which they are really interested in quickly and effectively. Therefore, recommender system is proposed to predict users’ preference on items. Recommender systems can be roughly divided into three classes: content-based [2, 3], collaborative filtering [4,5,6,7,8,9] and context-aware approaches [10,11,12,13,14,15]. Content-based filtering approaches generate recommendations by comparing item descriptions with users’ profiles. Collaborative filtering approaches predict users’ preference from that of similar users, by mining a large amount of historical records. Whereas context-aware approaches consider the effect of context factors, such as user’s profile (gender, age, profession and etc.) and natural situation (location, time and etc.), on users’ preference, demand, and the selection and definition of neighbors.

Market researchers have been aware of the correlation between temporal factors and users’ behaviors and begin to employ their correlation to recommender system. There are various kinds of representation forms of such correlation. In this paper, in the context of marketing apps for maternal and child, we mainly consider two kinds of correlations: First, there exists a certain temporal frequency to buy fast-moving consumer goods of life kind, such as maternal vitamins, diapers, milk powder and body shampoo, therefore, we aim to mine the periodic regularity of purchasing a piece or a category of items by a user, and then for the items which the user has bought before, we can predict the next time when he will purchase it again. Second, there exists a certain temporal evolution regularity for parental population, as Figure 1 shows, a pregnant woman may buy folic acid and maternity dress. After the baby is born, they may buy milk powder and diapers. Later when the baby grows up, they may buy toys and books. Therefore, we aim to mine such temporal evolution regularity from the purchasement of several kinds of items by a category of similar users, and then we can estimate users’ current life stage, and then recommend new items that are proper with his life stage.

In this paper, we propose a recommender system by mining consuming behaviors with temporal evolution. We consider two kinds of temporal factors in consuming behavior: periodic regularity of purchasement and demand evolution based on life stage. For mining the periodic regularity of purchasement, first, time series data is generated by ranking the time intervals of every two adjacent records of a user purchasing an item or a kind of items in chronological order. And then a time window is proposed for cutting the time series data into multiple feature vectors as training dataset. Next, a KNN [16] method is utilized to reduce the training set by selecting the top k feature vectors which have a high similarity with the input feature vector. Finally, we can predict the time interval of next purchasement by the user based on SVR [17] model. For life stage based recommendation, the system first mines the mapping model from items to user’s life stage based on multiple-outlier detecting approach. Next, based on the model, users’ current life stage can be estimated from their recent behaviors by dynamic allocation weight algorithm. Finally, new items are recommended to users based on UBN (unweighted bipartite networks) model [18] and Bayesian model [19]. Extensive experiments are conducted with offline data in the mum-baby domain provided by TIAN-CHI platform, and the experimental results show the effectiveness of our proposed recommender system.

The rest of the paper is organized as follows. Section 2 reviews related work. Section 3 describes problem definition and approach overview. Section 4 presents periodic regularity of purchasement based recommendation. Section 5 illustrates the life stage based recommendation. Section 6 analyzes the experimental results. And finally, Section 7 gives conclusions and future work.

2 Related work

This section presents a brief review on three topics relevant to our study: traditional collaborative filtering, recommendations based on sequential behaviors and series prediction.

2.1 Traditional collaborative filtering

Collaborative Filtering (CF) is currently the most popular information filtering technology in recommender system, which generates personalized recommendations based on users’ historical data. In the last few years, with the rapid growth of e-commerce, CF has become a research hotspot and many large e-commerce sites have used CF as an essential tool in their recommender systems, such as recommending books at Amazon [20], news at Google [21], movies at Yahoo [22], and CDs at Netflix [23]. Traditional collaborative filtering methods analyze a great deal of information on users’ behaviors, activities or preferences and predict what users would like based on their similarity to other users. As collaborative filtering methods only rely on the information about user interactions and do not require the content information of items or user profiles, they are widely used in recommender system and studied by many researchers. These methods filter or evaluate items through the opinions of other users [24], which is based on a hypothesis that the user will prefer the items which the user’s similar user preferred in the past [25].

In the literature, collaborative filtering methods can be mainly classified into three types. They are model-based, memory-based and hybrid methods respectively. Firstly, model-based methods generate recommendations by training a model. Algorithms of this type include the matrix factorization [26, 27], the graph-based approaches [28], etc. Secondly, most memory-based methods is first to obtain a set of neighbor users for a particular user according to the entire previously rated items by the users. After obtaining the set, the recommendations for the particularly user are generated based on the items that neighbor users like. These methods are known as user-oriented memory-based methods. Moreover, an analogous procedure, which builds item similarity groups using purchasing history, is known as item-oriented memory-based collaborative filtering [29]. At last, hybrid methods combine two or more types of collaborative filtering techniques to generate recommendations. They are helpful to address sparsity problem, in which external information can be used to predict for new users or new items. In 2004, Zhou D et al. [30] propose a hybrid collaborative filtering approach, which exploits bulk taxonomic information designed for exact product classification to address sparsity problem, based on the generation of profiles via inference of super-topics core and topic diversification. In 2002 Schein et al. [31] propose the aspect model latent variable method for cold start recommendation, which combines both collaborative and content information in model fitting.

2.2 Recommendations based on sequential behaviors

In recent times, several studies have highlighted the effect of time in recommender system [32,33,34,35,36,37,38,39]. In [32], Koren et al. take time information into account and propose the Time SVD++ algorithm to address interests drift problem. In [33], Khoshneshin et al. assign users (products) to different clustering dynamically based on evolutionary co-clustering, which is ready for further recommendation. In [34], it has been shown that user’s historical ratings focus only on one or more aspects of user interest spanning a particular time period. According to this finding, Li et al. thus propose a cross-domain CF framework, which can track user’s interests drift and effectively recommend. In [35], Ren et al. show that user preference patterns and preference dynamic effect has barely been exploited in the existing recommendation system. They thus formalize the user’s preference as a sparse matrix, and use subspace to iteratively model the personalized and global preference pattern. In [36], Xiang et al. propose a Session-based Temporal Graph to simultaneously model users’ long-term and short-term preferences over time. In [37], Rendel et al. present a tensor factorization modeling algorithm bringing both matrix factorization and Markov chain together for next-basket recommendation. In [38], Wang et al. adapt the proportional hazards modeling approach in survival analysis and explicitly incorporate time to recommend a particular product at a particular time. In [39], Peng Jiang et al. propose a new Maximum Entropy Semi Markov Model to segment and label consumer life stages based on observed purchasing data over time.

2.3 Time series prediction

Time series prediction is gaining popularity as a means of studying regularity that data changes with time. Generally, there are two modeling types for time series prediction. They are global modeling and local modeling. The former constructs a model which is independent from user queries, while the latter constructs a model for each different query from the user. Both Wu S F et al. [40] and Martínez-Rego D et al. [41] identified that local modeling is beneficial to improve estimation performance for time series prediction through multiple experiments. In many real-world applications where time series prediction techniques have been used, both Sapankevych N I et al. [42] and Thissen U et al. [43] demonstrate that the SVM outperforms the autoregressive moving average (ARMA) model and in most cases outperforms the best of several Elman neural networks by a comparison of these three methods based on their predicting ability. Huang Z et al. [44] proposed a k-nearest neighbor based least squares support vector machine (LS-SVM) framework. By selecting similar instances (i.e., nearest neighbors) in the training dataset for each testing instance based on the KNN approach, the complexity of training an LS-SVM regressor is reduced significantly and the prediction horizon of SVM is improved by taking into account the KNN. In time series prediction, even if the one-step prediction model is very exact, the iteration procedure would accumulate prediction errors when repeating one-step-ahead prediction, which results in bad prediction performance. Zhang L et al. [45] deals with iterated time series prediction problem by using multiple SVR models, which are trained independently based on the same training data with different targets. In other words, the n-th SVR model performs an n-step-ahead prediction.

Different from the above relevant studies, this paper proposes a new e-commerce recommender system based on mining consuming behavior with temporal evolution, namely periodic regularity of purchasement and demand evolution based on life stage. We have investigated the life stage based recommendation in [46], and now we extend the correlation between temporal evolution factors and users’ consuming behavior with one more kind, i.e., periodic regularity of purchasement. We thus use proposed recommender system to recommend items for users.

3 Problem definition and approach overview

We present the problem definition and an overview of our proposed e-commerce recommender system in this section.

3.1 Problem definition

The notations to be used in the rest of this paper are:

  1. 1.

    U: the set of users.

  2. 2.

    I: the set of items.

  3. 3.

    xu, i, t =  < u, i, t> is a 3-tuple denoting the purchase behavior of a user, where u is the user, i is the purchased item and t is the purchased time.

  4. 4.

    \( {x}_{u,i}(n)={x}_{u,i,{t}_{n+1}}-{x}_{u,i,{t}_n}=<u,i,{t}_{n+1}>-<u,i,{t}_n> \) is the time interval of two adjacent records for user u purchasing item i, and the unit of xu, i(n) is day.

  5. 5.

    su,t: the life stage of user u at time t.

  6. 6.

    Periodic regularity of purchasement: an obvious or potential habit that a user purchases an item.

  7. 7.

    Life stages: a predefined set of phases in life, and each life stage spans a period of time.

The main problems in paper are:

  1. 1.

    The first problem is, given the intervals of user u purchasing item i {xu, i(1), …xu, i(n)}, we aim to investigate the periodic regularity of user u purchasing item i, so as to predict xu, i(n + 1), i.e., the time interval of next purchasement.

  2. 2.

    The second problem is, given all the purchasing records of user u buying various items at various time, we aim to predict the current life stage of the user u from a set of su,t, and then recommend to user u the items that are proper with his current life stage.

3.2 Approach overview

Figure 2 show the architecture of our proposed recommender system based on mining consuming behaviors with temporal evolution, including recommendation based on periodic regularity mining and that based on user’s life stage.

Fig. 1
figure 1

An example in mum-baby domain

Three main steps compose the solution to the first problem. First, in order to get training feature vectors and input feature vector, we divide each sample data into multiple subsequences with a fixed-size time window. Second, KNN is used to select the final training data from candidate feature vectors. Finally, the SVR model is trained and obtained according to the training data and it is used to predict the time interval of next purchasement. We can then recommend the user to buy the item again after the predicted interval.

Three main steps compose the solution to the second problem. First, we convert user’s age information to a corresponding life stage which is defined by experts, and then count and analyze all purchase behaviors of users with age information, to get a mapping model which can tell the probabilities of users at each life stage when given a purchase behavior. Second, based on the mapping model, we propose an algorithm to predict user’s current life stage if his purchase records are without age information. Finally, we can recommend users new items that are proper to his life stage.

4 Recommendation based on periodic regularity mining

In this section, we mainly consider to recommend users the items with periodic regularity of purchasement.

We first calculate the time intervals of every two adjacent records of a user u purchasing an item or a kind of items namely i, i.e. {xu, i(1), …xu, i(t)}. We can deem the set of time intervals as an array, and show it in a picture. For example, Fig. 3 shows 4 sets of time intervals for different <u, i > tuples, where x-axis denotes the array index, and y-axis denotes each time interval. From the upper two pictures of Fig. 3, we can see a straight line and a repeat wave line respectively, and we can easily conclude that there is obvious periodic regularity, however, we cannot see an obvious periodic regularity in the lower two pictures of Fig. 3.

In our dataset, we find that there is periodic regularity in most of purchasing records by statistical analysis. To judge whether there is periodic regularity in a set of time intervals, we transform the problem into a time series prediction problem. Recently time series analysis, especially on time series prediction, has gained much attention of many researchers. In order to solve the time series prediction, many methods have been applied to obtain a nonlinear and dynamic model, such as SVR, K nearest neighbor regression (KNNR). In this paper, we choose KNN + SVR after comparing two methods in the experiment.

Figure 4 shows the solution of to the first problem. The goal is to get the value at time t + 1, namely xu, i(t + 1).

4.1 Time window based data grouping

Consider a time series {xu, i(1), …xu, i(l)}, where l is the length of the time series. As Fig. 5 shows, the SVR model that depends on the past d points has the form

$$ {x}_{u,i}\left(l+1\right)=f\left({x}_{u,i}(l),{x}_{u,i}\left(l-1\right),\dots {x}_{u,i}\left(l-d+1\right)\right), $$
(1)

where d is the size of time window.

For a time series prediction problem, it is necessary to divide the time series {xu, i(1), …xu, i(l)} to be samples which can be accepted by a SVR model. Thus let xj − d + 1 = [xu, i(j), xu, i(j − 1), …, xu, i(j − d + 1)] andyj − d + 1 = xu, i(j + 1), where d ≤ j ≤ l − 1. Each time the time window moves a step forward, we can get a piece of training data (xj − d + 1, yj − d + 1), and finally we can get the training set \( {\bigcup}_{j=d}^{l-1}\left\{{\mathbf{x}}_{j-\mathrm{d}+1},{y}_{j-\mathrm{d}+1}\right\} \). Moreover, in order to get yl − d + 1 = xu, i(l + 1), the input testing feature vector is xl − d + 1. The d influences the final results and we will determine the optimal d in the experiment.

4.2 KNN based group selection

In the previous step, there are l − d pieces of data in the training set. If l is too larger, the training time will be long and much noise data may be considered. Therefore in the second part, KNN is used to reduce the training dataset by selecting k feature vectors which are the most similar to the input testing feature vector xl − d + 1 from the candidate training dataset. Cosine similarity is used to calculate the similarity between the feature vectors.

Assume that the training sample set is \( {\bigcup}_{j=d}^{l-1}\left\{{\mathbf{x}}_{j-\mathrm{d}+1},{y}_{j-\mathrm{d}+1}\right\} \) and testing input feature vector is xl − d + 1. We first calculate its cosine similarity with each candidate feature vector xj − d + 1, denoted as:

$$ \cos \uptheta =\frac{{\mathbf{x}}_{l-\mathrm{d}+1}\cdotp {\mathbf{x}}_{j-\mathrm{d}+1}}{\left\Vert {\mathbf{x}}_{l-\mathrm{d}+1}\right\Vert \left\Vert {\mathbf{x}}_{j-\mathrm{d}+1}\right\Vert } $$
(2)

The feature vectors with the k largest cosine similarities are taken to be the nearest neighbors of the input feature vector. For example, suppose {xu, i(1), …xu, i(l)} = (58, 51, 126, 2, 51, 65) is a time series records. The size of time window is 3. Then we can get (58, 51, 126), (51, 126, 2), (126, 2, 51) as candidate training dataset. And the input feature vector is (2, 51, 65), we can get the cosine distances between input feature vector and candidate training vectors are 0.89, 0.59, 0.33. If k = 2 in KNN, we choose (58, 51, 126) and (51, 126, 2) as the final training feature vectors for the next step.

4.3 Time series prediction based on SVR

After obtaining the k training feature vectors, it is assumed that \( {\bigcup}_{i=1}^k\left\{{\mathbf{x}}_i,{y}_i\right\} \) is the training sample set, and k denotes the number of samples. Given a set of training samples, SVM aims to find the optimal function from the set of hypothesis functions.

$$ \left\{f|f\left(\mathbf{x}\right)={\mathbf{w}}^{\mathrm{T}}\mathbf{x}+b,\mathbf{w}\in {\mathrm{R}}^{\mathrm{d}},b\in \mathrm{R}\right\} $$
(3)

where w is the weight vector and b is the bias term. In order to obtain weight vector w and bias term b, we must minimize the sum of the ε-insensitive loss function:

$$ \frac{1}{2}{\left\Vert \mathbf{w}\right\Vert}^2+C{\sum}_{\mathrm{i}=1}^{\uptau}L\left({y}_{\mathrm{i}},f\left({\mathbf{x}}_{\mathrm{i}}\right)\right) $$
(4)

where C > 0 is the regularization factor, ‖∙‖ denotes the 2-norm, and L(∙, ∙) is the loss function. SVR has different formulations when using different loss functions. Generally, the ε-insensitive loss function is used in SVR and has the form

$$ L\left(y,\mathrm{f}\left(\mathbf{x}\right)\right)=\left\{\begin{array}{c}0,\kern5.5em \\ {}\mid f\left(\mathbf{x}\right)-y\mid -\upvarepsilon, \kern0.5em \end{array}\genfrac{}{}{0pt}{}{\mid f\left(\mathbf{x}\right)-y\mid \le \upvarepsilon}{\mathrm{otherwise}\kern1.25em }\right. $$
(5)

We can convert formula (4) into an equivalent formula (6) by introducing slack variables ξi and \( {\upxi}_i^{\ast } \):

$$ \genfrac{}{}{0pt}{}{\underset{\mathbf{w},\mathrm{b}}{\min}\frac{1}{2}{\left\Vert \mathbf{w}\right\Vert}^2+\complement \sum \limits_{i=1}^{\uptau}\left({\upxi}_i+{\upxi}_i^{\ast}\right)}{\begin{array}{c}\mathrm{s}.\mathrm{t}.\kern0.5em {\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_{\mathrm{i}}+\mathrm{b}-{y}_i\le \upvarepsilon +{\upxi}_i\\ {}\kern1em {y}_i-{\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i-\mathrm{b}\le \upvarepsilon +{\upxi}_i^{\ast}\\ {}\kern1.25em {\upxi}_i,{\upxi}_i^{\ast}\ge 0,i=1,\dots, \uptau\ \end{array}} $$
(6)

It is not easy to seek the solution of formula (6) because of high dimensional-feature space. So we convert it into the dual problem via a kernel function:

$$ {\displaystyle \begin{array}{c}\underset{\upalpha, {\upalpha}^{\ast }}{\min}\frac{1}{2}\sum \limits_{i=1}^{\uptau}\sum \limits_{j=1}^{\uptau}\left({\upalpha}_i-{\upalpha}_i^{\ast}\right)\left({\upalpha}_j-{\upalpha}_j^{\ast}\right)k\left({\mathbf{x}}_{i,}{\mathbf{x}}_j\right)-\sum \limits_{i=1}^{\uptau}\left({\upalpha}_i-{\upalpha}_i^{\ast}\right){y}_i+\upvarepsilon \sum \limits_{i=1}^{\uptau}\left({\upalpha}_i+{\upalpha}_i^{\ast}\right)\\ {}\mathrm{s}.\mathrm{t}.\kern2.25em \sum \limits_{i=1}^{\uptau}\left({\upalpha}_i-{\upalpha}_i^{\ast}\right)=0\kern4em \\ {}0\le {\upalpha}_i,{\upalpha}_i^{\ast}\le \complement, \kern0.75em \mathrm{i}=1,\dots, \uptau, \end{array}} $$
(7)

where \( {\upalpha}_i,{\upalpha}_i^{\ast } \), i = 1,…, τ are Lagrange multipliers. Three of the most commonly used kernel functions include Gaussian radial basis function (RBF) kernel, linear kernel and polynomial kernel. The experimental results of those three kernel functions reveal the best performance of RBF kernel function:

$$ k\left({\mathbf{x}}_{i,}{\mathbf{x}}_j\right)=\exp \left(-\upgamma {\left\Vert {\mathbf{x}}_i-{\mathbf{x}}_j\right\Vert}^2\right) $$
(8)

Namely, we have the regression function with the form

$$ f\left(\mathbf{x}\right)={\mathbf{w}}^{\mathrm{T}}\mathbf{x}+\mathrm{b} $$
$$ ={\sum}_{i=1}^{\uptau}\left({\upalpha}_i-{\upalpha}_i^{\ast}\right)\exp \left(-\upgamma {\left\Vert {\mathbf{x}}_i-{\mathbf{x}}_j\right\Vert}^2\right)+\mathrm{b}, $$
(9)

First, SVR maps original nonlinear data into a high-dimensional space with linear feature. And kernel function is used for solving optimization problem with a loss function. Then the prediction problem can therefore be cast as a linear regression problem in a high-dimensional space. In training phase, we are aimed at identification of the appropriate parameters for our prediction model, namely Cε, γ. Finally the next purchase interval can be predicted using the well trained SVR.

5 Life stage based recommendation

We have introduced the architecture of life stage based recommendation in Fig. 2. Next, we will describe its detailed process including mapping model, prediction of life stage and recommendation in the following.

Fig. 2
figure 2

The framework of temporal consuming behavior based recommendation

Fig. 3
figure 3

Some examples of periodic regularity of purchasement

Fig. 4
figure 4

The solution of first problem based on KNN + SVR

Fig. 5
figure 5

Time window

Fig. 6
figure 6

The purchasing quantity of 4 items changes with life stage

Fig. 7
figure 7

Two different cases for Su,t and Su,now

Fig. 8
figure 8

The working process of un-weighted bipartite networks

Fig. 9
figure 9

The experimental results of determining SVR model

Fig. 10
figure 10

The experimental results of PPRR

Fig. 11
figure 11

The experimental results of PLSP

Fig. 12
figure 12

The comparison of three models

5.1 Mapping model for life stages

The first step of life stage based recommendation is to construct a mapping model between items and the life stages at which an item is most probably bought. Given all the relevant purchasing behaviors on item i, we first aim to count the purchasing probability distribution of item i at each life stage.

Table 1 shows the mapping model, where M is the number of items, N is the number of life stages, and Zi, j is purchasing probability of item i at life stage j.

Table 1 Mapping Matrix ZM × N

A crucial aspect is how to calculate all the elements in the mapping matrix. We consider several approaches to constructing the mapping model, such as logistic regression, Student’s t test, and Grubbs’ test model. Inspired by Student’s t test and Grubbs’ test, we use Eq.10 as a key judgement condition:

$$ \frac{Q_{i,j}-\min \left({Q}_i\right)}{\mathrm{std}\left({Q}_i\right)}\ge \beta, $$
(10)

where Qi, j denotes the purchasing quantity of items i at the life stage sj, Qi = {Qi, 1, Qi, 2, …Qi, N}, min(Qi)is the minimum of Qi, and std(Qi) is the standard deviation of Qi.

The rule for calculating Zi, j is, find all the Qi, j that satisfy Eq.10 and suppose they form a set Qi, then Zi, j = 0 if Qi, j ∉ Qi, else Zi, j = Qi, j/ ∑ Qi, j where Qi, j ∈ Qi. Please note we only consider the Qi, js that satisfy Eq.10 to ensure the sum of each row is 1.

The value of β influences the number of non-zeros in a row. If β is too small, there are more non-zeros entries in a row, which means the item may be bought at several life stages and there is no clear mapping relation between the item and the life stages at which it is most probably bought. On the other hand, If β is too large, some possible life stages may be ignored. The value of β determines the mapping model, and the mapping model further influences the accuracy of predicting users’ current life stage. So we must adjust the value of β by experiment to minimize the error of predicting users’ current life stage in the next step.

A case example is provided to illustrate how to construct mapping model. The sample data in Table 2 are a part of our experimental dataset. The original data of a purchasing behavior is labeled with the baby’s age, and we have to transfer the labels into life stages first. We transfer the labels according to Table 3. The candidate life stages, as well as the mapping relations between life stages and age groups of babies are provided by marketing experts based on baby product standards and their domain knowledge. We do not consider life stage 0 in Table 3, and we denote the life stage set as L = [1, 2, 3, 4, 5].

Table 2 Purchase distribution of five items
Table 3 Life stages for Mom-baby domain

For each item, we count its purchasing quantities at each life stage. Table 3 shows the purchasing quantities of five items at each life stage as an example.

In order to show the purchasing distribution more visually, we normalize the purchasing quantities and show them as Fig. 6. The x-axis denotes the life stage and the y-axis denotes the normalized purchasing quantities.

Taking above five items as an example, according to the proposed rule for mapping model construction, we can get a mapping matrix ZM × 5 as shown in Table 4. Since the number of L is 5, the number of columns of the mapping matrix is 5.

Table 4 Mapping matrix in Mom-Baby domain

When constructing mapping model, we find many users without age information in the dataset. Therefore, we will next predict the life stage of a user without age information according to his purchasing records and mapping model.

5.2 Prediction model for User’s current life stage

For users without age information, we cannot directly transfer the label of baby’s age into their life stage, so we need to predict their life stage based on their recent behaviors. We consider user’s n latest different purchasing records. Since we have achieved the mapping model between items and the corresponding life stages from the last step, and each item may be mapped to several life stages, the key problem is how to predict a user’s life stage by allocating weights to his n latest purchasing records.

In our solution, we first initialize the weight vector W1 × nto [1/n,1/n,…,1/n], and then adjust weights of each record according to the number of non-zeros of the purchased item of the record in the mapping matrix Zm × 5. If there is only one non-zeros value in a row, this item is mapped to only one life stage, i.e. the item is only purchased in this life stage. So the more zeros a row has, the more obvious the mapping result is, therefore, we increase the weight of the items, i.e., the rows with more 0, since once we observe the purchasing behavior with the item, it will be more definitely to tell the life stage of the user; otherwise we decrease the weight. On the other hand, if there are multiple items that can be mapped to one same life stage in n records, we increase their weights, since the items together provide more powerful evidence to tell the life stage of the user.

The Pseudo-code of allocating weights to each records is shown as follows:

figure a

In line 1–3, w is the total weight which is initialized to 1 and we first update the weight of items that need to decrease the weight. Length() is a function that counts the number of non-zeros in a row namely \( {\left\{{Z}_{i,j}\right\}}_{j=1}^5 \). w is then updated to the remnant weight in line 4–6. In line 7, n1 is the number of rows where there are minimum number of non-zeros. In line 8–13, we update the weight of items whose value vector \( {\left\{{Z}_{i,j}\right\}}_{j=1}^5 \) is unique and the remnant weight w, and at last the remnant weight w is equally allocated to the n2 rows. In line 3 and line 12, ϑ and θ adjust the speed of weight updating. The smaller the ϑ and θ are, the faster the weight updates. We set ϑ = 3 and θ = 6 to ensure the weights are allocated optimally according to repeated test.

An item may be mapped to multiple life stages according to proposed mapping methods. In order to multiply the mapping vector with the allocated weight vector W1 × n to get the life stage by considering user’s n records, we need to ensure that each item is mapped to only one life stage. According to the mapped life stage at which Zi, j is the closest to the mathematical expectation value, we have:

$$ {\mathrm{Map}}_{n\times 1}=\left\{\min \Big(|j-{\sum}_{j=1}^5\left({\mathrm{Z}}_{i,j}\times j\right)|\Big)\right\}. $$
(11)

Then we can get the predicted value of the life stage su, t using formula (12):

$$ {s}_{u,t}={\mathrm{W}}_{1\times n}\cdotp {\mathrm{Map}}_{n\times 1}. $$
(12)

Each purchased record corresponds to a specific time point. In order to get a corresponding time point for the calculated life stage su, t, we propose the formula (13):

$$ t=\left\lfloor {\mathrm{W}}_{1\times n}\cdotp {{\mathrm{Y}}_{1\times n}}^{\mathrm{T}}\right\rfloor \left|\left\lfloor {\mathrm{W}}_{1\times n}\cdotp {{\mathrm{M}}_{1\times n}}^{\mathrm{T}}\right\rfloor \right|\left\lfloor {\mathrm{W}}_{1\times n}\cdotp {{\mathrm{D}}_{1\times n}}^{\mathrm{T}}\right\rfloor, $$
(13)

where Y1 × n is a vector of years in n records, M1 × n is a vector of n months, and D1 × n is a vector of n days. We calculate the three parts, i.e. the year, month, day separately and combine the three parts using a symbol ∣. Given the estimated life stage su, t and its corresponding time point, in order to predict the user’s current life stage su, now, we have:

$$ {s}_{u, now}=\left\lfloor {s}_{u,t}+\left({\mathrm{T}}_{\mathrm{now}}-\mathrm{t}\right)/{\mathrm{D}}_{s_{u,t}}\right\rfloor, $$
(14)

where\( {\mathrm{D}}_{s_{u,t}} \) is the duration of life stage su, t, Tnow is the current time point, and the unit of (Tnow − t) and \( {\mathrm{D}}_{s_{u,t}} \)is month.

We continue to use the example in the last step. Assume that we have a user u without age information and consider his 5 latest purchased records of different items. His purchasing behavior sequence is X = [(u, 50,006,042,20,150,918),(u,50,007,011,20,151,001),(u,50,012,370,20,151,005,),(u,50,018,436,20,151,115,),(u,50,023,606,20,160,118,)] and corresponding mapping matrix Z5 × 5 is the first 5 lines of Table 4.

Then the initialized weight matrix is W1 × 5= [1/5,1/5,1/5,1/5,1/5]. According to the algorithm of dynamically allocating weights, we first calculate W1, 1 = 1/5 − (1 − 1)/(3 ∗ 5) = 1/5, W1, 2 = 1/5 − (2 − 1)/(3 ∗ 5) = 2/15, W1, 3 = 2/15, W1, 4 = 1/15, W1, 5 = 1/5, w = 1 − 2/15 − 2/15 − 1/15 = 2/3,  the number of rows where there are minimum number of non-zeros is 2, and the number of the same rows in these 2 rows is 2. Next, we get W1, 1 = w/2 = 1/3, W1, 5 = w/2 = 1/, namely the allocated weight vector is W1 × 5 = [1/3, 2/15, 2/15, 1/15, 1/3].

According to Eq.11, we getMap5 × 1 = [4, 2, 3, 3, 4]T. According to Eq.12, the calculation process is su, t = 4 × 1/3 + 2 × 2/15 + 3 × 2/15 + 3 × 1/15 + 4 × 1/3 = 3.53, namely this user is in the middle of the third life stage when he purchased the 5 items. According to Eq.13, the calculation process is t = ⌊2015 × 3/10 + 2015/15 + 2015/5 + 2015 × 3/10 + 2016 × 2/15⌋ ∣ ⌊9 × 3/10 + 10/15 + 10/5 + 11 × 3/10 + 1 × 2/15⌋ ∣ ⌊18 × 3/10 + 1/15 + 5/5 + 15 × 3/10 + 18 × 2/15⌋ = 20150610. Tnow=20160121, then we calculate: su, now = ⌊3.53 + 7/6⌋=4, where 6 is the duration of life stage 3 according to Table 3.

5.3 Recommend new items based on life stage

Once we have estimated su, now, we can recommend the items that are appropriate to his current life stage su, now. By considering user’s n latest different purchased behaviors, the current life stage su, now may be the same as su, t, or may be in the following of su, t, therefore, we propose two recommendation strategies to differentiate the two cases.

Figure 7 shows the two different cases. We use un-weighted bipartite networks (UBN) model for (1) and Bayesian model for (2), because in the first situation, there are several records in the same life stage as a foundation for recommendation, but in the second, there are none for reference.

(1) su, now is the same as su, t: we adopt an un-weighted bipartite networks model to present the relations between users and items, which is represented by an adjacency matrix W = {wij} where wij = 0 if there is no edge between user Ui and item Ij, and wij = 1 otherwise. Assume that each item has a unit resource, a user distributes its resource to all his neighboring users, and then each user redistributes the received resource to all his/her purchased items. Accordingly, the resource that item Im has received from item In is

$$ {R}_{mn}=\frac{1}{k\left({I}_n\right)}{\sum}_{i=1}^U\frac{w_{im}{w}_{in}}{k\left({U}_i\right)}, $$
(15)

where k(In) is the degree of In, namely the number of users who have purchased Ink(Ui) is the degree of Ui, namely the number of items the user Ui purchased. This process can be expressed by the matrix form \( {\overrightarrow{f}}^{\prime }=R\cdotp \overrightarrow{f} \), where R = {Rmn}, \( \overrightarrow{f} \) is the initial resource vector on items, and \( {\overrightarrow{f}}^{\prime } \) is the final resource vector. Given the target userUi, the corresponding initial resource vector is defined as

$$ {f}_m^i={w}_{im} $$
(16)

According to the resource-allocation process discussed before, the final resource vector \( {\overrightarrow{f}}^{\prime } \) is

$$ {f^{\prime}}_m^i={\sum}_{n=1}^J{R}_{mn}{f}_n^i={\sum}_{n=1}^J{R}_{mn}{w}_{in} $$
(17)

Figure 8 provides a working process of un-weighted bipartite networks, where squares and circles present items and users, respectively. Red circle denotes target user and blue circles denote the others. Items purchased by target user and others at life stage su, t denote red squares and blue squares, respectively. The left side of Fig. 8 depicts initial resource distribution and the right side presents final resource distribution. Then we can rank all items that hasn’t been purchased by target user in life stage su, now in descending order of final resource values, and recommend those items with the highest values to target user.

(2) su, now is in the following of su, t: we adopt a Bayesian model to calculate the probability of a user purchasing an item at su, now, i.e. the joint probabilityP(i, su, now), where i is the item. Let P(su, now| i) be the conditional probability of making the purchasement at su, now, given that the user has purchased item i. Let P(i) be the probability of user purchasing item i. Based on the chain rule, we have:

$$ P\left(i,{s}_{u, now}\right)=P\left({s}_{u, now}|i\right)P(i). $$
(18)

Then we can sort all purchased items at su, now in descending order of P(i, su, now), and recommend those items with the highest values to target user. It is easy to understand that the most popular items at su, now are recommended.

We continue to use the example in the last step and we have su, t = 3, su, now = 4.It is belonged to the second case. As Table 5 shown, taking the five new items that are mapped to the fourth life stage as an example, then we can get\( P\left(121452056,4\right)=\frac{16174}{45907}\times \frac{45907}{3415871}=0.00047,P\left(50010557,4\right)=0.0093,P\left(50012359,4\right)=0.0076,P\left(121408024,4\right)=0.0026,P\left(50008847,4\right)=0.0016 \). So our final recommended list is [50,010,557, 50,012,359, 121,452,056, 121,408,024, 50,008,847]. If su, t = 3 and su, now = 3, it is belonged to the first case. First, we find all users who have purchased the five items from our dataset, and then find all items that those users purchased in previous step, we can therefore construct an un-weighted bipartite network similar to Fig. 8. Finally, a recommended list can be obtained according to the working process of the network. The final result is omitted here.

Table 5 Purchase distribution of five new items

6 Experiments

In this section, we first introduce the experimental setup. And then, for the solution to the first problem, we determine the parameters of SVR model and analyze the effect of the size of time window d and the number of nearest neighbors k. Meanwhile, we compared the KNN + SVR with the other two methods: KNNR and SVR. For the solution to the second problem, we analyze the effect of β and m on life stage prediction and the effect of μ on recommending new items by using Two-way Analysis of Variance (ANOVA) [47], Meanwhile, we compare the our proposed recommendation model with the other two models: UBN and Bayesian.

6.1 Experimental setup

To evaluate our proposed recommender system, we adopt the whole mum-baby dataset from TIANCHI, where those data was processed anonymously. Table 6 shows the description of dataset, where bolded columns are used in our experiment.

Table 6 The description of dataset

For each given purchasing record, it can be written as a 3-tuple <user_id, item_id, day>. Due to data sparsity problem in original dataset, we use parent category that an item belongs to instead, i.e., the 3-tuple <user_id, cat_id, day > .

In order to evaluate the precision of recommendation based on periodic regularity mining (PPRR), we divide all the time interval sets in the form of {xu, i(1), …xu, i(n)} into two parts. One part is the last time interval xu, i(n), which is used to compare with the predicted result. We then predict xu, i(n) based on the remainder part, i.e. {xu, i(1), …xu, i(n − 1)}, as training set to train the SVR model. Then we evaluate PPRR by verifying whether we predict xu, i(n) correctly, namely we have

$$ \mathrm{PPRR}=\frac{{\mathrm{N}}_{\mid {\mathrm{T}}_{\mathrm{pre}}-{\mathrm{T}}_{\mathrm{real}}\mid \le 0.1\ast {\mathrm{T}}_{\mathrm{real}}}+{\mathrm{N}}_{\mid {\mathrm{T}}_{\mathrm{pre}}-{\mathrm{T}}_{\mathrm{real}}\mid \le 7}}{{\mathrm{N}}_{\mathrm{U}}}, $$
(19)

where\( {\mathrm{N}}_{\mid {\mathrm{T}}_{\mathrm{pre}}-{\mathrm{T}}_{\mathrm{real}}\mid \le 0.1\ast {\mathrm{T}}_{\mathrm{real}}} \)is the number of predictions whose difference with real value Treal is less than or equal to ± 0.1*Treal. The criteria can stand for large Treal values, but fails for small ones. For example, ifTpre = 5, and Treal = 6, the case does not satisfy \( {\mathrm{N}}_{\mid {\mathrm{T}}_{\mathrm{pre}}-{\mathrm{T}}_{\mathrm{real}}\mid \le 0.1\ast {\mathrm{T}}_{\mathrm{real}}} \), but it should be a correct prediction. Because the shortest period we consider here is one week, and there is only 9.2% of time intervals which is less than 7 days, so, we think it is a correct prediction when ∣Tpre − Treal ∣  ≤ 7 . Therefore, we add \( {\mathrm{N}}_{\mid {\mathrm{T}}_{\mathrm{pre}}-{\mathrm{T}}_{\mathrm{real}}\mid \le 7} \) in formula (19).

For evaluating the precision of life stage prediction (PLSP), we divide dataset into two parts randomly. 80% are as the training sets, while 20% are as the test sets. The training set is used to learn the β、n in the formula. We evaluate PLSP by comparing a user’s real life stage and the predictive life stage, namely we have

$$ \mathrm{PLSP}=\frac{{\mathrm{N}}_{{\mathrm{S}}_{\mathrm{pre}}={\mathrm{S}}_{\mathrm{real}}}}{{\mathrm{N}}_{\mathrm{U}}}, $$
(20)

where\( {\mathrm{N}}_{{\mathrm{S}}_{\mathrm{pre}}={\mathrm{S}}_{\mathrm{real}}} \) is the number of users whose predictive life stage Spre equals to their real life stage Sreal in the test sets, NU is the number of users in the test sets.

To evaluate the precision of life stage based recommendation (PLSR), each user’s purchasing records will be classified into two parts according to the ascending order of their purchasing time. 20% of the data are removed to form a test set, and the remaining 80% of the data form a training set. We first use the training set to train parameterμ, and then examine the precision of PLSR through verifying whether purchased items of target user can be matched to the items we recommend, namely we have

$$ \mathrm{PR}=\frac{{\mathrm{N}}_{{\mathrm{L}}_{\mathrm{test}}\cap {\mathrm{L}}_{\mathrm{rec}}\ne \Phi}}{{\mathrm{N}}_{\mathrm{U}}}, $$
(21)

where \( {\mathrm{N}}_{{\mathrm{L}}_{\mathrm{test}}\cap {\mathrm{L}}_{\mathrm{rec}}\ne \Phi} \) is the number of users whose list of purchased records Ltest in the test sets and recommended list Lrec have intersection.

Due to the length limit of the paper, we do not describe the calculation formulas of Two-way ANOVA. In our Two-way ANOVA, we set α = 0.05.

6.2 Determine the parameters of SVR model

In this part, we determine which kernel function is the most suitable and the parameters of the selective kernel function.

Each parameter of SVR code we used has a default value that is C = 1e0, Ɛ = 1e-1, γ=0, degree = 3. γ is the parameter of RBF kernel. Degree is the parameter of polynomial kernel. C and Ɛ are the common parameters of three kernel functions. Ensuring else parameters are default value, we first adjust C from 1e-4 to 1e4 to obtain experimental results as shown in Fig. 9(a), then adjust Ɛ from 1e-4 to 1e4 to obtain experimental results as shown in Fig. 9(b). According to Fig. 9(a) and 9(b), we can see that the performance of RBF kernel function is better than poly kernel and linear poly in general. When >1e0, PPRR decreases with the growth of C. When C ≤ 1e0 , PPRR increases slowly with the decrease of C and when C≤1e-1, PPRR is no longer changed. So we get appropriate value of C is 1e-1. Similarly, When Ɛ > 1e0, PPRR decreases with the growth of Ɛ and when Ɛ > 1e2, PPRR is no longer changed. When Ɛ ≤ 1e0 , PPRR decreases slowly with the decrease of Ɛ and when Ɛ≤1e-1, the value of Ɛ is no longer changed. So we get appropriate value of Ɛ is 1e0.

Next, we continue to determine the parameter γ of RBF kernel function, and Fig. 9(c) displays the experimental results. It shows that PPRR increases slowly with the growth of γ when γ ≤ 1e2, but PPRR is no longer changed with the growth of γ when γ > 1e2. We get that most appropriate value of γ is 1e2.

6.3 Analysis of periodic regularity of purchase Recommen-dation

In this section, based on the determined SVR model in the previous section, we continue to determine the number of the nearest neighbors k of KNN + SVR (RBF) method. And then determine the size of time window d, and compare the performance of the three methods: KNNR, SVR (RBF), KNN + SVR (RBF).

We set d = 2, k as a variable (5 to 35) to obtain experimental results as shown in Fig. 10(a). It shows that PPRR increases with the growth of k when k ≤ 25, but PPRR decreases with the growth of k when k > 25. So the most suitable value of k is 25.

Next, we determine the size of time window d, as shown in Fig. 10(b), all three lines reach its peak at d = 2. It means, when most user purchases the same item again, he/she considers the last two purchased time. Namely, users’ last two purchased time has the greater influence on next time of purchasement. As shown in Fig. 10(b), the performance of KNN + SVR (RBF) is better than KNNR and SVR (RBF) in general. By combining the conclusions of section B and C, we can get that when C = 1e-1, Ɛ = 1e0, γ = 1e2, k = 25 and d = 2, the performance of KNN + SVR is the best and the prediction precision PPRR = 30.68%.

6.4 Analysis of life stage prediction

In the second problem, we aim to find the appropriate β in formula (10) and n so that the prediction of life stage is the most accurate. n indicates the number of user’s purchasing records that we observe.

Figure 11(a) presents the impact of regulatory factor β, where x-axis is the number of records and y-axis is the precision of life stage prediction. It is easy to see that whenβ = 0.5, the prediction precision is the minimum. When β = 1.0, β = 1.5, β = 2.0, the precision is similar. But when β = 2.0, the trend of precision is the most stable, so we take β = 2.0 as the optimal value.

The impact of n is shown in Fig. 11(b). When n = 10 and β ≠ 0.5, the prediction precision achieves to the maximum and its trend is the most stable. Therefore, we use the latest only 10 records to predict the life stage of a user. When β = 2.0 and n = 10, the prediction precision is 77.2%.

In our experiment, β and n are both studied according to Fig. 11(a), where factor β and n have 4 and 16 levels, respectively. In all, there are 64 cells. Because there only one observation in each cell, we do not consider the interaction between two factors. The Two-way ANOVA table of Fig. 11 is shown in Table 7,

Table 7 The Two-Way Anova Table of Fig. 11

where SS denotes the sum of square, DF denotes the degree of freedom, MSS denotes the means sum of squares, F denotes the F-Value. According to Table 7 Fβ = 174.95840 > F(3, 45) = 2.81154 and Fm = 3.91805 > F(15, 45) = 1.89488, so the two factors have statistically significant effect on PLSP.

6.5 Analysis of life stage recommendation

In order to observe the impact of μ, which denotes the length of recommendation list, on the precision of recommendation, we also adjust different μ as shown in Fig. 12.We compare the UBN model only, the Bayesian model only and the hybrid model stated in Section V.C.

Figure 12 shows that with μ increases, the precision of recommendation improves slowly, because a larger μ means a longer recommendation list. However, it will lose the original meaning of recommendation when μ is too large. So we set μ = 15. From the experimental results, we can know that recommendation based on hybrid model is more effective than pure UBN model and Bayesian model, and its precision can achieve to 11.6%.

Similarly, the categories of model c and μ are both studied according to Fig. 12, where factor c and factor μ have 3 and 5 levels, respectively. In all, there are 15 cells.

Two-way ANOVA table of Fig. 12 is shown in Table 8. Fc = 5.94186 > F(2, 8) = 4.45897 and Fμ = 40.43605 > F(4, 8) = 3.83785, so the two factors have statistically significant effect on PLSR.

Table 8 The Two-Way Anova Table of Fig. 12

7 Conclusions

In this paper, we focus on mining consuming behaviors with temporal evolution to make accurate recommendation in mobile marketing apps for maternal and child. We consider two kinds of temporal factors in consuming behavior: periodic regularity of purchasement and demand evolution based on life stage. We first propose to mine the periodic trends of users’ consuming behavior from historical records, and predict the next time when a user re-purchases the item, so as to recommend some items that users have purchased before at proper time. Second, we aim to find the regularity of users’ purchasing behavior during different life stages and recommend the new items that are needed and proper for their current life stage.

For mining the periodic regularity of purchasement, first, a time window is proposed for dividing time sequence data into plenty of feature vectors as training dataset, next, a KNN approach is utilized to reduce the training dataset, finally, the time interval of user’s next purchasement can be predicted based on SVR model. For life stage based recommendation, at the beginning, the system mines the relations between user’s life stage and items. Then, according to mined relation model, it can predict user’s current life stage using dynamic allocation weight algorithm. Eventually, according to the un-weighted bipartite networks (UBN) model and Bayesian model, new appropriate items will be recommended to target user. The experimental results show that the proposed method is reasonable and effective.

This is the first step towards mining consuming behaviors with temporal evolution in mobile marketing apps for maternal and child. Furthermore, we will apply the recommendation approach into apps in other fields, such as pet feeding, health keeping and wedding. In addition, there are some other factors we can add into our system, such as item popularity and item property, to make the recommendations more accurate.