1 Introduction

Nowadays, people are influenced to a large extent by the massive (and overwhelming) quantity of information that has become available due to rapid development of the Internet and Information Technology (IT), which is known as the information overload problem [3]. Consequently, it is becoming increasingly difficult for users to find the information they really need. Therefore, recommender systems have been proposed to help users find the contents that they want, such as research articles [8], point-of-interest [20, 33], questions [4] and music [23, 24]. Existing recommendation methods include collaborative filtering-based recommendations [12, 30], content-based recommendations [16], social network-based recommendations [1, 13] and hybrid recommendation [7].

For many real-world applications, such as listening to music and game playing, users usually perform a series of actions within a period of time, forming behavior sequences. Such behavior sequences can be used to discover users’ sequential patterns and to predict users’ next new action (or item), which is called sequential recommendation, one of the typical applications of recommender systems. Traditional sequential recommendation models are mainly based on sequential pattern mining [32] and Markov Chain (MC) models [5]. The advent of deep learning has significantly boosted the performance of sequential recommendation [22, 31]. For example, Recurrent Neural Network (RNN) has been successfully applied in sequence modeling and next-item prediction/recommendation [9]. Furthermore, as a variant of RNN, Long Short-Term Memory (LSTM) network solves the problem of gradient disappearance in RNN and it provides better recommendation results. However, RNN and LSTM only focus on relative order information relating to items in sequence, and they ignore some important sequential information. In particular, items (or actions) that are close to one another in behavior sequence have strong correlations (although, in reality, this rule is not always true in sequential recommendation because users may have some haphazard behaviors which do not indicate their actual preferences).

In this paper, we propose to model the sequences of item information, interval information and duration information in users’ behavior sequences in order to predict their next new items (or actions). Specifically, the next item refers to a new item that appears in other users’ historical records, rather than the behavior sequence of target users. For example, for a user’s event sequence {A,B,A,C}, we will use {A,B,A} to predict “C” instead of {A,B} to predict “A”. This is because “A” has already appeared in the sequence, and repeated predictions are meaningless. In this sequence, “C” is a new item. Obviously, in sequential recommender systems, it is more meaningful (and also more difficult) to recommend items that users may be interested in but which they have not yet interacted with, which are called new items for users. The scenario of this work is shown in Fig. 1. The length of time (time interval) between two adjacent items indicates the correlation between them. In other words, time information, in addition to order information, provides rich context for modeling users’ dynamic preferences. Furthermore, the duration of users’ actions or behaviors is also related to their preferences for corresponding items. For example, a user may be very interested in a game (or similar games) if he/she plays it for a very long time (duration).

Fig. 1
figure 1

Sequential behaviors with intervals and durations. ik represents the kth item in the sequence; Δtk represents the interval between ik and ik+ 1; and dk represents the duration of ik

In order to make better use of sequential patterns as well as time information, we present a novel recommendation model, namely the Interval- and Duration-aware LSTM with Embedding layer and Coupled input and forget gate (IDLSTM-EC). Specifically, an interval gate and a duration gate are firstly introduced to preserve users’ short-term and long-term preferences. Time information from the two gates is then seamlessly combined to improve next-new-item recommendation. An embedding layer is used after the input layer to incorporate more important information, such as the global context and long-term memory. The main contributions of this paper are listed as follows:

  • We propose a novel next-item recommendation model, which can make better use of important time information (such as interval and duration) in time sequences;

  • We further improve the model’s effectiveness and efficiency by adding the embedding layer and coupled input and forget gate;

  • Experimental results on real-world datasets show that the proposed model outperforms the state-of-the-art baselines and can handle the problem of data sparsity effectively.

The rest of this paper is structured as follows: related works are introduced in Section 2. We then illustrate the motivation for this work with data statics and analysis in Section 3. Section 4 discusses the proposed methods in detail, and Section 5 demonstrates the experimental results and analysis. Finally, Section 6 concludes the paper and outlines future work.

2 Related works

2.1 Traditional sequence models

Traditional sequential models can be further divided into sequential pattern mining [32] and Markov Chain (MC) models [5]. Recommender systems based on sequential pattern mining firstly mine frequent patterns of sequence data and then recommend via sequential pattern matching. For the sake of efficiency, these models may filter some infrequent but important patterns, which limits the recommendation performance, especially in terms of coverage. Markov Chains (MC) methods are also used for sequence modeling. The main idea of such sequential recommendation models is to use the MC to model the probability of users’ interaction events in the sequence and then predict the next event based on probability. Specifically, the MC model assumes that the current user’s interaction depends on one or more recent interaction events. Therefore, it can only capture local information on the sequence, and it ignores global information relating to the sequence. Rendle et al. proposed a Factorized Personalized Markov Chains (FPMC) model [19] and introduced an adaption of the Bayesian Personalized Ranking (BPR) [18] framework for sequential data modeling and recommendation. However, the MC model mainly focuses on the relationships between items in the short term, and it is not able to incorporate important information in long sequences.

2.2 Latent representation-based sequential models

Latent representation models learn the potential representations of users or items, which contain some latent dependencies and features. The main categories in the latent representation model are the factorization machine [10] and the embedding model [26]. Sequential recommendation methods based on the factorization machine usually use matrix factorization to factorize the observed user-item interaction matrix into potential vectors of users and items. Nisha et al. [14] used network representation learning methods to capture implicit semantic social information and improve the performance of recommender systems. Wang et al. [25] proposed a Hierarchical Representation Model (HRM) based on users’ overall interests and final behaviors. Pan et al. [15] combined factorization and neighborhood-based methods, and proposed a novel method called matrix factorization with multiclass preference context (MF-MPC). Shi et al. [20] used the factorization machine to construct a recommendation model, which effectively reduces model parameters and improves the recommendation performance. Yu et al. [33] used information based on users’ context behavior semantics with the Point-of-Interest (POI) recommendation model to solve the data sparsity problem. However, sequential recommendation methods based on factorization are easily affected by sparse observation data. The sequential recommendation model based on the embedded model usually maps all user interactions in the sequence into a potential low-dimensional space through a new coding method. The embedding model is used in many fields, such as word2vec and GloVe (global vectors) [17]. Among these, the vector obtained by embedding the model is usually used for the input of neural networks. It should be noted that the representation vector is obtained by the order of interaction between users or items, which is completely different from the vector in collaborative filtering. Embedding models make the models tend to use global information rather than local information.

2.3 Deep learning-based sequential models

In recent years, the most commonly used deep learning method in sequential recommendation has been that based on Recurrent Neural Networks (RNN). These are well suited for modeling complex dynamics in sequences due to their special structure [11, 21, 35]. Zhang et al. [35] proposed a novel framework based on RNN which can model user sequence information through click events. Twardowski et al. [21] combined context information to propose a recommender that can handle the long-term and short-term interests of users in the news domain. Hu et al. [11] proposed a neural networks model using item context to better model the purchasing behaviors of users. In order to improve the session-based recommender system, Wang et al. [27] designed effective Mixture-Channel Purpose Routing Network (MCPRN) and improved the accuracy and diversity of recommendations. Yu et al. [34] proposed a new sequential recommendation model, namely SLi-Rec, by combining the traditional RNN structure and matrix factorization techniques. However, SLi-Rec does not incorporate the long-term interests of users in the neural network model. Wu et al. [28] proposed a long- and short-term preference learning model (LSPL) that considers both long-term and short-term interests. Specifically, LSPL uses LSTM to capture sequential patterns and learn sequence context information. We have compared traditional methods in details, and their strengths and weaknesses are listed in Table 1.

Table 1 Strengths and weaknesses of traditional methods

3 Data analysis and motivation

Some studies [28, 36] play an important role in user preferences modeling and can effectively improve the recommendation performance. In this section, we will introduce the time information in the experimental data and further analyze the time information. We will present a case using a game dataset to explain the role of time interval and duration information, and we will then present some samples of game-playing data in Table 2. For example, Record 1 in Table 2 indicates that user ID No.386576 played “League of Legends” on 2016-09-01 at 01:00:47 (timestamp) for 8700 seconds (duration).

Table 2 Examples of game-playing data

The interval represents the difference between two adjacent timestamp records for the same user. For example, the interval between Record 2 and Record 1 in Table 2 is 63878 seconds. Zhu et al. [37] showed that the shorter the interval, the greater the impact of the current item on the next item. One reason for this is that users may repeatedly play similar games for a short period of time, which represents their short-term preferences. For example, user ID No. 386576 frequently plays “League of Legends” for a short period of time. Furthermore, the information in Fig. 2a shows that the proportion of adjacent games are different in the sequence increases overall when the time interval is longer. In other words, a longer interval indicates low correlation between adjacent items, which also influences the modeling of users’ preferences.

Fig. 2
figure 2

Statistics of game data sets

Furthermore, duration is also an important feature in sequence modeling and sequential prediction/recommendation. As shown in Table 2, the duration indicates how long users play a game for. Generally, duration can reflect the degree of users’ preferences for corresponding items. When a user plays a game for a longer duration, he/she will play the game more frequently; in other words, he/she is more interested in the corresponding game. For example, in Table 2, the frequently played games (“League of Legends” and “CrossFire”) have longer durations. As shown in Fig. 2b, our data analysis further illustrates the relationship between the average duration of the game and the frequency with which the game is played. The results show that a longer duration indicates that users are interested in corresponding games and tend to play them more frequently and spend more time playing them (duration). In other words, users’ preferences for the game are reflected in the duration.

In general, users have both long-term and short-term preferences [29]. Specifically, long-term interest refers to users’ long-term and static interest. For example, some users only like role-playing games, so they may play this type of game most of the time. However, users’ preferences can change over time, and the next item or action is more likely to depend on users’ recent behaviors (called short-term interest). For example, although some users predominantly like role-playing games, they may also try popular strategy games. In order to better capture users’ long-term and short-term interests, we need to make better use of both time interval information and duration information. Therefore, we propose a novel recommendation model which incorporates interval and duration information into next-new-item recommendation.

4 The proposed method (IDLSTM-EC)

The time information in this work includes both time interval and duration. Specifically, the interval indicates the correlation between the current item and next item in the sequence, and the duration indicates the user’s preferences for corresponding events (similar to the rating). Inspired by the analysis and motivation in Section 3, we propose a novel next-item recommendation method, namely Interval- and Duration-aware LSTM with Embedding layer and Coupled input and output gate (IDLSTM-EC).

Figure 3 shows how the proposed model performs prediction and recommendation based on users’ sequences. In the process, we firstly extract the interval Δt, duration d and x from the sequence {game1,game2,⋯ ,gamek}, where x is the one-hot vector of the game. We then feed the obtained information into the IDLSTM-EC cell. Finally, the result output by the IDLSTM-EC cell is passed through the softmax function to obtain the probability of each game to be played next. Compared with RNN or LSTM, the proposed model can incorporate three kinds of inputs (item, time interval and duration) into sequence modeling and recommendation in a unified way.

Fig. 3
figure 3

The architecture of the proposed model in sequential recommendation. In architecture, a game sequence is taken as an example, where {game1,game2,⋯ ,gamek} is the sequence; x is the one-hot vector of the game; Δt is the time interval; and d is the duration.IDLSTM-EC is the proposed model in this paper

The architectures of the proposed model and LSTM are shown in Fig. 4. Specifically, in order to illustrate the advantages of the proposed model, we use different colored lines to highlight the improvements made. In addition, parameters that appear in Section 4 are explained in Table 3. Next, we will describe the IDLSTM-EC model in detail.

Fig. 4
figure 4

Architectures of models: a LSTM and b IDLSTM-EC. The IDLSTM-EC has a time gate Ik and a duration gate Dk. Specifically, Ik is designed to model time interval Δt in order to gauge the impact of the current event on the next event, and Dk is designed to model duration d to indicate users’ interests. Furthermore, the IDLSTM-EC uses the coupled input and forget gate, and input x has been converted to \(\hat x\) by the embedding layer. We use red lines to highlight the improved features of the IDLSTM-EC in comparison with traditional LSTM

Table 3 Parameter description

As shown in Fig. 4b, the IDLSTM-EC introduces an interval gate I and a duration gate D into the LSTM model. As shown in the figure, we use purple lines to highlight processing of time interval and duration with the time interval gate and duration gate. Specifically, the interval gate models the impact of the current event on the next event on the basis of time interval information, while the duration gate is used to model users’ long-term interest in various items on the basis of duration information.

The equations for the interval gate Ik and the duration gate Dk are formally defined as follows:

$$ \begin{array}{c} I_{k}=\sigma_{t_{i}} \left( W_{t_{i}} x_{k} + \sigma^{\prime}_{t_{i}}\left( S_{t_{i}} {\Delta} t_{k}\right) +b_{t_{i}}\right), \end{array} $$
(1)
$$ \begin{array}{c} D_{k}=\sigma_{t_{d}}\left( W_{t_{d}} x_{k} + \sigma^{\prime}_{t_{d}}\left( S_{t_{d}} d_{k}\right) +b_{t_{d}}\right). \end{array} $$
(2)

Furthermore, the interval gate Ik and the duration gate Dk are added to the LSTM in Fig. 4a, which is defined as follows:

$$ \begin{array}{c} i_{k} = \sigma_{i} (W_{i} x_{k} + U_{i} h_{k-1} + P_{i} \circ \hat c_{k-1} + b_{i}), \end{array} $$
(3)
$$ \begin{array}{c} f_{k} = \sigma_{f} (W_{f} x_{k} + U_{f} h_{k-1} + P_{f} \circ \hat c_{k-1} + b_{f}), \end{array} $$
(4)
$$ \begin{array}{c} c_{k} = i_{k} \circ \sigma_{c} (W_{c} x_{k} +U_{c} h_{k-1} + b_{c}), \end{array} $$
(5)
$$ \begin{array}{c} {\hat c_{k}} = f_{k} \circ \hat c_{k-1} + {D_{k}} \circ c_{k}, \end{array} $$
(6)
$$ \begin{array}{c} {\widetilde c_{k}} = \hat c_{k} + {I_{k}} \circ c_{k}, \end{array} $$
(7)
$$ \begin{array}{c} o_{k} = \sigma_{o} (W_{o} x_{k} + {V_{o} {\Delta} t_{k}} + U_{o} h_{k-1} + P_{o} \circ \widetilde c_{k} + b_{o}), \end{array} $$
(8)
$$ \begin{array}{c} h_{k} = o_{k} \circ \sigma_{h} ({\widetilde c_{k}}). \end{array} $$
(9)

Input from the duration gate Dk is added to \(\hat c_{k}\) to associate the input vector xk with both the input gate and the duration gate. We then add the interval gate Ik to \(\widetilde c_{k}\) so that \(\widetilde c_{k}\) incorporates information from both the interval gate Ik and the duration gate Dk.

The cell \(\hat c_{k}\) is used to further model the user’s interest by adding information from the duration gate Dk. We also add cell \(\widetilde c_{k}\) to combine duration and interval information for recommendation.

Specifically, a small interval (large time-interval gate Ik) means that the current item has a significant influence on the next item. Correspondingly, \(\hat c_{k-1}\) will be relatively small, and the next item is influenced to an even greater extent. In this way, the IDLSTM-EC can combine duration and interval information to perform a more precise recommendation. On the other hand, \(\widetilde c_{k}\) is directly connected to the output gate and is used to control the output, together with the output gate. In addition, Δt is added to the output gate to control the output better, with other parameters in the output gate, and Vo is the weight coefficient of the input gate. The IDLSTM-EC combines information on time interval and duration well, and use of the interval gate and duration gate enables the two kinds of time information to be preserved for a longer period of time.

In order to further improve the effectiveness and efficiency of the proposed model, the model introduces the embedding layer to utilize more sequence information and the coupled input and forget gate.

  • Adding the embedding layer: In the IDLSTM-EC model, all inputs are converted into one-hot vectors, which may result in some important information being lost, such as the correlation between different items. In fact, the co-occurrence and context relationships between the inputs play important roles in sequential recommendation. However, the IDLSTM-EC only employs part of the context information but fails to utilize the global context. In order to incorporate more contextual information, an embedding layer is added after the input to transform the original one-hot vectors into low-dimensional real-valued vectors (embeddings), which can effectively capture important features of items and their relationships in the training data. Specifically, the GloVe [17] method is used to train the embedding vector. The GloVe model is a popular embedding method, which obtains vectors through unsupervised learning. Unlike other embedding methods, the GloVe model incorporates global information and contextto capture more important information.

  • coupling input and forget gates: The parameters of the proposed model are reduced by the coupled input and forget gate. Thus, (4) will be removed and (6) is modified as follows:

    $$ \begin{array}{c} \hat c_{k} = (1 - I_{k} \circ i_{k}) \circ \hat c_{k-1} + D_{k} \circ c_{k}. \end{array} $$
    (10)

    Specifically, \(\hat c_{k}\) is the main cell, and the input is affected by both Ik and the input gate; fk is replaced with (1 − Ikik) in \(\hat c_{k}\). The IDLSTM-EC with coupled input and forget gate increases the model’s efficiency by reducing the model parameters. At the same time, reduction of the parameters prevents the model from overfitting to some extent.

5 Experiments

In this section, we will evaluate the proposed model as well as state-of-the-art baselines on two real-world datasets. The first dataset comprises game-playing records collected from the world-leading internet bar which has the largest number of game players in China. The second one is a public music listening dataset: LastFM-1KFootnote 1, which includes all the music listening sequences and timestamps of nearly 1,000 listeners up to May 5, 2009. We preprocessed the two datasets and deleted users and items with only a few records. Statistical information on the final datasets is shown in Table 4, where #(∗) indicates the number of ∗, and Average (Item) indicates the average number of interactions for all users.

Table 4 Statistics for the two datasets

5.1 Compared methods

In this section, the proposed model is compared with state-of-the-art recommendation methods, including traditional recommendation methods and the variants of LSTM mentioned above.

5.1.1 Baselines

Two kinds of recommendation methods have been adopted as baselines, including general recommendation models and sequence-based recommendation models. Specifically, general models mainly perform traditional, non-sequential recommendation, while sequence-based models can perform next-item recommendation via machine learning or neural networks. We also compare different versions of the proposed model to show the effectiveness of each improved component.

General recommendation models:

  • POP: Popularity predictor which recommends the most popular items to users.

  • UBCF: User-Based Collaborative Filtering.

  • BPR: Bayesian personalized ranking [18].

Sequence-based next-item recommendation models:

  • FPMC: Factorizing Personalized Markov Chains [19].

  • Session-RNN: A variant of traditional RNN which can capture the user’s short-term interest.

  • Peephole-LSTM: A variant of LSTM which adds a “peephole connection” to allow all gates to accept input from the state [6].

  • Peephole-LSTM with time: This model adds time information to the Peephole-LSTM for a fair comparison.

  • Time-LSTM: A variant of LSTM that adds two time gates to the traditional LSTM [37].

5.1.2 The proposed methodsFootnote 2

As three variants of the proposed IDLSTM-EC model, IDLSTM-E, IDLSTM-C and IDLSTM are included in the ablation experiments. In particularly, they are used as baselines to show the effectiveness of two key components in IDLSTM-EC (i.e., the embedding layer and the coupled input and forget gate). All methods are described as follows:

  • IDLSTM-EC: Interval- and Duration-aware LSTM with Embedding layer and Coupled input and forget gate.

  • IDLSTM-E: IDLSTM-EC model with only the embedding layer.

  • IDLSTM-C: IDLSTM-EC model with only the coupled input and forget gate.

  • IDLSTM: IDLSTM-EC model without the embedding layer and the coupled input and forget gate.

Specifically, IDLSTM-C and IDLSTM-E are used as baselines to evaluate the effectiveness of the embedding layer component and the coupled input and forget gate component in IDLSTM-EC, respectively. Besides, IDLSTM is used as a baseline to evaluate the effectiveness of combining the two key components in IDLSTM-EC.

5.2 Experimental setup

In the experiment, the task is to predict the next new item that users will be most likely to interact with according to existing behavior sequences. During the training phase, an improved stochastic gradient descent method called Adagrad [2] was used, which can adapt the learning rate to the parameters. Specifically, Adagrad can improve the convergence ability of the model by increasing the learning rate of sparse parameters. In addition, cross-entropy was chosen as the loss function, defined as follows:

$$ \begin{array}{c} Loss = -\frac{1}{M}\sum (pos_{i} \times y_{i} \log \hat{y}_{i}), \end{array} $$
(11)

where M is the number of training samples; yi is the value of the real item; \(\hat {y}_{i}\) is the value of the predicted item; and posi is the new-item indicator. When yi corresponds to a new item, posi = 1; otherwise, posi = 0.

All experiments were conducted on a PC with Intel(R) Core(TM) i9-7900X @ 3.30GHz and GeForce GTX 1080 Ti, 64GB memory and Ubuntu 16.04.

5.3 Evaluation metrics

The proposed model was evaluated with two metrics, including Recall and MRR.

  • Recall: Recall (aka sensitivity) is defined as follows:

    $$ \begin{array}{c} Recall@n = \frac{\#(n,hit)}{\#(all)}, \end{array} $$
    (12)

    where #(n,hit) is the number of predicted results in the top-n of the recommended list, and #(all) is the number of all test samples. Recall is a common evaluation criterion and is usually used to evaluate if the recommendation lists contain the target item.

  • MRR: MRR (Mean Reciprocal Rank) is a ranking evaluation metric which indicates the average of the reciprocal ranks of the target items in a recommendation list. Formally, it is defined as follows:

    $$ \begin{array}{c} MRR@n = \frac{1}{\#(all)}\times \sum \frac{1}{rank_{i}}, \end{array} $$
    (13)

    where ranki denotes the ranking of the i-th test target item in the recommendation list. If ranki > n, \(\frac {1}{rank_{i}}=0\). The MRR is the average of the reciprocal levels of the target items. When the Recall@n of several models are similar (different models have a similar proportion of target items appearing in the recommendation list), we can use MRR to further evaluate them. In particular, MRR@n is the same with Recall@n when n = 1.

5.4 Comparison with baselines

The comparisons between the proposed model and baseline models are presented in Table 5. The results show that the proposed model has the best performance in sequential recommendation. Furthermore, models with a neural network structure perform better than models without considering sequence factors. Specifically, the performance of the IDLSTM-EC is 11.1% and 32.5% better than the best baseline method in terms of Recall@10 and MRR@10, respectively, on game datasets. Furthermore, the level of improvement achieved by the IDLSTM-EC on LastFM datasets is 65.8% (Recall@10) and 18.3% (MRR@10), respectively. Next, we will analyze the performance of each model in detail.

  • POP, UBCF, BPR and FPMC: The POP method only recommends items with high popularity, which results in low coverage. UBCF and BPR ignore the dependence of items in sequences and cannot model users’ short-term preferences. However, in sequential recommendation, recent items usually play a large part in decision-making. The FPMC method achieves better performance than the POP and UBCF because it combines matrix factorization and Markov chains to model users’ behavior sequences. However, FPMC has some limitations when it comes to retaining information in the sequence for a long time, and it does not fit well with long sequences.

    Table 5 Recall and MRR of the proposed methods and baselines (best results are highlighted in bold)
  • Session-RNN: Session-RNN mainly captures users’ short-term interests but does not consider their long-term interests, which limits its performance. However, in sequential recommendation and users’ long- and short-term preferences both play an important role in sequential recommendation.

  • Peephole-LSTM and Peephole-LSTM with time: The Peephole-LSTM does not work well due to the lack of time information. Compared to Peephole-LSTM, the Peephole-LSTM with time incorporates time information into the input and achieves slightly better performance in most cases. However, adding time information directly to the input is not entirely effective. In addition, these two approaches cannot capture or preserve users’ long-term preferences accurately.

  • Time-LSTM: Time-LSTM incorporates time information into the sequence modeling process in a more effective way, so it achieves better performance than Peephole-LSTM and Peephole-LSTM with time. In particular, the lack of duration information in Time-LSTM decreases its ability to fully utilize time information or capture users’ preferences accurately, which limits its performance.

  • IDLSTM and IDLSTM-E: The IDLSTM and IDLSTM-E perform better than all baselines in Recall and MRR. This shows that it is better to use the duration gate and the interval gate at the same time to perform recommendation. Besides, the performance of the IDLSTM-E is much better than that of the IDLSTM. The reason for this is that the proposed methods can utilize time interval and duration information with gate mechanisms effectively in order to perform better recommendation. Furthermore, the results also show that time interval and duration data are both important in sequence modeling, as well as for capturing users’ long- and short-term preferences. The performance of the IDLSTM-E is better than that of the IDLSTM. The reason for this is that the embedding layer based on the GloVe method can effectively capture global information in users’ behavior sequences, which enables the proposed methods to achieve better performance in sequential recommendation.

  • IDLSTM, IDLSTM-C and IDLSTM-EC: Extensive experiments have shown that the IDLSTM-C does not significantly improve upon the IDLSTM in terms of accuracy evaluation. Efficiency comparisons of each model’s ability to run an epoch are listed in Table 6, which shows that the IDLSTM-C and IDLSTM-EC are approximately 6% faster than the IDLSTM and IDLSTM-E. Therefore, the efficiency of the proposed model is improved via the coupled input and forget gate, reducing the parameters that need to be trained. Traditional methods cannot achieve accurate results, although they require much less time due to their concise structure. In addition, in actual applications, recommendation models are generally pre-trained offline, so the comparison test time is more meaningful. Specifically, all recommendation methods can perform recommendation during the test phase within close and reasonable time.

    Table 6 Time taken for each model to run an epoch

In conclusion, traditional recommendation methods (such as UBCF and BPR) do not consider dynamic changes in user interests, which leads to poor results. Sequence-based methods (such as Session-RNN, Peephole-LSTM, Peephole- LSTM with time, Time-LSTM and IDLSTM(-EC)) achieve better performance than traditional recommendation methods due to the effectiveness of RNN when it comes to modeling users’ behavior sequences. In particular, the proposed model can make better use of time interval and duration information, which is very important for sequence modeling and sequential prediction/recommendation. In addition, improvement of the IDLSTM-EC over the IDLSTM shows that the global information (related to users’ long-term preferences) captured by IDLSTM-EC is quite important in sequential recommendation.

5.5 Effect of the number of units

In this subsection, we evaluate the influence of the number of cell units and number of embedding layer units using results from two experiments. In the first experiment, the effect of the number of cells was evaluated. The best number of units in the first experiment was then used in the second experiment to evaluate the effect of the number of embedding layer units.

5.5.1 Effect of the number of cell units

We firstly set the number of cell units to (16, 32, 64, 128, 256 and 1024), and then evaluated the impact of the number of cell units of the IDLSTM and IDLSTM-C in terms of Recall and MRR. As shown in Fig. 5, two models had similar performance in Recall@10 and MRR@10. In addition, we found that the IDLSTM-C takes less time than the IDLSTM as the number of cell units increases. Meanwhile, the promotion of Recall@10 and MRR@10 gradually stabilizes. In particular, once the number of cell units exceeds 128, the performance of Recall@10 and MRR@10 is not much improved. Thus, the optimum number of cell units is 128, and this enables the proposed model to capture most important information.

Fig. 5
figure 5

The effect of different numbers of cell units on Recall@10, MRR@10 and time of an epoch

5.5.2 Effect of the number of embedding units

The effect of different numbers of embedding units was further investigated with the number of cell units set to 128, and the results are shown in Fig. 6. In particular, the results without an embedding layer are also added at 0-abscissa for comparison. Our results indicate that when the number of cells in the embedding layer is less than 32, the performance of the proposed model is lower than that without the embedding layer model. Therefore, it is necessary to have enough units in the embedding layer to ensure that sequence information is well preserved in the recommendation model.

Fig. 6
figure 6

The effect of different numbers of embedding units on Recall@10, MRR@10 and time of an epoch

As shown in Fig. 6c and f, although the time varies, the overall fluctuation is not large because the number of embedding layer unit parameters only accounts for a small part of the model. Therefore, different numbers of embedding layer units do not have much impact on the efficiency of the proposed approach.

Furthermore, as shown in Fig. 6a, b, d and e, when the number of embedding layer units increases from 16 to 128, Recall@10 and MRR@10 are also improved. However, when the number of embedding units becomes larger than 128, Recall@10 and MRR@10 have no significant increase and can even result in a downward trend. The reason for this is that an excessive number of units may cause overfitting. Therefore, the number of embedding layer units was set as 128.

5.6 Impact of data sparsity

We also evaluated the proposed methods against baselines for datasets with different sparsity to verify their ability to deal with sparse data. Specifically, items with a frequency of less than d were removed from the dataset, where d was set to (0, 5, 10, 15, 20, 30, 40 and 50), respectively, and the sparsity of corresponding datasets was (97.04%,94.50%, 93.38%,92.65%,91.87%,90.73%,90.15%,89.61%) for the game dataset and (99.72%,98.68%,97.87%,96.15%, 94.74%, 88.89%,86.23%,85.71%) for LastFM. As shown in Fig. 7, the performance of the various methods did not significantly decrease as the sparsity increased because our task was to recommend next new items that users might have been interested in but had not yet interacted with. In particular, some items with a low frequency were excluded to change the sparsity of the dataset, which may also have removed some key items or correlations. For example, if “C” was removed from the sequence, the next-new-item recommender system (our work) performed one prediction, \(\{A,A\}\rightarrow \)B”. The performance decreased if “C” was the key item for prediction of “B”. But even so, we found that the proposed method performs better than baselines in terms of Recall@10 and MRR@10. Therefore, we conclude that our methods can deal with data with different sparsity effectively.

Fig. 7
figure 7

Comparison of different models using data with different sparsity

6 Conclusions

In this paper, we have proposed a novel time-aware sequence modeling method and have applied it to next-new-item recommendation. Specifically, the proposed method introduces two gates, i.e., a duration gate for modeling user preferences and an interval gate for modeling the impact of the current item on the next item in sequences. In addition, we adopted the GloVe method to take advantage of global context information and further improve its efficiency with a coupled input and forgot gate. Experiments on real-world datasets show that the proposed model outperforms state-of-the-art baselines, including LSTM and its variants. Furthermore, the experimental results also demonstrate the effectiveness of the proposed methods when handling sparse data.

In the future, we will try utilizing an attention mechanism to extract key features and their relevance from sequences. It is generally agreed that users’ personalized interests play an important role in recommendations. Therefore, we will try enhancing the model’s ability to adapt to users with different preferences. In addition, we will also consider incorporating content information such as text and description to further improve the performance of sequential recommendation.