1 Introduction

With the rapid development of the information age, Internet users are in an environment of information explosion. An ensuing problem is how users can quickly and accurately obtain the required information from the huge and complex web data. Recommender systems [1, 2] have been proposed to solve the problem due to their powerful ability for capturing users’ interests and have attracted a lot of attention from researchers. However, traditional recommendation systems have some problems in capturing users’ short-time preferences, such as noise in users’ historical interaction data, failure to reflect users’ short-time preference shifts through historical information and personal data [3, 4], which leads to unsatisfactory recommendation results. In order to solve the problems of traditional recommendation systems, the session-based recommendation was born.

Session-based recommendation aims to predict the next item a user is likely to click through the user’s click sequence in the recent period. Based on this feature, session-based recommendation can better capture the current interest preference shift of users, thus improving the effectiveness and accuracy of the recommendation system. How to learn and capture the features of items in a session and user preferences from session data has also become a core problem in the field of session-based recommendation. As shown in Fig. 1, among the four user browsing records, sessions S1, S2, and S3 reflect users’ interest in electronics and accessories, while users in session S4 are clearly more likely to buy food.

Fig. 1
figure 1

A toy example of item conversion in different sessions

In response to the problem of the above example, a large number of researchers have explored how to obtain user preferences and generate accurate recommendation results from such session data. In early studies of session-based recommendation, researchers mainly used collaborative filtering (CF) [5,6,7] and Markov chain (MC) [8, 9], which largely ignore item interactions in session data and are ineffective in making recommendations on sparse data. Matrix factorization (MF) [10,11,12] methods perform well in dealing with sparse matrices, but pose the problem of high time complexity as well as poor interpretability. Subsequently, recurrent neural networks (RNNs) [13, 14] as well as attention mechanisms [15,16,17] have also been used in the field of recommender systems due to their excellent handling of sequential data, and have been used to capture the sequential relationships of a session in session-based recommendation. Although this approach, which relies on sequential structure, brings some improvement to session-based recommendation, it still has many limitations. Thanks to the higher-order feature representation capability of graph structures, graph neural networks (GNNs) [18,19,20,21] have been widely used in various fields in recent years, especially in the field of recommender systems, which has received wide attention and become a mainstream approach. The session-based recommendation based on graph neural networks [22,23,24] learns the similarity of items in a session by combining multiple sessions in a graph neural network through graph structures, and although it can provide more information for items, this method of transferring information only through connected pairs of nodes still limits the performance of session-based recommendation.

Compared with the ordinary graph where node pairs are connected to each other, the hypergraph [25] connects two or more nodes per hyperedge, which can effectively transfer information between multiple nodes. By introducing this hypergraph structure, a hyperedge can connect multiple items when constructing a session graph, which can effectively capture the complex relationship between two or more items in the session data. In session-based recommendation, researchers usually propagate information about all items at the session level [26, 27], which allows effective learning of the overall features of the session, but also ignores the structural features of the items within the session to some extent. In this paper, Session-based Recommendation with Fusion of Hypergraph Item Global and Context Features (FHGIGC) is proposed. Firstly, all session data are constructed as a global hypergraph, and different sessions are interconnected by shared nodes to learn item features by related sessions in the hypergraph neural network. Then, for each session, the local hypergraph is constructed by using the context window as a hyperedge, and the interrelationship of items within the session is learned by the contextual information of the items in the session. Next, the learned item features are fused according to the importance level by an attention mechanism to obtain the final item features and session features, which are finally used in the recommendation task. The main contributions can be summarized as follows:

  • To learn the features of items in a global session, all sessions are constructed as global hypergraphs and information is propagated through Hypergraph Convolutional Neural Networks (HGCN). Then local hypergraphs are constructed in individual sessions with contextual information of items and the information is propagated through Hypergraph Attention Network (HGAT) to learn the item interaction features within the sessions.

  • The attention mechanism learns the different importance of inter-session holistic features and intra-session item relationship features, and fuses the features learned from global and local hypergraph structures according to this importance, and finally generates personalized recommendations for sessions based on the fused feature information.

  • Extensive experiments were conducted on three datasets, Tmall, Diginetica, and RetailRocket, respectively, to demonstrate that the FHGIGC model outperforms the existing baseline model, and the designed modules were also shown to improve the recommendation accuracy through ablation experiments.

2 Related works

2.1 SBR based on traditional methods

Collaborative filtering is mainly used to make recommendation by calculating the similarity between users and items. Item-KNN [5] inferred items that might be of interest to users by calculating the cosine similarity between different items and recommended them to users. With the development of machine learning, there are also many researches on combining machine learning in collaborative filtering for recommendation. NCF [28] used a multiple perceptron approach to learn user-item interactions. J-NCF [29] further combined neural collaborative filtering to learn deep user-item features from user-item interactions. Although these methods have achieved some success, the accuracy of the recommendation results is somewhat affected by ignoring the user preference shifts in the sequence.

Markov chain-based recommender systems predict subsequent possible items from the user’s current continuous session items. FPMC [8] captures the sequence information of a session by combining a matrix decomposition model and a first-order Markov chain. Shani et al. [30] predicted user preferences by Markov decision process for the sequential purchase behavior of users. Similarly, Zimdars et al. [9] used the user interactions in a session as a sequence to make recommendations based on temporal relationships. However, the Markov chain-based approach only focuses on first-order interaction relations and only performs local feature extraction based on the current sequence, ignoring global session information resulting in unsatisfactory recommendation results.

2.2 SBR based on deep learning

Recurrent neural networks were first proposed to solve problems related to natural language processing, and are also of interest in the field of session-based recommendation due to their ability to learn sequence structure [31]. GRU4REC [13] first applied RNNs to session-based recommendation to iterate the item sequence information in a session through a multilayer gated recurrent unit (GRU). HRNN [32] proposed hierarchical recurrent neural network to model users’ personal preference variations and also to improve the personalization of recommendations by fusing personal history information. Jannach et al. [33] combined the recommendation results of the GRU4REC and the KNN in a set ratio as the recommendation results. Considering that user interests are dynamically changing, ISLF [34] uses variable automatic encoder (VAE) combined with RNNs to capture user preferences in session sequences. These methods address to some extent the problem of separate items in traditional methods and the inability to model sequential continuity information over a certain period of time, but still fail to distinguish the importance between multiple items in a session.

The main use of the attention mechanism is to calculate the importance of each input using key-value pairs according to the task requirements, so as to focus on the more important inputs. NARM [35] captures the user’s main preferences in the current session sequence by introducing an attention mechanism, and also incorporates the user’s history information to accomplish the recommendation task. The short-term memory attention model proposed by STAMP [36] implements a session-based recommender by fusing attention mechanisms as well as RNNs. HARSAM [37] improves the recommendation capability of the model by introducing a soft attention mechanism to model the user’s interaction information and to learn the potential performance of the user. Atten-Mixer [38] leverages both concept-view and instance-view readouts to achieve multi-level reasoning over item transitions.The introduction of attentional mechanisms has enabled deep learning approaches to make great progress in session-based recommendation. However, these approaches rely excessively on information about adjacent items in the session order structure and are unable to learn the interrelationship of non-adjacent items in a session.

2.3 SBR based on GNN

Graph neural networks provide a convenient and intuitive perspective for modeling complex relationships among nodes, and their applications in recommender systems have been expanding as research on graph neural networks has been pushed to a new high in recent years. SR-GNN [39] proposed a graph neural network approach to model the relationship of items in a session, capturing user as well as item interaction features while maintaining structural information between items. PGGNN [40] proposes a dual position encoding (DPE) gated graph neural network to learn the positional information of items in a session. FGNN [41] generates session embeddings through the weighted graph attention layer (WGAT) and ultimately recommendations in the context of cross-session information. FGNN-BCS [42] further construct a broadly connected session (BCS) graph to connect sessions to improve session embedding. GC-SAN [43] learns long-term dependencies between session items by combining graph neural networks and self-attention mechanisms. GARG [44] further proposed an approach based on graph convolutional neural networks and attention mechanisms to improve recommendation performance. In order to avoid information loss when constructing session graphs, LESSR [45] used edge-order preserving aggregation (EOPA) and a fast graph attention layer by modeling sessions as directed graphs. COTREC [46] improves session data utilization through a self-supervised learning approach based on graph convolutional neural networks. In these graph methods, GNN can only capture the relationships between node pairs and cannot capture the higher-order relationships of multiple items in a session, which affects the effectiveness of recommendations to some extent.

Hypergraph is an improved graph structure that can handle higher-order relationships between multiple nodes. In the hypergraph structure, each edge can connect two or more nodes and treat them as a hypernode. This special connection makes the hypergraph structure highly expressive in dealing with complex data relationships. DHCN [26] captures the relationship of all items in a session by modeling the session as a hyperedge and enhances the learning capability of the model by contrastive learning. SHARE [47] creates a hypergraph for each session that captures the relevance of items in the session. HyperS2Rec [27] improves the recommendation capability to some extent by combining the hypergraph structure with the sequential structure to learn the global and sequential relationships between session items, respectively. The FHGIGC proposed in this paper is a session-based recommendation model that combines global hypergraph and local hypergraph structures to effectively capture item consistency, relevant session information, and contextual interaction information of items in session data.

3 The proposed model

3.1 Problem setup

The set of all items is denoted as \(V=\{v_1,v_2,\cdots ,v_N\} \), N is the total number of items, and for the session m can be denoted as \(S_m={v_{m1},v_{m2},\cdots ,v_{mr}}\), r is the length of this session. The purpose of session-based recommendation is to predict the next item of possible interest to the user based on such a session, i.e., \(v_{m(r+1)}\).

In the FHGIGC model proposed in this paper, sessions are modeled as global and local hypergraphs, respectively, and information is propagated through a hypergraph convolutional network and a hypergraph attention network. Figure 2 shows the overall framework of the proposed model, which consists of four main parts: the first part is the global item feature extraction module, which is used to capture the item consistency information of each session and the information in the related sessions; the second part is the item context interaction feature extraction module, which captures the context information of the items within a session through the hypergraph attention network; the third part is the item feature fusion module, which obtains the global and local item feature vectors through the attention The fourth part is the session feature fusion and prediction module, in which the item feature in a session are fused by the attention mechanism with position embedding to obtain the corresponding session feature, and thus calculate the probability of candidate items and recommend the higher ranked items to users as a recommendation list.

Fig. 2
figure 2

The overall framework of HGIGC model. Module I is the global hypergraph based on HGCN to learn the global item features. Module II is the local hypergraph based on HGAT to learn the local item features. Module III fusion of global and local item features to generate fused item features. Module IV incorporates an attention mechanism with position embedding to fuse item features into session features and make recommendations

3.2 Global item feature extraction module

In order to capture the higher-order relationships of items in all sessions rather than between pairs of nodes, each session is treated as a hyperedge such that different sessions are constitutively connected to each other by sharing nodes, while the nodes in each session are connected to each other in the hyperedge. The hyperedges \(\epsilon _1,\ \epsilon _2\,\ \epsilon _3\) constructed by sessions \(S_1,\ S_2,\ S_3\), respectively, are shown in Fig. 2, and the global features of items are effectively captured by this structure.

The global graph notation is denoted as \(G_g=(V,E)\), V is the set of nodes consisting of N items in all sessions, E is the set of all edges, each edge contains two or more nodes, the total number of edges is M. Each edge corresponds to a single piece of session data, and the different hyperedges are connected to each other through shared nodes in the session. For such a hypergraph, its connectivity is represented by an association matrix \(\textrm{H}\in R^{N\times M}\) with \(\textrm{H}_{ij}=1\) when node \(v_i\in \epsilon _j\) and \(\textrm{H}_{ij}=0\) otherwise. Each hyperedge \(\epsilon _j\) is initialized with weight \(\textrm{W}_{jj}\) and all weights form a diagonal matrix \(\textrm{W}\in R^{M\times M}\). The degrees of nodes and hyperedges are defined as \(\textrm{D}_{ii}=\sum _{j=1}^{M}\ \textrm{W}_{jj}\textrm{H}_{ij}, \textrm{B}_{jj}=\sum _{i=1}^{N}\ \textrm{H}_{ij}\), respectively, and all degrees form a diagonal matrix \(\textrm{D}\in R^{N\times N}\) and \(\textrm{B}\in R^{M\times M}\).

By constructing the hypergraph structure in the above way, a global hypergraph constructed from all the session data in the dataset as hyperedges can be obtained. It has been demonstrated in study [48] that removing the nonlinear activation functions and convolutional layer parameter matrices between the layers of a graph convolutional network does not adversely affect the downstream tasks of the model, and the computational complexity of the information propagation process can be greatly reduced in this way. In addition, since information can be transmitted across multiple nodes in a hyperedge, global dependencies can be better captured by hypergraph convolutional neural networks. Based on the above advantages, after obtaining the global hypergraph structure introduced above, the model propagates information through the hypergraph convolutional network (HGCN) [26, 49], and the propagation process is calculated as follows:

$$\begin{aligned} \textrm{x}^{(l+1)}_i = \sum ^N_{t=1} \sum ^M_{j=1} \textrm{H}_{ij}\textrm{H}_{tj}\textrm{W}_{jj}\textrm{x}^{(l)}_t \end{aligned}$$
(1)

where \(\textrm{x}_t^{(l)}\), \(\textrm{x}_i^{(l+1)}\) are the input of item t at layer l and the output of item i at layer l, respectively. \(\sum _{t=1}^{N}\textrm{H}_{tj}\textrm{W}_{jj}\textrm{x}_t^{(l)}\) is the hidden representation of hyperedge j at layer l, which represents the hyperedge collects information from its connected nodes. The row normalization matrix of Equation (1) takes the form of:

$$\begin{aligned} \textrm{X}^{(l+1)} = \textrm{D}^{-1}\textrm{HWB}^{-1}\mathrm{H^TX}^{(l)} \end{aligned}$$
(2)

where \(\textrm{X}^{(l)}\), \(\textrm{X}^{(l+1)}\) are the input and output of the layer l, respectively, and the zeroth layer is obtained by random initialization, and the weight matrix is initialized to a unit matrix.

After that, the output of multilayer HGCN is obtained by averaging the final output of HGCN, which is calculated as follows:

$$\begin{aligned} \textrm{X}^G=\frac{1}{L+1} \sum _{l=0}^L \textrm{X}^{(l)} \end{aligned}$$
(3)

3.3 Item context interaction feature extraction module

For each session, consider the contextual information that each item in the session can provide for that item. Inspired by Wang et al. [47], we set up context windows of different sizes in each session as the hyperedges of the local hypergraph, and the items covered by the windows are all the nodes connected by this hyperedge. The process of constructing multiple hyperedges through the set context window is illustrated in Fig. 3 to obtain the local hypergraph corresponding to the current session.

Fig. 3
figure 3

Constructing hyperedges through contextual windows

The local hypergraph notation is denoted as \(G_l=(V_l,E_l)\), \(V_l\) is the set of all items in the current session, and \(E_l\) is the set of all hyperedges obtained through the context window.

Although the hypergraph convolutional network has been introduced in 3.2, and the hypergraph structure has been simplified to improve the computational efficiency of the model, only some of the nodes in the same hyperedge in the local hypergraph may be able to provide information for this hyperedge, and each hyperedge may have different effects on the nodes. Therefore, we propagate information over a local hypergraph by incorporating a neural network of attention, the hypergraph attention network (HGAT) [25, 47, 49], in which, unlike the traditional graph structure in which information is propagated between neighboring nodes, the hyperedges are considered as a kind of hypernodes and the node representation learning process is decomposed into two steps.

3.3.1 Node to hyperedge

The information is first passed from the node to the hyperedge, and the propagation through the attention mechanism is calculated as follows:

$$\begin{aligned} \textrm{e}_j^{(l)}= & {} \sum _{t\in \mathcal N_j} \textrm{h}_{t-j}^{(l)} \end{aligned}$$
(4)
$$\begin{aligned} \textrm{h}_{t-j}^{(l)}= & {} \alpha _{jt}\textrm{W}_1^{(l)}\textrm{x}_t^{(l-1)} \end{aligned}$$
(5)

where \(\mathcal {N}_j\) denotes the set of all items connected by edge j, \(\textrm{e}_j^{(l)}\) denotes the feature vector of edge j at layer l, \(\textrm{x}_t^{(l-1)}\) denotes the input representation of item t at layer l, \(\textrm{h}_{t-j}^{(l)}\) denotes the information passed from item t to edge j, \(\textrm{W}_1^{(l)}\in R^{d\times d}\) is the trainable parameter matrix. \(\alpha _{jt}\) is the attention coefficient between hyperedge j and item t, which is calculated as follows:

$$\begin{aligned} \alpha _{jt}=\frac{S(\hat{\textrm{W}}_1^{(l)}\textrm{x}_t^{(l-1)},\textrm{u}^{(l)})}{\sum _{f\in \mathcal N_j}S(\hat{\textrm{W}}_1^{(l)}\textrm{x}_f^{(l-1)},\textrm{u}^{(l)})} \end{aligned}$$
(6)

where \(\hat{\textrm{W}}_1^{(l)}\in R^{d\times d}\), \(\textrm{u}^{(l)}\in R^d\) are the trainable parameter matrix and attention parameters, respectively, and \(S(*)\) denotes the function for calculating the similarity score, which in the experiments is computed by scaling the dot product attention [50, 51] as follows:

$$\begin{aligned} S(\textrm{a,b})=\mathrm{\frac{a^Tb}{\sqrt{D}}} \end{aligned}$$
(7)

3.3.2 Hyperedge to node

After completing the first step of propagation, similarly, the propagation from the hyperedge to the node is calculated as follows:

$$\begin{aligned} \textrm{x}_t^{(l)}= & {} \sum _{j\in \mathcal Y_t} \textrm{h}_{j-t}^{(l)} \end{aligned}$$
(8)
$$\begin{aligned} \textrm{h}_{j-t}^{(l)}= & {} \beta _{tj}\textrm{W}_2^{(l)}\textrm{e}_j^{(l)} \end{aligned}$$
(9)

where \(\mathcal {Y}_t\) denotes all edges connected to item t, \(\textrm{h}_{j-t}^{(l)}\) denotes the information passed from edge j to item t, and \(\textrm{W}_2^{(l)}\in R^{d\times d}\) is the trainable parameter matrix. \(\beta _{tj}\) denotes the attention coefficient between item t and hyperedge j, which is calculated as follows:

$$\begin{aligned} \beta _{tj}=\frac{S(\hat{\textrm{W}}_2^{(l)}\textrm{e}_j^{(l)},\textrm{W}_3^{(l)}\textrm{x}_t^{(l-1)})}{\sum _{f\in \mathcal Y_t}S(\hat{\textrm{W}}_2^{(l)}\textrm{e}_f^{(l)},\textrm{W}_3^{(l)}\textrm{x}_t^{(l-1)})} \end{aligned}$$
(10)

where \(\hat{\textrm{W}}_2^{(l)}\), \(\hat{\textrm{W}}_3^{(l)}\in R^{d\times d}\) is the trainable parameter matrix.

3.4 Item feature fusion module

Considering that global and local information has different importance for the target nodes, in this process, the importance of the learned representation in the global and local structures is captured and different weights are assigned to them for feature fusion by the attention mechanism [27]. The attention coefficients are calculated as follows:

$$\begin{aligned}{} & {} \mu _i^G=\mathrm{q_1^T}tanh(\textrm{W}_4\textrm{x}_i^G+\textrm{b}_1) \end{aligned}$$
(11)
$$\begin{aligned}{} & {} \mu _i^L=\textrm{q}_1^Ttanh(\textrm{W}_4\textrm{x}_i^L+\textrm{b}_1) \end{aligned}$$
(12)

where \(\textrm{x}_i^G, \textrm{x}_i^L\) denote the representation learned by item i in the global hypergraph and local hypergraph, respectively, \(\textrm{q}_1\in R^d\), \(\textrm{W}_4\in R^{d\times d}\), \(\textrm{b}_1\in R^d\) are the attention parameters, trainable parameter matrix and bias vector, \(tanh(*)\) is the activation function to prevent the gradient disappearance/explosion problem that may occur during training, the attention coefficients are normalized by the softmax function, and the normalized attention coefficients are expressed as:

$$\begin{aligned} \gamma _i^G=\frac{\textrm{exp}(\mu _i^G)}{\textrm{exp}(\mu _i^G)+\textrm{exp}(\mu _i^L)}=1-\gamma _i^L \end{aligned}$$
(13)

Finally, by linearly combining global and local information through the attention coefficients, the final representation can be calculated as follows:

$$\begin{aligned} \textrm{x}_i=\gamma _i^G\textrm{x}_i^G+\gamma _i^L\textrm{x}_i^L \end{aligned}$$
(14)

3.5 Session feature fusion and prediction module

Session representation can be obtained by aggregating the features of items in a session. It has been shown [40, 52] that each item in a session is influenced by position and that introducing reverse positional embeddings in a session plays a more important role in capturing user preferences. Therefore, we introduce an attention mechanism with position embedding to capture the importance of items at different position when computing the session representation. The position embedding is initialized as \(\textrm{P}=\{\textrm{p}_1,{ \mathrm p}_2,\cdots ,\textrm{p}_m\}\), \(\textrm{p}_1, \textrm{p}_1,\textrm{p}_2,\cdots ,\textrm{p}_m\in R^d\), and m is the maximum session length. The initialized position embedding is connected to the item features and linearly transformed as follows:

$$\begin{aligned} \textrm{x}_{j,i}^*=\textrm{tan}h(\textrm{W}_5[\textrm{x}_{j,i}||\textrm{p}_{m_j-i+1}]+\textrm{b}_2) \end{aligned}$$
(15)

where \(\textrm{x}_{j,i}\), \(\textrm{x}_{j,i}^*\) denote the representation before and after the transformation of item i in session j, respectively, \(\textrm{W}_5\in R^{d\times 2d}\), \(\textrm{b}_2\in R^d\) are the trainable parameter matrix and the bias vector, respectively, and \(m_j\) denotes the length of session j. The || indicates a vector splice operation.

The attention coefficient is calculated as follows:

$$\begin{aligned}{} & {} \textrm{s}_j'=\frac{1}{m_j}\sum _{i=1}^{m_j}\textrm{x}_{j,i} \end{aligned}$$
(16)
$$\begin{aligned}{} & {} \theta _{j,i}=\textrm{q}_2^T\sigma (\textrm{W}_6\textrm{s}_j'+\textrm{W}_7\textrm{x}_{j,i}^*+\textrm{b}_3) \end{aligned}$$
(17)

where \(\textrm{q}_2\in R^d\), \(\textrm{W}_6\in R^{d\times d}\), \(\textrm{W}_7\in R^{d\times d}\), \(\textrm{b}_3\in R^d\) are the attention parameters, trainable parameter matrix and bias vector, respectively, \(\sigma (*)\) denotes the sigmoid activation function, and \(\textrm{s}_j^\prime \) is the average of all items within the session.

According to the calculated attention coefficients, the importance of different items for session features can be distinguished, so that a linear combination of item features can be obtained for session features as:

$$\begin{aligned} \textrm{s}_j=\sum _{i=1}^{m_j}\theta _{j,i}\textrm{x}_{j,i} \end{aligned}$$
(18)

The final score is calculated by multiplying the discourse representation \(\textrm{s}_j\) with the candidate item embedding \(\textrm{x}_i\) and outputting the probability through the softmax function:

$$\begin{aligned}{} & {} \hat{\textrm{z}}_{j,i}=\textrm{s}_j^T\textrm{x}_i \end{aligned}$$
(19)
$$\begin{aligned}{} & {} \hat{\textrm{y}}_j=softmax(\hat{\textrm{z}}_j) \end{aligned}$$
(20)

Finally, the K items with the highest scores in the results are created into a recommendation list and recommended to users.

For each session \(\textrm{s}_j\), the loss function is defined as follows:

$$\begin{aligned} \mathcal {L}=-\Sigma _i^N(\textrm{y}_{j,i}.log(\hat{\textrm{y}}_{j,i})+(1-\textrm{y}_{j,i}).log(1-\hat{\textrm{y}}_{j,i})) \end{aligned}$$
(21)

where \(\hat{\textrm{y}}_{j,i}\) denotes the probability that item i is in the recommendation list of session \(\textrm{s}_j\). \(\textrm{y}_{j,i}=1\) means that item i does exist in session \(\textrm{s}_j\), otherwise \(\textrm{y}_{j,i}=0\).

4 Experiment

The model was experimented on three datasets, Tmall, Diginetica, and RetailRocket, respectively, and the following three questions were analyzed to verify the validity of the proposed FHGIGC model based on the experimental results:

  • Whether FHGIGC outperforms the baseline model on real datasets.

  • Whether the proposed FHGIGC can improve the recommended performance and what is the impact of each module on the model.

  • How different hyperparameter settings affect the overall prediction effect of the model.

In this section, the dataset used in the experiment and how the data were processed are first introduced, then the evaluation metrics used and the parameters set in the experiment are given, and finally the experimental results are analyzed.

4.1 Dataset

Tmall: This dataset was first proposed in the IJCAI-15 competition and includes user shopping records from the Tmall shopping site from six months before the Double 11 to the day of the Double 11.

Diginetica: This dataset was proposed in the CIKM Cup 2016 and is session information extracted from e-commerce search engine logs, using only transaction data in this experiment.

RetailRocket: This dataset is a competition dataset published on kaggle and contains 4.5 months of data from Russian online retailers.

For all the above datasets, we follow the processing of previous work [35, 39], Deleting sessions that contain only one item and removing items with a total count of less than five, and expanding the dataset by dividing it, As for session \(S=\{v_1,v_2,\cdots ,v_t\}\), divide it into \(\{[v_1],v2\},\{[v1,v2],v3\},\cdots ,\{[v1,v2,v3,\cdots ,vm-1],vm\}\), where the last item of each sequence is used as the label. The statistics of all three datasets are given in Table 1.

Table 1 Statistics of datasets

4.2 Metrics

In order to observe the recommended effects of the model more intuitively, the FHGIGC model was evaluated and compared with the baseline model by two ranking-based evaluation metrics.

HR@K(Hit Rate): Which indicates the proportion of correctly recommended items in the K recommendation list. When the test dataset size is N, HR@K is calculated as shown below:

$$\begin{aligned} \mathrm{HR@K}=\frac{\textrm{Hit}_n}{N} \end{aligned}$$
(22)

where \(\textrm{Hit}_n\) indicates the number of correctly recommended item targets in the K recommendation list.

MRR@K (Mean reciprocal rank): Which means the inverse of the ranking of the correct recommendation item target in the K recommendation list, for example, the target appears in the first position is scored as 1, the second position is scored as 0.5, and if the target does not appear in the K recommendation list, it is scored as 0. When the size of the test dataset is N, the formula for MRR@K is as follows:

$$\begin{aligned} \mathrm{MRR@K}=\frac{1}{N}\sum _{i=0}^N\frac{1}{\textrm{Rank}(i)} \end{aligned}$$
(23)

where \(\textrm{Rank}(i)\) denotes the rank of correctly recommended item target in the K recommendation list.

4.3 Baselines

The ability of the FHGIGC model to improve the recommended performance was demonstrated by comparing the results with those of the following ten models.

POP: Recommending the K most frequently occurring items in the dataset.

Item-KNN[5]: Recommends similar products by calculating the cosine similarity between items.

FPMC[8]: Combines matrix decomposition with first-order Markov chain method to generate sequence-based recommendations.

GRU4REC[13]: Compute the session feature representation by gated recurrent network by treating the user’s current session as a sequence.

NARM[35]: Improves GRU4REC by adding an attention layer to capture important features in a session.

STAMP[36]: Replaces recurrent neural networks with multiple attention mechanisms and captures short-term preferences at the last node of the session.

SR-GNN[39]: Learns feature representations in graph neural networks by modeling multiple sessions as directed graphs, and fuses feature representations of multiple items in a session by an attention mechanism.

GC-SAN[43]: Combining graph neural networks with self-attention to learn long-term dependencies of items in a session.

SHARE[47]: Setting contextual windows on a single-session sequence to capture complex interactions between items within a session.

DHCN[26]: Constructing hypergraphs and line graphs to compute item as well as session feature representations through hypergraph neural networks, respectively, and learning the mutual information between them through a comparative learning approach.

4.4 Parameter setting

To be fair, the embedding vector length as well as the batch size were set to 100 in the experiments following other research conventions. All learnable parameters are initialized by a Gaussian distribution with mean 0 and variance 0.1. In the optimizer settings, the Adam optimizer is selected and the learning rate is initialized to 0.001, decaying by 0.1 every three epochs, and trained for 30 epochs iteratively. Experiments show that the level of the hypergraph network and the size of the sliding window have different effects on the recommendation ability of the model under different datasets. For the baseline model, when both its evaluation metrics and datasets are the same as in this paper, the best results in the original experiment are used as a comparison; otherwise, the parameters are set to those of the best performance of the model for the experiment and comparison.

4.5 Experiment result

The results are presented in Table 2 through the experiments, and it can be seen that the FHGIGC model proposed in this paper outperforms the baseline model on these three datasets. Specifically, we analyze the experimental results by answering the three questions posed above in 4.5, 4.6, and 4.7, respectively.

As a comparison, the POP in the traditional method only recommends the most frequently occurring items in the session data to the users without considering the user preferences, and has the worst effect. the Item-KNN method recommends by calculating the item similarity, but does not consider the sequential structure in the session data, so it does not achieve better results. FPMC, by combining matrix decomposition and Markov chain methods, has better results than the Item-KNN in both Tmall and RetailRocket datasets outperform Item-KNN, which proves to some extent the importance of sequential interaction information for obtaining user preferences.

Table 2 Experimental results of FHGIGC and comparison model, where FHGIGC is indicated by bolded

GRU4REC is the first model to apply RNN to the field of session-based recommendation, but the improvement of its recommendation accuracy is quite limited because GRU4REC only considers the sequential relationships in a session and does not fully take into account the preference changes at different positions in the sequence. NARM and STAMP integrate attention mechanisms into RNN methods in order to more accurately capture shifts in user preferences. Among them, STAMP demonstrates the importance of short-term preferences in predicting users’ next clicked item by replacing the RNN layer with multiple attention mechanisms and using the last item in the session as the user’s short-term preference.

With the results of SR-GNN and GC-SAN, we can observe that the graph neural network approach outperforms the traditional and deep learning methods in terms of recommendations. This is because the graph structure approach is more conducive to capturing the transformation relationships of pairs of items between sessions. However, the traditional graph structure cannot effectively capture the higher-order relationships between multiple nodes. SHARE and DHCN model the session data through hypergraphs, and the experimental results fully demonstrate the advantages of hypergraph structure in session-based recommendation.

The FHGIGC model proposed in this paper combines global and local information based on all sessions to improve the accuracy of recommendations. In contrast to SHARE, the model effectively learns item consistency and relevant session information from the full session. In addition, unlike DHCN, the model learns the contextual features of items in the current session through the local hypergraph structure.

4.6 Ablation experiment

In order to verify the effect of different modules in the proposed FHGIGC model on the recommended results, ablation experiments were performed by designing four variants of the model, and the four variants are described as follows.

FHGIGC-G: In this variant, the global item feature extraction module is removed from the model, and the item features learned in the item context interaction feature extraction module are used as the final item features, and this variant is used to verify the effectiveness of the global item feature extraction module.

FHGIGC-L: In this variant, the item context interaction feature extraction module is removed from the model, and the item features learned in the global item feature extraction module are used as the final item features, and this variant is used to verify the effectiveness of the item context interaction feature extraction module.

FHGIGC-C: The item feature fusion module is removed from the model in this variant to combine the global hypergraph with the local hypergraph by direct summation, and this module verifies that global features have different importance from local features.

FHGIGC-P: In this variant, the positional embedding of the session feature fusion module is removed, and the session features are obtained by directly fusing the item features in the session, and this module verifies the different effects of items in different position in the session on the session features.

The experimental results of the four variants of the model are shown in Fig. 4, and according to the obtained experimental results, it can be observed that the proposed FHGIGC achieves the best results. FHGIGC-G and FHGIGC-L show a significant decrease in model performance compared to FHGIGC, which indicates the effectiveness of learning item features jointly through multiple structures for prediction tasks. Meanwhile, the performance degradation of FHGIGC-G is more significant relative to FHGIGC-L, which indicates that the relevant sessions can provide more accurate and rich information for the recommendation task. Comparing FHGIGC with FHGIGC-C, it can be concluded that the information learned from the two structures does not have the same degree of influence on the recommended characters. The FHGIGC results outperformed the FHGIGC-P, which illustrates that items in different position have different degrees of influence on the session as a whole, validating the role of position embedding in capturing the importance of items in different position.

Fig. 4
figure 4

Comparison of experimental results of the FHGIGC model and the proposed multiple variants

4.7 Parameter experiment

To explore the best performance of the FHGIGC model parameter experiments were conducted on each of the three datasets to find the best parameters by grid search, and the experimental results are shown in Fig. 5.

Fig. 5
figure 5

Comparison of experimental results obtained by FHGIGC model when setting different parameters

Number of layers of HGCN: The layers of the hypergraph convolutional neural network are set to 1, 2, 3, 4, 5, respectively. From the results in Fig. 5a, it can be observed that on the Tmall dataset, the best results are obtained when the number of layers is 1, and the model becomes less and less effective when the number of layers rises. And on the Diginetica and RetailRocket datasets, the best results can be achieved when the hypergraph convolutional neural network level is set to 3. For the above results, the possible reason for the analysis is that in the Tmall dataset, where the average length of sessions is longer, a single-layer neural network can better capture the session consistency in sessions. For the Diginetica and RetailRocket datasets with shorter average lengths, a deeper layer of neural network needs to be set up so that the model can add information from the relevant sessions.

Number of layers of HGAT: The hypergraph attention network layers are set to 1, 2, 3, 4, 5, respectively. The results in Fig. 5b show that the best results can be obtained when the attention network is set to two layers, and the final prediction of the model decreases instead of rising when the number of attention layers is continued to be raised. This is because the deepening of the graph neural network layers may make the information between nodes too smooth.

Size of context window: The context windows are set to 1, 2, 3, 4, 5, respectively. The results are displayed in Fig. 5c. When the maximum window size is set to 1, each hyperedge contains only one node, so that there is no information propagation between nodes. And when the maximum window is set to 2, each hyperedge contains two nodes, in which case the hypergraph structure degenerates to an ordinary graph structure. From the results, it can be seen that larger window sizes achieve better results on the Tmall dataset, possibly because the average session length in the Tmall dataset is larger, allowing multiple hyperedges to be constructed with larger window sizes.

Fig. 6
figure 6

Experimental results of the model with session samples of different lengths

4.8 Case study

In this section, we further explore the ability of the proposed method to capture user preferences in sessions of different lengths. On the Tmall dataset, the dataset is firstly categorized into four groups according to their lengths, and the session lengths of each group are 3–4, 5–8, 9–11, and greater than 12. Experiments are conducted on each of the four groups for FHGIGC and DHCN, and the results of the experiments are displayed in Fig. 6. From the results in the Fig. 6, it can be seen that FHGIGC performs more stably in sessions of different lengths, and outperforms DHCN in the short-session experiments, which indicates that the proposed method of fusing global and contextual information can effectively improve the recommendation performance. In addition, the experimental results respond that the model recommendation performance is higher when the session data are longer, which is because the items in the global graph are more tightly connected to learn richer information when the session length increases, and more hyperedges can be constructed for the current session for feature propagation through the context window set in the local graph. Therefore, our proposed method can effectively capture the user preferences reflected by the session for recommendation tasks.

5 Conclusion

In this paper, we propose a hypergraph neural network model FHGIGC for the session-based recommendation task, which can learn relevant session information as well as item consistency information from the global hypergraph and capture the contextual features of items in a session from the local hypergraph, respectively. The learning item features are combined through both structures and these features are integrated through an attention layer. The experiments show that the proposed model outperforms the comparison model on all three datasets, demonstrating that the FHGIGC model can effectively solve the problem of missing item features within a session in existing session-based recommendation models and mitigate the negative impact of information scarcity in short sessions on the recommendation effect. Current research in session-based recommendation relies exclusively on session data for recommendation. In future work, we consider constructing item as well as spatiotemporal information in session data through heterogeneous graphs, and introducing generative adversarial networks to capture dynamic feature changes in these data.