1 Introduction

Recommender systems have been widely used for alleviating information overload in real-world applications, such as social media [16], news [42], videos [39], E-commerce [15] and Point of Interest (POI) applications [30]. It aims to estimate whether a user will show a preference for an item, based on the user’s historical interactions. Among existing recommendation methods, Collaborative Filtering (CF) based models [8, 14, 34, 36] have shown great performance in user and item representation learning. For example, Matrix Factorization [13] represents users and items with embedding vectors and models the user-item interactions with the inner product. Neural collaborative filtering [8] utilizes nonlinear neural networks with multiple hidden layers to capture the user-item interactions for better user and item representations.

Recently, GCN-based recommendation models have surged to learn better user and item representations in the user-item bipartite graph. The typical flow can be summarized as follows: 1) Initialize user and item representations by embedding them into the latent space; 2) Use an aggregation function over neighbors of each node to update its representation iteratively; 3) Readout the final representation of each node by combining or concatenating. The paradigm of GCN iterative aggregating feature information from local graph neighbors has been proved to be an efficient way to distill additional information from graph structure and thus improve user and item representation learning. For example, GC-MC [1] explores the first-order connectivity between users and items by utilizing only one convolution layer in the user-item bipartite graph. NGCF [32] leverages the message-passing mechanism to obtain high-order connectivity and collaborative signal between users and items. LightGCN [9] simplifies the NGCF [32] model, removing the components of feature transformation and nonlinear activation, leading to improvement in training efficiency and generation ability.

Despite effectiveness, GCN is still vulnerable to noisy and incomplete graphs, which are common in real-world scenarios, due to its recursive message propagation mechanism [3, 4, 41]. However, some latest GCN-based recommendation models (e.g. LightGCN [9]) propose to remove the feature transformation during message propagation, but making it unable to effectively capture the graph structural features and become more sensitive to noisy or missing information. Moreover, they model users and items with real-value embeddings, which has been demonstrated to have high distortion when modeling complex graphs [2, 18], further degrading the capability to capture the graph structural features and leading to sub-optimal performance.

Can we move beyond the Euclidean space to learn better user and item representations and feature transformation, capture the graph structural features more effectively, and thus improve both recommendation performance and model robustness? Quaternion space - a hyper-complex vector space, where each quaternion is a hyper-complex number consisting of one real and three imaginary components, has shown great performance in representation learning [19, 20, 38]. Hamilton product, the multiplication of quaternions, enhances the inter-latent interactions between real and imaginary components of two quaternions. Any slight change in the input quaternion results in an entirely different output, leading to highly expressive computations, thus the intricate relations are captured more powerfully [25]. Figure 1 shows the difference between real-value transformation and quaternion transformation. There has been significant success of quaternion-based methods in various fields. For example, [5] builds deep quaternion networks in the Quaternion space for classification tasks. Zhu et al. [43] proposes quaternion-based convolutional neural network (CNN) for image classification and denoising tasks. Parcollet et al. [24] applies the Quaternion space into recurrent neural network (RNN) and long-short term memory neural network (LSTM) for automatic speech recognition. Parcollet et al. [23] integrates multiple feature views in quaternion-valued CNN to be used for sequence-to-sequence mapping with the CTC model. [22] investigates quaternion-based CNN and RNN for speech recognition. [28] proposes quaternion-based attention models and Transformer for NLP tasks.

Moreover, there has been some work introducing the Quaternion space into graph representation learning to obtain more expressive graph-level representations [19, 20, 38]. For example, [19] generalizes graph neural networks in the Quaternion space for graph classification, node classification, and text classification. Nguyen et al. [20], Zhang et al. [38] introduce more expressive quaternion representations to model entities and relations for knowledge graph embeddings for knowledge graph completion. Despite Quaternion space being introduced into various fields for various tasks and achieving remarkable performance improvement, there is almost no exploration of the Quaternion space in GCN-based recommendation scenarios. Some challenges during this process remain to be explored. The most crucial one is that: The model should not be designed to be very complex or redundant to better validate the effectiveness of the Quaternion space and for more intuitive comparison. In other words, how to introduce the Quaternion space while keeping the model as simple as possible remains to be considered.

Fig. 1
figure 1

Comparison between real-value transformation and quaternion transformation

To this end, in this paper, we propose a simple yet effective Quaternion-based Graph Convolution Network (QGCN) recommendation model, which improves both performance and robustness. Specifically, we first embed all users and items into the Quaternion space with quaternion embeddings. Then, we introduce the quaternion embedding propagation layers with quaternion feature transformation to perform message propagation for aggregating more useful information. Finally, we combine the embeddings generated at each layer with the mean pooling strategy to obtain the final embeddings for recommendation. The quaternion feature transformation enhances the inter-latent interactions between real and imaginary components, enabling it to capture the graph structural features more effectively, distinguish the contribution of different nodes during message propagation, and thus improve both performance and robustness. Extensive experiments are conducted on three public benchmark datasets to validate the effectiveness of our proposed QGCN model. Results show that QGCN outperforms the state-of-the-art methods by a large margin, which indicates that it can better learn user and item representations. Besides, with further robustness analysis, we find that the performance of our QGCN model remains steady in various noisy or incomplete graphs, while that of compared state-of-the-art methods declines dramatically. This indicates that our model is more robust and can effectively capture the graph structural features.

We summarize the contributions of this work as follows:

  • To the best of our knowledge, we are the first to introduce the Quaternion space into GCN-based recommendation models.

  • A QGCN model is proposed to model users and items in the Quaternion space and propagate them with quaternion feature transformation, which significantly enhances both recommendation performance and model robustness.

  • We conduct extensive experiments on three public benchmark datasets, and experimental results demonstrate the effectiveness of our QGCN model. Results of robustness analysis verify the effectiveness of the quaternion feature transformation in capturing the graph structural features.

2 Preliminaries

In this section, we cover some necessary background on quaternion before delving into the architecture of our proposed model.

2.1 Quaternion

A quaternion \(\textit{Q} \in \mathbb {H}\) is a hyper-complex number consisting of one real part and three imaginary parts defined as:

$$\begin{aligned} Q = Q_{r} + Q_{i}\textbf{i} + Q_{j} \textbf{j} + Q_{k} \textbf{k}, \end{aligned}$$
(1)

where \(Q_{r}, Q_{i}, Q_{j}, Q_{k} \in \mathbb {R}\), and \(\textbf{i}\), \(\textbf{j}\), \(\textbf{k}\) are imaginary units, satisfying the following rule:

$$\begin{aligned} \textbf{i}^{2} = \textbf{j}^{2} = \textbf{k}^{2} = \textbf{i}\textbf{j}\textbf{k} = -1. \end{aligned}$$
(2)

A n-dimensional vector form of quaternion \(\varvec{Q} \in \mathbb {H}^{n}\) is defined as:

$$\begin{aligned} \varvec{Q} = \varvec{Q_{r}} + \varvec{Q_{i}} \textbf{i} + \varvec{Q_{j}} \textbf{j} + \varvec{Q_{k}} \textbf{k}, \end{aligned}$$
(3)

where \(\varvec{Q_{r}}, \varvec{Q_{i}}, \varvec{Q_{j}}, \varvec{Q_{k}} \in \mathbb {R}^{n}\), each coefficient of the real unit or the imaginary units is a n-dimensional vector.

2.2 Quaternion addition

The addition of two quaternions \(\textit{Q}\) and \(\textit{P}\) is defined as:

$$\begin{aligned} Q + P = (Q_{r}+P_{r}) + (Q_{i}+P_{i})\textbf{i} + (Q_{j}+P_{j})\textbf{j} + (Q_{k}+P_{k})\textbf{k}, \end{aligned}$$
(4)

2.3 Quaternion inner product

The inner product of two quaternions \(\textit{Q}\) and \(\textit{P}\) is defined as:

$$\begin{aligned} Q \cdot P = Q_{r} \cdot P_{r} + Q_{i} \cdot P_{i} + Q_{j} \cdot P_{j} + Q_{k} \cdot P_{k}. \end{aligned}$$
(5)

2.4 Hamilton product

The quaternion product of two quaternions \(\textit{Q}\) and \(\textit{P}\) is defined as:

$$\begin{aligned} Q \otimes P= & {} (Q_{r} P_{r}-Q_{i} P_{i}-Q_{j} P_{j}-Q_{k} P_{k}) \nonumber \\{} & {} + (Q_{i} P_{r}+Q_{r} P_{i}-Q_{k} P_{j}+Q_{j} P_{k}) \textbf{i} \nonumber \\{} & {} + (Q_{j} P_{r}+Q_{k} P_{i}+Q_{r} P_{j}-Q_{i} P_{k}) \textbf{j} \nonumber \\{} & {} + (Q_{k} P_{r}-Q_{j} P_{i}+Q_{i} P_{j}+Q_{r} P_{k}) \textbf{k}. \end{aligned}$$
(6)

We further simplify the result of Hamilton product above into matrix form as follows:

$$\begin{aligned} \left[ \begin{array}{c} 1 \\ \textbf{i} \\ \textbf{j} \\ \textbf{k} \end{array}\right] ^ \textrm{T} \left[ \begin{array}{cccc} Q_{r} &{} -Q_{i} &{} -Q_{j} &{} -Q_{k} \\ Q_{i} &{} Q_{r} &{} -Q_{k} &{} Q_{j} \\ Q_{j} &{} Q_{k} &{} Q_{r} &{} -Q_{i} \\ Q_{k} &{} -Q_{j} &{} Q_{i} &{} Q_{r} \end{array}\right] \left[ \begin{array}{c} P_{r} \\ P_{i} \\ P_{j} \\ P_{k} \end{array}\right] . \end{aligned}$$
(7)

3 Methodology

Fig. 2
figure 2

The architecture of our proposed QGCN model which is formed by Quaternion Embedding Layer, Quaternion Embedding Propagation Layers and Prediction Layer

In this section, we present our proposed QGCN model. As illustrated in Figure 2, the model contains three main components: Quaternion Embedding Layer, Quaternion Embedding Propagation Layers, and Prediction Layer.

3.1 Quaternion embedding layer

Firstly, we embed all the users and items into the Quaternion space. For each user \(u \in \mathcal {U}\), we represent it with a quaternion ID embedding \(\textbf{e}^{0, Q}_{u} = \textbf{e}^{0}_{u, r} + \textbf{e}^{0}_{u, i}\textbf{i} + \textbf{e}^{0}_{u, j}\textbf{j} + \textbf{e}^{0}_{u, k}\textbf{k} \in \mathbb {H}^{d}\), where d represents the dimension of quaternion. And the same for item quaternion ID embeddings.

3.2 Quaternion embedding propagation layers

3.2.1 Quaternion embedding propagation

Next, we perform message propagation within the Quaternion Embedding Propagation Layers with quaternion feature transformation. As mentioned above, we argue that removing the feature transformation during message propagation makes it unable to effectively capture the graph structural features and become more sensitive to noisy or missing information, further degrading the model performance. So in this part, we introduce the feature transformation in the Quaternion space at each layer for message propagation to aggregate more useful information. In order to prove our quaternion feature transformation to be valid more intuitively, we adopt the simple message propagation procedure like the vanilla GCN [12] without the nonlinear activation function, only involving the user and item embeddings and the quaternion transformation matrices. We generate the quaternion transformation matrix at layer l as follows:

$$\begin{aligned} \textbf{W}^{l, Q} = \textbf{W}^{l}_{r} + \textbf{W}^{l}_{i}\textbf{i} + \textbf{W}^{l}_{j}\textbf{j} + \textbf{W}^{l}_{k}\textbf{k}, \end{aligned}$$
(8)

where \(\textbf{W}^{l}_{r}, \textbf{W}^{l}_{i}, \textbf{W}^{l}_{j}, \textbf{W}^{l}_{k} \in \mathbb {R}^{d \times d}\).

Thus, our quaternion embedding propagation rule in QGCN is defined as:

$$\begin{aligned} \textbf{e}^{l,Q}_{u}= & {} \sum \limits _{i \in \mathcal {N}_{u}} \frac{1}{\sqrt{\vert \mathcal {N}_{u} \vert \vert \mathcal {N}_{i} \vert }} \textbf{W}^{l,Q} \otimes \textbf{e}^{l-1,Q}_{i}, \nonumber \\ \textbf{e}^{l,Q}_{i}= & {} \sum \limits _{u \in \mathcal {N}_{i}} \frac{1}{\sqrt{\vert \mathcal {N}_{i} \vert \vert \mathcal {N}_{u} \vert }} \textbf{W}^{l,Q} \otimes \textbf{e}^{l-1,Q}_{u}, \end{aligned}$$
(9)

where \(\textbf{e}^{l,Q}_{u}\) and \(\textbf{e}^{l,Q}_{i}\) respectively represent user u’s quaternion embedding and item i’s quaternion embedding after l layers propagation; \(1 / \sqrt{\vert \mathcal {N}_{u} \vert \vert \mathcal {N}_{i} \vert }\) is the symmetric normalization term following the vanilla GCN [12], designed to avoid the scale of embeddings increasing with graph convolution operations, where \(\mathcal {N}_{u}\) and \(\mathcal {N}_{i}\) respectively denote the user u’s interacted items and the item i’s interacted users; \(\textbf{W}^{l,Q} \in \mathbb {H}^{d \times d}\) is the quaternion feature transformation matrix at layer l; \(\otimes \) denotes Hamilton product.

To facilitate the implementation of the quaternion embedding propagation, we derive the Hamilton product \(\otimes \) between \(\textbf{W}^{l, Q}\) and \(\textbf{e}^{l-1,Q}_{u}\) in Eq. (9) as follows (c.f. (7)):

$$\begin{aligned} \left[ \begin{array}{c} 1 \\ \textbf{i} \\ \textbf{j} \\ \textbf{k} \end{array}\right] ^ \textrm{T} \left[ \begin{array}{cccc} \textbf{W}^{l}_{r} &{} -\textbf{W}^{l}_{i} &{} -\textbf{W}^{l}_{j} &{} -\textbf{W}^{l}_{k} \\ \textbf{W}^{l}_{i} &{} \textbf{W}^{l}_{r} &{} -\textbf{W}^{l}_{k} &{} \textbf{W}^{l}_{j} \\ \textbf{W}^{l}_{j} &{} \textbf{W}^{l}_{k} &{} \textbf{W}^{l}_{r} &{} -\textbf{W}^{l}_{i} \\ \textbf{W}^{l}_{k} &{} -\textbf{W}^{l}_{j} &{} \textbf{W}^{l}_{i} &{} \textbf{W}^{l}_{r} \end{array}\right] \left[ \begin{array}{c} \textbf{e}^{l-1}_{u, r} \\ \textbf{e}^{l-1}_{u, i} \\ \textbf{e}^{l-1}_{u, j} \\ \textbf{e}^{l-1}_{u, k} \end{array}\right] . \end{aligned}$$
(10)

The result of Hamilton product \(\otimes \) between \(\textbf{W}^{l, Q}\) and \(\textbf{e}^{l-1,Q}_{i}\) can be derived similarly.

3.2.2 Dropout and L2Norm

Dropout drops the units of the neural networks with a certain probability during the training process, which proves to be an effective way to prevent neural networks from overfitting [10, 27]. Motivated by the previous work of introducing dropout into graph convolutional network [1] and GCN-based recommendation models [32], we apply dropout to the user and item embeddings at each layer l with a certain dropout rate p, which is one of the critical hyper-parameters to be tuned. Then, we perform L2 Normalization function on them for training speed and stability. We summarize the dropout and L2 normalization as follows:

$$\begin{aligned} \textbf{e}^{l,Q}_{u}= & {} L2Norm \left( Dropout(\textbf{e}^{l,Q}_{u}) \right) , \nonumber \\ \textbf{e}^{l,Q}_{i}= & {} L2Norm \left( Dropout(\textbf{e}^{l,Q}_{i}) \right) . \end{aligned}$$
(11)

3.3 Prediction layer

After the above L layers’ quaternion embedding propagation, dropout and L2 normalization, we obtain \(L+1\) representations for each user u and item i, including the user embedding initialized at quaternion embedding layer, \({\textbf{e}^{0,Q}_{u}}\) and user representations generated at each layer during propagation, \(\{\textbf{e}^{1,Q}_{u}, \textbf{e}^{2,Q}_{u}, \dots , \textbf{e}^{L,Q}_{u}\}\). And the same for item i, we obtain \(L+1\) item representations which consist of \(\{\textbf{e}^{0,Q}_{i}, \{\textbf{e}^{1,Q}_{i}, \textbf{e}^{2,Q}_{i}, \dots , \textbf{e}^{L,Q}_{i}\}\}\). Since the output of different layers expresses different connections, utilizing the representations of all layers seems like an effective method for GCN-based models. Readout function is the method to obtain the final node representation, e.g. Max, Sum, Concat, Mean pooling, which are the most primitive and simple pooling methods.

We concatenate the real and imaginary components of the node embeddings and apply the pooling methods as follows:

$$\begin{aligned} \textbf{e}^{l}_{u} = Concat \{\textbf{e}^{l,Q}_{u,r}, \textbf{e}^{l,Q}_{u,i} \textbf{e}^{l,Q}_{u,j}, \textbf{e}^{l,Q}_{u,k}\}, \end{aligned}$$
(12)
$$\begin{aligned} \textbf{e}^{*}_{u} = Readout \{\textbf{e}^{l}_{u} \}_{l=1}^{L}, \end{aligned}$$
(13)

where Readout is the readout function (i.e. Max, Sum, Concat, Mean pooling) applied on the node embeddings generated at each layer. We further conduct experiments and investigate the influence of the readout function applied to our model in the ablation study part.

After generating the final user and item embeddings, we predict by the inner product of user u and item i:

$$\begin{aligned} \hat{y_{ui}} = {\textbf{e}^{*}_{u}}^ \textrm{T} \textbf{e}^{*}_{i}. \end{aligned}$$
(14)

3.4 Optimization

We adopt \(\textit{Bayesian Personalized Ranking}\) (BPR) loss [26], which encourages the observed interactions to achieve higher scores than the unobserved ones. The objective function for our QGCN model is as follows:

$$\begin{aligned} Loss = \sum \limits _{u=1}^{M} \sum _{i \in \mathcal {N}_{u}} \sum _{j \notin \mathcal {N}_{u}} - \ln \sigma (\hat{y}_{ui} - \hat{y}_{uj}) + \lambda \Vert \varvec{\Theta } {\Vert _{2}^{2}} , \end{aligned}$$
(15)

where \(\sigma \) is the sigmoid function; \(\lambda \) represents the regularization weight, which is \(L_{2}\) regularization to prevent overfitting; \(\varvec{\Theta } = \left\{ \{\textbf{e}_{u}^{0,Q}\}_{u \in \mathcal {U}}, \{\textbf{e}_{i}^{0,Q}\}_{i \in \mathcal {I}}, \{\textbf{W}^{l,Q}\}_{l \in [1,L]}\right\} \) denotes all trainable parameters of QGCN. The mini-batch Adam [11] is adopted to optimize the prediction model and update the model parameters. In particular, for a batch of randomly sampled triples, their representations can be obtained by the propagation rules, and then the model parameters are updated by using the gradients of the loss function.

3.5 Complexity analysis

Specifically, \(\textbf{R}\), \(\vert V \vert \) and \(\vert E \vert \), L and d respectively represent the user-item interaction matrix, the number of nodes and edges, the number of graph convolution layers, and the quaternion dimension.

3.5.1 Time complexity

The time complexity of our model is mainly in the following three parts, adjacency matrix, graph convolution, and BPR loss. For the adjacency matrix, the time complexity is \(\mathcal {O}(\vert E \vert )\) that we set each element \(\textbf{R}_{ui}=1\) in user-item interaction matrix \(\textbf{R}\) if user u has interacted with item i. For the graph convolution, the quaternion embedding propagation has computation complexity \(\mathcal {O}(L \vert E \vert d^{2})\). For the BPR loss, the time complexity is \(\mathcal {O}(\vert E \vert d)\). Therefore, the overall time complexity of our model is \(\mathcal {O}(\vert E \vert +L\vert E\vert d^{2}+\vert E \vert d)\). As we can see, the layer-wise propagation rule is the main operation. By deriving the Hamilton product in (9) with the real-value operation (10), we can conclude that the time complexity of the quaternion propagation is equal to the normal matrix multiplication in the basic GCN propagation rule leveraged by previous GCN-based recommendation models.

3.5.2 Space complexity

The space complexity of our model is mainly in the user and item embeddings and the quaternion transformation matrix at each layer. Therefore, the overall space complexity of our model is \(\mathcal {O}(\vert V \vert d+Ld^{2})\), which is equal to the space complexity of previous GCN-based recommendation models.

4 Experiments

In this section, we conduct experiments to answer the following research questions:

  • RQ1 How does our proposed QGCN model perform compared with the state-of-the-art baselines?

  • RQ2 How can QGCN alleviate the problem of noisy or incomplete graphs?

  • RQ3 What is the influence of readout function, quaternion embedding and quaternion weight matrices on the model performance?

  • RQ4 How do the key hyper-parameters, such as dropout rate and regularization affect the effectiveness of QGCN?

Table 1 Statistics of the experimented data

4.1 Datasets

To evaluate the effectiveness of QGCN, we conduct experiments on three benchmark datasets: Yelp2018 [9], Amazon-Book [9], and Amazon-Kindle-Store [17], which are publicly available. The first dataset is the 2018 edition YelpFootnote 1 released by the Yelp challenge. The last two datasets are two widely used datasets for product recommendation from Amazon review.Footnote 2

Following the general dataset settings in previous recommendation methods [9, 17, 32], we filter users and items with few interactions to ensure the quality of the datasets. Specifically, for all the datasets, we use the 10-core settings, which ensure that each user and item have at least 10 interactions. The detailed statistics of the three datasets are shown in Table 1.

We randomly split each dataset into training, validation, and testing set with a ratio of 80:10:10 for each user. For each observed user-item interaction, we treat it as a positive instance. Then, we randomly sample one negative item that the user did not consume before as a negative instance to pair the positive instance.

4.2 Experimental settings

4.2.1 Evaluation metrics

To evaluate the effectiveness of our model on top-K recommendation, we take two evaluation metrics widely used in previous work: Recall@K and NDCG@K. Here, we set \(K = 20\) by default, and the average results for all users in the testing set are reported. The specific definition is as follows:

  • Recall@K describes the percentage of user-item rating records included in the final recommendation list. We denote the recommendation list for a user as \(R_{K}\), and the corresponding testing set as T. Then, the specific definition of Recall@K is as follows:

    $$\begin{aligned} \text {Recall@K} = |T\cap R_{K} |/|T |. \end{aligned}$$
    (16)
  • NDCG@K  i.e. Normalized Discounted Cumulative Gain measures the quality of ranking, which emphasizes more on the relevance of the items on the top of the recommendation list. We denote the relevance of the i-th item in the recommendation list as \(r_{i}\), and the set of relevant items as R. Then, the specific definition of NDCG@K is:

    $$\begin{aligned} \text {NDCG@K} = \sum \limits _{i=1}^ K \frac{r_{i}}{\log _{2}(i+1)} \big / \sum \limits _{i=1}^{\textsf {R}} \frac{1}{\log _{2}(i+1)}, \end{aligned}$$
    (17)

4.2.2 Baselines

To demonstrate the effectiveness of our proposed QGCN model, we compare QGCN with the following competitive baseline methods:

  • NeuMF [8] is a state-of-the-art neural collaborative filtering model, captures the nonlinear interactions between user and item embeddings with multiple hidden layers.

  • HOP-Rec [37] is a state-of-the-art graph-based model, exploits the high-order connectivity between users and items by performing random walks to augment a user’s interactions.

  • GC-MC [1] explores the first-order connectivity between users and items by utilizing only one convolution layer over the user-item bipartite graph.

  • NGCF [32] leverages the message-passing mechanism to obtain high-order connectivity and collaborative signal in the user-item integration graph.

  • LightGCN [9] removes two components, feature transformation and nonlinear activation in NGCF, leading to improvement on training efficiency and generation ability.

4.2.3 Parameter settings

We implement our QGCN model in PyTorch,Footnote 3 optimize QGCN with Adam [11] with the default learning rate of 0.0001, and set batch size as 2048 for speed. We apply a grid search for the only two hyper-parameters: the dropout rate is tuned among {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8} and the coefficient of \(L_{2}\) normalization in Equation (15) is searched in \(\{1e^{-6}, 1e^{-5}, \dots , 1e^{-2}\}\). The embedding parameters are initialized with the Xavier method [6].

Table 2 Overall performance comparison over three datasets

4.3 Performance comparison (RQ1)

Table 2 shows the performance with competing methods. The best results are highlighted in bold. We summarize the main observations as follows:

  • NeuMF, a state-of-the-art neural collaborative filtering model, performs relatively poorly since it captures the connectivity between user and item embeddings in the embedding learning process rather than leveraging the high-order user-item interactions.

  • Compared with NeuMF, GC-MC utilizes one convolution layer to explore the first-order connectivity between users and items and improve the performance, demonstrating the influence of first-order neighbors for representation learning.

  • HOP-Rec exploits the high-order connectivity between users and items by performing random walks to augment a user’s interactions, resulting in better performance than GC-MC. NGCF performs much better over the above baselines. It leverages the message-passing mechanism to obtain high-order connectivity and collaborative signal in the user-item integration graph. LightGCN simplifies the NGCF [32] model, removing the components of feature transformation and nonlinear activation, leading to improvement in training efficiency and generation ability.

  • QGCN outperforms all the baselines by a large margin over all the datasets. In particular, compared with the strongest baseline, i.e LightGCN, QGCN gains on average 14.03% improvement w.r.t. Recall@20 and 15.30% improvement w.r.t. NDCG@20 over all the datasets. In collaborative filtering, the learned user/item representation getting as close as possible to its actual representation is necessary for better recommendation performance. The significant improvements reveal that QGCN can better capture high-order user-item connectivity and learn better user and item embeddings.

  • Specifically, our QGCN model gains huge and about 20%, relative performance improvement. We ascribe this to the characteristics of the datasets, sparsity. As the sparsity of the dataset decreases, the quaternion feature transformation could highlight its contribution to distilling sufficient information from the sparse user-item interaction graphs and further lead to more significant performance improvement. Aside from the effect of sparsity, we conclude that the user-item interaction matrix size might matter as well. The size of the Amazon-Book and Kindle-Store user-item interaction matrix is about four times that of Yelp2018. Moreover, the observations mentioned above that our QGCN model gaining huge relative performance improvement on sparse and huge graphs is of great significance to the practical applications and real recommendation scenarios since real-world graphs for recommendation are often extremely sparse.

  • For more intuitive comparisons, We further plot the training curves of training loss and testing recall per 10 epochs on Kindle-Store and Amazon-Book with optimal settings on both LightGCN and our QGCN model in Figure 3, where results on Yelp2018 show the same trend and are omitted for space. Our QGCN model obtains relatively lower training loss during the whole training process than that in LightGCN, which indicates our QGCN model can better fit the training data and further obtains better testing results, which demonstrates our model’s stronger generalization capability.

Fig. 3
figure 3

Training curves of QGCN and LightGCN, which are evaluated by training loss and testing recall per 10 epochs on Kindle-Store and Amazon-Book(results on Yelp2018 show the same trend which are omitted for space)

Fig. 4
figure 4

Effect of random edges injection. The bar represents Recall@20, while the line represents the relative performance change compared to the original result

Fig. 5
figure 5

Effect of random edges discard. The bar represents Recall@20, while the line represents the relative performance change compared to the original result

4.4 Robustness analysis (RQ2)

4.4.1 Random Edges Injection

To investigate the robustness of our QGCN model to noisy graphs, we conduct simulated experiments to explore the influence of random injection of edges. Specifically, we randomly connect the unobserved edges in the user-item interaction graph \(\textbf{R}\) as noisy edges to construct a noisy graph for the training process. The noise ratio is set in \(\{5\%, 10\%, 15\%, 20\%, 25\% \}\). By the way, the compared LightGCN model and our QGCN model are trained with the same constructed noisy graph for a fair comparison. And we evaluate with the original graph (i.e. 0% edges injection). We further plot Recall@20 and relative drop compared with their original performance of both LightGCN and our QGCN model on Kindle-Store and Yelp2018 in Figure 4.

We observe that QGCN consistently outperforms LightGCN by a large margin under different ratios of random edges injection on both Kindle-Store and Yelp2018. Along with the increase of the noise ratio, the performance of LightGCN decreases accordingly, while that of our QGCN model remains almost unchanged. For example, Recall@20 of LightGCN in the noisy graph with 25% noise ratio of noise injection on Yelp2018 is 0.0568, dropping 12.48% (i.e. -12.48%) compared to the original performance, 0.0649. In contrast to the large drop percent of LightGCN, the performance of our QGCN model under 25% noise ratio even rises by 0.75% (i.e. +0.75%) compared to that under 0% noise ratio. The sharp decline of the relative drop of Recall@20 of LightGCN along with the increase of noise ratio reveals that LightGCN is extremely sensitive to noise, which is consistent with our argument mentioned before. Compared with the steep decline curve of Recall@20 of LightGCN, the relative performance change curve of our QGCN model is more steady, which demonstrates the robustness of our QGCN model to noisy graphs.

4.4.2 Random edges discard

In addition to the characteristic of real-world user-item graphs containing a lot of noise, they are often incomplete as well. Thus, besides the simulated experiments on exploring the influence of random injection of edges, we also conduct experiments to explore the influence of the random discard of edges. Similarly, we construct a corrupted graph by randomly disconnect the existing edges in the user-item interaction graph \(\textbf{R}\) with a drop ratio ranging in \(\{5\%, 10\%, 15\%, 20\%, 25\% \}\). We then train the compared LightGCN model and our QGCN model with the corrupted graph and evaluate with the original graph (i.e. 0% edges discard). The details of Recall@20 and relative drop are shown in Figure 5.

We have similar observations from Figure 5. Specifically, QGCN consistently outperforms LightGCN by a large margin w.r.t different ratios of random edges discard on both Kindle-Store and Yelp2018. The steep performance decline curve of LightGCN is in sharp contrast to the steady curve of QGCN, demonstrating the robustness of our QGCN model to corrupted graphs.

The simulated experiments on exploring the influence of random injection and discard of edges both demonstrate the robustness of our QGCN model. QGCN remains steady in differently constructed graphs while the baseline model declines dramatically, which indicates that changing the structure of the graph by random injection or discard of edges has almost no influence on the model performance. We ascribe this to the expressive quaternion feature transformation, distinguishing the contribution of different nodes and effectively capturing the graph structural features during message propagation. Thus, it can aggregate more useful information and further lead to better model performance and robustness.

Table 3 Performance of our model and its variants
Table 4 Influence of readout function

4.5 Ablation study (RQ3)

4.5.1 Influence of components

We perform ablation studies to explore the contribution of different components to the model performance by comparing QGCN with the following two variants:

  • QGCN-Q: In this variant, we embed all users and items into the real-value space instead of the Quaternion space and maintain the component of feature transformation.

  • QGCN-W: This variant removes the quaternion transformation matrices during message propagation.

Table 3 shows the results of the two variants of QGCN, and the best results are highlighted in bold. QGCN performs much better than QGCN-Q, which shows the significant influence of modeling in the Quaternion space. And QGCN outperforms QGCN-W in most cases, indicating the effectiveness of quaternion transformation matrices. The comparison between QGCN and its two variants demonstrates that the design of our proposed QGCN model is reasonable and effective.

4.5.2 Influence of Readout Function

Since different pooling methods generate different final user and item embeddings, we conduct experiments and investigate the influence of the readout function applied to our model. Table 4 shows the results under different readout functions, and the best results are highlighted in bold. We can observe that Mean pooling performs relatively better than the other three readout functions, Max, Sum, Concat pooling. We think Mean pooling method could not only maintain the information of nodes but also uniform the user and items representations generated at each layer, leading to more powerful generalization capability.

4.6 Hyper-parameter study (RQ4)

Fig. 6
figure 6

Effect of dropout rate

4.6.1 Effect of dropout rate

Dropout drops the units of the neural networks with a certain probability during the training process, which proves to be an effective way to prevent neural networks from overfitting [10, 27]. Motivated by the previous work applying dropout in graph convolutional network [1] and GCN-based recommendation models [32], we investigate the influence of the dropout rate p ranging from 0.0 to 0.8 on our proposed QGCN model.

Figure 6 displays the experimental results, including Recall@20 and NDCG@20, under different dropout rates on Yelp2018 and Amazon-Book. Specifically, the dropout rate set as 0.1 leads to the best performance. Besides, the performance degrades generally after the peak in that too many neurons lost leads to underfitting and limits the expression of our model. These observations are consistent with the findings of prior effort [32] and demonstrate the effectiveness of proper dropout rate settings in our model.

4.6.2 Effect of regularization

Regularization is an effective strategy to prevent overfitting, so that we tune the coefficient of \(L_{2}\) normalization \(\lambda \) among \(\{1e^{-6}, 1e^{-5}, \dots , 1e^{-2}\}\) to investigate the influence of the regularization on our proposed model.

Figure 7 shows the performance of our QGCN model under different regularization coefficients \(\lambda \) on Yelp2018 and Amazon-Book. As shown in Figure 7, too small or too large regularization coefficient result in relatively poor performance. Results are relatively steady when the regularization coefficient \(\lambda \) is set between \(1e^{-5}\) and \(1e^{-4}\), while the performance significantly decrease when \(\lambda \) is set larger than \(1e^{-4}\) or smaller than \(1e^{-5}\). This indicates that a medium regularization coefficient is more suitable for our model. Specifically, the optimal regularization coefficient for Yelp2018, Amazon-Book, and Kindle-Store is \(1e^{-4}\), \(1e^{-5}\) and \(1e^{-4}\) respectively.

5 Related work

5.1 Quaternion-based applications

Quaternion space is a hyper-complex vector space, where each quaternion is a hyper-complex number consisting of one real and three imaginary components. Owing to Hamilton product, which is the multiplication of quaternions, the interactions between real and imaginary components of two quaternions are enhanced, leading to highly expressive computations. In addition, if any slight change happens in the input quaternion, Hamilton product will generate an entirely different output [25] and further influence the final performance. The Quaternion space has been successfully employed in various fields. It has been introduced into various fields for various tasks and achieved remarkable performance improvement than the basic real-valued model. For example, [5] provides the architecture components needed to build deep quaternion networks in the Quaternion space for classification tasks. Zhu et al. [43] re-designs the basic modules like convolution layers and fully-connected layers in the Quaternion space and proposed quaternion-based convolutional neural network (CNN) for image classification and denoising tasks. Parcollet et al. [24] applies the Quaternion space into recurrent neural network (RNN) and long-short term memory neural network (LSTM) for the task of automatic speech recognition. [23] integrates multiple feature views in quaternion-valued CNN to be used for sequence-to-sequence mapping with the CTC model. Parcollet et al. [22] investigates quaternion-based CNN and RNN for speech recognition. Tay et al. [28] proposes quaternion-based attention models and Transformer for NLP tasks. Moreover, there has been some work introducing the Quaternion space into graph representation learning to obtain more expressive graph-level representation [19, 20, 38]. For example, [19] generalizes graph neural networks within the Quaternion space for graph classification, node classification, and text classification. Nguyen et al. [20] Zhang et al. [38] introduce more expressive quaternion representations to model entities and relations for knowledge graph embeddings for knowledge graph completion.

Fig. 7
figure 7

Effect of regularization

5.2 Collaborative filtering and graph-based recommendation

Collaborative Filtering (CF) based models [8, 14, 34, 36] have shown great performance in learning user and item representations. For example, Matrix Factorization [13] represents users and items with embedding vectors and models the user-item interactions with the inner product. Neural collaborative filtering [8] utilizes nonlinear neural networks with multiple hidden layers to capture the user-item interactions for better user and item representations. Another research line exploits the user-item interaction graph for recommendation. Prior efforts like ItemRank [7], adopt label propagation on the graph and encourage connected nodes to have similar labels. HOP-Rec [37] firstly performs random walks to augment a user’s interactions to exploit the connectivity information.

Recently, Graph Neural Networks(GNN), such as GCN [12] and GAT [29], have shown promising performance in various fields. Moreover, there are approaches that focus on addressing some of the flaws of GNN. API-GNN [40] reduces the disturbance of neighborhood aggregationand. BGN [31] improves the efficiency and scalability of GNN by a binarized graph neural network. Specifically, GCN-based recommendation models have surged to learn better user and item representations in user-item bipartite graphs. For example, GC-MC [1] explores the first-order connectivity between users and items by utilizing only one convolution layer in the user-item bipartite graph. NGCF [32] leverages the message-passing mechanism to obtain high-order connectivity and collaborative signal between users and items. LightGCN [9] simplifies the NGCF [32] model, removing the components of feature transformation and nonlinear activation, improving training efficiency and generation ability. Apart from these GCN-based recommdation models , some methods leverage self-supervised learning for additional information [33, 35], and some research on the heterogeneous graph, opposite of the homogeneous graph [21].

6 Conclusion

In this work, we argued the limitation of the unreasonable operation of removing the feature transformation and modeling users and items in the Euclidean space and performed empirical studies to justify this argument. We moved beyond the Euclidean space, fully utilized the Quaternion space, a hyper-complex space, and proposed a simple yet effective Quaternion-based Graph Convolution Network model formed by a Quaternion Embedding Layer, Quaternion Embedding Propagation Layers, and a Prediction Layer. Extensive experiments on three public benchmark datasets were conducted to evaluate the effectiveness of our proposed model. Results showed that our model outperforms the state-of-the-art methods by a large margin. This indicates that it can better learn user and item representations. Besides, further robustness analysis demonstrated that our QGCN model is more robust to noisy and incomplete graphs and can effectively capture the graph structural features. Moreover, the specific performance comparison showed that our QGCN model gains huge performance improvement on sparse graphs, which is of great significance to the practical applications and real recommendation scenarios.

This work represents an attempt to explore the Quaternion space to model users and items and the effectiveness of quaternion transformation in the Quaternion-based GCN collaborative filtering methods. We believe the insights in this study are enlightening for introducing the Quaternion space into other recommendation scenarios and digging into the nature and effectiveness of quaternion transformation.