1 Introduction

As an important means to solve information overload, the recommender system has been widely studied in industry and academia. Most of the current recommendation models are based on collaborative filtering (CF). CF points out that similar users are interested in similar items. Model-based CF methods often use inner product to model the similarity between users and items, so nonlinear relations between users and items cannot be captured. With the development of deep learning, nonlinear neural networks are introduced into model-based CF methods [1, 2], achieving a good performance in multiple fields, such as social recommendations [3], sequential recommendations [4] and click-through rate predictions [5].

The above model-based CF methods are basically divided into two parts: embedding representation and interaction modeling. The purpose of embedding is to randomly initialize users and items into low-dimensional real-valued representations. Then an interaction module is devised to reconstruct the historical interactions between users and items. We can see that the user-item interactions are only used for model training. Actually, the user-item interactions have rich high-order collaborative signals. If we can integrate the high-order collaborative signals into the embedding representations, we can get better users and items embeddings. Based on this, some studies [6, 7] constructed a bipartite graph by user-item interactions, and used graph-related methods to capture the high-order collaborative signals between users and items in embedding representation process.

Despite the effectiveness of above models, we argue that these models have the following limitations. Firstly, the above models are based on real-valued operation and representations. Compared with representations in non-real space, such as complex space and quaternion space, real-valued representations have less representation capacity. As the most common non-real spaces, complex and quaternion spaces have attracted the attention of scholars in recent years, and have been widely used in different fields [8, 9]. The complex number \(C= r + a\textbf{i}\) and the quaternion \(Q = r + a\textbf{i} + b\textbf{j} + c\textbf{k}\) contain a real component and multiple imaginary components, so they have a richer representational capacity than the real number.

Secondly, the above models tend to use nonlinear neural networks in the interaction modeling, because the simplicity of inner product makes it ineffective capture the complex relationships between users and items. However, some studies have shown that the simple inner product can better model user preferences than nonlinear neural networks [10,11,12]. Therefore, the interaction modeling based on inner product is still worth studying and exploring. In the complex space, we can make the components of complex vectors interact with each other explicitly through the Hermitian product [13] and inner product, which makes it similar to the multi-view representations but exceeds the multi-view representations [14]. Similarly, Hamilton product in the quaternion space have the same advantages. Therefore, compared with inner product in real space, Hermitian product and Hamilton product have better modeling capability while maintaining the simplicity.

Thirdly, in the general recommendation, the user-item interactions can be modeled as a bipartite graph, where node represents user or item, and edge represents the interaction between user and item [15, 16]. From the perspective of bipartite graph, we can find that users and items belong to different sets, and there is obvious antisymmetric relations between them [14]. Neither inner product nor nonlinear neural network can model the antisymmetric relations between users and items in interaction modeling in above models. The Hermitian product of complex space and Hamilton product of quaternion space are not commutative, which makes them have great potential to capture the asymmetry in the recommendation system.

Thanks to the excellent properties of complex and quaternion space and the advantage of using graph to model recommendation, in this paper, based on the framework of graph neural network, we propose Graph Collaborative Filtering for recommendation in Complex and Quaternion space (GCFC and GCFQ). Specifically, we first use complex representations or quaternion representations to initialize users and items, which endow users and items representations with a richer representational capacity. Then, based on Hermitian product (for GCFC) or Hamilton product (for GCFQ) and embedding propagation layers, we can further enrich the embeddings of users and items. Benefiting from the Hermitian or Hamilton product, we can obtain both latent inter-dependencies and intra-dependencies between components of users and items. By embedding propagation layers, we can obtain high-order connectivities between users and items. Finally, in interaction modeling layer, we aggregate the embeddings of different propagation layers and use the Hermitian or Hamilton product with inner product to capture the intricate antisymmetric relations between users and items. We apply GCFC and GCFQ on four real datasets, and the experimental results clearly demonstrate the superiority and effectiveness of our proposed model. This paper is an extension of our previous work [17]. In this paper, more technical and implementation details, more datasets, baselines, ablation experiments and discussions are included.

In summary, we make the following contributions: (1) We propose to model recommendation in complex and quaternion spaces from the perspective of graph. This work expands the research of recommendation in non-real space. (2) We propose two novel graph neural network models for recommendation in complex and quaternion spaces, GCFC and GCFQ, respectively. Based on Hermitian or Hamilton product and embedding propagation layers, we enrich embeddings of users and items, and capture the intricate antisymmetric relations between users and items as well. (3) We conduct extensive experiments on four commonly used real-world datasets. Experiment results show that GCFC and GCFQ achieve better performance than state-of-art recommendation solutions.

The rest of this paper is organized as follows: Section 2 introduces model-based collaborative filtering methods, graph-based recommendation and application of complex and quaternion neural networks; Section 3 gives the necessary mathematical background about complex and quaternion algebra; Section 4 describes two proposed graph recommendation models GCFC and GCFQ in detail; Section 5 presents extensive experiments on four real-world datasets to verify the effectiveness of GCFC and GCFQ; Section 6 provides a summary and future direction of this work.

2 Related work

2.1 Model-based collaborative filtering methods

The model-based CF methods can directly train model parameters according to downstream tasks, so it achieves better performance. Matrix factorization (MF) [18] is the most representative model-based CF method, which maps the ID of users and items to real-valued embeddings, and then uses the inner product as the interaction function. In order to obtain richer embeddings, various side information have been introduced, such as visual content, textual content, social network and knowledge graph (KG) [12]. In order to capture the nonlinear relationship between users and items, nonlinear neural network is introduced into the interaction function [1, 2]. However, some studies [10,11,12] have shown that the simple inner product can better model recommendation system than nonlinear neural networks. But the symmetry and simplicity make inner product ineffectively capture the complex asymmetric relationship between users and items. Different from above researches, we use Hermitian or Hamilton product with inner product to model the interactions between users and items in complex and quaternion spaces respectively. Hermitian product and Hamilton product have better modeling capabilities to obtain intricate antisymmetric relations between users and items while maintaining the simplicity with inner product.

2.2 Graph-based recommendation

From the perspective of graph, the interactions between users and items in the recommendation can be seen as a bipartite graph. Early work used random walks on the bipartite graph to obtain high-order connectivities of users and items to improve the performance [19, 20]. With the development of graph neural networks (GNN) [21], more and more researches begin to use GNN in the recommendation field, including social recommendations [3], sequential recommendation [22] and CTR prediction [23]. The GCMC [6] performs an embedding propagation on the user-item graph, which does not capture the high-order connectivities. NGCF [24] conducts multiple embedding propagation on user-item graph and concatenates multiple representations as the final embedding. PinSAGE [7] uses GCN to process the item-item graph to implicitly obtain high-order connectivities. For more information about the application of GNN in recommendation, please refer to the review [25]. Although the above work has achieved promising performance, they are all running in the real space, ignoring the rich representational capacity of complex space and quaternion space. Therefore, we propose to model recommender systems in complex and quaternion spaces from the perspective of graph. This paper extends the research of recommender systems in non-real space.

2.3 Application of complex and quaternion neural networks

Due to its richer representational capacity, complex-valued deep neural network [26] has been applied in many domains, including signal processing, computer vision and so on. Yang et al. [27] proposed a complex transformer for sequence modeling. Wisdom et al. [28] proposed RNN in complex space and applied it in a real-world speech task. [13, 29] used complex-valued embeddings to model the knowledge graph to capture antisymmetric relationships. For more application of complex neural network, please refer to the review [9].

Since real-world data is often multidimensional, we need a specific approach to consider the relationship between different dimensions. Quaternion, as an extension of complex number, can consider up to four-dimensional information. Therefore, quaternion neural networks have attracted the attention of researchers. Parcollet et al. [30] used quaternion convolutional neural networks for image reconstruction. Shi et al. [31] proposed quaternion block network to learn multi-modality interaction for visual question answering. Nguyen et al. [32] employed graph neural network in quaternion space for node classification. Zhang et al. [33] used quaternion space to model entities and relationships in knowledge graph. Tay et al. [34] proposed quaternion transformer for many NLP tasks. For more application of quaternion neural network, please refer to the review [8].

As far as we know, only a few papers currently consider using complex number or quaternion to represent users and items in recommendations. Zhang et al. [14] directly employed complex embedding and quaternion embedding to model recommendation on the basis of MF. Fang et al. [35] modeled users and items in quaternion space and propagate them with quaternion feature transformation. Tran et al. [36] used attention and RNN to model user’s long-term and short-term preferences for sequential recommendation in quaternion space. Inspired by the modeling of KG in quaternion space, Li et al. [37] proposed to model unified user-item KG in quaternion space for KG-aware recommendation. The above researches show that modeling recommendation in complex and quaternion spaces can achieve better performance. Different from the above studies, to the best of our knowledge, we are the first to use GNN to model user-item interaction in complex and quaternion spaces. Based on Hermitian or Hamilton product and embedding propagation layers, we can enrich embeddings of users and items and capture the intricate antisymmetric relations between users and items.

3 Background of complex and quaternion algebra

In this section, we give the necessary mathematical background about complex and quaternion algebra. For more details, please refer to [8, 9].

   Complex algebra.

A complex number C, belonging to complex space \(\mathbb {C}\), contains a real part and an imaginary part: \(C = r + a\textbf{i}\), where r and a are real numbers and the imaginary unit \(\textbf{i}\) satisfies \(\textbf{i}^2 = -1\). We can expand the real and imaginary parts into real-valued vectors to obtain a complex vector. Similarly, we can obtain a complex matrix. The Hermitian product [13] of two complex number is defined as:

$$\begin{aligned} \langle C_1, C_2 \rangle = \overline{C_1}C_2 = (r_1 - a_1\textbf{i})(r_2 + a_2\textbf{i}) = (r_1r_2 + a_1a_2) + (r_1a_2 - r_2a_1)\textbf{i} \end{aligned}$$
(1)

where \(\overline{C_1} = r_1 - a_1\textbf{i}\) represents the complex conjugate of \(C_1\). From above formula, we can find the Hermitian product of two complex number is asymmetrical, that is \(\langle C_1, C_2 \rangle ~ \ne \langle C_2, C_1 \rangle \). Many operations in real space can be applied to complex number. Suppose f is an operator in the real number space, we can use f in two complex numbers as follows: \(f(C_1, C_2) = f(r_1,r_2) + f(a_1,a_2)\textbf{i}\).

   Quaternion algebra.

A quaternion Q is an extension of complex number, belonging to quaternion space \( \mathbb {Q}\). Q contains one real part and three imaginary parts: \(Q = r + a\textbf{i} +b\textbf{j} + c\textbf{k}\), where rab, and c are all real numbers and the imaginary units \(\textbf{i}, \textbf{j}\) and \(\textbf{k}\) satisfy: \(\textbf{i}^2 = \textbf{j}^2 = \textbf{k}^2 = \textbf{ijk} = -1; \textbf{ij}=\textbf{k},\textbf{jk}=\textbf{i},\textbf{ki}=\textbf{j},\textbf{ji}=-\textbf{k},\textbf{kj}= -\textbf{i},\textbf{ik}=-\textbf{j}\). We can expand the real and imaginary parts into real-valued vectors to obtain a quaternion vector. Similarly, we can obtain a quaternion matrix. The Hamilton product of two quaternion is also quaternion:

$$\begin{aligned} Q_1 \otimes Q_2= & {} (r_1+ a_1\textbf{i} + b_1\textbf{j} + c_1\textbf{k}) \otimes (r_2+ a_2\textbf{i} + b_2\textbf{j} + c_2\textbf{k}) \nonumber \\= & {} (r_1r_2 - a_1a_2 - b_1b_2 - c_1c_2) + (r_1a_2 + a_1r_2 + b_1c_2 - c_1b_2)\textbf{i} \nonumber \\+ & {} (r_1b_2 - a_1c_2 + b_1r_2 + c_1a_2)\textbf{j} + (r_1c_2 + a_1b_2 - b_1a_2 + c_1r_2)\textbf{k} \end{aligned}$$
(2)

From above formula, we can find that the Hamilton product is not commutative, that is \(Q_1 \otimes Q_2 \ne Q_2 \otimes Q_1\). Many operations in real space can be applied to quaternion. Suppose f is an operator in the real number space, we can use f in two quaternions as follows: \(f(Q_1, Q_2) = f(r_1,r_2) + f(a_1,a_2)\textbf{i} + f(b_1,b_2)\textbf{j} + f(c_1, c_2)\textbf{k}\).

4 Methodology

In this section, we will introduce the proposed GCFC and GCFQ models in detail. Before that, we first give the definition of the problem. Then we give an explanation of the model overall framework. Next, we will detail each part of models.

4.1 Problem formulation

Given user set \(\mathcal {U} = \{u_1,u_2,...,u_m\}\), item set \(\mathcal {V} = \{v_1,v_2,v_3,...,v_n\}\), and user-item interaction matrix \(R \in \mathbb {R}^{m\times n} \), we can construct the user-item bipartite graph \(G=(\{\mathcal {U,V}\},A)\). In user-item interaction matrix R, if there is implicit feedback between user u and item v, such as purchasing, clicking, watching, then \(R_{uv}=1\), otherwise \(R_{uv}=0\). \(A \in \mathbb {R}^{(m + n)\times (m + n)}\) is the adjacency matrix of the user-item graph, which is constructed from user-item interaction matrix R:

$$\begin{aligned} A= \left[ \begin{array}{cc} 0^{(m\times m)} &{} R \\ R^{T} &{} 0^{(n \times n)} \end{array}\right] \end{aligned}$$
(3)

Our task is to learn the low-dimensional vector representations of users and items on the bipartite graph G, design the prediction functions to calculate the probabilities of each user engaging an item in complex and quaternion spaces, and make Top-K recommendations for a target user based on the probability scores.

Fig. 1
figure 1

Overall architecture of the proposed model GCFQ

4.2 Framework overview

Since GCFC and GCFQ have similar framework, we only give the overall framework of GCFQ as shown in Figure 1. The GCFQ is mainly composed of three parts: (1) In the embedding layer, we randomly initialize quaternion vectors as the embeddings of users and items. (2) In the embedding propagation layers, by continuously gathering information from neighbors, the high-order connectivity between users and items are integrated into the embeddings of users and items. Thanks to the Hamilton product, we can get both latent inter-dependencies and intra-dependencies between components of users and items. (3) In the interaction modeling layer, we aggregate the embeddings of different propagation layers as the final embeddings of users and items, and then use Hamilton product with inner product to calculate the affinity score between users and items.

4.3 GCFC

4.3.1 Embedding layer

In the embedding layer, we use complex vectors as embeddings of users and items. Note that we use real vectors to represent the real and imaginary part of complex vector, and then use real operations to simulate complex operations [27]. So, we have two initial complex embedding matrices \(H_C^{U,(0)} \in \mathbb {C}^{m\times d}\) and \(H_C^{V,(0)} \in \mathbb {C}^{n\times d}\), where m and n are the number of users and items respectively, d is the embedding size. Specifically, the initial complex embedding of user u is \(h_C^{u,(0)} = h_{C,r}^{u,(0)} + h_{C,i}^{u,(0)} \textbf{i}\), where \(h_{C,r}^{u,(0)}, h_{C,i}^{u,(0)} \in \mathbb {R}^{d / 2}, \textbf{i}\) is an imaginary unit. Similarly, the initial complex embedding of item v is \(h_C^{v,(0)} = h_{C,r}^{v,(0)} + h_{C,i}^{v,(0)} \textbf{i}\), where \(h_{C,r}^{v,(0)}, h_{C,i}^{v,(0)} \in \mathbb {R}^{d / 2}\). Thanks to complex embeddings, the user and item representations have two parts with a richer representational capacity.

4.3.2 Embedding propagation layer

The user-item interaction graph contains the high-order collaborative signals [24, 38, 39]. The collaborative signals reflect the similarity between users and items. In order to integrate the high-order collaborative signals into the embeddings of users and items, with the help of message-passing architecture in GNN [21], we introduce embedding propagation in the user-item graph. Specifically, the embedding propagation layer includes two steps: feature propagation and nonlinear transformation.

   Feature propagation.

When user u performs feature propagation in \(l_{th}\) layer, u aggregates the message from graph neighbors \(N_u\) and itself embeddings in layer \(l-1\). We take feature aggregation function in matrix form used in GCN [21]:

$$\begin{aligned} L= & {} \widetilde{D}^{-\frac{1}{2}} \tilde{A} \widetilde{D}^{-\frac{1}{2}} \end{aligned}$$
(4)
$$\begin{aligned} H_{C}^{(l)}= & {} \langle L, H_{C}^{(l-1)}\rangle = L H_{C, r}^{(l-1)}+L H_{C, i}^{(l-1)} \textbf{i} \end{aligned}$$
(5)

where A is the adjacency matrix of the user-item graph G, \(\tilde{A}=A+I\) and I is the identity matrix; \(\widetilde{D}\) is the degree matrix of \(\tilde{A}\); \(L \in \mathbb {R}^{(m+n)\times (m+n)} \)represents the normalized adjacency matrix; \(\langle ,\rangle \) denotes the Hermitian product; \(H_{C}^{(l-1)} = H_{C,r}^{(l-1)} + H_{C,i}^{(l-1)} \textbf{i}\) is the embeddings of users and items after \(l-1\) step of embedding propagation and \(H_{C}^{(l-1)} \in \mathbb {C}^{(m+n)\times d_{l-1}}, H_{C}^{(0)}=\left[ H_{C}^{U,(0)}; H_{C}^{V,(0)}\right] \).

   Nonlinear transformation.

After obtaining information from neighbors, we perform feature transformation in formula (6) and nonlinear activation in formula (7):

$$\begin{aligned} H_{C}^{(l)}&=\langle H_{C}^{(l)}, W_{C}^{(l)} \rangle \nonumber \\&=\left( H_{C, r}^{(l)}-H_{C, i}^{(l)} \textbf{i}\right) \left( W_{C, r}^{(l)}+W_{C, i}^{(l)} \textbf{i}\right) \nonumber \\&=\left( H_{C, r}^{(l)} W_{C, r}^{(l)}+H_{C, i}^{(l)} W_{C, i}^{(l)}\right) +\left( H_{C, r}^{(l)} W_{C, i}^{(l)} - H_{C, i}^{(l)} W_{C, r}^{(l)} \right) \textbf{i} \end{aligned}$$
(6)
$$\begin{aligned} H_{C}^{(l)}&= \sigma \left( H_{C}^{(l)}\right) =\sigma \left( H_{C, r}^{(l)}\right) +\sigma \left( H_{C, i}^{(l)}\right) \textbf{i} \end{aligned}$$
(7)

where \(W_{C}^{(l)}=W_{C, r}^{(l)}+W_{C, i}^{(l)} \textbf{i}\) is the trainable complex weight matrix and \(W_{C}^{(l)} \in \mathbb {C}^{(d_{l-1}/2) \times d_{l}}\); \(\sigma (x)\) is a nonlinear activation function. From above formula, we can find that \(W_{C, r}^{(l)}\) and \(W_{C, i}^{(l)}\) are shared by the real and an imaginary parts of \(H_{C}^{(l)}\) in Hermitian product. Weight sharing enables the model to obtain both latent inter-dependencies and intra-dependencies between the components of user and item, leading to a higher expressive model.

4.3.3 Interaction modeling layer

After L layers, we will obtain \(L+1\) representations at different layers for user u and item v: \(\left\{ h_{C}^{u,(0)}, h_{C}^{u,(1)}, \ldots , h_{C}^{u,(L)}\right\} ,\left\{ h_{C}^{v,(0)}, h_{C}^{v,(1)}, \ldots , h_{C}^{v,(L)}\right\} \). We take the mean of \(L+1\) representations to get the final representation \(h_{C}^{u}\) and \(h_{C}^{v}\) for user and item. For example, the calculation formula for \(h_{C}^{u}\) is as follows:

$$\begin{aligned} h_{C}^{u}&= {\text {mean}}\left( h_{C}^{u,(0)}, h_{C}^{u,(1)}, \ldots h_{C}^{u,(L)}\right) \nonumber \\&={\text {mean}}\left( h_{C, r}^{u,(0)}, \ldots , h_{C, r}^{u,(L)}\right) +{\text {mean}}\left( h_{C, i}^{u,(0)}, \ldots , h_{C, i}^{u,(L)}\right) \textbf{i} \end{aligned}$$
(8)

Then we use the Hermitian product to model the interaction between users and items:

$$\begin{aligned} r_{C}&= \left\langle h_{C}^{u}, h_{C}^{v}\right\rangle \nonumber \\&=\left( h_{C, r}^{u}-h_{C, i}^{u} \textbf{i}\right) \left( h_{C, r}^{v}+h_{C, i}^{v} \textbf{i}\right) \nonumber \\&=\left( h_{C, r}^{u} \cdot h_{C, r}^{v}+h_{C, i}^{u} \cdot h_{C, i}^{v}\right) +\left( h_{C, r}^{u} \cdot h_{C, i}^{v}-h_{C, i}^{u} \cdot h_{C, r}^{v}\right) \textbf{i} \end{aligned}$$
(9)

where “\(\cdot \)” denotes dot product. Through Hermitian product, we can further obtain inter-dependencies between components of user and item. In addition, Hermitian product is asymmetrical, which well captures the asymmetric relationship between user and item. Finally we take mean of all components of \(r_C\) as the prediction score:

$$\begin{aligned} \hat{y}_{uv}=\left( r_{C, r}+r_{C, i}\right) / 2 \end{aligned}$$
(10)

4.4 GCFQ

4.4.1 Embedding layer

In the embedding layer, we use quaternion vectors as the embeddings of users and items. Thus, we have two initial quaternion embedding matrices \(H_Q^{U,(0)} \in \mathbb {Q}^{m\times d}\) and \(H_Q^{V,(0)} \in \mathbb {Q}^{n\times d}\). Specifically, the initial quaternion embedding of user u is \(h_Q^{u,(0)} = h_{Q,r}^{u,(0)} + h_{Q,i}^{u,(0)} \textbf{i} + h_{Q,j}^{u,(0)} \textbf{j} + h_{Q,k}^{u,(0)} \textbf{k}\), where \(h_{Q,r}^{u,(0)}, h_{Q,i}^{u,(0)}, h_{Q,j}^{u,(0)}, h_{Q,k}^{u,(0)}\in \mathbb {R}^{d / 4}, \textbf{i}, \textbf{j}, \textbf{k}\) are imaginary units. Similarly, the initial quaternion embedding of item v is \(h_Q^{v,(0)} = h_{Q,r}^{v,(0)} + h_{Q,i}^{v,(0)} \textbf{i} + h_{Q,j}^{v,(0)} \textbf{j} + h_{Q,k}^{v,(0)} \textbf{k}\), where \(h_{Q,r}^{v,(0)}, h_{Q,i}^{v,(0)}, h_{Q,j}^{v,(0)}, h_{Q,k}^{v,(0)}\in \mathbb {R}^{d / 4}\). Using quaternion embeddings, the user and item representations have four parts with a richer representational capacity.

4.4.2 Embedding propagation layer

References to GCFC model above, we provide the matrix form of the embedding propagation layer in GCFQ. The embedding propagation consists of two steps: feature propagation and nonlinear transformation.

   Feature propagation.

We take feature aggregation function in matrix form used in GCN [21]:

$$\begin{aligned} H_{Q}^{(l)} = L \otimes H_{Q}^{(l-1)} = L H_{Q, r}^{(l-1)}+L H_{Q, i}^{(l-1)} \textbf{i} + L H_{Q, j}^{(l-1)} \textbf{j} + L H_{Q, k}^{(l-1)} \textbf{k} \end{aligned}$$
(11)

where L represents the normalized adjacency matrix; \(\otimes \) denotes the Hamilton product; \(H_{Q}^{(l-1)}=H_{Q, r}^{(l-1)} + H_{Q, i}^{(l-1)} \textbf{i} + H_{Q, j}^{(l-1)} \textbf{j} + H_{Q, k}^{(l-1)} \textbf{k}\) is the embedding of users and items after \(l-1\) step of embedding propagation and \(H_{Q}^{(l-1)} \in \mathbb {Q}^{(m+n) \times d_{l-1}}, H_{Q}^{(0)}=\left[ H_{Q}^{U,(0)}; H_{Q}^{V,(0)}\right] \).

   Nonlinear transformation.

After obtaining the information from neighbors, we perform feature transformation in formula (12) and nonlinear activation in formula (13):

$$\begin{aligned} H_{Q}^{(l)}= & {} H_{Q}^{(l)} \otimes W_{Q}^{(l)} \nonumber \\= & {} \left( H_{Q, r}^{(l)} W_{Q, r}^{(l)}-H_{Q, i}^{(l)} W_{Q, i}^{(l)}-H_{Q, j}^{(l)} W_{Q, j}^{(l)}-H_{Q, k}^{(l)} W_{Q, k}^{(l)}\right) \nonumber \\+ & {} \left( H_{Q, r}^{(l)} W_{Q, i}^{(l)}+H_{Q, i}^{(l)} W_{Q, r}^{(l)}+H_{Q, j}^{(l)} W_{Q, k}^{(l)}-H_{Q, k}^{(l)} W_{Q, j}^{(l)}\right) \textbf{i} \nonumber \\+ & {} \left( H_{Q, r}^{(l)} W_{Q, j}^{(l)}-H_{Q, i}^{(l)} W_{Q, k}^{(l)}+H_{Q, j}^{(l)} W_{Q, r}^{(l)}+H_{Q, k}^{(l)} W_{Q, i}^{(l)}\right) \textbf{j} \nonumber \\+ & {} \left( H_{Q, r}^{(l)} W_{Q, k}^{(l)}+H_{Q, i}^{(l)} W_{Q, j}^{(l)}-H_{Q, j}^{(l)} W_{Q, i}^{(l)}+H_{Q, k}^{(l)} W_{Q, r}^{(l)}\right) \textbf{k} \end{aligned}$$
(12)
$$\begin{aligned} H_{Q}^{(l)}= & {} \sigma \left( H_{Q}^{(l)}\right) =\sigma \left( H_{Q, r}^{(l)}\right) +\sigma \left( H_{Q, i}^{(l)}\right) \textbf{i} +\sigma \left( H_{Q, j}^{(l)}\right) \textbf{j}+\sigma \left( H_{Q, k}^{(l)}\right) \textbf{k} \end{aligned}$$
(13)

where \( W_{Q}^{(l)}=W_{Q, r}^{(l)}+W_{Q, i}^{(l)} \textbf{i}+W_{Q, j}^{(l)} \textbf{j}+W_{Q, k}^{(l)} \textbf{k}\) is the trainable quaternion weight matrix and \(W_{Q}^{(l)} \in \mathbb {Q}^{(d_{l-1}/4) \times d_{l}}\); \(\sigma (x)\) is a nonlinear activation function. From above formulas, we can find that \(W_{Q, r}^{(l)}, W_{Q, i}^{(l)}, W_{Q, j}^{(l)}, W_{Q, k}^{(l)}\) are shared by the real and three imaginary parts of \(H_{Q}^{(l)}\) in Hamilton product. Weight sharing enables the model to obtain both latent inter-dependencies and intra-dependencies between components of user and item, leading to a higher expressive model.

4.4.3 Interaction modeling layer

After L layers, we will obtain \(L+1\) representations at different layers for user u and item v: \(\left\{ h_{Q}^{u,(0)}, h_{Q}^{u,(1)}, \ldots , h_{Q}^{u,(L)}\right\} ,\left\{ h_{Q}^{v,(0)}, h_{Q}^{v,(1)}, \ldots , h_{Q}^{v,(L)}\right\} \). We take the mean of \(L+1\) representations to get the final representation \(h_{Q}^{u}\) and \(h_{Q}^{v}\) for user and item. Then we use the Hamilton product to model the interaction between users and items:

$$\begin{aligned} r_{Q}&= h_{Q}^{u} \otimes h_{Q}^{v} \nonumber \\&=\left( h_{Q, r}^{u} \cdot h_{Q, r}^{v}-h_{Q, i}^{u} \cdot h_{Q, i}^{v}-h_{Q, j}^{u} \cdot h_{Q, j}^{v}-h_{Q, k}^{u} \cdot h_{Q, k}^{v}\right) \nonumber \\&+\left( h_{Q, r}^{u} . h_{Q, i}^{v}+h_{Q, i}^{u} \cdot h_{Q, r}^{v}+h_{Q, j}^{u} \cdot h_{Q, k}^{v}-h_{Q, k}^{u} \cdot h_{Q, j}^{v}\right) \textbf{i} \nonumber \\&+\left( h_{Q, r}^{u} . h_{Q, j}^{v}-h_{Q, i}^{u} \cdot h_{Q, k}^{v}+h_{Q, j}^{u} \cdot h_{Q, r}^{v}+h_{Q, k}^{u} \cdot h_{Q, i}^{v}\right) \textbf{j} \nonumber \\&+\left( h_{Q, r}^{u} . h_{Q, k}^{v}+h_{Q, i}^{u} \cdot h_{Q, j}^{v}-h_{Q, j}^{u} \cdot h_{Q, i}^{v}+h_{Q, k}^{u} \cdot h_{Q, r}^{v}\right) \textbf{k} \end{aligned}$$
(14)

where “\(\cdot \)” denotes dot product. Through Hamilton product, we can further obtain the inter-dependencies between the components of user and item. In addition, Hamilton product is asymmetrical, which well captures the asymmetric relationship between user and item. Finally we take mean of all components of \(r_Q\) as the prediction score:

$$\begin{aligned} \hat{y}_{u i}=\left( r_{Q, r}+r_{Q, i}+r_{Q, j}+r_{Q, k}\right) / 4 \end{aligned}$$
(15)

4.5 Model training

This work mainly focuses on the top-K recommendation task, and we optimize the model using Bayesian Personalized Ranking (BPR) loss function [18]. Specifically, BPR loss function is formulated as:

$$\begin{aligned} Loss = \sum _{(u, i, j) \in T}-\ln \sigma \left( \hat{y}_{u i}-\hat{y}_{u j}\right) +\lambda \Vert \Theta \Vert _{2}^{2} \end{aligned}$$
(16)

where \(T = \{(u,i,j)\mid (u,i)\in \mathcal {R_+}, (u,j) \in \mathcal {R_-}\}\) is training dataset with observed interactions set \(\mathcal {R_+}\) and the unobserved interactions set \(\mathcal {R_-}\); \(\sigma (x)\) is sigmoid function; \(\hat{y}_{u i}\) and \(\hat{y}_{u j}\) are the learned prediction score; \(\lambda \) is the \(L_2\) regularization coefficient; \(\Theta \) denotes trainable parameters, including \(H_{C / Q}^{U,(0)}, H_{C / Q}^{V,(0)}, W_{C / Q}^{(l)}\).

5 Experiments

In this section, we conduct extensive experiments over four real-world datasets to verify the effectiveness of proposed models GCFC and GCFQ. We first give detailed experimental settings. Then we give model performance, hyper-parameter studies and model analysis.

5.1 Experimental settings

5.1.1 Datasets

We conduct experiments on four widely used benchmark datasets [12], including Amazon-Cloth, Amazon-Music, Amazon-Electronic and Book-Crossing. Information about these datasets is shown in Table 1. Amazon-Cloth, Amazon-Music and Amazon-Electronic are datasets that users rate items on Amazon, respectively corresponding to three categories: clothing, digital music and electronics. For each user, we take score greater than 3 as positive feedback. For the Amazon datasets, we respectively use 5-core setting, 10-core setting and 5-core setting [24] to ensure the quality of the data. Book-Crossing is the dataset of user ratings about books. For each user, we take score greater than 0 as positive feedback and use 10-core setting to ensure the quality of data.

Table 1 Datasets statistics

5.1.2 Baselines

We choose some representative baselines to perform comparison experiments. These models not only include MF, MLP, NeuMF, DMF, GCMC, PinSage, NGCF in real space, but also CCF and QCF in non-real space. Since this work focuses on general recommendation, we do not use sequential recommendation model [36] and KG aware recommendation model [37] in non-real space as baselines. The brief introduction to the above-mentioned baselines is as follows:

  • MF [18]: This model randomly initializes embeddings of users and items, and then uses the inner product to predict the affinity score.

  • MLP and NeuMF [1]: MLP uses nonlinear neural network to model interaction. Based on MF and MLP, NeuMF uses deep neural networks and inner product jointly to predict the affinity score between user and item.

  • DMF [2]: This model uses ratings as feature of user and item, maps the feature by nonlinear neural network, and predicts the affinity score between user and item by cosine similarity.

  • GCMC [6]: This model uses graph auto-encoder to generate user and item embeddings for recommendation.

  • PinSage [7]: This model uses GraphSAGE to generate item embeddings on the item-item graph. In this paper, we directly use GraphSAGE on user-item interaction graph for recommendation task.

  • NGCF [24]: This model proposes to perform multiple embedding propagation on the user-item interaction graph to capture the collaborative signals between users and items.

  • CCF and QCF [14]: These two models use complex representation and quaternion representation as the embeddings of users and items on the basis of MF.

5.1.3 Parameter settings and evaluation setup

For all baselines, batch size is set to 1024, embedding dimensionality is set to 64, adam optimizer [40] is used to optimize all models, Xavier initializer is used to initialize the model parameters, and the loss function is BPR loss. For other hyperparameters, we refer to the original paper of baselines. Note that, the dimensionality of complex embeddings of users and items is 64 for GCFC, which means that each component of complex embedding is a vector with size \(64/2=32\) as we mentioned in Section 4.3. Similarly, each component of quaternion embedding is a vector with size \(64/4 = 16\) for GCFQ. In order to obtain high-order connectivity between users and items, the number of embedding propagation layer of GCFC and GCFQ is searched in \(\{1,2,3,4\}\). Finally, for all baselines, Bayesian HyperOpt [12] is used to perform hyper-parameter optimization on learning rate and coefficients of \(L_2\) regularization term w.r.t. NDCG@20 on each dataset for 30 trails.

In the experiment, we randomly select \(80\%\) of the historical interactions between users and items as the training set, \(10\%\) as the validation set, and \(10\%\) as the test set. All models are trained on the training set, and the optimal model parameters are obtained on the validation set. The evaluation metrics use the two most common ranking metrics Recall@K and NDCG@K by the all-ranking protocol [38]. The above process is executed 10 times and the final average result is presented.

Table 2 Performance of all methods on Amazon-Cloth and Amazon-Music datasets

5.2 Overall performance comparison

The comparison results of all models are shown in Tables 2 and 3 with the best result hightlighted in bold. We can draw the following conclusions:

  • GCFQ performs best in all datasets, which shows the effectiveness of using graph neural network to model recommendation in quaternion space. Compared with GCFC, GCFQ uses quaternion embeddings, which have four components and enhance the representational capacity of model. Thus, GCFQ performs better than GCFC. On Amazon-Cloth and Amazon-Electronic datasets, GCFC gets similar performance to GCMC, PinSage and NGCF. On Amazon-Music and Book-Crossing datasets, GCFC achieves the best performance except for GCFQ, which indicates that the GCFC is more suitable for dataset with higher density and shows the potential of modeling recommendation in complex space.

  • Compared with the models MF, MLP, NeuMF and DMF in real space, the models (CCF, QCF, GCFC, GCFQ) in the non-real space can achieve better performance. This is because the non-real embeddings have the rich representational capacity and Hamilton product and Hermitian product can capture the intricate antisymmetric relation between users and items. In addition, in most cases, MF, CCF and QCF performe better than MLP and NeuMF models using nonlinear neural networks, demonstrating the effectiveness of using inner product to model interaction between users and items, which is consistent with previous studies [10,11,12].

  • Compared with MF, CCF and QCF, the models that use bipartite graphs to model the interactions between users and items (NGCF, GCFC, GCFQ, etc.) can achieve better performance. The user-item interaction graph contains the high-order collaborative signals, which reflect the similarity between users and items. By performing message-passing on the bipartite graph, the high-order collaborative signals can be integrated into the embedings of users and items.

  • We attribute the excellent performance of GCFQ (GCFC) to the following reasons: (1) thanks to the quaternion (complex) representations, the embeddings of user and item have a richer representational capacity, (2) through Hamilton (Hermitian) product and embedding propagation layers, we can obtain both latent inter- and intra- dependencies between components of users and items, (3) Hamilton (Hermitian) product with inner product can capture the intricate antisymmetric relation between users and items.

Table 3 Performance of all methods on Amazon-Electronic and Book-Crossing datasets
Fig. 2
figure 2

Effect of number of embedding propagation layer on GCFC

Fig. 3
figure 3

Effect of number of embedding propagation layer on GCFQ

Fig. 4
figure 4

Effect of embedding dimensionality on GCFC and GCFQ with Recall@20

5.3 Hyper-parameter studies

5.3.1 Number of embedding propagation layer

In this section, we explore the influence of embedding propagation layer numbers on GCFC and GCFQ. We search layer numbers in \(\{1, 2, 3, 4\}\) for GCFC and GCFQ on four datasests. The experimental results are shown in Figures 2 and 3, and we can draw the following conclusions: (1) For the datasets Amazon-Cloth and Amazon-Music, the best number of layer is 1 for GCFC and GCFQ. This is because the Amazon-Cloth and Amazon-Music datasets are small, and one embedding propagation layer enables the node to obtain enough information from one-hop neighbors. (2) For the dataset Amazon-Electronic, the best number of layers is 3 for GCFC and 1 for GCFQ. Amazon-Electronic has more interaction information than Amazon-Cloth and Amazon-Music. Therefore, GCFC needs 3 layers to enhance the representation ability of model. Compared with GCFC, GCFQ has stronger modeling ability, so it only needs 1 layer for Amazon-Electronic. (3) The Book-Crossing is the largest in the four datasets and has more abundant interaction information. Therefore, GCFC and GCFQ need 3 and 4 embedding propagation layers respectively.

5.3.2 Effect of embedding dimensionality

In order to study the impact of embedding dimensionality on models, we search the embedding dimensionality in \(\{16, 32, 64, 128, 256\}\) for GCFC and GCFQ on four datasests. The experimental results are shown in Figures 4 and 5. From the figures, we can find that for Amazon-Cloth, Amazon-Music and Amazon-Electronic datasets, the best embedding dimensionality is 64. Higher dimensionality will lead to overfitting. For Book-Crossing, performance is improved as the dimensionality increase. This is because Book-Crossing has more interaction information than other three datasets, and requires a higher embedding dimensionality to represent users and items.

Fig. 5
figure 5

Effect of embedding dimensionality on GCFC and GCFQ with NDCG@20

5.4 Model analysis

5.4.1 Space and time complexity analysis

For GCFC, it can be found from formulas (6) and (7) that we introduces embedding matrix \(H_{C}^{(l)} \in \mathbb {C}^{(m+n)\times d_{l}}\) and trainable complex weight matrix \(W_{C}^{(l)} \in \mathbb {C}^{(d_{l-1}/2) \times d_{l}}\) in layer l. We can find the space complexity of GCFC at each layer is \(O(m+n+d_{l-1}/2) \times d_{l}\), where m and n are the number of users and items respectively; \(d_{l-1}\) and \(d_{l}\) are the embedding size at layer \(l-1\) and l; \(d_{l-1}\) and \(d_{l}\) are smaller than m and n. Similarly, the space complexity of GCFQ at each layer is \(O(m+n+d_{l-1}/4) \times d_{l}\). The proposed GCFC and GCFQ can be regarded as the expansion of NGCF in complex and quaternion spaces, so the space complexity of NGCF at each layer is about \(O(m+n+d_{l-1}) \times d_{l}\), which shows GCFC and GCFQ have similar space complexity with NGCF.

In terms of GCFC and GCFQ, the difference in time complexity mainly lies in nonlinear transformation. For GCFC, it can be found from formulas (6) and (7) that the matrix operation has computational complexity \(O((m+n) \times (d_{l-1} + 2) \times d_{l}\)) in nonlinear transformation for layer l. Similarly, the time complexity of GCFQ is \(O((m+n) \times (d_{l-1} + 4) \times d_{l}\)). The time complexity of NGCF is about \(O((m+n) \times (d_{l-1} + 1) \times d_{l}\)). For NGCF, GCFC and GCFQ, the time complexity difference is caused by inter-dependencies and intra-dependencies between components of users and items. From Section 5.2 we can see that in some cases GCFC achieves the best performance except for GCFQ. Therefore, if there is a requirement for the time of model, GCFC is a good choice compared with GCFQ.

5.4.2 Representational capacity of non-real embedding

The proposed GCFC and GCFQ can be regarded as the expansion of NGCF in complex and quaternion spaces. NGCF employs real representations as the embeddings of users and items. GCFC and GCFQ use complex representations and quaternion representations respectively as embeddings. In order to verify the representational capacity of non-real embedding, we compare the performance of GCFC and GCFQ with NGCF in different embedding dimension. From the Figures 4 and 5, we can find that: (1) For the dataset Amazon-Cloth, GCFQ and GCFC are better than NGCF before the best embedding dimension 64. As the dimension increases, the performance of three models decreases, and GCFQ and GCFC decrease faster. This is because the Amazon-Cloth dataset is smaller, and the overfitting of GCFQ and GCFC is more serious with the improvement of dimension. This also shows that the modeling ability of GCFQ and GCFC is stronger than NGCF. (2) For the datasets Amazon-Music, Amazon-Electronic and Book-Crossing, as embedding dimension increases, the performance of GCFQ and GCFC are always better than NGCF. This is because complex and quaternion embeddings have one real component and multiple imaginary components. Thanks to Hermitian and Hamilton products, we can obtain both latent inter- and intra- dependencies between components of embeddings. Therefore, compared with real embeddings, complex and quaternion embeddings have a better representational capacity.

5.4.3 Effect of Hermitian and Hamilton products in interaction modeling layer

Compared with the inner product, Hermitian and Hamilton products enable the components of embeddings of user and item to interact with each other, which endows them with better modeling capability while maintaining the simplicity. In addition, the Hermitian and Hamilton products are inherently asymmetrical, which enables them to capture the asymmetry in the recommendation. In order to verify the effectiveness of Hermitian and Hamilton products, we propose two variants of model: Model-in and Model-sys. Model-in replaces Hermitian or Hamilton product in interaction modeling layer with inner product. Model-sys redefines the prediction function in models. Specifically, when calculating the products between user and item in formulas (9) and (14), we redefine the products in GCFC-sys and GCFQ-sys as follows:

$$\begin{aligned} r_{C}&= (\left\langle h_{C}^{u}, h_{C}^{v}\right\rangle + \left\langle h_{C}^{v}, h_{C}^{u}\right\rangle ) / 2 \end{aligned}$$
(17)
$$\begin{aligned} r_{Q}&= (h_{Q}^{u} \otimes h_{Q}^{v} + h_{Q}^{v} \otimes h_{Q}^{u}) / 2 \end{aligned}$$
(18)

Through above formula, we can eliminate the influence of asymmetry property of Hermitian product and Hamilton product on the model.

We conduct experiments on four datasets with metrics Recall@20 and NDCG@20, and the experimental results are shown in Table 4, from which we can draw the following conclusions: (1) Eliminating the Hermitian and Hamilton products in interaction modeling layer reduces the performance of model. This is because both latent inter-dependencies and intra-dependencies between the components of embeddings of user and item can be obtained through Hermitian and Hamilton products. The inner product only models the interaction of the corresponding components between user and item. Therefore, compared with Hermitian or Hamilton product, the inner product has a weaker modeling ability. (2) Eliminating the asymmetry of Hermitian and Hamilton products reduces the performance of model. In the recommendation system, users and items belong to different sets, and there is obvious asymmetry between them. If this asymmetry is not considered, the performance of the model will decrease. From Table 1, we can see that the asymmetry of the dataset Book-Crossing is very strong, and there are more than twice as many items as users. So, compared with the other two datasets, the performance of GCFC-sys and GCFQ-sys drop more in Book-Crossing.

Table 4 Effect of Hermitian and Hamilton products on GCFC and GCFQ

6 Conclusion and future work

In this paper, we apply graph neural network to recommendation in complex and quaternion spaces, and propose two specific models GCFC and GCFQ respectively. These two models respectively employ complex and quaternion embeddings to represent users and items. Then based on Hermitian product (for GCFC) or Hamilton product (for GCFQ) and embedding propagation layers, we can obtain the intricate relations between users and items. Finally, in interaction modeling layer, we aggregate the embedding representations of different propagation layers and use Hermitian or Hamilton product with inner product to obtain the final prediction score between users and items. Extensive experimental results on four widely used datasets show the effectiveness of GCFC and GCFQ.

In the future work, we can further explore from many aspects. For example, for other recommendation scenarios, such as social recommendation, POI recommendation and interactive recommendation, we can model user and item on complex and quaternion spaces to further improve performance. In addition, many studies model recommendation scenarios as heterogeneous information networks. How to apply complex and quaternion space to heterogeneous information networks and reduce the time of model is a worthy research direction. Furthermore, in addition to complex and quaternion spaces, other non-real spaces, such as octonion space, are also worthy of exploration on how to apply them in recommendation scenarios. Finally, when we take text and image about user and item into account, how to model multimodal information and user item interaction in complex and quaternion spaces is a challenging direction.