Introduction

The Web consists of millions of pages that contain information on almost every topic. This variety makes users strangled while searching for relevant information. To this end, numerous recommender systems that exploit the users’ preferences have been proposed to support them with relevant information such as items, movies, locations, news, and other products (Amir et al., 2022; Christoforidis et al., 2018; Ma et al., 2020; Chen et al., 2021; Christoforidis et al., 2021a; Kefalas & Symeonidis, 2015). In that direction, a lot of work related to paper recommendations have been conducted (Cai et al., 2019; Bansal et al., 2016; Son & Kim, 2017; Cai et al., 2018a) in the last five years. Based on the information filtering methods, researchers categorize these models into content-based (CB) (Bhagavatula et al., 2018; Zhao et al., 2016), collaborative filtering (CF) (Bansal et al., 2016; Wang & Li, 2015), and graph-based (GB) (Cai et al., 2019; Yang et al., 2018). The CB models employ the descriptions and features of papers and user profiles to produce recommendations. They generate useful recommendations when items’ descriptions as well as user’s past history and profile information are available, otherwise they are tackled by the cold-start problem (Christoforidis et al., 2018). The CF-based CR models exploit the past users’ ratings along with the social network. The recommendations are robust if the users’ ratings is available, otherwise they are tackled by sparsity which leads to imprecise predictions (Son & Kim, 2017). We can overcome this issue using GB models (Cai et al., 2019; Yang et al., 2018) that use additional relationships among nodes in the network. However, traditional GB models conceive recommendation like a link prediction task. Thus, these methods over-weight old and outdated nodes in the network (Son & Kim, 2017; Kefalas et al., 2018). To overcome the issues of traditional graph-based and CB methods, heterogeneous information networks for academic paper recommendation (HNPR) (Du et al., 2020) model exploited the author’s collaboration, citation relations, and papers’ research area to construct two types of heterogeneous information networks. The model employs a random walk-based strategy to conduct edge traversal in the constructed heterogeneous information networks and adopt natural language models to match word sequences for paper recommendation.

In recent years, different studies (Gupta & Varma, 2017; Cai et al., 2018a; Guo et al., 2019; Jiang et al., 2018) have employed homogeneous network representation learning (NRL) methods such as LINE (Tang et al., 2015; Christoforidis et al., 2021b) and Deepwalk (Perozzi et al., 2014) to overcome the issues of the traditional models. Nevertheless, these models cannot deal with the heterogeneity and multiplicity of citation networks. To overcome the heterogeneity issue several NRL-based models (Jiang et al., 2018; Cai et al., 2019; Kong et al., 2019) have been used in generating paper recommendations.

Albeit, the current NRL-based models (Jiang et al., 2018; Cai et al., 2018a, 2019; Gupta & Varma, 2017) resolved the heterogeneity problem in homogeneous embedding models, thus, they are unable to exploit the salient factors and relations corresponding to the heterogeneous information network objects. For instance, DBLP is a heterogeneous information network, where multiple relations exist among the network objects. These include relations among papers based on citation, authorship, the field of study, and so on. These relations establish a view that is indispensable to exploit by the NRL-based models. Nevertheless, existing NRL-based models are unable to consider such semantics and contextual information and therefore lack in capturing researchers’ preferences and generating quality results (Chen et al., 2019). To learn content-based embeddings, existing studies employ Doc2vec (Le & Mikolov, 2014), BERT (Devlin et al., 2018), and SBERT (Reimers & Gurevych, 2019), language models. Yet, content-embedding approaches such as SBERT and BERT are trained on general English corpus, which in contrast to domain-specific embeddings models like SciBERT (Beltagy et al., 2019) and SPECTER (Cohan et al., 2020), cannot learn context-preserving node embeddings (Ali et al., 2021b). These models neglect to capture long-term dependencies and significance of salient factors and are therefore limited in making robust and justifiable recommendations.

To overcome these issues, we present a personalized paper recommendation model termed Scientific Paper Recommendation by employing SPECTER with Memory Network (SPR-SMN), which exploits semantic relations between the Heterogeneous Paper Networks (HPNs) nodes, such as papers, authors, the field of study, and content. The model employs the SPECTER language model to learn content-based node representations. Moreover, an end-to-end memory network is used to exploit the long-range dependencies and robust semantics. The main contributions of this study are the following:

  • We present a novel Paper Recommendaiton model that employs an End-to-end Memory Network with SPECTER, which effectively exploits useful semantic relations and contextual information to generate robust paper recommendations.

  • The proposed model utilizes personalized information regarding a user, such as an authorship, the field of study, paper content, and citation relations, to capture researchers’ preferences and make personalized recommendations.

  • We conduct exhaustive experimentation on two real-world datasets to examine the performance of the proposed model related to other state-of-the-art counterparts in terms of standard evaluation metrics including nDCG, MAP, and recall.

Related work

The models providing paper recommendations can be classified into three main categories, namely: CB, CF, and GB. In the following sections, we explain state-of-the-art models that belong to these three classes.

CF and CB based citation recommendation models

CF-based models exploit users’ friends’ opinions (explicit or implicit) to make paper recommendations (Wang & Blei, 2011; Wang & Li, 2015). To this end, Collaborative topic regression (CTR) (Wang & Blei, 2011) recommends papers to users by seamlessly integrating both feedback matrix and paper content information into a unified model. However, the CTR model faces the cold-start problem when the user-item rating matrix is sparse. PCTR (Wang & Li, 2015) resolves this issue by extending the CTR model by integrating the network structure information along with user-item feedback information using a principled hierarchical Bayesian model, thereby overcoming the cold-start problem faced by CTR. Bansal et al. (2016), introduced a model that generates CF-based paper predictions by employing a gated recurrent unit (GRU) network. The model utilizes user friends’ ratings along with the content of articles to make paper recommendations. Both these models utilize content and auxiliary information to overcome the sparsity and cold-start problems faced by traditional CF-based models. Khadka et al. (2020) generated a high-level representation of a paper employing its topic information to produce citation recommendations. The research also contributes a new dataset consisting of the publication history of researchers and the content of scientific publications. McNee et al. (2002) exploited citation network between authors and papers employing four types of CF-based methods to make recommendations for a query user.

In contrast to CF, the CB models represent research papers and users by exploiting the content, features, and descriptions of the corresponding papers and users to produce recommendations (Salloum & Rajamanthri, 2021; Ali et al., 2021c). To this end, Bollacker et al. (1998) proposed- CiteSeer, which is the first CB-based academic paper recommendation system that exploits TFIDF vector and citation relations. Likewise, a CB model (Amami et al., 2016) makes use of Latent Dirichlet Allocation (LDA) to the textual content of research papers to generate their latent representations. In particular, the model builds the representations of the researcher’s profile (based on author’s written papers) and candidate papers using LDA. Then, the model computes similarities between these representations to make final recommendations. Similarly, Science Concierge (Achakulvisut et al., 2016) employed Latent Semantic Analysis (LSA) to the content of manuscripts to provide recommendations.

The CB and CF-based models discussed above can assist users in tailoring personalized recommendations. Nevertheless, CF-based models encounter different problems, especially cold-start and sparsity. That is, when there is limited user rating and user profile information then CF models find it troublesome to produce justifiable results. This way, the recommendations delivered on such insufficient information can lead to inaccurate results (Son & Kim, 2017). In contrast, CB models require papers and user descriptions/features, and if such information is unavailable, they end up with the cold-start and overspecialization problems (Khusro et al., 2016). Besides, CB and CF models do not utilize auxiliary information sources and semantic relations in the HPNs. Therefore, these models fail to capture meaningful semantics and generate relevant recommendations.

Deep learning and graph-based citation recommendation models

During the last decade, several DL-based paper recommendation models employed multilayer perceptrons (MLPs) (Huang et al., 2015), convolutional neural networks (CNNs) (Yin & Li, 2017), recurrent neural networks (RNNs) (Bansal et al., 2016; Yang et al., 2018; Uddin et al., 2022), and generative adversarial networks (GANs) (Cai et al., 2018a) to produce quality recommendations (Ali et al., 2020a). For instance, Huang et al. (2015) generated recommendations for a citation context by using the semantic representations of citation contexts and relevant papers. This model utilized a multi-layer neural network to learn the probability of citing an article given the citation context. In the same direction, PCCR (Yang et al., 2018) adopted LSTM (Abro et al., 2020) to learn the representations of citation contexts and research manuscripts employing the context and paper encoders, respectively. Then, the model finds top-k citations for a given context using consine similarity between the embeddings of corresponding context and candidate papers. Similarly, a personalized model (Wang et al., 2020a) utilized authors information and citations information using a BiGRU network to make context-aware recommendations. On the other hand, p-CNN (Yin & Li, 2017), a personalized citation recommendation model generates recommendations using CNN. In particular, it exploits the information about authors to compute relevance between citation context and relevant paper. Moreover, the model employs a discriminative training strategy to learn the parameters and generate relevant recommendations. Similarly, POLAR (Du et al., 2019) proposed an attention-based CNN model to produce paper recommendations. To capture salient factors and words, the model employs an attention matrix that captures both local and global weights.

Recently, sophisticated graph (Goyal & Ferrara, 2018) and network representation (Cui et al., 2019) techniques exploit semantic relations between the nodes of the graph or a network to learn their vector representations of corresponding nodes. Various paper recommendation models (Gupta & Varma, 2017; Cai et al., 2019; Kong et al., 2019; Ali et al., 2021a; Du et al., 2020) employed such embedding methods to make recommendations. To this end, Gupta and Varma (2017) employed Doc2vec (Le & Mikolov, 2014) and DeepWalk to learn the embeddings of papers’ content and network structure, respectively. Then, the model exploits similarities between the learned representations to produce paper recommendations. Likewise, VOPRec (Kong et al., 2019) generated recommendations by integrating text-based vector representations and structured-based embedding employing the Paper2vec (Ganguly and Pudi, 2017) and Struct2vec (Ribeiro et al., 2017) embedding methods, respectively. In contrast, BNR (Cai et al., 2019) and CR-HBNE (Ali et al., 2021a) employed Node2vec (Grover & Leskovec, 2016) to exploit the semantic relations between the objects of heterogeneous bibliographic network and learn the embeddings of participating nodes (i.e., papers, authors, content, venues, etc.). Finally, the learned node embeddings are utilized to make the final recommendations for a query manuscript. More recently, Dai et al. (2021) proposed the GRSLA model, which exploits author information by introducing a novel author embedding method. The model makes use of an encoder-decoder architecture using three neural networks to alleviate the extendibility issue of author embedding vector faced by existing global citation recommendation models (Son & Kim, 2017; Cai et al., 2018b; Dai et al., 2019). In the same direction, Zhang and Zhu (2021) studied citation recommendation problem from the perspective of semantic representation of cited papers’ relations and content. The study designs 132 methods by integrating different NRL-based methods with text representation learning methods and generates the vector representations of research papers and then cosine similarity between these representations is employed to deliver relevant recommendations.

Existing network representation learning-based recommendation models, such as VOPRec, GAN-HBNR (Cai et al., 2018b), GRSLA, and BNR gain superior results compared to the random-walk, and traditional CB and CF models. Nevertheless, these models are limited to exploiting the significance of semantic relations in HPNs and dealing with the ”cold-start papers” problems. Furthermore, these models cannot effectively utilize contextual information and salient factors to capture the researcher’s preferences. A detailed survey related to the classification of paper recommender systems exploring information filtering methods, information sources exploited such as keywords, title, citation, user profile, etc., evaluation measures, and open challenges can be found in Kreutz and Schenkel (2022).

Preliminaries and problem definition

This section illustrates the preliminaries required for the proposed model. For simplicity, you may find all the symbols used in the rest of the paper at Table 1.

Table 1 Symbols and their details used in this research

Definition 1

(Heterogeneous Papers Network). The network \(G=(N,E)\) is a variant of Heterogeneous Information Network (HIN) (Ali et al., 2020c) with two mapping functions, viz., node type mapping \(\varphi :N\rightarrow O\) and relation type mapping \(\varphi :E\rightarrow R\). Here, each node \(v \in N\) and edge \(e \in E\) belong to a particular node type and relation type, respectively. Moreover, \(N=A^s \cup P^s \cup F^s\), with \(A=\left\{ a_1, a_2,...a_n\right\}\), \(P=\left\{ p_1, p_2,...p_n\right\}\), \(F=\left\{ f_1, f_2,...f_n \right\}\) representing sets of authors, papers, and field of study, respectively. Additionally, \(E=\cup _{r\in R} E_r\) represents the edges, where \(E_r\) is the set of edges linked with relation \(r \in R>1\). In an HPN, we have \(|O|+|R|>2\).

Example 1

: The HPN in this work contains three objects, viz,. papers, authors, and field of study (FOS) that establish relations with each other. In particular, the authorship relation network \(G_{ap}\) is established between authors and papers when an author writes a paper. For instance, if author \(a_i\) writes paper \(p_j\), then the author-paper adjacency matrix \(A^s \in \Re ^{\left| A \right| \times \left| P \right| }\) will have value \(e_{a_i,p_j}=1\) to represent such relation. To capture researchers’ preferences, the proposed model employs paper-to-paper relation network \(G_{pp}\) and FOS of the paper \(G_{pf} \in \Re ^{\left| P \right| \times \left| F \right| }\) network. That is, if paper \(p_i\) establishes citation link with \(p_j\), then the corresponding entry in the \(P^s \in \Re ^{\left| P \right| \times \left| P \right| }\) matrix has \(e_{p_i,p_j}=1\), otherwise 0. Likewise, we demonstrate relations between papers and their FOS, if there exists any relationship.

Problem Statement: Given a seed paper q along with the HPN \(G=(V,E)\), the proposed model aims to exploit the semantics in the HPNs and recommend top@k relevant papers for q.

Hypothesis: Using auxiliary information sources and structural modules incorporated in the HPNs will improve recommendation results and over-come the cold-start paper problem.

SPR-SMN model

Figure 1 presents the architecture of the proposed model that has a three-step process. First, it learns embeddings of the contents of papers by employing the pre-trained SPECTER (Cohan et al., 2020) model. Next, it exploits researchers’ preferences by using authors’ information, the FOS, and citation relations of the papers. Finally, it uses an end-to-end memory network with an attention mechanism to capture long-range dependencies and give weight to the significant information. The constituent modules responsible for these steps are discussed in detail in the following sections.

Fig. 1
figure 1

The diagram of the SPR-SMN model

Content-based paper embedding

In this section, we present how the model employs the SPECTER (Cohan et al., 2020) document embedding model to learn content-based representations of scientific papers. To learn semantic-aware embedding, the SPECTER (Cohan et al., 2020) employs Citation-aware Transformers. General-purpose documents embedding models, i.e., BERT (Devlin et al., 2018) and SBERT (Reimers & Gurevych, 2019), can produce context-aware documents representations (Wang et al., 2020b; Gao et al., 2019). Nevertheless, traditional language models like Doc2vec and SBERT were trained on Wikipedia corpus, therefore they do not capture more relevant contextual information corresponding to scientific documents. Also, these models do not consider relatedness between documents established based on citation relations while generating document representations. To come up with a better solution, the SPECTER tunes the embeddings learned through SciBERT (Beltagy et al., 2019), which employs a corpus of scientific documents during training. To learn representations of a paper \(c_i^p\), the SPECTER initially encodes the concatenated text (i.e., the abstract and title) of the paper utilizing Transformer LM (which is SciBERT), defined below.

$$\begin{aligned} \varvec{c_i^p}= \text {Transf}(input)_{[CLS]}, \end{aligned}$$
(1)

where Transf represents the forward function of Transformer. The model takes the concatenated WordPieces (of title and abstract) and [CLS] token as an input, which is separated using the [SEP] token. In order to enrich the embeddings learned using SciBERT, the SPECTER employs citation relations between documents as a relatedness signal to enrich the vector representations of documents learned through SciBERT. Besides, the SPECTER makes use of ”hard negatives” along with ”simple negatives” to learn more optimal and context-preserving embeddings. The SPECTER model learns nodes content-based embeddings by optimizing the following margin loss objective 2.

$$\begin{aligned} T_L= \max \left\{ \left( d \left( P^m-P^+\right) - d \left( P^q-P^- \right) +w \right) ,0 \right\} , \end{aligned}$$
(2)

where \(P^m\) is used for query paper, \(P^+\) denotes the relevant paper, while \(P^-\) represents the irrelevant paper. Additionally, d denotes the Euclidean norm distance, and w represents the margin which is equal to 1. The model makes use of w to make sure that the value of \(P^+\) is at least w closer to \(P^m\) as compared to the \(P^-\). During training, the model aims to minimizes the distance between query paper and related paper, while maximizing the distance between query paper and irrelevant paper. At inference time, for an input paper \(c_i\), the model learn content-based paper embedding \(\varvec{c_i^p}\) by taking the SPECTER’s Transformer pooled output activation. This way, the model captures the contextual information of the paper.

Fig. 2
figure 2

An illustration of personalized embedding module

Personalization module

In paper recommendation models, researchers’ preferences regarding authors, citations and paper field of study play a prominent role in producing individualized paper recommendations. For instance, the author(s) of a paper can have a great influence on the readers, and citations. Mostly, a researcher follows a particular researcher or a research group/s with similar research preferences. Likewise, an author who does collaboration with another researcher is more important, compared to the one who has no collaboration with different research interests (Ali et al., 2020c). Besides, those papers which are linked based on citation relation are considered to be more related. Also, researchers cite papers very carefully. Therefore, such relations has a great impact on personalizing the recommendations for a researcher. In particular, The probability of citing a paper already cited by an author in their previous paper is higher compared to any other random paper. Similarly, a researcher likes studying a research paper that targets the same field of study and authors with matching interests. Also, the study (Ali et al., 2020b) reveals that the most popular feature in citation recommendation models is field of study since it correlates papers based on similar keywords. Thus, each paper is marked with multiple tags that give a short description of its contents. To exploit such useful relations, the proposed model employs a personalization module, depicted in Fig. 2. This module exploits authors information, citation relations, and paper field of study. In this regard, the model first establishes relation matrix between papers, authors and FOS to capture researchers preferences. The author-paper relation matrix \(L^a \in \Re ^{\left| A \right| \times \left| D \right| }\) is maintained between authors and their papers. Similarly, the model establishes paper-citation relation matrix \(P^s \in \Re ^{\left| P \right| \times \left| P \right| }\). The paper-FOS relation matrix \(F^s \in \Re ^{\left| P \right| \times \left| F \right| }\) is created between papers and field-of-study. For instance, if a paper \(p_i\) belongs to a FOS \(f_j\), the corresponding cell in the matrix gets \(e_{p_i f_j}=1\) and 0 otherwise. Once we have these relation matrices, the model combines the i-th columns, i.e., \(l^s_i \in L^s\), \(p^s_i \in P^s\), and \(f^s_i \in F^s\) of adjacency matrices with the content-based embedding \(c_i^p\) and passes through a non-linear layer as follows.

$$\begin{aligned} z_0^p=Rel\left( W_c^Pc_i^p+W_u^Pu_i^p+W_p^Pp_i^p+W_f^Pf_i^p+b^P \right) \end{aligned}$$
(3)

Here, \(W_c^P \in \Re ^{p_p \times p_h}\), \(W_u^P \in \Re ^{p_p \times \left| A \right| }\), \(W_p^P \in \Re ^{p_p \times \left| P \right| }\), and \(W_f^P \in \Re ^{f_p \times \left| F \right| }\) denote the learnable weight matrices, and \(b^P\) represents the bias. Next, the model employs multi-layer perception to generate the final embedding vector \(\varphi _i^p\) of a paper, as defined below.

$$\begin{aligned} \varphi _i^p=MLP\left( z_i^p \right) , \end{aligned}$$
(4)

where MLP represents a multi-layer perceptron which can be computed using the following equation.

$$\begin{aligned} MLP(y)=L_2[ReLU(W_iy+b_i)], \end{aligned}$$
(5)

where ReLU is used for non-linearity. To capture researchers preferences and produce personalized results, It is intuitive to exploit authors information, citation relations and paper’s field of study. Thus, to learn robust representations of training papers, SPR-SMN employs the aforementioned relations between heterogeneous papers networks (HPNs). Likewise, the model adopts this process to generate the context-aware representation of the query paper \(\varphi _i^q\).

Memory network

The proposed model employs an end-to-end memory network (Sukhbaatar et al., 2015) module to exploit long-term contextual information and salient factors. The output of the personalization module is feed as an input to this module. The working of the memory network module is illustrated graphically as a final module in Fig. 1. To produce final recommendations for a query paper, the content-based paper representations \(\varphi _i^p\) are converted into memory slots \(M_s=\Re ^{p_m \times P}\), where P denotes the total number of papers. The model embeds query paper q into internal state vector in the controller and computes relevance between query paper and memory slots by taking the inner product between them followed by a tanh activation, defined as given below.

$$\begin{aligned} s_i= \tan h(W_{atn}[m_i:e_q]+b_{atn}), \end{aligned}$$
(6)

where \(s_i\) denotes the scoring function between query paper and relevant training papers and \(W_{atn}\) represents weight metrics. The model then computes the attention weights for a query paper by giving \(s_i\) into the softmax function defined as \(\alpha _i=softmax (s_i)\). The model employs an attention mechanism since it learns an adaptive weighting function, which assigns significance/weight to participating memory slots. The model computes the final output memory representations by obtaining a weighted sum over paper embedding vectors \(C\in \Re ^{p_m\times N}\), as computed below.

$$\begin{aligned} o=\sum _{i=1}^N\alpha _i c_i , \end{aligned}$$
(7)

where o denotes weighted sum based on the relationships between training papers and query paper. If a single layer MemN2N is employed, then the output embedding o and the input vector \(e_q\) are summed and provided as an input into a final weight matrix followed by a softmax function to make predictions. Nevertheless, models that employ a single attention mechanism lack of capturing comprehensive semantics and context-aware representations of data (Sukhbaatar et al., 2015). Thus, the SPR-BMN uses multiple processing layers. In particular, to make predictions for a query paper, with the \(k + 1-th\) modeling layer/hop, the model adds the output \(o^k\) and the input \(e_q^k\) followed by softmax as follows.

$$\begin{aligned} {\hat{a}}=softmax[W(o^k+e_q^k)], \end{aligned}$$
(8)

where \(W\in \Re ^{N\times p}\) denotes the weight matrix. The proposed model used three hops and selected top-10 recommendations for each query paper.

Model training

During training, the proposed model minimizes the cross-entropy loss computed between the prediction of the model, i.e., g, and the accurate result. Thus, the proposed model optimizes the following objective.

$$\begin{aligned} O=-\sum _{q \in C} \sum _{i \in P} P_q^{(i)} log[P_q^{(i)}(g)], \end{aligned}$$
(9)

where C, and \(\left| P \right|\) denote training examples and set of research papers, respectively. If the proposed model recommends ground truth, i.e., \(p^i\) for the query paper q in top-k recommendations, then it is treated as an accurate result and we have \(P_q^{(i)}=1\), otherwise 0. In addition, the model utilizes stochastic gradient descent, which learns the parameters of the model based on the backpropagation method. To overcome possible overfitting, the model employs dropout. For better configuration, we experimented with different dropout rates.

Experimental study

In this section, we present the evaluation protocol, the datasets, and the models used for comparison.

Datasets

To assess the results generated by models, we employed two datasets, viz., the DBLP-V12 and the ACL Anthology (AAN). Further details and statistics of the datasets are tabulated in Table 2.

DBLP The DBLP-Citation-network V12Footnote 1 has a relatively large size among the DBLP datasets with 3,501,133 research papers and 25,022,314 citations. The information it provides include papers’ titles, abstracts, authors, venues, keywords, and citation relations.

Table 2 Datasets specifications

ACL Anthology Network (AAN) AAN datasetFootnote 2 is comparatively latest and holds papers related to computational linguistics and NLP. It holds research articles = 21,450, authors = 17,335, venues = 311, and citation relations = 113355.

Datasets train-test split

To conduct experiments, we split each dataset randomly into two sets called the training \(\Upsilon ^ t\) and test set \(\Upsilon ^ p\). Training set consists of 80% of the papers, while test set \(\Upsilon ^ p\) possesses the remaining 20% papers. Additionally, \(\Upsilon =\Upsilon ^ t \cup \Upsilon ^ p\) and \(\Upsilon ^ t \cap \Upsilon ^ p =\emptyset\). For a seed manuscript, the model provides top@k paper recommendations using the \(\Upsilon ^ t\). If the ground truth is recommended in the top@k, then the result is considered as relevant, otherwise not.

Metrics for evaluation

We employed recall, Mean Average Precision (MAP), and normalized Discounted Cumulative Gain (nDCG) as evaluation metrics, which are the most commonly used metrics in the relevant domain (Ali et al., 2020b; Kefalas & Manolopoulos, 2017; Kefalas et al., 2018).

Recall: examines the recommendations of models using the percentage of related results delivered in the top-k list of recommendations, where \(k=\left\{ 20,40, 60, 80, 100 \right\}\).

$$\begin{aligned} Recall = \frac{1}{Q}\sum _{j=1}^{Q}\frac{R_p\cap T_p}{T_p}, \end{aligned}$$
(10)

where Q represents all target research papers, \(R_p\) denotes the list of top-k recommendations delivered for the seed paper p.

MAP: examines the significance of a model by examining whether the relevant articles are suggested in top-k or not. Additionally, it penalizes those errors that happen high up in the top@k.

$$\begin{aligned} AP@k=\frac{1}{GTP}\sum _{i=1}^{k}\frac{TP seen}{i}, \end{aligned}$$
(11)

where TPseen denotes total true positives occurred till k. We select the cut-off value of AP as choose the cut-off value for the Average Precision (AP) as AP@10.

nDCG: evaluates the rank in the top-k of the true relevant papers suggested by the model (Ali et al., 2020c) and is computed as:

$$\begin{aligned} nDCG_{g} = \frac{DCG_{g}}{IDCG_{g}}, \end{aligned}$$
(12)

where DCG is the weighted sum of the degree of relatedness of the ranked manuscript. \(IDCG_{g}\) shows the DCG of ideal ordering, using which the DCG values are normalized.

Models used in the experiments

This section provides details about the models that are used as baselines for the proposed model. The following subsections explain these models.

  • CCA (Gupta & Varma, 2017) learns the low-dimensional representations using the content and network proximity. To learn content and network embeddings, it uses Doc2Vec and DeepWalk embedding methods, respectively. Finally, the similarities between the learned vectors are computed to make relevant recommendations. We set the dimensions of DeepWalk to 64 and 300 for the Doc2Vec method.

  • BNR (Cai et al., 2019) is an NRL method that explores network proximity and relevant papers’ content to provide recommendations against a seed paper. To conduct fair experiments, the setting of the parameters is adjusted as follows: dimensions = 128, context size = 10, and walks per vertex = 80. Also, the tuning parameter \(\beta\) has great importance in the model performance and it gives the best results when it is set to \(\beta = 0.7.\) Also, we set the in-out parameter p = 1 and return parameter q = 2.

  • SCR-NTR (Qiu et al., 2021) is a citation recommendation model that exploits network and textual information to generate recommendations. To learn text and network representations, the model utilizes BERT and HeGAN representation learning methods, respectively. The parameter setting for the text representation is as follows: Learning rate: 0.0001, Learning epoch: 5, the number of bidirectional transformer layers: 12, the size of the hidden layer: 768, the number of attention heads: 12, and the number of total parameters: 110 M. For network representation, the number of discriminator training per epoch and generator training per epoch are set to 15 each.

  • NNRank (Bhagavatula et al., 2018) creates embeddings of nodes employing a neural network and provides top-k results for a query paper using the cosine between the embeddings learned for the corresponding nodes in the network. We utilize the learning rate of 0.001, batch-size = 512, abstract length = 500, nearest neighbors to 5, while the dense dimensions have a value of 75.

  • PR-HNE (Ali et al., 2020c) is a heterogeneous network embedding model that employs multiple relation networks to generate recommendations for a query manuscript. The model employs two proximity concepts to capture semantic relations between network objects and learn their representations. Finally, the dot product between these embedding is computed to rank top-k papers for a query manuscript. All the parameters are set according to the experimental setup of the PR-HNE model.

  • GCR-GAN (Ali et al., 2021b) is a global citation recommendation model that exploits network structure and relevant content using generative adversarial network. Like the proposed model, GCR-GAN learns papers content-based embedding using the SPECTER document embedding model. However, the network-based representations are learned using the GAN network. On the other hand, the proposed model employs a personalization module and end-to-end memory network to capture user preferences and long-range contextual information, respectively. In addition, the proposed model exploits the paper field of study to exploit more semantics in the network. In this set of experiments, we used the default setting of parameters.

Fig. 3
figure 3

Radar chart showing the MAP, nDCG, and Recall scores based on the a DBLP-V12, and b AAN datasets

Comparative analysis with other baselines

This section analyzes the results generated by the proposed model compared to the baselines. In particular, we judge the recommendations produced by different models on the DBLP, and AAN datasets using the evaluation metrics, namely recall, MAP, and nDCG, as depicted in Fig. 3. Table 3 and Fig. 4 reveal that SPR-SMN outperform its counterparts on DBLP and AAN datasets, respectively. The results exhibit that CCA has generated the most insignificant results. The reason behind the poorer performance is its inability to exploit the structure of the heterogeneous bibliographic network, author information, and paper field of study. Moreover, the model utilizes doc2vec to learn textual representations, which in contrast to SPECTER cannot capture robust contextual information and therefore fails to learn semantic-preserving paper content embeddings. BNR on the other hand achieves better results compared to CCA as it exploits heterogeneous information sources to learn nodes embedding and produce citation recommendations. Nevertheless, the model generates ineffective vectors since it employs the DeepWalk method to exploit shallow node representation. SCR-NTR gains improvement in terms of nDCG and MAP scores as compared to the BNR model. The reason is that it employs more sophisticated network and text representation learning methods, viz., HeGAN and BERT, respectively to exploit semantic relations in the network. Yet, the model is limited in terms of using the BERT model, which in contrast to the domain-specific SPECTER model fails to produce context-aware content embedding since it is pretrained on general English Wikipedia corpus and does not consider hard negatives. Besides, the model is limited in terms of exploiting personalized information i.e., the paper field of study to generate individualized recommendations.

It is also notable that GCR-GAN returned the second-best results because of its ability to consider the network structure along with author and auxiliary information sources to make personalized results. However, the author vector in GCR-GAN is obtained using the author adjacency matrix, which contains only linked neighbors’ information. Also, when a new author is introduced to the model, it requires to expand the dimension of the author’s adjacency matrix and retrain the whole model, which leads to extendability problems. Therefore, GCR-GAN has been outperformed along with other competitors by the SPR-SMN in terms of nDCG, MAP, and Recall metrics, which demonstrates that the proposed model is comparatively efficient in producing better-ranked paper recommendations.

Also, we notice that the SPR-SMN has gained almost 4% and 3% improvement in terms of MAP and nDCG scores against GCR-GAN on the DBLP-V12 dataset. Considering the recall results (i.e., Rec@20, Rec@40, Rec@60, Rec@80, and Rec@100), SPR-SMN obtains better results over other baselines, which exhibits the stability and robustness in the results of the proposed model. Similarly, the proposed model has gained nearly 4% and 5% better MAP and nDCG scores related to the second-best model, i.e., GCR-GAN, on the AAN dataset. This is attributed to the fact that SPR-SMN exploits semantic relations and contextual information corresponding to research papers and authors, which helps the model exploit the researcher’s preferences and produce quality recommendations. The reason for the SPR-SMN’s superior results is its application of semantics employing the SPECTER model, which during training employs domain-specific corpus and utilizes citation-informed transformers to produce semantic-aware embedding of papers. Further, the personalization module exploits researchers’ preferences, viz., author’s information, relevant content, the paper field of study, and citation relations, which boosts the results. Additionally, the model stacks the memory layers to enhance the learning ability and grasp long-range dependencies of the significant factors (Table 4).

Table 3 Results reported using the DBLP-V12, where bold indicates the best and—represents the second best performer
Table 4 Results reported employing the AAN dataset, where bold indicates the best and—represents the second best performance
Fig. 4
figure 4

Comparison of recommendations models regarding the recall scores a on the DBLP-V12 dataset, b on the ACL anthology dataset, and c with respect to the cold-start papers

Ablation study

In this section, we analyze the impact of each module in the proposed model, in terms of MAP, nDCG, and recall scores. In particular, we judge the influence of integrating content embedding module \({\textbf {SPR-SMN}}_{\textbf {E}}\), personalization module \({\textbf {SPR-SMN}}_{\textbf {{E+P}}}\) and memory network module \({\textbf {SPR-SMN}}_{\textbf {{E+P}}}\), over the two datasets. The results presented in Table 5, clearly indicate that the \({\textbf {SPR-SMN}}_{\textbf {E}}\) produced relatively insignificant results compared to other variants since this variant employs content information and does not capture personalized information and long-range dependencies. On the contrary, \({\textbf {SPR-SMN}}_{\textbf {{E+P}}}\) gained a significant improvement over the previous variant since it exploits personalized information including author, the paper field of study, and citation relations. Finally, all components combined including the memory network denoted as \({\textbf {{SPR-SMN}}}_{\textbf {{E+P+M}}}\), further improve models’ accuracy. This is due to the integration of an end-to-end memory network with an attention mechanism, which helps capture the long-range dependencies and exploit salient factors.

Table 5 The influence of integrating different modules on the recommendation results, where bold results indicate the best model

Impact of integrating information networks

We analyze the influence of the participating relation networks in the personalized embedding module. The results of this study are tabulated in Table 6. In particular, we examined the significance of incorporating authors’ information, paper citation relations, and paper FOS. To do so, we compared the results of different variants of the proposed model. These include \({\textbf {SPR-SMN}}_{\textbf {{NP}}}\), which employs no relation networks, namely authors, citation, and FOS, except papers’ content to generate recommendations. \({\textbf {SPR-SMN}}_{\textbf {{A}}}\) extends previous version by adding author relation to enhance results. \({\textbf {SPR-SMN}}_{\textbf {{AC}}}\) incorporates citation relations along with authors information. \({\textbf {SPR-SMN}}_{\textbf {{ACF}}}\) integrates all relations networks including field of study.

These results demonstrate that \({\textbf {SPR-SMN}}_{\textbf {{NP}}}\) produced relatively insignificant results compared to other variants since it employs none of the personalized author information, citation relationships, and field of study. Also, it can be noticed that exploiting author’s relations in the \({\textbf {SPR-SMN}}_{\textbf {{A}}}\) has a great impact on the relevance of the results. On the contrary, the field of study has comparatively less impact on the final recommendations. The results of final variant \({\textbf {SPR-SMN}}_{\textbf {{ACF}}}\) exhibits that citation relation is the second influential relation exploited. To conclude, these findings suggest that when we integrate various information networks, the proposed model gains better results. Exploiting these relations help the model capture researchers’ preferences to produce more individualized results.

Table 6 The influence of utilizing various relation networks on the recommendation results, where bold results indicate the best model

Performance over cold-start papers

The problem of ‘cold-start papers’ occurs because of the unavailability of information about papers to recommend them to the users. The information that is unavailable regarding cold-start papers includes paper content, citation relations, the field of study, and the author’s information. To analyze the performance of the SPR-SMN, we selected 20,454 cold-start papers and used them in producing recommendations. Table 7 represents that even if we have missing information, the model can utilize auxiliary information sources to produce useful recommendations. In particular, if a paper has no citation information, the proposed model employs its field of study, content, and author’s information to make recommendations. We notice that our model gained 7% and 2% better MAP and Rec@100 results compared to the second best-performing model, i.e., GCR-GAN. This significant gain demonstrates that the context-preserving content-based embedding learned using the SPECTER model and exploiting personalized information helped the proposed model produce improved results. On the other hand, GCR-GAN and PR-HNE perform competitively with each other.

Table 7 Analysis of results using cold-start papers, where bold results reveal the best and—results show the runner-up model

Impact of parameters

This section discusses the setting of different parameters employed by the SPR-SMN and their impact on the resulting recommendations. Besides, we use the SPECTER model to learn content-based representations of papers. SPECTER utilizes a 768-dimensional embedding for nodes. Also, we set the loss margin parameter to \(w=1\). For training, SPECTER employs five negative samples containing two hard negatives and three easy negatives. For the batch size, we choose value 16. To learn papers’ content embedding, we used the abstracts of research papers with a size of 517. To find the best learning rate, we used the grid search over a typical range of hyperparameters. To analyze the impact of the learning rate, we presented the MAP, nDCG, and Rec@20 results based on the DBLP and AAN datasets as depicted in Fig. 6. It is noticeable that the model functions poorly on a learning rate of 0.1, which shows that it is unable to converge well on a high learning rate. In contrast, a small value, i.e., 0.0001, requires more time for convergence. However, the results improve regularly on 0.001 and 0.005 values, and therefore, the model gains a significant change considering Recall, MAP, and nDCG on the DBLP dataset. Moreover, a similar kind of trend is observed in the AAN dataset. To come up with a good learning rate, we employ a fine-search mechanism and discover that 0.001 provides the best possible results on the DBLP and AAN datasets.

Fig. 5
figure 5

The influence of K value in the memory network module

Finally, we analyze the model’s performance for the layer number k of the memory network, its value varies from 1 to 5. Figure 5 depicts the impact of k on the model’s results. We observe that the model gains the highest performance when k equals to 3. As the value of k increases from 1 to 3, the model achieves better results. But, when it goes from 3 to 5, then the performance of the model degrades. increasing k from 3 to 5, the performance is getting slightly lower. For simplicity, we set \(k=3\). On the other hand, the dimensionality of the MLP layer has a relatively less impact on the final results. In Fig. 7, we notice that the model achieves the highest performance when the values of the MLP layer are 120 and 140 on the DBLP-V12 and AAN datasets, respectively.

Fig. 6
figure 6

Analyzing the influence of various learning rates on the final recommendations using the DBLP dataset

Fig. 7
figure 7

The dimensions of MLP layer in the personalization module using the DBLP-V12 and AAN datasets

Error analysis

Our system achieves the MAP of 62.5 on the DBLP dataset. To further analyze the model and explore possible improvements, we conducted a manual error analysis. This is done by randomly choosing 100 errors from our recommendation results. During the error analysis, we find that many errors occurred due to missing information regarding papers and authors. In particular, 33 out of 100 errors occurred due to the missing citation relations, and 27 happened due to the unavailability of paper contents (title and abstract). Further, 22 out of 100 inaccurate predictions are due to the unavailability of author information, the remaining 18 are caused by other factors like missing FOS, papers written in non-English languages, etc. Thus, with this in mind, we focused more effort on adding the missing citations, paper contents, and author information, which helped our model achieve an improved MAP score.

Conclusion and future work

Several models have been proposed in the literature to make personalized paper recommendations for researchers. However, these models are limited in exploiting several salient factors and semantic relations in the heterogeneous paper networks to capture researchers’ preferences and generate relevant results. The existing models also encounter the ‘cold-start papers’ problem, which is addressed by the SPR-SMN model by employing contextual information and semantic relations corresponding to the network. The model uses SPECTER along with an end-to-end memory network to capture long-range contextual information. Its personalization module uses authors’ information, the paper’s FOS, and citations. The experimental results reveal the effectiveness of SPR-SMN against state-of-the-art baselines. The key findings of this research are the following.

  • Employing SPECTER document embedding model can learn robust and semantic-preserving paper content embeddings.

  • Exploiting personalized information, viz., paper citation relations, author’s information, and field of study can better grasp the researcher’s preference dynamics and produce justifiable results.

  • Using an end-to-end memory network with an attention mechanism helps the model to exploit rich semantics and capture long-range dependencies.

  • Integrating all these aspects into a unified model alleviates the cold-start paper and lack of personalization problems.

In the future, we will analyze the influence of temporal dynamics and the significance of contextual information using a hierarchical attention mechanism.