Keywords

1 Introduction

Due to the overwhelming amount of academic papers published every year, researchers need to spend much time on searching relevant or interested papers, i.e., recommending academic papers becomes a challenging task. To address the problem, many digital libraries and recommender systems have been developed to recommend relevant literature according to the provided keywords or users’ profiles. Although search engines accelerate search process, finding satisfactory scientific articles is still time-consuming and inconvenient.

A number of techniques have been proposed to recommend academic papers. Content-based filtering (CBF) [11] views content of document as keywords to measure similarity between papers. Collaborative filtering (CF) is a classical recommendation method which recommends items to users based on others who have similar preferences [13]. Citation-based analysis uses citation relationships to calculate relatedness among papers [16]. Hybrid methods have been proposed to improve the recommendation results, which combine two or more recommendation techniques to get better performance [6].

The techniques mentioned above are mostly used to recommend academic papers that are similar to a given paper in some respects. As shown in Fig. 1(a), given a paper \(u_1\) in the field of graph embedding, the recommended papers are \(u_6\), \(u_2\), \(u_4\), and \(u_8\). In these recommended papers: \(u_6\) surveys the study of dimensionality reduction, which has the same background knowledge with \(u_1\); papers \(u_2\) and \(u_4\) are topically related to \(u_1\) and focus on addressing problem of network representation; paper \(u_8\) provides a new fundamental theory which can be transferred to deal with the research problem in \(u_1\). Existing recommendation methods rank papers based on relevance and return a set of discrete papers relevant to the given query. However, for researchers who concentrate on learning existing literature about the topic of the given paper, they want to know how the topic develops and evolves, and understand what it originates from. Conventional academic paper recommendation ignores the relationships of recommended papers and can not provide effective results to help researchers trace the development of the topic.

Fig. 1.
figure 1

Comparison of conventional recommendation and path-based recommendation.

To address above-mentioned problem, we propose a novel method called PAPR (Path-based Academic Paper Recommendation), which considers time-oriented development of the topic of the given paper, and recommends a sequence of academic papers. Figure 1(b) illustrates how our method works. Our method takes a paper \(u_1\) as input and produces a 3-hop path \({u_2,u_3,u_4}\) from the paper. The proposed method concentrates on creating a deep and consecutive paper path. Nodes on the path are papers highly relevant and important, which represent the evolution and development of the paper’s topic. Addressing this task will be beneficial to researchers who start to investigate a new topic.

In this paper, we take both content and network information into account to obtain better performance on paper recommendation. We learn the textual vector for each paper based on the semantic information (e.g., title and abstract) and utilize the network information to get relatedness of papers. Then, we combine semantic similarity and network relatedness to obtain the united similarity between papers. To get the path of recommendation, we use beam search [18] to get papers which have high relevance with all papers in the existing path.

The main contributions of this paper can be summarized as follows:

  • We introduce a novel academic paper recommendation method to trace the origin and development of the topic of the given paper, which is beneficial for people to have an sight into the development of a research topic.

  • We propose a novel strategy to measure the relevance between two papers and generate a consecutive paper path for recommendation, where both text semantic information and network structural information are considered.

  • We conduct extensive experiments on two real-world datasets DBLP and ACM, and the results show that our proposed approach performs better than all compared methods.

The remaining sections of this paper are as follows. Section 2 presents the concept of scientific network and the problem definition. The overview of the path-based paper recommendation is introduced in Sect. 3. Section 4 describes the experimental setup and discusses the experimental results. Section 5 presents related work on recommending research papers. Section 6 concludes the paper.

2 Problem Definition

In this section, we introduce the concept of scientific network and then formally define the problem. The notations we will use are summarized in Table 1.

Table 1. Notations and explanations.

Definition 1

(Academic Network). An academic network is defined as a graph \(G=(V,E)\), where \(V=\lbrace v_1,...,v_n\rbrace \) represents n vertices and E is the set of edges in G. Each node v and each link e are associated with their mapping functions \(\psi (v): V \rightarrow T_V\) and \(\varphi (e): E \rightarrow T_E\), respectively. \(T_V\) and \(T_E\) denote the sets of predefined objects and relation types. A mapping function \(\Phi :V\rightarrow \mathbb {R}^{d} \left( d\ll |V| \right) \) is used to learn the latent representation of papers.

Example 1

As shown in Fig. 2(a). we construct an academic graph. It consists of multiple types of objects(Author, Paper, Venue) and relations(publish relation between papers and venues, written relation between papers and authors).

Fig. 2.
figure 2

Result of a path-based paper recommendation.

Definition 2

(Academic Paper Path). An academic paper path is defined as a path in the form of \(v_{i_1}\rightarrow v_{i_2}\rightarrow ... \rightarrow v_{i_k}\), which describes a chain from new papers to old papers. For adjacent nodes \(v_{i_{k-1}}\) and \(v_{i_k}\) in the path, there is a citation relation between them.

Example 2

From Fig. 2(b), we can observe there is a paper path T which can be denoted as \(T=v_1\rightarrow v_2\rightarrow v_3\rightarrow v_4\). To extend the path, papers cited by \(v_4\) will be considered and one of them will be chosen as next node according to the similarity between cited papers and T.

Definition 3

(Path Recommendation Probability). Given a path \(T=v_{i_1}\rightarrow v_{i_2}\rightarrow ... \rightarrow v_{i_k}\), which denotes a path of recommended papers, the probability of recommending the paper path is defined as the similarity between the representation of the given paper and that of the recommended path.

Example 3

According to Fig. 2(c), there are three paper paths named path1, path2, path3. Then, we can return the path having the maximum similarity with \(v_1\), after computing the similarity between \(v_1\) and path1, path2, path3 separately.

Problem Formalization. Given a graph G and a query paper \(v_{i_1}\), the Path-based Academic Paper Recommendation is to select a path T defined on the graph G, which has the maximum path recommendation probability.

3 Proposed Approach

To generate an ordered path for paper recommendation, we define the relatedness measurement of papers and the construction of path recommendation. We give an overview of our recommendation framework in Fig. 3.

Fig. 3.
figure 3

Overview of recommendation framework.

3.1 Text Learning

The semantic relatedness of papers is measured mainly based on textual information. We use words from metadata (e.g., title, abstract) to build texts of documents. An effective method to compute semantic similarity of two papers is to generate word embeddings and combine them to form paper embeddings. Specifically, let \(c_i, c_j\) denote the vector representation of two papers \(P_i,P_j\) and the semantic relatedness \(F_s(P_i,P_j)\) is defined by cosine function:

$$\begin{aligned} F_s(P_i,P_j)=cos(c_i,c_j)=\frac{c_i \cdot c_j}{\parallel c_i \parallel \parallel c_j \parallel } \end{aligned}$$
(1)

where \(c_i, c_j\) are weighted average vector representation of words in \(P_i, P_j\) respectively. For the vector representation \(c_i\) of paper \(P_i\), let \(w=(w_1,...,w_p)\) denote word vectors of unique words in paper and we have

$$\begin{aligned} \widehat{c_i}=\sum _{k=1}^p \text {tfidf}(w_{k})\cdot w_k \end{aligned}$$
(2)

where TF-IDF is used to measure importance of word vectors and we use L2-normalization to get final text-level representation, \(\mathbf {c_i=\widehat{c_i} /\parallel \widehat{c_i} \parallel }\).

3.2 Network Learning

The academic network is fundamentally a multi-relational heterogeneous graph where edges indicate many relationships and contain different semantic relatedness. In our model, we consider paper-paper network and author-paper-venue network and adopt two different methods to obtain relatedness between papers.

Paper-Paper Space. The paper-paper relationship constructs a homogeneous network for the academic graph. To evaluate the latent interactions between nodes, we use random walk based sampling strategy to learn latent representation of nodes. In the generation of random walks, we use two parameters p and q proposed by [5] to bias our random walks towards local neighborhood or tend to move further away. After generating a random walk, we define a window size k and \(N_s(u)\) denotes neighborhood for node u in the slide window. Let \(\phi _1 : V \rightarrow \mathbb {R}^d\) be the mapping function from nodes to feature representations. We now try to maximize the likelihood function

$$\begin{aligned} \max \sum _{u \in V} log Pr(N_s(u)| u ) \end{aligned}$$
(3)

and we have

$$\begin{aligned} Pr(N_s(u)| u ) = \prod _{n_i \in N_s(u)} Pr(n_i|u) = \prod _{n_i \in N_s(u)} \frac{\exp (\phi _1(n_i)\cdot \phi _1(u))}{\sum _{v \in V}\exp (\phi _1(v)\cdot \phi _1(u))} \end{aligned}$$
(4)

Finally, we optimize above equation using stochastic gradient ascent and skip-gram architecture to get node representation. Then we can get the similarity of two nodes \(v_i,v_j\) in paper-paper space:

$$\begin{aligned} f_c(v_i,v_j)=cos(\phi _1(v_i),\phi _1(v_j))=\frac{\phi _1(v_i)\cdot \phi _1(v_j)}{\parallel \phi _1(v_i) \parallel \parallel \phi _1(v_j) \parallel } \end{aligned}$$
(5)

Author-Paper-Venue Space. The paper-paper relation can not discover relatedness information between papers without citation relationship. To get a better relatedness evaluation, we also consider other relationships. Specifically, we use two most common and effective meta-path schemes, which are “author-paper-author”(APA) and “author-paper-venue-paper-author”(APVPA). We use meta-path-based random walk strategy [4] to incorporate different types of nodes into skip-gram and learn effective node representations. Given a node v, we maximize the probability of having the heterogeneous context \(N_t(v), t \in T_V\):

$$\begin{aligned} arg max_{\theta } \sum _{v \in V} \sum _{t \in T_V} \sum _{c_t \in N_t(v)} \log p(c_t|v;\theta ) \end{aligned}$$
(6)

where \(N_t(v)\) denotes v’s neighborhood with the \(t^{th}\) type of nodes. Let \(\phi _2 : V \rightarrow \mathbb {R}^d\) be the mapping function for nodes and \(p(c_t|v;\theta )\) is commonly defined as a softmax function:

$$\begin{aligned} p(c_t|v;\theta ) = \frac{\exp ^{\phi _2(c_t)\cdot \phi _2(v)}}{\sum _{u \in V} \exp ^{\phi _2(u)\cdot \phi _2(v)}} \end{aligned}$$
(7)

To achieve efficient optimization, we also use negative sampling for network learning. After learning network representation, the similarity of two nodes \(v_i,v_j\) in author-paper-venue space can be defined as:

$$\begin{aligned} f_a(v_i,v_j) = cos(\phi _2(v_i),\phi _2(v_j))=\frac{\phi _2(v_i)\cdot \phi _2(v_j)}{\parallel \phi _2(v_i) \parallel \parallel \phi _2(v_j) \parallel } \end{aligned}$$
(8)

Network-Based Relatedness. To make full use of network information, the final network embedding for a paper consists of paper-paper and author-paper-venue space. Formally, we formulate the relatedness of two paper \(v_i\) and \(v_j\):

$$\begin{aligned} F_n(v_i,v_j) = \alpha f_a(v_i,v_j) + (1-\alpha ) f_c(v_i,v_j) \end{aligned}$$
(9)

where \(\alpha \in [0,1]\) is to adjust the weights of two parts.

3.3 Aggregated Method

The united relatedness measurement includes semantic similarity and network closeness. We can get the relatedness \(Aggr(P_i,P_j)\) between papers \(P_i\) and \(P_j\):

$$\begin{aligned} Aggr(P_i,P_j)=\lambda F_s(P_i,P_j)+(1-\lambda )F_n(P_i,P_j) \end{aligned}$$
(10)

where \(\lambda \in [0,1]\) trades off the weight of \(F_s\) against \(F_n\).

Time Decay. To get a path which can illustrate the development of the field of study, it is necessary to avoid overemphasizing old important articles. In this paper, we use a time-based decay parameter \(\gamma \) to reduce influence of older articles. The final relatedness \(Sim(P_i,P_j)\) can be defined as:

$$\begin{aligned} Sim(P_i,P_j)=Aggr(P_i,P_j)\cdot \exp ^{- \gamma \cdot |Time(P_i)-Time(P_j)|} \end{aligned}$$
(11)

where \(\gamma \) is positive, \(Time(P_i)\) and \(Time(P_j)\) denote papers’ published years.

3.4 Path Generation

To get an ordered path, we define a strategy to extend the path from a given node. We propose a two-stage beam search component for path generation, as shown in Fig. 4.

Fig. 4.
figure 4

Two stage path generation.

First Stage. In the first stage, we improve cohesiveness among nodes in the generated path. Specifically, given a path T in the candidate paths, which is denoted as \((u_1,u_2,...,u_l)\), we need to choose the next node to extend the path. The similarity of candidate node \(\widehat{u}\) and path T can be defined as:

$$\begin{aligned} F_h(T,\widehat{u})=\frac{1}{l} \sum _{i=1}^{l} Sim(u_i,\widehat{u}) \end{aligned}$$
(12)

To get better results, we use beam search to get a set of candidate nodes \(CanSet=\{u_{l+1}^{(1)},...,u_{l+1}^{(k)}\}\), which has top-k high similarity with the path. We extend the given path T to k new paths. For all candidate paths, we do the same operation and get the new set of all extended candidate paths.

Second Stage. In the second stage, we try to reduce the size of the set of extended candidate paths. To keep topic similarity between the generated paths and the original paper, we define the similarity between the path \(T=(u_1,u_2,...,u_l)\) and the original node \(u_1\) as follows:

$$\begin{aligned} F_h(T,u_1)=\frac{1}{l-1} \sum _{i=2}^l Sim(u_i,u_1) \end{aligned}$$
(13)

We choose the most similar k candidate paths as the i-step generated paths. We do two-stage path-generation repeatedly to get the final generated paths and rank these paths in order of decreasing similarity for paper recommendation.

4 Experiment

In this section, we introduce the datasets and experimental setup, as well as the evaluation metrics and baseline methods. We also present and analyze the results of our experiments.

4.1 Data Preparation

Dataset. The experiments are conducted on two different datasets: the DBLP dataset and the ACM dataset [19]. The information of title, abstract, keywords, authors, publication venue and publication year is used in both datasets. Statistics of the two constructed heterogeneous bibliographic networks are summarized in Table 2. We preprocess texts of each paper, remove stop words and words appearing less than 10 times and then stem each word.

Table 2. Statistics of two datasets.

Evaluation Settings. To conduct the experiments, we invite 10 experts to help on evaluating the quality of recommended results. We randomly choose 200 papers in DBLP and ACM as test set respectively. For each given paper, 3 experts annotate ordered paths and we choose the overlap results as ground truth papers. All meta-data (e.g., title and abstract) can be available for annotators.

4.2 Experimental Settings

Baseline Methods. Several widely deployed paper recommendation approaches were implemented. We compared the recommendation results of the following methods in academic network:

LDA: LDA [1] is a celebrated generative model for text documents that learns representations for documents as distributions over word topics.

LSI: LSI [3] uses singular value decomposition on the BOW representation to arrive at a semantic feature space.

PathSim: PathSim [17] is a meta path-based method to search similar papers in heterogeneous information networks. It considers different linkage paths to study similarity among the same type of objects in academic networks.

HeteSim: HeteSim [15] is a path-constrained method to measure the relatedness of heterogeneous objects in heterogeneous networks.

PageRank: PageRank [10] is a method to derive an object’s importance based on authority propagation in the heterogeneous bibliographic network. It tends to rank papers based on citation analysis in citation networks.

Deepwalk: Deepwalk [12] is a network-only representation learning method. Deepwalk takes random walk paths from network as sentences and nodes as words to learn the node representations by applying the Skip-Gram algorithm.

Node2vec: Node2vec [5] proposes an improvement to the random walk phase of DeepWalk and combines DFS-like and BFS-like neighborhood exploration.

PAPR: Our proposed method combines text and network structure information to learn representations for nodes. To evaluate the benefit brought by time decay introduced in Sect. 3.3 and two-stage path-generation introduced in Sect. 3.4, we design two variations of PAPR. PAPR1 only uses two-stage path-generation and PAPR2 only uses time decay.

Parameter Settings. In the section of text learning, we adopt 100 dimension, 10 window size, skip-gram model and 5 negative samples for word2vec. In the section of network learning, we set 128 dimension, 10 window size as the basic parameters. The number of walks per node is 20 and the walk length is 30. In all methods, biased parameters are also fine-tuned to report the best performance.

Table 3. Recommendation performance comparisons on DBLP and ACM datasets in terms of Precision and Recall.

Evaluation Metrics. We employ Precision and Recall at position M (P@M and R@M) as the evaluation metrics. Precision@M is defined as the fraction of ground truth papers contained by the M-length recommended path and Recall@M is defined as the fraction of ground truth papers that appear in the M-length recommended path contained by the whole ground truth. For each given paper \(P_i\), we have:

$$\begin{aligned} P@M=\frac{|N(M;P_i)|}{|N_p(P_i)|} \qquad R@M=\frac{|N(M;P_i)|}{|N_r(P_i)|} \end{aligned}$$
(14)

where \(N(M;P_i)\) is the set of ground truth papers in the M-length recommended path, \(N_p(P_i)\) is the set of papers in the recommended path and \(N_r(P_i)\) is the set of papers in the ground truth path.

4.3 Result Analysis

Evaluation results on different methods are shown in Table 3. We compare the proposed method with different baselines using Precision and Recall. Table 3 summarizes the comparison results on both DBLP and ACM datasets. It can be easily observed that our proposed method outperforms other methods and gains a performance improvement of more than 20% over the best competitive algorithm. In general, the method LDA gets the lowest precision on ACM dataset, which shows that the similarity of papers can not be evaluated only based on content information. PathSim and HeteSim have the similar performance. However, they don’t perform well on DBLP dataset because they heavily depend on hand-engineered meta paths. To analyze the effect of time-decay component and two-stage path generation component in our method, we also compare the proposed method with its variation PAPR1 and PAPR2. We can observe that PAPR gains a performance improvement of more than 10% over PAPR1 and more than 6% over PAPR2 on precision metrics, since time-decay strategy and two-stage path-generation play important roles for better recommendation.

Fig. 5.
figure 5

Precision and recall with varying \(\alpha \), \(\lambda \) and \(\gamma \) for PAPR.

4.4 Parameter Study

In this section, we study the impact of parameters \(\alpha \), \(\lambda \) and \(\gamma \) on dataset DBLP. \(\alpha \) is proposed to balance paper-paper space and author-paper-venue space. Observed from Fig. 5(a)(d), PAPR achieves the best performance when \(\alpha \) = 0.2. This is because, citation relation has a leading role and two meta-path relations (APA, APVPA) make a great supplement for relatedness measurement. \(\lambda \) trades off the weight of text semantic and network structure. We set \(\alpha =0.2\) to evaluate influence of text similarity and network similarity. From Fig. 5(b)(e), we find that the lower \(\lambda \) can get better results of Precision@3 and Recall@3. Meanwhile, the higher \(\lambda \) gets better results of Precision@8 and Recall@8. It means that short path depends more on network structure and long path depends more on text semantics. \(\gamma \) is used to reduce the influence of older papers. Figure 5(c)(f) show the influence of time decay. We can observe that the precision and recall increase quickly at first and then decline slowly, which means time decay is helpful to get a better time-oriented path but excessive decay has no better effects.

Table 4. Performance comparision over running time of different beam sizes.

To achieve the trade-off between running time and the quality of the paths, we also perform experiments over average running time of different beam sizes on each testcase. Table 4 illustrates the results w.r.t. Precision and Recall on different beam sizes in DBLP and ACM dataset. We can see that both running time and quality of paths increase as the beam size increases. For running time, it increases steadily as beam size increases. However, there is a much more rapid increase on performance when beam size increases from 1 to 5, and Precision and Recall have few improvements from 5 to 7 in both datasets. Therefore, we set beam size as 5 to obtain most of the benefits of efficiency and performance.

5 Related Work

5.1 Content-Based Algorithms

Content-based filtering is a widely used method to compare similarity of items in paper recommendation systems. It uses keywords extracted from papers’ texts to evaluate similarity of articles. [21] feeds documents’ titles and abstracts into TF-IDF model and learns probabilistic model to evaluate relatedness of documents. [9] develops a paper recommendation system which constructs users’ profiles based on candidate papers’ titles. However, these methods mostly use Bag-of-Words model, which has difficulty in finding conceptually similar work [2].

5.2 Collaborative Filtering

Collaborative filtering is a popular and widely used method in recommender systems. However, collaborative filtering cannot generate accurate recommendations without sufficient initial ratings from users. In order to alleviate the cold start problem, [8] proposes a context-based collaborative filtering, which incorporates co-occurrence relations into rating matrix. [14] combines the matrix factorization with the topic modeling to achieve a better performance. Nevertheless, collaborative filtering needs much computing time and offline data processing.

5.3 Network-Based Algorithms

Network-based paper recommendations concentrate on analyzing citation network to understand the relationship between scholarly papers. Two main methods in citation analysis are co-citation analysis and bibliographic coupling. However, these methods can not address complex relationships in networks [22]. [20] proposes a method based on random walk with restart(RWR) to measure vertex-to-vertex relevance. [7] considers a heterogeneous network to recommend similar papers. Nevertheless, these works mainly extract hand-engineered structural information to calculate paper similarity, which is inflexible and time-consuming.

6 Conclusion

In this paper, we propose a Path-based Academic Paper Recommendation, namely PAPR, to perform academic paper recommendation. Specifically, to acquire an ordered path of relevant scholarly papers, we design a flexible and expressive model, which takes both text semantics and network structure into account. To facilitate evaluation of similarity between papers, textual attributes and network structure are embedded in vector space. The experiments based on real datasets show that our model outperforms other methods.