Scientific Item Recommendation Using a Citation Network

Wang, Xu; van Harmelen, Frank; Cochez, Michael; Huang, Zhisheng

doi:10.1007/978-3-031-10986-7_38

Xu Wang^12,13,
Frank van Harmelen^12,13,
Michael Cochez^12,13 &
…
Zhisheng Huang¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13369))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1987 Accesses
1 Citations

Abstract

Scientific items (such as papers or datasets) discovery and reuse is crucial to support and improve scientific research. However, the process is often tortuous, and researchers end up using less than ideal datasets. Search engines tailored to this task are useful, but current systems only support keyword searches. This hinders the process, because the user needs to imagine what kind of keywords would give useful datasets. In this paper, we investigate a new technique to recommend scientific items (paper or datasets). This technique uses a graph consisting of scientific papers, the corresponding citation network, and datasets used in these works as background information for its recommendation. Specifically, a link-predictor is trained which is then used to infer useful datasets for the paper the researcher is working on. As an input, it uses the co-author information, citation information, and the already used datasets. To compare different scientific items recommendation approaches fairly and to prove their efficiency, we created a new benchmark. This benchmark includes more than three million scientific items to evaluate the performance of recommendation approaches. We experiment with a variety of methods and find that an ensemble technique which uses link prediction on the citation network yields a precision of nearly 70%.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Hybrid Recommendation System for Scientific Literature

Where to Begin? Using Network Analytics for the Recommendation of Scientific Papers

Research Paper Recommender Systems on Big Scholarly Data

Keywords

1 Introduction

Data discovery and reuse play an essential role in helping scientific research by supporting to find data [4, 19]. Researchers typically reuse datasets from colleagues or collaborators, and the credibility of such datasets is critical to the scientific process [11, 25]. Datasets sourced from a network of personal relationships (colleagues or collaborators) can carry limitations as they tend only to recommend datasets that they themselves find helpful [2]. However, due to the research variability, one person’s noisy data may be another person’s valuable data. Also, datasets retrieved from relational networks can be limited to certain research areas.

As an emerging dataset discovery tool, a dataset search engine can help researchers to find datasets of interest from open data repositories. Moreover, due to the increasing number of open data repositories, many dataset search engines, such as Google Dataset Search [3] and Mendeley Data^{Footnote 1}, cover more than ten million datasets. While dataset search engines bring convenience to researchers, they also have certain limitations. Similar to general search engines, such dataset search engines require the researcher to provide keywords to drive the search; filtering, ranking, and returning all datasets based on the given keywords. In order to use a dataset search engine, researchers need to summarize the datasets they are looking for into these keywords, with the risk that they do not cover all the desired properties, and that unexpected but relevant datasets will be missed. Thus, the standard pathway “scientific items \(\rightarrow \) keywords \(\rightarrow \) scientific items sets”^{Footnote 2} used by existing dataset search engines has inherent limitations.

This paper proposes a recommendation method based on entity vectors trained on citation networks. This approach is a solution for data discovery following the more direct “scientific items \(\rightarrow \) scientific items” pathway. Because our approach does not require converting scientific items (papers and datasets) into keywords, we can avoid the earlier drawbacks. Furthermore, we combine this new recommendation method with existing recommendation methods into an integrated ensemble recommendation method. This paper also provides a benchmark corpus for scientific item recommendation and a benchmark evaluation test. By performing benchmark tests on randomly selected scientific items from this benchmark corpus, we conclude that our integrated recommendation method using citation network entity embedding can obtain a precision rate of about 70%.

Specifically, in this paper, we study three research questions:

Will a citation network help in scientific item discovery?
Can we do dataset discovery purely by link prediction on a citation network?
Will the addition of citation-network-link-prediction help for scientific item discovery?

The main contributions of this paper are: 1) we propose a method for recommending scientific items based on entity embedding in an academic citation graph, 2) we propose a benchmark corpus and evaluation test for scientific items recommendation methods, 3) we identify an ensemble method that has high precision for scientific items recommendation, and 4) we provide the pre-trained entity embeddings for our large-scale academic citation network as an open resource for re-use by others.

2 Related Work

Data reuse aims to facilitate replication of scientific research, make scientific assets available to the public, leverage research investment, and advance research and innovation [19]. Many current works focus on supporting and bringing convenience to data reuse. Wilkinson et. al. provided FAIR guiding principles to support scientific data reuse [28]. Pierce et. al. provided data reuse metrics for scientific data so that researchers can track how the scientific data is used or reused [22]. Duke and Porter provided a framework for developing ethical principles for data reuse [10]. Faniel et. al. provided a model to examine the relationship between data quality and user satisfaction [12].

Dataset recommendation is also a popular research trend in recent years. Farber and Leisinger recommended suitable dataset for given research problem description [14]. Patra et. al. provided an Information retrieval (IR) paradigm for scientific dataset recommendation [20]. Altaf et. al. recommended scientific dataset based on user’s research interests [1]. Chen et. al. proposed a three-layered network (composed of authors, papers and datasets) for scientific dataset recommendation [5].

3 Link Prediction with Graph Embedding on a Citation Network

The link prediction training method we use is KGlove [7]. KGlove finds statistics of co-occurrences of nodes in random walks, using personalized page rank. Then Glove [21] is used to generate entity embeddings from the co-occurrence matrix. In this paper, we apply KGlove on 638,360,451 triples of the Microsoft Academic Knowledge Graph (MAKG) [13] citation network (containing 481,674,701 nodes) to generate a co-occurrence matrix of the scientific items. Then we use the Glove method on this co-occurence matrix to obtain the scientific entity (item) embeddings. The trained embeddings are made available for future work^{Footnote 3}. After training the entity embedding based on the MAKG citation network, we perform link predictions between scientific items (papers and/or datasets) by a similarity metric in the embedding space. We use cosine similarity, which is the most commonly used similarity for such embeddings.

Definition 1 (Link Prediction for scientific items with Entity Embedding)

Let \(E = \{e_1, e_2, ...\}\) be a set of scientific entities (also known as scientific items). Let \({{\,\mathrm{emb}\,}}\) be an embedding function for entities such that \({{\,\mathrm{emb}\,}}(e)\) is the embedding of entity \(e\in E\), and \({{\,\mathrm{emb}\,}}(e)\) is a one-dimensional vector of a given length.

Let \(\cos : (a, b) \rightarrow [0,1]\) be a function such that \(\cos (a,b) = \frac{{{\,\mathrm{emb}\,}}(a)\cdot {{\,\mathrm{emb}\,}}(b)}{||{{\,\mathrm{emb}\,}}(a)||\cdot ||{{\,\mathrm{emb}\,}}(b)||}\) where \(a,b \in E\).

Given a threshold t, we define Link prediction with Entity Embedding in E as a function \(LP_E:E\rightarrow 2^E\) where \(LP_E(e_s) = \left\{ r_{1}, r_{2}, ... ,r_{n} | \forall i = 1 \dots n, \cos (e_s, r_i) < t \right\} \).

4 Dataset Recommendation Methods

In this section we use the previous definition of link prediction to introduce two new dataset recommendation methods, as well as three methods from our previous work. We also propose an open-access scientific items recommendation evaluation benchmark, including corpus and evaluation pipeline (Fig. 1).

4.1 Dataset Recommendation Methods

The dataset recommendation methods in this section use a combination of link-prediction and ranking approaches to recommend a recommended scientific item based on given scientific items.

Data Recommendation with Link Prediction Using a Citation Network. This scientific entity (item) recommendation method is based on Definition 1, where a set of entities is returned such that the cosine distance between these entities and the given entity is smaller than a threshold t. Based on the list of scientific items returned by the link prediction algorithm, the recommendation method considers only the TOP-n results of that list, with the value of n to be chosen as a parameter of the method. Formally, this is defined as follows:

Definition 2

(Top-n scientific entity (items) Recommendation with Link Prediction). Let \(E = \{e_1, e_2, ...\}\) be a set of scientific entities (also known as scientific items). Let \(LP_E\) be a link prediction function using embeddings in E (see Definition 1). Top-n Scientific entity recommendation with link prediction using embedding is a function \(DRLP_{E}^n\), which maps an entity \(e_s\) to \((r_1, \dots r_m)\) which is the longest ordered list of \(m <= n\) pairwise distinct elements of \(LP_E(e_s)\) where \(\forall i = 1 \dots m - 1, \cos (e_s, r_i) <= \cos (e_s, r_{i+1})\).

In words, this function maps an entity (scientific item) to a list of at most n other entities (scientific items) which are closest to it in the embedded space, ordered by the distance.

We can now combine this general definition with a specific embedding function \({{\,\mathrm{emb}\,}}\) to create a specific link-prediction-based recommendation method. In particular, we use KGloVe embeddings from the MAKG citation network to create a recommendation method based on link prediction from a citation network.

Scientific Items Recommendation with BERT-based Link Prediction. The method from the previous subsection used the embeddings computed on the citation graph to determine similarity between data items. This is a plausible choice, since we can expect the MAKG citation graph to give us a reasonable signal for similarity in the scientific domain: it captures the scientific relationships between items in the science domain. In contrast to this, we also experimented with using other models to compute the similarity between items. In particular, we used the pretrained BERT model [9, 23] as an example of a cross-domain model to see if such a generic pretrained model would also suffice to compute the similarity metric that is the basis for our link-prediction-based recommendation algorithm. The pretrained BERT model used in this paper is the all-mpnet-base-v2 model from the SentenceTransformers Python library^{Footnote 4}. Such BERT-based link prediction for scientific items is obtained by applying the pretrained BERT model to the descriptive metadata of the scientific items to obtain the BERT embedding of scientific items. Such metadata consists of the title of the dataset and a short text that accompanies the dataset. Then, we apply the BERT embedding of the scientific items to Definition 2 to do scientific items recommendations.

Scientific Items Recommendation with BM25-based Data Ranking. BM25-based Data Ranking is the recommendation approach provided in our previous paper [27]. Given a seed scientific item, we rank the list of candidate recommended scientific items using the popular BM25 method from information retrieval according to the descriptive metadata of the scientific items (consisting of title and textual description), where a higher ranking position means a better recommendation [24].

Scientific Items Recommendation with Graph Walk. The co-author network-based graph walk method is also a scientific items recommendation method that we have previously proposed in [26]. Such a graph walk on a co-author network performs the recommendation task according to the “scientific items \(\rightarrow \) author \(\rightarrow \) co-author network \(\rightarrow \) author \(\rightarrow \) scientific items” pathway. In order to reduce the number of candidate recommendations we only consider items connected to authors within an n-hop distance to the author of the seed data item in the co-author network.

Dataset Recommendation with Pre-trained Author Embedding. Similar to the method based on citation-based embeddings, we have proposed in earlier work [26] a recommendation method for scientific items based on pre-trained co-authorship embeddings. This approach is similar to our proposed method using embeddings from the MAKG citation network (Definition 1), but uses embeddings computed from the MAKG co-author network instead.

5 Scientific Items Recommendation Benchmark

To evaluate the performance scientific items recommendation methods, we propose here an open-source generalized benchmark corpus and process for scientific items recommendation. scientific items in general can be publications, datasets, graphs, tables, geographic data, etc.

Table 1. Statistics of benchmark corpus

Full size table

5.1 Benchmark Corpus

The benchmark corpus is an HDT/RDF graph [15, 18] stored as triples of the form “[scientific item] [link] [scientific item].” The scientific items are the intersection of scientific items in ScholeXplorer^{Footnote 5} and MAKG (Microsoft Academic Knowledge Graph). This intersection is computed by matching the DOI of scientific items (datasets and/or papers) between ScholeXplorer and MAKG. We have chosen to represent all the scientific items by the identificatier used in the Microsoft Academic Graph (MAG). With help of these MAG identifiers, the information (such as title, providers, publishers, or creators) of scientific items is easily accessible in MAKG. The bi-directional links between these items are from ScholeXplorer and all the links are provided by data sources managed by publishers, data centers, or other organizations.

In Table 1, we show the statistics of our benchmark corpus. There are more than 3 million items and more than 15 million bi-directional links between them. We provide the data subset with only bi-directional links between scientific papers, consisting of 2.9 million scientific papers and 14.3 million links between them. We also provide the data subset of only the bi-directional links between scientific datasets, with 1,544 scientific items and 2,335 million links between them. We have made this corpus available at https://zenodo.org/record/6386897.

5.2 Benchmark Evaluation

The goal of our benchmark is to evaluate the performance of scientific item recommendation methods on all datasets in the benchmark corpus, with the option to only use a randomly selected subset. We use the F1-measure method [6] to evaluate the performance of recommendation methods on reconstruction of bi-directional links between scientific items. The F1-measure method consists of three evaluation metrics: recall, precision and F1-score. Recall is the percentage of recommendations (i.e. links as given in the dataset that start from the seed data) that the recommendation method can recommend. Precision is the percentage of scientific items recommended by the recommendation method that is correct (i.e., present in the standard). Finally, the F1-score is the harmonic mean of recall and precision.

6 Experiments and Results

This section will present the setup and results of our experiments on the proposed recommendation methods from Sect. 4 using the evaluation benchmark from Sect. 5. The implementation of recommendation methods and the code of all experiments could be found at https://github.com/XuWangVU/datarecommend.

6.1 Experimental Setup

We set up three evaluation experiments using three sets of data randomly selected from the benchmark corpus. The statistics of the selected data are shown in Table 2. For each seed scientific item, we look for recommendations among all the candidate scientific items and return a sorted subset of these candidates.

Table 2. Statistics of experiments

Full size table

The recommendation methods evaluated in the experiments comprise the five methods described in Sect. 4. Beyond these single methods, we also tested ensemble methods by combining multiple methods to make recommendations. All methods (including the ensemble methods) fall into two types of pathway-based categories: pathways with author and pathway without authors. All the methods (including the ensemble methods) used in our experiments can be found in Table 3.

Table 3. Scientific items recommendation methods used for experiments.

Full size table

We use thresholds for two methods: a distance threshold for graph walks and a threshold for similarity between author embeddings. The distance threshold for graph walks is the maximum number of hops that make up a graph walk. For example, hop1 means that only authors with a distance of 1 from the given author are considered. The author embedding similarity threshold means that only authors with an embedding similarity greater than or equal to the threshold with the given author are considered.

Each recommendation method is assigned a parameter. For the graph walk method, we use the parameter of hop1, hop2, or hop3, to represent the distance threshold used for graph walk. For the similarity method between pretrained MAKG author embeddings, we use similarity threshold parameters ranging from 0.3 to 0.7, increasing in steps of 0.1. For the BM25-based ranking method, we use the parameter \(p_{bm25} = 2*outdegree(seed)\), where outdegree(seed) is the number of scientific items linked from the seed in the benchmark corpus. In other words, we will only consider the top \(p_{bm25}\) results in the list returned by the ranking method. For both link prediction methods using citation network embeddings and BERT-based link prediction methods, we use a parameter of 0.8, which means we only consider the top 80% of the sorted lists returned by both methods.

6.2 Experimental Results

Table 4 show the results of the scientific items recommendation methods which do not consider authors in the pathway, while Tables 5, 6 and 7 show the results of methods considering authors. We use color-coding of the cells to indicate different ranges of values: Red means relative poor performance in comparison with related settings; green code means outstanding performance in comparison; and yellow means average performance.

Table 4. Results of Experiment(EXP) 1, 2 & 3 without graph walk and author embedding.

Full size table

In the experiments which do not consider authors, we found that recall, precision, and F1-score were usually not high, except for the method which only uses BERT, where we could obtain a recall of over 0.95. However, this situation does not achieve sufficiently high precision rates.

When the author network is taken into consideration, the precision rate improves considerably, and in some integrated methods, we achieve precision results of 0.7 or even 0.8. Unfortunately, these high precision rates come with a decreased recall rate, which means that the methods return few, but often correct recommendations.

This behavior, i.e., high precision rates at relative low recall, is typical and sufficient for recommendation engines. Hence, we explore these results in more detail. A comparison of the precision rates of the different methods can be found in Fig. 2. For experiment 1, we observe little variability, likely due to the small data size. For experiments 2 and 3, however, the precision rate increases with a higher distance threshold for the graph walk or with a higher threshold for the author embedding similarity.

Based on the comparison of the results of the different methods in Tables 5, 6 and 7 and Fig. 2, we can conclude that all recommendation methods that use data ranking (BM25) or link prediction (Citation Embedding) have a high precision on our scientific items recommendation benchmark experiments when using graph walking and author embedding similarity methods in an ensemble of methods.

7 Conclusion and Discussion

In this paper, we have investigated the use of a large scale citation network for the purposes of recommending scientific items, on the basis of a given scientific item by the user, according to the well-known paradigm “if you like this dataset, you might also like these other datasets”. The method uses low-dimensional vector space embeddings computed from the citation graph in order to compute the cosine similarity between datasets as the basis for its recommendations. By itself, this method performed unsatisfactorily on our benchmark under a variety of experimental settings.

We therefore also studied the behaviour of this method in an ensemble with a number of other methods: recommendations based on n-hops walks in a co-author graph (\(n=1,2,3\)), recommendations based on embeddings computed over this co-author graph, recommendations based on the BERT large language model, and the BM25 method from information retrieval. We studied a large variety of the most promising combinations of methods under different experimental settings. In our largest experimental setting, the ensemble methods that used the embeddings from the citation network outperformed those that didn’t, with a precision of 0.64 under a variety of settings. This acceptable precision in a recommendation setting comes at the price of a low recall, a behaviour that is typical in recommendation engines.

This allows us to succinctly answer the research questions we formulated in the introduction of this paper:

Will a citation network help in dataset discovery? Answer: yes
Can we do dataset discovery purely by link prediction on a citation network? Answer: no
Will the addition of citation-network-link-prediction help for dataset discovery? Answer: yes

We performed our experiments on a newly constructed benchmark set, using the KGlove method for training scientific entity (item) embeddings from the Microsoft Academic Knowledge Graph, containing a citation network of 100 million edges. We have made this benchmark corpus available online.

The methods that we designed and evaluated in this paper are clearly not the final word on how to recommend scientific items. Likely, the results can be improved not only by using tuning parameters to specific datasets, but also by adding other existing applicable methods. Also, the dataset could be expended. We have used both citation and co-author networks as signals for academic similarity, but also other academic networks exist. Including those is subject of future work.

The link prediction mentioned in this paper uses pre-trained embedding models. One drawback of this type of models is that this requires an embedding for each entity in the graph, and hence many existing models do not scale well enough. In the future several approaches could be investigated to overcome, one option is to use a model which can work in an inductive setting, based on the description, or even the content of the datasets. An example of such a method is BLP [8]. To reduce the number of embeddings, we could also use a model which only keeps embeddings for some entities in the graph, like NodePiece [16]. Another direction could be to attempt scaling models using summarization, as was done in [17].

Notes

1.
https://data.mendeley.com/.
2.
We use the term “scientific items” to refer to both papers and datasets.
3.
https://zenodo.org/record/6324341.
4.
https://www.sbert.net/docs/pretrained_models.html.
5.
https://scholexplorer.openaire.eu/.

References

Altaf, B., Akujuobi, U., Yu, L., Zhang, X.: Dataset recommendation via variational graph autoencoder. In: IEEE International Conference on Data Mining (ICDM), pp. 11–20 (2019)
Google Scholar
Borgman, C.: One scientist’s data as another’s noise. Nature 520(7546), 157 (2015)
Article Google Scholar
Brickley, D., Burgess, M., Noy, N.: Google dataset search: building a search engine for datasets in an open web ecosystem. In: WWW Conference, WWW 2019, pp. 1365–1375. ACM (2019). https://doi.org/10.1145/3308558.3313685
Chapman, A., et al.: Dataset search: a survey. VLDB J. 29, 251–272 (2019). https://doi.org/10.1007/s00778-019-00564-x
Article Google Scholar
Chen, Y., Wang, Y., Zhang, Y., Pu, J., Zhang, X.: Amender: an attentive and aggregate multi-layered network for dataset recommendation. In: IEEE International Conference on Data Mining (ICDM), pp. 988–993. IEEE (2019)
Google Scholar
Chinchor, N.: MUC-4 evaluation metrics. In: Proceedings of the 4th Conference on Message Understanding, MUC4 1992, pp. 22–29. ACL (1992). https://doi.org/10.3115/1072064.1072067
Cochez, M., Ristoski, P., Ponzetto, S.P., Paulheim, H.: Global RDF vector space embeddings. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 190–207. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_12
Chapter Google Scholar
Daza, D., Cochez, M., Groth, P.: Inductive entity representations from text via link prediction. In: Proceedings of The Web Conference (2021). https://doi.org/10.1145/3442381.3450141
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, vol. 1, pp. 4171–4186. ACL, June 2019. https://doi.org/10.18653/v1/N19-1423
Duke, C.S., Porter, J.H.: The ethics of data sharing and reuse in biology. BioScience 63(6), 483–489 (2013)
Article Google Scholar
Faniel, I.M., Jacobsen, T.E.: Reusing scientific data: how earthquake engineering researchers assess the reusability of colleagues’ data. Comput. Supported Coop. Work 19(3–4), 355–375 (2010). https://doi.org/10.1007/s10606-010-9117-8
Faniel, I.M., Kriesberg, A., Yakel, E.: Social scientists’ satisfaction with data reuse. J. Assoc. Inf. Sci. Technol. 67(6), 1404–1416 (2016)
Article Google Scholar
Färber, M.: The microsoft academic knowledge graph: a linked data source with 8 billion triples of scholarly data. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 113–129. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_8
Chapter Google Scholar
Färber, M., Leisinger, A.K.: Recommending datasets for scientific problem descriptions. In: International Conference on Information & Knowledge Management, p. 3014 (2021)
Google Scholar
Fernández, J.D., Martínez-Prieto, M.A., Gutiérrez, C., Polleres, A., Arias, M.: Binary RDF representation for publication and exchange (HDT). Web Semant. Sci. Serv. Agents World Wide Web 19, 22–41 (2013). http://www.websemanticsjournal.org/index.php/ps/article/view/328
Galkin, M., Wu, J., Denis, E., Hamilton, W.L.: NodePiece: compositional and parameter-efficient representations of large knowledge graphs. arXiv preprint arXiv:2106.12144 (2021)
Generale, A., Blume, T., Cochez, M.: Scaling R-GCN training with graph summarization (2022). https://doi.org/10.1145/3487553.3524719
Martínez-Prieto, M.A., Arias Gallego, M., Fernández, J.D.: Exchange and consumption of huge RDF data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 437–452. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_36
Chapter Google Scholar
Pasquetto, I.V., Randles, B.M., Borgman, C.L.: On the reuse of scientific data. Data Sci. J. 16, 8 (2017)
Article Google Scholar
Patra, B.G., Roberts, K., Wu, H.: A content-based dataset recommendation system for researchers-a case study on gene expression omnibus (geo) repository. Database 2020, 1 (2020)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. ACL (2014). https://doi.org/10.3115/v1/D14-1162
Pierce, H.H., Dev, A., Statham, E., Bierer, B.E.: Credit data generators for data reuse (2019)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. ACL (2019). https://doi.org/10.18653/v1/D19-1410
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M.: Okapi at TREC-3. In: Overview of the 3rd Text REtrieval Conference (TREC-3), pp. 109–126 (1995). https://www.microsoft.com/en-us/research/publication/okapi-at-trec-3/
Tenopir, C., et al.: Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLOS ONE 10(8), 1–24 (2015). https://doi.org/10.1371/journal.pone.0134826
Article Google Scholar
Wang, X., van Harmelen, F., Huang, Z.: Recommending scientific datasets using author networks in ensemble methods (2022). https://datasciencehub.net/paper/recommending-scienti%EF%AC%81c-datasets-using-author-networks-ensemble-methods
Wang, X., van Harmelen, F., Huang, Z.: Biomedical dataset recommendation. In: International Conference on Data Science, Technology and Applications - DATA, pp. 192–199 (2021). https://doi.org/10.5220/0010521801920199
Wilkinson, M.D., et al.: The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3(1), 1–9 (2016)
Article Google Scholar

Download references

Acknowledgments

This work was funded in part by Elsevier’s Discovery Lab (https://discoverylab.ai/). This work was also funded by the Netherlands Science Foundation NWO grant nr. 652.001.002 which is also partially funded by Elsevier. The first author is funded by the China Scholarship Council (CSC) under grant nr. 201807730060. Part of this work was inspired by discussions with other members of the discovery lab, like Daniel Daza and Dimitrios Alivanistos.

Author information

Authors and Affiliations

Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands
Xu Wang, Frank van Harmelen, Michael Cochez & Zhisheng Huang
Discovery Lab, Elsevier, Amsterdam, The Netherlands
Xu Wang, Frank van Harmelen & Michael Cochez

Authors

Xu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Frank van Harmelen
View author publications
You can also search for this author in PubMed Google Scholar
Michael Cochez
View author publications
You can also search for this author in PubMed Google Scholar
Zhisheng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu Wang .

Editor information

Editors and Affiliations

Télécom Paris, Paris, France
Gerard Memmi
Purdue University, West Lafayette, IN, USA
Baijian Yang
Shanghai Jiao Tong University, Shanghai, Shanghai, China
Linghe Kong
Nanyang Technological University, Singapore, Singapore
Tianwei Zhang
Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu

A Detailed Results of the Different Experiments

Table 5. Results of Experiment 1 with graph walk and author embedding.

Full size table

Table 6. Results of Experiment 2 with graph walk and author embedding.

Full size table

Table 7. Results of Experiment 3 with graph walk and author embedding.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., van Harmelen, F., Cochez, M., Huang, Z. (2022). Scientific Item Recommendation Using a Citation Network. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13369. Springer, Cham. https://doi.org/10.1007/978-3-031-10986-7_38

Download citation

DOI: https://doi.org/10.1007/978-3-031-10986-7_38
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10985-0
Online ISBN: 978-3-031-10986-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scientific Item Recommendation Using a Citation Network

Abstract

Similar content being viewed by others

Hybrid Recommendation System for Scientific Literature

Where to Begin? Using Network Analytics for the Recommendation of Scientific Papers

Research Paper Recommender Systems on Big Scholarly Data

Keywords

1 Introduction

2 Related Work

3 Link Prediction with Graph Embedding on a Citation Network

Definition 1 (Link Prediction for scientific items with Entity Embedding)