A Consistency Analysis of Different NLP Approaches for Reviewer-Manuscript Matchmaking

Kotak, Nishith; Roy, Anil K.; Dasgupta, Sourish; Ghosal, Tirthankar

doi:10.1007/978-3-030-91669-5_22

Nishith Kotak¹¹,
Anil K. Roy¹¹,
Sourish Dasgupta¹² &
…
Tirthankar Ghosal¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13133))

Included in the following conference series:

International Conference on Asian Digital Libraries

1217 Accesses

Abstract

Selecting a potential reviewer to review a manuscript, submitted at a conference is a crucial task for the quality of a peer-review process that ultimately determines the success and impact of any conference. The approach adopted to find the potential reviewer needs to be consistent with its decision of allocation. In this work, we propose a framework for evaluating the reliability of different NLP approaches that are implemented for the match-making process. We bring various algorithmic approaches from different paradigms and an existing system Erie, implemented in IEEE INFOCOM conference, on a common platform to study their consistency of predicting the set of the potential reviewers, for a given manuscript. The consistency analysis has been performed over an actual multi-track conference organized in 2019. We conclude that Contextual Neural Topic Modeling (CNTM) with a balanced combinatorial optimization technique showed better consistency, among all the approaches we choose to study.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A journal recommender for article submission using transformers

Article 21 December 2022

PEERRec: An AI-based approach to automatically generate recommendations and predict decisions in peer review

Article 04 July 2023

Towards automated meta-review generation via an NLP/ML pipeline in different stages of the scholarly peer review process

Article 24 April 2023

Keywords

1 Introduction

The peer-review process in a conference is the cornerstone in the current academic and research field which is majorly regarded as an important part of scholarly communications. The selection of an expert reviewer plays a crucial role in the peer-review process. A reviewer, while reviewing, needs to focus on a) technical quality of the work b) reproducibility of the work c) impact of paper over the community, and d) extent of the work to be original and novel. For this, the reviewer assigned to the manuscript must be an expert in the domain of the submitted manuscript.

A framework is required to be developed that scrutinizes all the allocations of the expert reviewers to the submitted manuscripts. This work is not an attempt to propose a better reviewer-manuscript match-making system but rather to propose a framework for evaluating the reliability of match-making algorithms. This framework is agnostic to any conference, of whether the actual (semi)-manual allocation is perfect or not.

Certain attempts have been made to develop automated systems like TPMS [9], GRAPE [10], SubSift [11], Erie [20] to find a perfect match. The authors [28] have generalized the range of approaches for matching a reviewer with the manuscript. The authors in [12, 14, 16, 17, 24] have considered keywords as a matching parameter. The authors in [4, 15, 18, 26] have used Latent Dirichlet Allocation (LDA) approach while in [27], apart from LDA, authors also considered the concept of freshness for understanding the change in the research interest of a reviewer with time. Even the bibliography-based matching was been proposed by the authors in [21]. The authors in [22] worked on expertise, authority and diversity parameters while the authors in [23] considered a set of references and pedagogical facets. Hiepar-MLC approach [31] used a two-level bidirectional GRU with an attention mechanism to capture word-sentence-document information. To the best of our knowledge, any kind of consistency analysis of the implemented approaches in the context of reviewer-manuscript matching has not been performed yet.

By consistency, we here show that, if the approach agrees with a certain set of reviewers by providing a higher similarity score, then it should provide a significantly lower similarity score to the other set of reviewers, proving the system to be less ambivalent. A detailed explanation of consistency is given in Sect. 2. We attempt to bring different paradigms together to perform the analysis over the actual dataset provided by the conference organized in 2019. Over the analysis we performed, Contextual Neural Topic Modeling (CNTM) approach provided us with more stable and reliable results giving a new direction to explore CNTM in a more further detailed version that can be used in developing a reviewer-manuscript match-making system.

2 Problem Formulation

The reviewer-manuscript match-making process is accomplished majorly by imposing two constraints: a) workload constraint and b) review coverage constraint. Workload constraint is the maximum number of manuscripts that can be allocated to an individual reviewer to review, while review coverage constraint deals with the number of reviews required per manuscript to fulfill the peer-review process.

Let’s consider $\mathcal {R}=\{r^{(i)}\}_{i=1}^{n}$ be the set of n-reviewers, $\mathcal {M}=\{m^{(j)}\}_{j=1}^{m}$ be the set of m-manuscripts submitted to review. Let $[\varPi ]^{n}$ denote the profiles of n reviewers defined as $[\varPi ]^{n}=(\pi ^{(1)},\pi ^{(2)},\dots \pi ^{(n)})$. Here, profile of reviewers represents the expertise of reviewers. The process of formulation of profiles is mentioned in Sect. 4.1. We define sigma ($\sigma _{rt}$) as the match-making similarity function applied over the reviewer’s and manuscript profile, to obtain the similarity score matrix in-between the reviewers and manuscripts, using any match-making representational technique (let’s say rt). A similarity tensor S can be obtained as:

$$\begin{aligned} \mathrm {S}=\sigma _{rt}[\varPi ,\mathcal {M}] \end{aligned}$$

$S_{ij} \in [0,1]^{nxm}$ be the similarity matrix between the reviewer and manuscript. Higher the similarity score, more inclined the reviewer’s expertise to the manuscript’s theme. Let $\{\mathcal {R}^{(ar)}\}$ be the set of K-allocated reviewers to a particular manuscript and $\{\mathcal {R}^{(nar)}\}$ be the set of non-allocated reviewers. Here, $\{\mathcal {R}^{(nar)}\}=\mathcal {R} - \{\mathcal {R}^{(ar)}\}$.

It is necessary to determine the consistency of the approach adopted to calculate the similarity. By consistency, we mean the agreement of any match-making algorithmic approach to a certain set of reviewers by providing a higher similarity score, while it should disagree with the remaining set of the reviewers by providing a significantly lower similarity score. We define a term, here, a degree of consistency, denoted as $\varDelta $, that shows the consistency in the decision of predicting the reviewers by a particular algorithm. Figure 1 shows the set of reviewers predicted by any match-making algorithm to review a particular manuscript out of the global pool of the reviewers who actually signed up for the review process. The degree of consistency can be defined as, the absolute difference in the average similarity score of the predicted reviewers and the average similarity score of the remaining set of reviewers.

$$\begin{aligned} AS_{ar}=\left[ \frac{\sum _{i=1}^{m} S_{ir_{k}}}{m} \right] , r_{k} \in \{\mathcal {R}^{(ar)}\}, 0 \le k \le K \\ AS_{nar}=\left[ \frac{\sum _{i=1}^{m} S_{ir_{k}}}{m} \right] , r_{k} \in \{\mathcal {R}^{(nar)}\}, 0 \le k \le n-K \end{aligned}$$

$$\begin{aligned} \varDelta = abs\left( AS_{ar} - AS_{nar}\right) \end{aligned}$$

(1)

Here, $AS_{ar}$ is the average similarity score of allocated reviewers, while $AS_{nar}$ is the average similarity score of non-allocated reviewers. $\varDelta $ represents the degree of consistency. More the value of $\varDelta $, more consistent the algorithm is, with its decision of predicting the reviewers.

3 Conference Dataset Description

The Technical Program Committee Chair of the “MultiTrack Conf”^{Footnote 1} conference provided us with the complete data of a) all submitted manuscripts, b) the full list of reviewers with their affiliations (which we call Global pool), c) track-wise list of reviewers (which we call Track pool), and d) manuscripts allocated to a set of reviewers (which we call Original allocation). “MultiTrack Conf” was an engineering domain multi-track conference organized in 2019. Table 1 gives a summary of the conference data.

Table 1. “MultiTrack Conf” conference dataset details

Full size table

4 Methodologies Implemented and Result Analysis

This section includes various representation approaches that have been used for the match-making process. This section also focuses on the experimental setup and the evaluation method that has been undertaken to evaluate the consistency of different approaches.

4.1 Experimental Setup

The first step is to create profiles of the manuscripts and the reviewers. From the reviewers’ names and affiliation, the publications title and the publication years are extracted using Orcid [1]. Using the publication details, the DOI number is extracted using Crossref [3]. Finally, the abstracts of the papers are extracted using Semantic Scholar [2]. Publication details of some reviewers are not available in Orcid, hence their abstracts are extracted by web scraping.

In order to build the reviewer’s profile, we hypothesize that the past 5 years or recent 20 papers (which we empirically derive from the publication frequency of reviewers in our dataset) are an indicator of the research domain of operation/interests of the reviewer. Hence, for each reviewer, publications of last 5 years or recent 20 publications, whichever was earlier, are profiled. The title and abstract of the publications collectively formed the reviewer’s profile. The title and abstract provided by the conference, are used to build the manuscript’s profile. From the generalized structural property of the research papers, it is evident that the title and abstract reflect the core theme of the entire paper.

Before applying any match-making algorithm, a pre-processing task involving the removal of English stopwords and research stopwords was carried out. The research stopwords like author, efficiency, proposal, study, etc. are the set of words which are frequently repeated in the publications. They generally, do not convey sufficient information as a standalone entity.

The match-making process between the reviewer and manuscript is mentioned in Algorithm 1. During each allocation, the reviewer workload and manuscript coverage constraints are taken into consideration. The balanced optimized Hungarian approach [19] is adopted for the assignment process. The constraint pair $(\mu ,\lambda )$ is taken as (6, 3). The similarity matrix S between reviewer and manuscript can be obtained using different approaches mentioned in Sect. 4.3.

Table 2. Description of algorithms implemented with corresponding approaches adopted in Reviewer Matching Problem (No. of latent topics for topic modeling approaches is set to 20. All hyper-parameters are set to default.)

Full size table

4.2 Evaluation Method

To evaluate the approaches, top-3 reviewers are assigned to each of the submitted manuscripts using Algorithm 1. This allocation is compared with the original allocation done by the track chair. Three other modes of allocation are also done to study the consistency of approaches. These modes include the allocation to the reviewers among the global pool, which we call here as global pool allocation. The allocation was also studied by restricting the reviewers to the track they have selected, which we call as track-based reviewer allocation. The third mode of allocation includes the allocation of a particular manuscript among the set of reviewers who were actually not being allocated that particular manuscript to review, which we call it as global pool minus original allocation. Let $AS_{oa}$ be the average similarity score of original allocated reviewers, while $AS_{goa}$ be the average similarity score of global pool minus original allocated reviewers. The Eq. 1 for the degree of consistency can now be moulded as:

$$\begin{aligned} \varDelta = abs\left( AS_{oa} - AS_{goa}\right) \end{aligned}$$

(2)

4.3 Methodologies Description and Implementation

This subsection includes discussion of representational paradigms of queries (manuscripts) and targets (reviewers), along with their implementation to obtain the similarity between the reviewer and manuscript. We have implemented various approaches that includes Statistical approach based Keyword extraction methods like TextRank [25], RAKE [30] and YAKE [7], probabilistic topic modeling approach like Latent Dirichlet Allocation (LDA) [6], neural topic modeling approaches like Contextual Neural Topic Modeling (CNTM) [5] and BERTopic [13], Transformer based embedding approaches like Universal Sentence Encoder (USE) [8] and Sentence BERT (SBERT) [29]. We also observed the consistency over the existing system Erie [20] implemented in IEEE INFOCOM conference. Table 2 shows the description of approaches with their implementations, to obtain the similarity between the reviewers and manuscripts.

4.4 Result Analysis

Using the approaches mentioned in Sect. 4.3, the similarity between the reviewer and manuscript has been obtained. Now, to calculate the consistency of each of these approaches, the degree of consistency ($\varDelta $) is been computed using Eq. 2. Figure 2 is the comparison graph showing the consistency of different approaches and existing system Erie over the original allocation and the three other modes of allocation of reviewers. With the perspective of the degree of consistency $(\varDelta )$, it can be seen from the Fig. 2 that CNTM using word embedding variant shows better consistency among other approaches.

Keywords are the important facets of any paper. Authors tend to provide very specific yet peripheral keywords (e.g. Adam optimizer) or the broader category of keywords (e.g. Artificial intelligence). It may not serve a good idea to rely only on author-tagged keywords. Hence, a technique like keyword extraction is adopted to extract the core concepts of the paper. It can be observed that these approaches showed consistency with low delta differential value. Keywords based matching doesn’t consider semantically relevant concepts like plagiarism and copy as similar ones. So, we decided to introduce transformer-based contextual embedding in the representation to study their consistency. They showed higher similarity agreements but have lower delta differential component.

A publication is a collection of (latent) topics representing certain themes. So, we analyzed a topic modeling approach like LDA to study consistency. This approach clustered topics based on the representing words, but the issue of semantic relevance still persists. So, we decided to introduce and test the contextual embedding over the topic modeling approach like CNTM, where the semantically relevant words were classified in the same topic cluster. For instance, biological cell and electrolytic cell, despite having common word cell, would fall in different clusters representing biological/medical topic and in electronics domain respectively. Variants of CNTM are also applied to study their consistency. The consistency analysis is also performed over the existing reviewer assignment system, Erie implemented in the IEEE INFOCOM conference. As seen in Fig. 2, the CNTM model using word embeddings proved to have better consistency than any other approaches that we have considered in this study.

5 Conclusion and Future Work

We bring various algorithmic approaches from different paradigms and an existing system Eire, on a common platform, to study a framework of consistency, in evaluating match-making approaches. From the analysis performed, it can be established that the reviewer-manuscript match-making system based on Contextual Neural Topic Modelling (CNTM) using Word Embedding approach may result in a better match, as it directly considers SBERT embeddings used in the model. In the future, we plan to develop a match-making system considering Conflict of Interests (COIs), with sentiment analysis performed over the reviews provided by the reviewers. This will help in identifying the detailed quality reviews. We would like to extend the study of consistency over the full text of publications. We plan to develop a match-making system that may reduce the burden over the TPCs and thus promising a better quality of peer-review process in the conference.

Notes

1.
Due to the data privacy and confidentiality conditions, the original conference’s name is not revealed.

References

Orcid. https://orcid.org/. Accessed 14 July 2021
Semantic scholar $|$ AI-powered research tool. https://www.semanticscholar.org/. Accessed 14 July 2021
You are crossref - crossref. https://www.crossref.org/. Accessed 14 July 2021
An automated conflict of interest based greedy approach for conference paper assignment system. J. Informetr. 14(2), 101022 (2020). https://doi.org/10.1016/j.joi.2020.101022. https://www.sciencedirect.com/science/article/pii/S1751157719301373
Bianchi, F., Terragni, S., Hovy, D.: Pre-training is a hot topic: contextualized document embeddings improve topic coherence. arXiv preprint arXiv:2004.03974 (2020)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(null), 993–1022 (2003)
Google Scholar
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! Keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020). https://doi.org/10.1016/j.ins.2019.09.013
Article Google Scholar
Cer, D., et al.: Universal sentence encoder (2018)
Google Scholar
Charlin, L., Zemel, R.: The Toronto paper matching system: an automated paper-reviewer assignment system (2013)
Google Scholar
Di Mauro, N., Basile, T., Ferilli, S.: GRAPE: an expert review assignment component for scientific conference management systems, pp. 789–798, June 2005. https://doi.org/10.1007/11504894_109
Flach, P.A., et al.: Novel tools to streamline the conference review process: experiences from SIGKDD’09. SIGKDD Explor. Newsl. 11(2), 63–67 (2010). https://doi.org/10.1145/1809400.1809413
Article Google Scholar
Goldsmith, J., Sloan, R.: The AI conference paper assignment problem. In: AAAI Workshop - Technical Report, January 2007
Google Scholar
Grootendorst, M.: BERTopic: leveraging BERT and c-TF-IDF to create easily interpretable topics (2020). https://doi.org/10.5281/zenodo.4430182
Hartvigsen, D., Wei, J., Czuchlewski, R.: The conference paper-reviewer assignment problem*. Decis. Sci. 30, 865–876 (2007). https://doi.org/10.1111/j.1540-5915.1999.tb00910.x
Article Google Scholar
Jin, J., Geng, Q., Zhao, Q., Zhang, L.: Integrating the trend of research interest for reviewer assignment. In: Proceedings of the 26th International Conference on World Wide Web Companion. WWW 2017 Companion, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, pp. 1233–1241 (2017). https://doi.org/10.1145/3041021.3053053
Kalmukov, Y.: Architecture of a conference management system providing advanced paper assignment features. Int. J. Comput. Appl. 34, 51–59 (2011). https://doi.org/10.5120/4083-5888
Article Google Scholar
Kalmukov, Y.: Describing papers and reviewers’ competences by taxonomy of keywords. Comput. Sci. Inf. Syst. 9, 763–789 (2012). https://doi.org/10.2298/CSIS110906012K
Article Google Scholar
Kou, N.M., Leong Hou, U., Mamoulis, N., Gong, Z.: Weighted coverage based reviewer assignment. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, New York, NY, USA, pp. 2031–2046. Association for Computing Machinery (2015). https://doi.org/10.1145/2723372.2723727
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83–97 (1955)
Article MathSciNet Google Scholar
Li, B., Hou, Y.T.: The new automated IEEE INFOCOM review assignment system. IEEE Netw. 30(5), 18–24 (2016). https://doi.org/10.1109/MNET.2016.7579022
Article Google Scholar
Li, X., Watanabe, T.: Automatic paper-to-reviewer assignment, based on the matching degree of the reviewers. Procedia Comput. Sci. 22, 633–642 (2013). https://doi.org/10.1016/j.procs.2013.09.144. https://www.sciencedirect.com/science/article/pii/S187705091300937X. 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems - KES 2013
Liu, X., Suel, T., Memon, N.: A robust model for paper reviewer assignment. In: Proceedings of the 8th ACM Conference on Recommender Systems, RecSys 2014, New York, NY, USA, pp. 25–32. Association for Computing Machinery (2014). https://doi.org/10.1145/2645710.2645749
Medakene, A.N., Bouanane, K., Eddoud, M.A.: A new approach for computing the matching degree in the paper-to-reviewer assignment problem. In: 2019 International Conference on Theoretical and Applicative Aspects of Computer Science (ICTAACS), vol. 1, pp. 1–8 (2019). https://doi.org/10.1109/ICTAACS48474.2019.8988127
Merelo-Guervós, J.J., Castillo-Valdivieso, P., et al.: Conference paper assignment using a combined greedy/evolutionary algorithm. In: Yao, X. (ed.) PPSN 2004. LNCS, vol. 3242, pp. 602–611. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30217-9_61
Chapter Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: EMNLP (2004)
Google Scholar
Nguyen, J.: Knowledge aggregation in people recommender systems: matching skills to tasks (2019)
Google Scholar
Peng, H., Hu, H., Wang, K., Wang, X.: Time-aware and topic-based reviewer assignment. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds.) DASFAA 2017. LNCS, vol. 10179, pp. 145–157. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55705-2_11
Chapter Google Scholar
Price, S., Flach, P.A.: Computational support for academic peer review: a perspective from artificial intelligence. Commun. ACM 60, 70–79 (2017)
Article Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks (2019)
Google Scholar
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Min. Appl. Theory 1, 1–20 (2010). https://doi.org/10.1002/9780470689646.ch1
Article Google Scholar
Zhang, D., Zhao, S., Duan, Z., Chen, J., Zhang, Y., Tang, J.: A multi-label classification method using a hierarchical and transparent representation for paper-reviewer recommendation. ACM Trans. Inf. Syst. 38(1), February 2020. https://doi.org/10.1145/3361719

Download references

Author information

Authors and Affiliations

DA-IICT, Gandhinagar, India
Nishith Kotak & Anil K. Roy
RAx Labs Inc., Delaware, USA
Sourish Dasgupta
Institute of Formal and Applied Mathematics, Charles University, Prague, Czech Republic
Tirthankar Ghosal

Authors

Nishith Kotak
View author publications
You can also search for this author in PubMed Google Scholar
Anil K. Roy
View author publications
You can also search for this author in PubMed Google Scholar
Sourish Dasgupta
View author publications
You can also search for this author in PubMed Google Scholar
Tirthankar Ghosal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nishith Kotak .

Editor information

Editors and Affiliations

National Taiwan Normal University, Taipei, Taiwan
Hao-Ren Ke
Nanyang Technological University, Singapore, Singapore
Chei Sian Lee
Kyoto University, Kyoto, Japan
Kazunari Sugiyama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kotak, N., Roy, A.K., Dasgupta, S., Ghosal, T. (2021). A Consistency Analysis of Different NLP Approaches for Reviewer-Manuscript Matchmaking. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-91669-5_22
Published: 30 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91668-8
Online ISBN: 978-3-030-91669-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Consistency Analysis of Different NLP Approaches for Reviewer-Manuscript Matchmaking