Keywords

1 Introduction

The peer-review process in a conference is the cornerstone in the current academic and research field which is majorly regarded as an important part of scholarly communications. The selection of an expert reviewer plays a crucial role in the peer-review process. A reviewer, while reviewing, needs to focus on a) technical quality of the work b) reproducibility of the work c) impact of paper over the community, and d) extent of the work to be original and novel. For this, the reviewer assigned to the manuscript must be an expert in the domain of the submitted manuscript.

A framework is required to be developed that scrutinizes all the allocations of the expert reviewers to the submitted manuscripts. This work is not an attempt to propose a better reviewer-manuscript match-making system but rather to propose a framework for evaluating the reliability of match-making algorithms. This framework is agnostic to any conference, of whether the actual (semi)-manual allocation is perfect or not.

Certain attempts have been made to develop automated systems like TPMS [9], GRAPE [10], SubSift [11], Erie [20] to find a perfect match. The authors [28] have generalized the range of approaches for matching a reviewer with the manuscript. The authors in [12, 14, 16, 17, 24] have considered keywords as a matching parameter. The authors in [4, 15, 18, 26] have used Latent Dirichlet Allocation (LDA) approach while in [27], apart from LDA, authors also considered the concept of freshness for understanding the change in the research interest of a reviewer with time. Even the bibliography-based matching was been proposed by the authors in [21]. The authors in [22] worked on expertise, authority and diversity parameters while the authors in [23] considered a set of references and pedagogical facets. Hiepar-MLC approach [31] used a two-level bidirectional GRU with an attention mechanism to capture word-sentence-document information. To the best of our knowledge, any kind of consistency analysis of the implemented approaches in the context of reviewer-manuscript matching has not been performed yet.

By consistency, we here show that, if the approach agrees with a certain set of reviewers by providing a higher similarity score, then it should provide a significantly lower similarity score to the other set of reviewers, proving the system to be less ambivalent. A detailed explanation of consistency is given in Sect. 2. We attempt to bring different paradigms together to perform the analysis over the actual dataset provided by the conference organized in 2019. Over the analysis we performed, Contextual Neural Topic Modeling (CNTM) approach provided us with more stable and reliable results giving a new direction to explore CNTM in a more further detailed version that can be used in developing a reviewer-manuscript match-making system.

2 Problem Formulation

The reviewer-manuscript match-making process is accomplished majorly by imposing two constraints: a) workload constraint and b) review coverage constraint. Workload constraint is the maximum number of manuscripts that can be allocated to an individual reviewer to review, while review coverage constraint deals with the number of reviews required per manuscript to fulfill the peer-review process.

Let’s consider \(\mathcal {R}=\{r^{(i)}\}_{i=1}^{n}\) be the set of n-reviewers, \(\mathcal {M}=\{m^{(j)}\}_{j=1}^{m}\) be the set of m-manuscripts submitted to review. Let \([\varPi ]^{n}\) denote the profiles of n reviewers defined as \([\varPi ]^{n}=(\pi ^{(1)},\pi ^{(2)},\dots \pi ^{(n)})\). Here, profile of reviewers represents the expertise of reviewers. The process of formulation of profiles is mentioned in Sect. 4.1. We define sigma (\(\sigma _{rt}\)) as the match-making similarity function applied over the reviewer’s and manuscript profile, to obtain the similarity score matrix in-between the reviewers and manuscripts, using any match-making representational technique (let’s say rt). A similarity tensor S can be obtained as:

$$\begin{aligned} \mathrm {S}=\sigma _{rt}[\varPi ,\mathcal {M}] \end{aligned}$$

\(S_{ij} \in [0,1]^{nxm}\) be the similarity matrix between the reviewer and manuscript. Higher the similarity score, more inclined the reviewer’s expertise to the manuscript’s theme. Let \(\{\mathcal {R}^{(ar)}\}\) be the set of K-allocated reviewers to a particular manuscript and \(\{\mathcal {R}^{(nar)}\}\) be the set of non-allocated reviewers. Here, \(\{\mathcal {R}^{(nar)}\}=\mathcal {R} - \{\mathcal {R}^{(ar)}\}\).

Fig. 1.
figure 1

Example of consistency for an algorithmic approach, selecting a set of reviewers out of the global pool of reviewers who signed up for the review process

It is necessary to determine the consistency of the approach adopted to calculate the similarity. By consistency, we mean the agreement of any match-making algorithmic approach to a certain set of reviewers by providing a higher similarity score, while it should disagree with the remaining set of the reviewers by providing a significantly lower similarity score. We define a term, here, a degree of consistency, denoted as \(\varDelta \), that shows the consistency in the decision of predicting the reviewers by a particular algorithm. Figure 1 shows the set of reviewers predicted by any match-making algorithm to review a particular manuscript out of the global pool of the reviewers who actually signed up for the review process. The degree of consistency can be defined as, the absolute difference in the average similarity score of the predicted reviewers and the average similarity score of the remaining set of reviewers.

$$\begin{aligned} AS_{ar}=\left[ \frac{\sum _{i=1}^{m} S_{ir_{k}}}{m} \right] , r_{k} \in \{\mathcal {R}^{(ar)}\}, 0 \le k \le K \\ AS_{nar}=\left[ \frac{\sum _{i=1}^{m} S_{ir_{k}}}{m} \right] , r_{k} \in \{\mathcal {R}^{(nar)}\}, 0 \le k \le n-K \end{aligned}$$
$$\begin{aligned} \varDelta = abs\left( AS_{ar} - AS_{nar}\right) \end{aligned}$$
(1)

Here, \(AS_{ar}\) is the average similarity score of allocated reviewers, while \(AS_{nar}\) is the average similarity score of non-allocated reviewers. \(\varDelta \) represents the degree of consistency. More the value of \(\varDelta \), more consistent the algorithm is, with its decision of predicting the reviewers.

3 Conference Dataset Description

The Technical Program Committee Chair of the “MultiTrack Conf”Footnote 1 conference provided us with the complete data of a) all submitted manuscripts, b) the full list of reviewers with their affiliations (which we call Global pool), c) track-wise list of reviewers (which we call Track pool), and d) manuscripts allocated to a set of reviewers (which we call Original allocation). “MultiTrack Conf” was an engineering domain multi-track conference organized in 2019. Table 1 gives a summary of the conference data.

Table 1. “MultiTrack Conf” conference dataset details

4 Methodologies Implemented and Result Analysis

This section includes various representation approaches that have been used for the match-making process. This section also focuses on the experimental setup and the evaluation method that has been undertaken to evaluate the consistency of different approaches.

4.1 Experimental Setup

The first step is to create profiles of the manuscripts and the reviewers. From the reviewers’ names and affiliation, the publications title and the publication years are extracted using Orcid [1]. Using the publication details, the DOI number is extracted using Crossref [3]. Finally, the abstracts of the papers are extracted using Semantic Scholar [2]. Publication details of some reviewers are not available in Orcid, hence their abstracts are extracted by web scraping.

In order to build the reviewer’s profile, we hypothesize that the past 5 years or recent 20 papers (which we empirically derive from the publication frequency of reviewers in our dataset) are an indicator of the research domain of operation/interests of the reviewer. Hence, for each reviewer, publications of last 5 years or recent 20 publications, whichever was earlier, are profiled. The title and abstract of the publications collectively formed the reviewer’s profile. The title and abstract provided by the conference, are used to build the manuscript’s profile. From the generalized structural property of the research papers, it is evident that the title and abstract reflect the core theme of the entire paper.

Before applying any match-making algorithm, a pre-processing task involving the removal of English stopwords and research stopwords was carried out. The research stopwords like author, efficiency, proposal, study, etc. are the set of words which are frequently repeated in the publications. They generally, do not convey sufficient information as a standalone entity.

figure a

The match-making process between the reviewer and manuscript is mentioned in Algorithm 1. During each allocation, the reviewer workload and manuscript coverage constraints are taken into consideration. The balanced optimized Hungarian approach [19] is adopted for the assignment process. The constraint pair \((\mu ,\lambda )\) is taken as (6, 3). The similarity matrix S between reviewer and manuscript can be obtained using different approaches mentioned in Sect. 4.3.

Table 2. Description of algorithms implemented with corresponding approaches adopted in Reviewer Matching Problem (No. of latent topics for topic modeling approaches is set to 20. All hyper-parameters are set to default.)

4.2 Evaluation Method

To evaluate the approaches, top-3 reviewers are assigned to each of the submitted manuscripts using Algorithm 1. This allocation is compared with the original allocation done by the track chair. Three other modes of allocation are also done to study the consistency of approaches. These modes include the allocation to the reviewers among the global pool, which we call here as global pool allocation. The allocation was also studied by restricting the reviewers to the track they have selected, which we call as track-based reviewer allocation. The third mode of allocation includes the allocation of a particular manuscript among the set of reviewers who were actually not being allocated that particular manuscript to review, which we call it as global pool minus original allocation. Let \(AS_{oa}\) be the average similarity score of original allocated reviewers, while \(AS_{goa}\) be the average similarity score of global pool minus original allocated reviewers. The Eq. 1 for the degree of consistency can now be moulded as:

$$\begin{aligned} \varDelta = abs\left( AS_{oa} - AS_{goa}\right) \end{aligned}$$
(2)

4.3 Methodologies Description and Implementation

This subsection includes discussion of representational paradigms of queries (manuscripts) and targets (reviewers), along with their implementation to obtain the similarity between the reviewer and manuscript. We have implemented various approaches that includes Statistical approach based Keyword extraction methods like TextRank [25], RAKE [30] and YAKE [7], probabilistic topic modeling approach like Latent Dirichlet Allocation (LDA) [6], neural topic modeling approaches like Contextual Neural Topic Modeling (CNTM) [5] and BERTopic [13], Transformer based embedding approaches like Universal Sentence Encoder (USE) [8] and Sentence BERT (SBERT) [29]. We also observed the consistency over the existing system Erie [20] implemented in IEEE INFOCOM conference. Table 2 shows the description of approaches with their implementations, to obtain the similarity between the reviewers and manuscripts.

4.4 Result Analysis

Using the approaches mentioned in Sect. 4.3, the similarity between the reviewer and manuscript has been obtained. Now, to calculate the consistency of each of these approaches, the degree of consistency (\(\varDelta \)) is been computed using Eq. 2. Figure 2 is the comparison graph showing the consistency of different approaches and existing system Erie over the original allocation and the three other modes of allocation of reviewers. With the perspective of the degree of consistency \((\varDelta )\), it can be seen from the Fig. 2 that CNTM using word embedding variant shows better consistency among other approaches.

Fig. 2.
figure 2

Comparison chart of consistency in terms of similarity score with the delta differential (\(\varDelta \)) as a measure of consistency

Keywords are the important facets of any paper. Authors tend to provide very specific yet peripheral keywords (e.g. Adam optimizer) or the broader category of keywords (e.g. Artificial intelligence). It may not serve a good idea to rely only on author-tagged keywords. Hence, a technique like keyword extraction is adopted to extract the core concepts of the paper. It can be observed that these approaches showed consistency with low delta differential value. Keywords based matching doesn’t consider semantically relevant concepts like plagiarism and copy as similar ones. So, we decided to introduce transformer-based contextual embedding in the representation to study their consistency. They showed higher similarity agreements but have lower delta differential component.

A publication is a collection of (latent) topics representing certain themes. So, we analyzed a topic modeling approach like LDA to study consistency. This approach clustered topics based on the representing words, but the issue of semantic relevance still persists. So, we decided to introduce and test the contextual embedding over the topic modeling approach like CNTM, where the semantically relevant words were classified in the same topic cluster. For instance, biological cell and electrolytic cell, despite having common word cell, would fall in different clusters representing biological/medical topic and in electronics domain respectively. Variants of CNTM are also applied to study their consistency. The consistency analysis is also performed over the existing reviewer assignment system, Erie implemented in the IEEE INFOCOM conference. As seen in Fig. 2, the CNTM model using word embeddings proved to have better consistency than any other approaches that we have considered in this study.

5 Conclusion and Future Work

We bring various algorithmic approaches from different paradigms and an existing system Eire, on a common platform, to study a framework of consistency, in evaluating match-making approaches. From the analysis performed, it can be established that the reviewer-manuscript match-making system based on Contextual Neural Topic Modelling (CNTM) using Word Embedding approach may result in a better match, as it directly considers SBERT embeddings used in the model. In the future, we plan to develop a match-making system considering Conflict of Interests (COIs), with sentiment analysis performed over the reviews provided by the reviewers. This will help in identifying the detailed quality reviews. We would like to extend the study of consistency over the full text of publications. We plan to develop a match-making system that may reduce the burden over the TPCs and thus promising a better quality of peer-review process in the conference.