Introduction

Innovation research has increasingly recognized the central role of interdisciplinarity in supporting cutting-edge science and innovation, as well as in finding the solutions of complex societal problems (Schmidt, 2008). Interdisciplinary research (IDR) has thus received considerable attention in science and technology circles, which has led to the establishment of numerous IDR-driven research groups, centers, and programs worldwide (Anzai et al. 2012; Avila-Robinson & Sengoku, 2017).

Notwithstanding the rapid and broad diffusion of interdisciplinarity in research and practice, the field still demands the development of a more solid conceptualization as well as more substantial operationalization frameworks. According to several studies, IDR has neither coherent or generally accepted definitions, nor does it have consistent and valid assessment methods (Huutoniemi et al. 2010; Siedlok & Hibbert, 2014; Wagner et al. 2011). A confusing mix of terminologies has proliferated in the literature, including multidisciplinarity, transdisciplinarity, fusion research, convergence, and anti-disciplinarity (Lauto & Sengoku, 2015, MIT, 2016). The subtle differences between and within the meanings of these constructs have prevented the formation of a consensual view regarding IDR among policymakers, scientists, and research managers (Amir-Aslani & Mangematin, 2010). Moreover, the complexity behind IDR rules out the definition of a single and absolute assessment method (Adams et al. 2016); instead, policymakers and researchers must deal with a wide array of quantitative and qualitative approaches to operationalize interdisciplinarity (Abramo et al. 2012; Wagner et al. 2011).

Considering quantitative approaches, bibliometric measures have occupied a prominent position in the IDR literature (Wagner et al. 2011). Over the years, several bibliometric measures of IDR have been proposed, including integration (Porter et al. 2007), diversity and coherence (Rafols & Meyer, 2010, Rafols, 2014), Hill index (Zhang et al. 2016), DIV (Leydesdorff et al. 2019b), DIV* (Leydesdorff et al. 2019a, Rousseau, 2019), and betweenness centrality (Leydesdorff, 2007). Recent research has evaluated the validity and consistency of these bibliometric metrics of IDR, highlighting the large differences and inconsistencies between these measures (Adams et al. 2016; Wang & Schneider, 2019). Additionally, several qualitative approaches, mostly based on surveys and peer reviews, have analyzed the outcomes and factors influencing IDR (Katoh et al. 2018; Lauto & Sengoku, 2015; Wagner et al. 2011). However, to date, no study has investigated the adequacy and consistency between bibliometric approaches and scientists’ self-assessment scores with respect to the interdisciplinarity of their research.

To address these gaps, this study aims to understand the extent to which bibliometric measures of IDR reflect scientists’ perceptions of the interdisciplinarity of their research and the factors influencing the differences in the scientists’ perception and the outcomes of bibliometric measures. We investigate the following questions: Are bibliometric metrics of interdisciplinarity consistent with the perceptions of scientists? What factors seem to play a role in defining the similarities and differences between the outcomes of bibliometric metrics of interdisciplinarity and scientists’ perceptions? How do these perceptions differ a) from concepts that are more intuitive for scientists, such as the scientific impact of research and b) across fields of research?

We employ a unique dataset, obtained from an interdisciplinarity-oriented research center in a leading Japanese university, which conflates several bibliometric measures with scientists' self-assessment of the interdisciplinarity and scientific impact of their publications over 8 years. The case study of this paper encompasses a research program initiated by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) in Japan in the mid-2000s as part of a high-end funding program, the World Premier International Research Center Initiative (WPI) program. Since the foundation of this research center, it has had a steady record of producing Nobel Prize-worthy results in fields such as stem cell biology and coordination chemistry and has a clear vision toward the integration of different knowledge domains. Using bibliometric and survey-based data, this study assessed the research efforts of our case study across a set of indicators, including collaboration, nature of knowledge, scientific impact, and interdisciplinarity. The disagreement between qualitative and quantitative evaluations and the significant field-specific nature of interdisciplinarity observed in this study call for the use of multidimensional assessment approaches for assessing IDR, as well as the building of a consensus about the meaning of interdisciplinarity among scientists. This is in sharp contrast to other aspects of research such as scientific impact. Our findings can be useful for accomplishing a more effective management of interdisciplinarity efforts in different organizational contexts.

The remainder of this paper is organized as follows. Section "Bibliometric Measures of Interdisciplinarity" provides an overview of multiple bibliometric measures of interdisciplinarity. Section "Data, Methodology, and Descriptive Statistics" describes the data and research methods used in this study. Section "Results" continues with the description of the results drawn from different statistical approaches aimed at understanding the consistency of bibliometric measures and scientists’ perspectives. Section "Discussions" concludes the paper with a series of conclusions and implications from the analyses.

Bibliometric measures of interdisciplinarity

Wagner et al. (2011) define IDR as approaches that aim to “integrate separate disciplinary data, methods, tools, concepts, and theories to create a holistic view or common understanding of a complex issue, question, or problem.” The operationalization of IDR encompasses not only the number of and differences in the proportions of the disciplines constituting a particular body of research, that is, variety and balance, but also, and most importantly, how unrelated or distant these disciplines are in cognitive terms (i.e., disparity (Leydesdorff et al. 2019b, Porter et al. 2007, Rafols, 2014, Stirling, 2007)). Variety, balance, and disparity together describe the diversity of the fields of research. In addition to diversity, the evaluation of interdisciplinarity involves coherence, which relates to the degree of interconnection between the disciplines encompassing a body of research (Rafols, 2014).

Wagner et al. (2011) provide an overview of the different qualitative and quantitative approaches for understanding and assessing IDR. From these methods, bibliometric indicators of IDR have generated significant interest in the literature. The bibliometric assessment of IDR involves different levels of analysis (papers, journals, and institutions) and different types of metadata (texts, citations, co-authorships, etc.) (Wang et al. 2015). Focusing on the types of metadata used in the existing literature, we can discern the following five types of bibliometric indicators developed the assessment of interdisciplinarity of research: citation-based methods, semantic analysis of texts, network-based measures, hybrid approaches combining text and citation analysis, and co-authorship relationships. Each of these approaches is described as follows.

The majority of bibliometric indicators of interdisciplinarity uses the list of cited references from articles, grants, and project reports. The earliest approaches relied on the proportion of cited references from disciplinary categories outside those of a particular study, also referred to as the Pratt index (Morillo et al. 2001). Building on Stirling (2007), Rafols and Meyer (2010) proposes the use of the Rao-Stirling (RS) index, or quadratic entropy, to measure disciplinary diversity. Recently, the RS index has been contested in the literature because of its low discriminatory power and the exclusion of disciplinary balance in its calculation. Jensen and Lutkouskaya (2014), Soós and Kampis (2011), and Porter et al. (2008) use the RS diversity index in their studies. Based on the RS index, Rafols (2014) propose the use of the coherence index to evaluate the level of interconnection between disciplines. Meanwhile, following Leinster and Cobbold (2012), Zhang et al. (2016) proposed the use of a Hill-type index, referred to as 2DS, as an alternative to the RS index. They used the Leuven-Budapest (ECOOM) subject-classification scheme to classify the disciplinary fields of the cited references of studies. A similar approach is defined by Mugabushaka et al. (2016). More recently, Leydesdorff et al. (2019b) developed the DIV, an alternative IDR measure, that integrates balance, variety, and disparity in one measure. This measure has been updated to DIV* to consider Rousseau (2019)’s principles for interdisciplinarity measures. Other research efforts have proposed the use of simpler IDR indicators, mostly related to balance, to assess IDR through the list of cited references of studies, including the Gini coefficient, Shannon's entropy (Ávila-Robinson & Miyazaki, 2013; Silva et al. 2013), and the Herfindahl–Hirschman index or Simpson index (Anzai et al. 2012).

Approaches based on the semantic analysis of texts use natural language processing techniques to extract terms from full texts or summaries of articles, grant proposals, or research projects for the assessment of IDR (Adams et al. 2016). Despite their usefulness, text-based approaches have limitations; full-text data of research papers are not usually readily available, and the lack of standard approaches to categorize textual data complicates the calculation of interdisciplinarity measures (Adams et al. 2016). Thus, Waltman and Van Eck (2012), Ruiz-Castillo and Waltman (2015), and Klavans and Boyack (2017), harnessing the advantages of both textual data and citation data, utilized several hybrid approaches.

Studies propose the use of network measures for the assessment of IDR. Among these studies, those focusing on betweenness centrality have attracted the greatest attention (Leydesdorff, 2007; Rafols et al. 2012; Leydesdorff et al. 2018). As described by Wang and Schneider (2019), the additional network-based measures of IDR used in the literature are the clustering coefficient and average similarity. Both of these measures are used by Rafols et al. (2012) in their study of the impact of journal rankings on interdisciplinary research.

Finally, building on the initial efforts of Schummer (2004), recent research efforts have discussed the use of the specialization of authors as a way to assess IDR (Abramo et al. 2012, 2017, 2018; Adams et al. 2016; Zhang et al. 2018). However, despite its potential benefits, the use of co-authorship as a proxy for IDR measurement is complicated because authors do not usually list their discipline of specialty in articles, and their area of specialty is not necessarily reflected in their affiliation to a particular department (Wagner et al. 2011).

The consistency of bibliometric indicators used to measure IDR has been questioned in the recent literature (Adams et al. 2016; Wang & Schneider, 2019). These comparative studies have observed large deviations among the outcomes of the different bibliometric measures of IDR. These studies have focused on the differences between bibliometric measures; however, to date, no study has investigated the consistency between bibliometric indicators of IDR from the perspectives of scientists, that is, the main actors of scientific and technological research.

Data, methodology, and descriptive statistics

Description of the dataset and variables

To evaluate the differences between measures of interdisciplinarity based on bibliometrics and scientists’ perceptions, we used a dataset comprising 1,078 scientific articles published by an interdisciplinarity-oriented research center at a Japanese university over the sample period 2008–2015. The core work of this research center is pertinent to the integration of biological cells/tissues and material technologies. In 2015, the research center comprised 25 principal investigators (PIs) and 180 other researchers including associate professors, assistant professors, postdoctoral students, and research fellows. These researchers included a diverse mix of physicists, chemists, biologists, material scientists, and engineers.

The reasons for the selection of this research center for this study are as follows: (a) since its inception, the fostering of IDR is one of the core missions of this research center; (b) this research center has a steady record of producing highly significant results in fields such as stem cell biology and coordination materials; (c) the authors of this study had access to the data pertinent to this research center, necessary for conducting this study; and (d) additional bibliometric data related to this research center were readily available in public bibliographic databases. These reasons render our case study as an example of social science research worthy of adoption and analysis.

Although this study focuses on a single research center, it uses a unique dataset that includes the research center’s scientists’ perceptions of interdisciplinarity with several quantitative measures. Our research encompasses various bibliometric indicators grouped into four main categories: interdisciplinarity, scientific impact, cognitive characterization of knowledge, and collaboration.

Interdisciplinarity

We used several indicators to evaluate the level of interdisciplinarity of scientific research, including traditional bibliometric measures, cell-material index—a customized interdisciplinarity measure—and the qualitative assessment through scientists’ self-assessment. These indicators are described below.

For the case of bibliometric indicators, we included four measures frequently used in the literature: Shannon's entropy [IDR_ENTROPY], Rao–Stirling index [IDR_RAO], 2DS index [IDR_2DS], and DIV* index [IDR_DIVX] (Table 1). We also used [IDR_RAOT], which adjusts [IDR_RAO] with the natural logarithm of each publication’s length of references. While [IDR_ENTROPY] combines the properties of variety (number of disciplines) and balance (evenness of the distribution of research disciplines), the rest of the measures of interdisciplinarity combine the properties of variety, balance, and disparity (degree of difference among research disciplines), which, after being integrated, become the measure of diversity (Rafols & Meyer, 2010).

Table 1 List of interdisciplinarity indicators used in this study. [ID_RAO] is based on the full cosine matrix, whereas [ID_2DS] is based on the half cosine matrix using Stirling (2007)'s original definition

We selected the above citation-based indicators for IDR because they are included in the majority of studies on quantitative measures of interdisciplinarity. Moreover, these indicators represent a mix of traditional and new measures proposed in the literature for the assessment of interdisciplinarity of publications from multiple perspectives. The bibliometric measures of interdisciplinarity in Table 1 rely on evaluations at the journal level, and they are usually approximated by the Web of Science’s subject categories (SC) extracted from the list of references cited by articles. This database assigns each indexed journal to one or more subject categories according to its general contents.

To increase the reliability of the estimated level of interdisciplinarity, we reassigned new SCs to reference papers categorized as “Multidisciplinary Sciences” (MS), according to their relevant references. Out of approximately 32,000 references cited by the publications collected for the research center considered in this study, 2,971 references belong to one MS journal. Despite their low share, MS journals are of the utmost importance as they include some of the most highly cited journals, such as Nature, Science, PNAS, PlosOne, Scientific Reports, and Nature Communications. The interdisciplinarity indicators shown in Table 1 were calculated by using a revised version of the R script described by Rafols (2014). We confirmed our results with the interdisciplinarity evaluation routine described in http://www.leydesdorff.net/wc15/. In our analyses, we excluded SCs with proportions ≤ 0.025.

To overcome the limitations of journal-based bibliometric indicators of interdisciplinarity, we proposed an alternative measure, which we refer to as the cell-material (CM) Index [IDR_CM], specifically developed for the research center considered in this study. The CM Index quantifies the degree of interaction between cell- and materials-related terms, which is the core research area of our case study. To estimate the CM Index, we first extract keywords from the publications published by the research center considered for this study. We classified each relevant keyword depending on their technical domain, i.e., biology- or materials science-related keywords. Subsequently, we used VOSviewer software (Van Eck & Waltman, 2011) to visualize these data as a co-word map. We used this software because it features a layout arrangement in which the distances between nodes represent the degree of interrelatedness. For publications that combine both cell- and material-related terms, we calculated the distance dCM between the average location between both domains using the following equation:

$$CM Index=\sqrt{{({\overline{X} }_{C}-{\overline{X} }_{M})}^{2}+{({\overline{Y} }_{C}-{\overline{Y} }_{M})}^{2}}$$

\({\overline{\mathrm{X}} }_{C}\) and \({\overline{\mathrm{Y}} }_{C}\), and \({\overline{\mathrm{X}} }_{M}\) and \({\overline{\mathrm{Y}} }_{M}\) refer to the mean center of the coordinates of cell- and material-related terms within the co-word map, respectively. CM Index refers to [IDR_CM]. Positive CM Index values point to an integration between cells and materials in publications; the higher the CM Index, the more disparate the cell-material integration efforts. We assigned a value of 0 to papers that only included cell- or material-related terms.

[IDR_QUAL] is a qualitative indicator used for measuring the level of interdisciplinarity of the collected scholarly articles. This measure relied on the self-assessment by the PIs regarding the level of interdisciplinarity of the publications of their research groups. This self-assessment was part of the yearly evaluation conducted by the research center's management. PIs were asked to evaluate the interdisciplinarity of their publications by using the four-level scale shown in Table 2. The interdisciplinarity scores submitted by PIs were then peer-reviewed by an internal panel of scientists for the assessment of their adequacy and to avoid any biases.

Table 2 Four-level scale for the self-assessment of interdisciplinarity of scholarly articles, as defined by the research center’s management

Scientific impact indicators

To assess the scientific impact of publications, we used the following bibliometric measures: impact factor, citation-based measures, altmetrics, technology impact, and qualitative indicators, described below.

To obtain the journal impact factor [IMP_IFACTOR], we used Clarivate's InCites Journal Citation Reports database (https://jcr.clarivate.com/), as of the year 2016.

For citation-based indicators, we used two measures: the number of raw citations [IMP_CITRAW] and field-normalized citations [IMP_CITFNORM]. We included self-citations in our calculations. The number of raw citations [IMP_CITRAW] for each article was estimated by using a citation window of 3 years after its publication provided by Elsevier's Scopus bibliographic database. [IMP_CITFNORM] was obtained from the Field-Weighted Citation Impact (FWCI) scores provided by Elsevier's Scopus bibliographic database. The FWCI score corrects the differences by normalizing the number of citations by year of publication, document type, and associated discipline of scholarly articles. FWCI scores with values greater than 1.0 signify documents with citation levels higher than the average citation level of their year, document type, and scientific field.

We used an altmetrics index [IMP_ALTM], which measures the degree of public attention received by an article. Specifically, we used the Altmetric Attention Score collected from the Dimensions database of Digital Science & Research Solutions Inc. (https://app.dimensions.ai/discover/publication). This index is derived from the attention received by scholarly articles in multiple non-traditional sources such as news, blogs, policy documents, patents, Twitter, the F1000 database, etc. Additionally, we measured the technological impact [IMP_TECHIMP] of publications by collecting the data regarding the number of times an article was cited in patents. This information was also obtained from the Dimensions database (https://app.dimensions.ai/discover/publication). We acknowledge that use of different databases for the citation-based measures described above could result in different levels of impact assessment as databases differ in terms of their coverage and classification systems. Nevertheless, previous research efforts across bibliographic databases have shown that their differences in the number of citations across subject categories are highly correlated, despite having different absolute citation numbers (Martín-Martín et al. 2018). As the comparisons between citation measures from different bibliographic databases are not conducted in absolute terms, we can mitigate the impact of these differences in this study.

Finally, [IMP_QUAL] measures the level of the scientific impact of the collected scholarly articles. This indicator relied on the self-assessment by the PIs of the scientific impact of the publications of their research group. This self-assessment was part of the yearly evaluation conducted by the research center's management. PIs were asked to assess the scientific impact of their publications by using the four-level scale shown in Table 3. The scientific impact scores submitted by PIs underwent a peer-review by an internal panel of scientists for the assessment of their adequacy and to avoid any biases.

Table 3 Four-level scale for the self-assessment of scientific impact of scholarly articles, as defined by the research center’s management

Collaboration indicators

To study the interrelations of collaboration with interdisciplinarity and scientific impact, we defined the following measures for this study:

  • [COLL_NUAUTHOR] The number of coauthors in a paper.

  • [COLL_NUCOUNTRY] The number of different countries of origin listed in the coauthors of a paper.

  • [COLL_INTERPI] The number of collaborations between the PIs affiliated to the research center under study.

  • [COLL_INTRAORG] The proportion of coauthors from the university hosting the research center under study.

  • [COLL_NONACAD] The proportion of coauthors from organizations other than universities and public academic research organizations.

  • [COLL_PROX] Google’s geolocation data on average/median proximity of the distances between coauthors’ cities outside the campus premises of the research center.

Cognitive characterization of knowledge

We used two measures to cognitively characterize the knowledge contained in publications: the nature of knowledge [NAT_KNOW] and the cognitive cluster to which a publication belongs [CLU_KNOW].

We defined [NAT_KNOW] as an approximation of the nature of the knowledge included in the publications. This indicator is based on a customized taxonomy, defined by the authors, of the stage of problem solution carried out by scientists in the research described in their publications. For each knowledge domain involved in this research center, we defined three stages of the process of problem solution: basic understanding (level 1), intermediary solutions or proofs of concept (level 2), and downstream activities such as applications (level 3). For this, a consensual approach among the authors was used to allocate a particular stage of problem solution to publications. To minimize any subjectivities, we developed a matrix containing specific component technologies for each of the stage of the process of problem solution across the scientific and technological fields relevant for the research center considered in this study. As these definitions are highly dependent on the field of study, an in-depth understanding of the different technologies is necessary for the correct allocation of articles. For this, we consulted the technical literature and expert advice, as described in Avila-Robinson and Sengoku (2017).

We also included the indicator [CLU_KNOW], which describes the cluster obtained for each publication from a bibliographic coupling network elaborated by the authors. After using appropriate data cleaning approaches and applying cosine normalization, we utilized the VOSviewer software (Van Eck & Waltman, 2011) to estimate these clusters. In total, 13 clusters were extracted from the bibliographic coupling network. [CLU_KNOW] was used as the control variable. These clusters revealed the following major research topics of the research center under study: (1) Cholesterol, (2) Plasma membrane and signaling, (3) Stem cells, (4) DNA nanotechnology, (5) Drug delivery approaches, (6) Gene switches, (7) Glycotechnology, (8) Inorganic materials and photovoltaics, (9) Metal–organic Frameworks, (10) Organic materials, (11) Terahertz technologies, (12) Bionanotechnologies, and (13) Cell imaging technologies.

Additional indicators

Additional indicators were used to evaluate the collected publications, including the publication year [YEAR] and the length of the list of cited references of articles [REFS].

  • [YEAR] The publication year of the paper.

  • [REFS] Number of references per article.

General research methodology

The research methodology of this study proceeded in the following three general steps (Fig. 1): data collection, data processing and extraction of bibliometric measures, and statistical analyses.

Fig. 1
figure 1

General research methodology

First, we collected the scholarly articles published by the research center considered for this study from the years 2008–2015 from Clarivate’s Web of Science bibliographic database. In total, we collected 1208 publications, and after removing publications with incomplete data, we considered 1077 documents for conducting analyses. We used these documents as the basis for the research methods described in this section.

Next, we cleaned and processed the collected publication data to calculate the different bibliometric measures described in Sect. "bibliometric measures described". The estimation of these measures relied on data obtained from multiple bibliographic databases, including Elsevier's Scopus, Clarivate's InCites Journal Citation Reports, and Digital Science’s Dimensions databases. As described in Sect. "bibliometric measures described", we conducted two types of network analysis: (a) a bibliographic coupling network that relates publications based on the number of references they have in common (Kessler, 1963) to estimate the cognitive clusters that describe [CLU_KNOW]; and (b) a co-word network that relates keyword based on the number of times they appear together in a text (Callon et al. 1983) as the basis for the estimation of [IDR_CM].

After a series of data cleaning procedures on the collected data, we conducted the following four different analyses aimed at assessing the consistency of bibliometric measures with scientists’ perceptions: (a) pairwise comparisons of bibliometric measures, visualized as box plots, through independent samples Kruskal–Wallis test across the scientists’ perspectives; (b) bivariate correlation analysis of all measures based on the Spearman rho statistic; (c) error bar analysis of means with a 95% confidence interval across bibliometric measures of interdisciplinarity and scientific impact; and (d) analysis of differences in perspectives across cognitive clusters through a 2-dimensional space using a Kruskal non-metric multidimensionality reduction. We conducted these analyses using the R programming language.

The next section describes the results of the different statistical approaches used to assess the differences between measures of interdisciplinarity based on bibliometrics and the scientists’ perceptions, as well as their relation to scientific impact.

Results

The following four types of statistical approaches were used to understand the consistency between bibliometric measures and the scientists’ perceptions of interdisciplinarity: independent samples Kruskal–Wallis test for measures of scientific impact and interdisciplinarity across the levels of scientists’ perspectives, a bivariate correlation analysis based on the Spearman rho statistic across all measures, confidence intervals analysis, correlations analysis across cognitive clusters, and a 2-dimensional space using Kruskal non-metric multidimensionality reduction. Each of these statistical approaches focuses on each of the research questions of this study.

Descriptive statistics and discrimination capability of bibliometric measures and scientists’ perceptions

Table 4a presents the descriptive statistics obtained for the collected data. In total, we collected 22 bibliometric and additional measures, including aspects such as cognitive issues, collaboration, scientific impact, and interdisciplinarity. In addition, Table 4b describes the distribution of scores across scientists’ perception measures of interdisciplinarity and scientific impact.

Table 4 a Descriptive statistics estimated for the collected data and b Distribution of scores across scientists’ perception measures of interdisciplinarity and scientific impact

This section also explores the capability of bibliometric measures of interdisciplinarity and scientific impact to discriminate among different levels of scientists’ perceptions. For this, we used box plots to visualize relevant measures of interdisciplinarity and scientific impact across the four levels of scientists’ perceptions (four levels, from 0 to 3) (see Fig. 2, top and bottom, respectively). We also estimated pairwise comparisons using the independent samples Kruskal–Wallis test to evaluate if there is a statistically difference between scientists’ perception scores for interdisciplinarity and research impact measures. The significance values shown in this figure were adjusted by using the Bonferroni correction for significant pairs.

Fig. 2
figure 2

Comparisons of the discriminatory power of bibliometric measures of interdisciplinarity (top) and scientific impact (bottom) across the levels from scientists’ self-assessments (0–3). Box plots evaluate the data distribution. The p-values are denoted by * and ** for the statistical significance at the 0.05 level and 0.01 level, respectively. ns, not statistically significant

An examination of these results reveals that bibliometric measures of interdisciplinarity (Fig. 2, top) display a relatively higher discriminatory potential to differentiate between the different levels of scientists' perceptions than those obtained for scientific impact Fig. 2, bottom).

Regarding interdisciplinarity measures (Fig. 2, top), the greatest differences are observed between level 3 and the rest of the levels, and between level 2 and level 0, suggesting a stronger discriminatory potential at extreme levels of the scale. This pattern of significant comparisons is approximately consistent across interdisciplinarity measures. Furthermore, the ranges of values across the levels of scientists’ self-assessment vary among interdisciplinarity measures. While [IDR_RAO] and [IDR_2DS] show wider measurement ranges, [IDR_DIVX] and [IDR_ENTROPY] display much narrower measurement ranges.

We obtained different results for measures of scientific impact (Fig. 2, bottom), in which the greatest discriminatory potential was between level 0 and the remaining levels. Similarly, this pattern of significant comparisons is similar across the measures of scientific impact. Measures of scientific impact tend to be highly impacted by the skewed nature of the citation data, i.e., the outliers implicit in any citation data.

Interrelationships among measures

Correlation analysis was used to assess the patterns of interrelationship among the measures described above, particularly focusing on the correlations between bibliometric measures and scientists' perceptions of scientific impact and interdisciplinarity. Because of the characteristics of our data, we conducted a bivariate correlation analysis on the basis of the Spearman rho statistic, the correlation matrix of which is presented in Table 5. In this table, correlations significant at the 0.01 level are highlighted in bold. We describe relevant insights from the correlation matrix below.

Table 5 Bivariate correlation matrix based on the Spearman rho statistic

Comparisons between bibliometric measures and scientists’ perceptions of interdisciplinarity and scientific impact

Against recent discussions regarding the low consistency among interdisciplinarity indicators (Adams et al. 2016; Zhang et al. 2016), our results indicate a relatively high correlation among interdisciplinarity measures, including Shannon’s entropy [IDR_ENTROPY], DIV* index [IDR_DIVX], Rao-Stirling score [IDR_RAO], and 2DS index [IDR_2DS] with ρ  ≥ 0.527. Additionally, by using these bibliometric indicators of interdisciplinarity and our customized measure [IDR_CM], the assessment of cell and material integration using keywords displayed significant ρ values between 0.130 and 0.419.

We found mild yet significant correlations between [IDR_QUAL] and bibliometric measures of interdisciplinarity, with ρ values ranging between 0.202 and 0.266. These values signify the disparities between the perceptions of the scientists about interdisciplinarity and the respective results from the bibliometric measures. Contrastingly, we observed relatively higher correlation levels between [IDR_QUAL] and our customized cell-material integration index [IDR_CM] (ρ = 0.324), which may suggest that the use of term-based bibliometric indicators can provide greater accuracy in the assessment of IDR.

Similar to bibliometric measures of interdisciplinarity, bibliometric indicators of scientific impact share relatively high and significant correlations between them, yet at lower levels from those obtained for interdisciplinarity measures. Regarding the comparisons between bibliometric measures and scientists’ perceptions of scientific impact, we observed significant correlations between [IMP_QUAL] and [IMP_IFACTOR] (ρ  = 0.468), which signifies the high association of scientists’ perceptions of the scientific impact of their research with the journal impact factor. This result is not unexpected, as the existing literature highlights the role of the impact factor as a key proxy for scientific research impact. Additionally, traditional scientific impact measures based on citation data, such as [IMP_CITRAW] and [IMP_CITFNORM], show yet lower levels of correlation (ρ = 0.309 and ρ = 0.269, respectively) with [IMP_QUAL], which are slightly higher than those obtained for interdisciplinarity measures. Moreover, we observed mild yet statistically significant correlations between [IMP_QUAL] and alternative scientific impact measures (ρ = 0.244), such as the altmetrics index [IMP_ALTM].

Comparisons between interdisciplinarity and scientific impact

We also observed relatively mild and negative correlations (ρ  ≤ -0.276) between bibliometric measures of interdisciplinarity and scientific impact, which is consistent with Lariviere and Gingras (2010), Yegros et al. (2015), and Levitt and Thelwall (2008). Similarly, the scientists’ self-assessment of the research impact, [IMP_QUAL], shows negligible yet negative values of correlations with the bibliometric measures of interdisciplinarity. However, interestingly, we found that the relation between the scientists’ self-assessment scores on interdisciplinarity [IDR_QUAL] and scientific impact [IMP_QUAL] showed positive and significant ρ values of 0.298. Another exception with positive levels of correlations, yet at lower ρ values of 0.142, was that between [IDR_QUAL] and altmetrics [IMP_ALTM]. Both scientific impact measures, [IMP_QUAL] and [IMP_ALTM], do not relate scientific impact to citation counts.

Interactions of interdisciplinarity and scientific impact with collaboration measures

Contradicting relationships between collaboration and interdisciplinarity have been proposed in the existing literature. Some argue that collaboration contributes significantly to IDR, whereas others have found inverse relationships between external collaboration and interdisciplinarity (Sanz Menéndez, 2001). In our data, the degree of correlation between bibliometric measures of collaboration and interdisciplinarity is relatively weak (ρ  ≤ 0.187), solely focused on [IDR_ENTROPY] and a couple of relationships with [IDR_DIVX], including [COLL_NUAUTHOR] and [COLL_INTERPI]. This lack of collaboration-interdisciplinarity interrelationships agrees with the literature that dissociates interdisciplinarity from collaboration schemes (Dai & Boos, 2017; Hessels & Kingstone, 2019).

Contrastingly, from the scientists’ perspective, interdisciplinarity [IDR_QUAL] and bibliometric measures of collaboration are correlated at particular points of the interrelationship. Interestingly, we observed relatively mild levels of correlation between [IDR_QUAL] and extreme schemes of collaboration, including those with distant partners [COLL_PROX] (ρ = 0.212) and with different countries [COLL_NUCOUNTRY] (ρ = 0.224), and those between PIs inside the research center [COLL_INTERPI] (ρ = 0.295). The latter suggests a significant association of scientists’ perceptions of interdisciplinarity and collaborative efforts.

Our findings demonstrated relatively weak yet positive correlations between bibliometric measures of collaboration and scientific impact. Correlations between collaboration and scientific impact-related measures are non-existent or at lower levels (ρ < 0.154), except for the impact factor [IMP_IFACTOR] and altmetrics [IMP_ALTM] with the number of coauthors [COLL_NUAUTHORS], both with ρ = 0.197. Researchers’ self-assessment of the scientific impact leans toward a low or negligible correlation with collaboration indicators. Among the collaboration indicators, [COLL_NUAUTHORS] displayed the highest level of correlation at ρ = 0.172.

Cognitive measures and additional measures

As described above, we approximated the nature of the knowledge involved in publications with [NAT_KNOW], which refers to the general stages of the problem solution process. These three stages were defined as follows: (1) basic understanding, (2) intermediary solutions or proofs of concept, and (3) applications.

Interestingly, the results in Table 5 show significant levels of correlation between [NAT_KNOW] and bibliometric measures of scientific impact, and particularly between [NAT_KNOW] and interdisciplinarity measures. For the former, correlations ρ varied from 0.086 to 0.168; for the latter, correlations ρ ranged from 0.171 to 0.305. These results suggest that problem- or mission-oriented research tends to be somewhat correlated with interdisciplinarity (Kueffer et al. 2012, Whitesides, 2010). With regard to the perceptions of scientists, only [IDR_QUAL] appeared with a low yet statistically significant correlation at ρ = 0.134 with [NAT_KNOW].

Consistent with previous studies (Adams et al. 2016; Zhang et al. 2016), our results indicate a mildly negative correlation between the length of list of references [REFS] listed in scholarly articles and bibliometric measures of interdisciplinarity. The latter is of particular concern in light of the overshooting bias that the latter has in research fields with scholarly articles traditionally having fewer references, such as those in computer science and mathematics (Moed, 2006). As observed, this significant correlation was eliminated after adjusting [IDR_RAO] with the natural logarithm of each publication’s length of references, as inferred from [IDR_RAOT]. This may suggest the need to consider the application of appropriate transformations on interdisciplinarity measures for improving any biases related to the length of references.

Comparisons of the propensities and perspectives of research fields toward interdisciplinarity and scientific impact measures across cognitive clusters

Previous research has observed the differing propensities of cognitive fields to engage in interdisciplinarity, as well as scientists’ different perspectives regarding scientific impact and interdisciplinarity (Avila-Robinson & Sengoku, 2017). To this end, we conducted two analyses. First, we evaluated the propensity of research fields toward interdisciplinarity and scientific impact through the construction of error bars on means (95% confidence interval) across relevant bibliometric measures. Second, we conducted a correspondence analysis based on the relevant bivariate correlations discussed in Sect. "Descriptive statistics and discrimination capability of bibliometric measures and scientists’ perceptions", across the 13 cognitive clusters obtained from the bibliographic coupling network.

Comparison of the propensities of research fields across bibliometric measures of interdisciplinarity and scientific impact

Figure 3 presents the error bars that correspond to the 95% confidence interval for the means of bibliometric measures of scientific impact (left) and interdisciplinarity (right) across the 13 cognitive clusters extracted from the bibliographic coupling network. We decided to use this approach to compare the propensities of research fields toward interdisciplinarity and scientific impact across cognitive clusters.

Fig. 3
figure 3

Confidence interval comparisons across scientific impact (left) and interdisciplinarity measures (right). Means are denoted by red punctuated lines. The blue lines connect the medians across clusters. The 95% confidence level was used in the estimation of confidence intervals. Groups refer to cognitive clusters obtained from the bibliographic coupling network. The 13 clusters were arranged according to the scientific fields that they encompass per Clarivate's ESI classification. The following is the clusters classification: (1) Cholesterol; (2) Plasma membrane and signaling; (3) Stem cells; (4) DNA nanotechnology; (5) Drug delivery approaches; (6) Gene switches; (7) Glycotechnology; (8) Inorganic materials and photovoltaics; (9) Metal Organic Frameworks; (10) Organic materials; (11) Teraherz technologies; (12) Bionanotechnologies; and (13) Cell-imaging technologies

As shown in Fig. 3, we classified these clusters into five different groups according to the general scientific fields involved: biology, integration of biology and chemistry, chemistry and material sciences, physics, and multiple research fields.

The examination of Fig. 3 reveals three aspects. First, the curves from the measures of scientific impact and interdisciplinarity show contrary patterns across clusters, which is related to the negative correlations found for these measures in Sect. "Interrelationships among Measures". Second, we can see that the coefficient of variation, a measure of the mean spread, is four times smaller for measures for interdisciplinarity as compared to that of the measures of scientific impact, which is unsurprising given the highly skewed nature of citation data. Third, these findings indicate field domain-dependent behavior across clusters. These findings suggest a strong dependence of interdisciplinarity and research impact across fields of research, as observed in the patterns described below. Biology-oriented fields display relatively low levels of scientific impact and mild levels of interdisciplinarity. Clusters that integrate biology and chemistry tend to have a lower scientific impact, but above average levels of interdisciplinarity. Chemistry- and physics-related clusters indicate average values of impact and lower levels of interdisciplinarity. Contrastingly, fields that involve multiple disciplines, such as bionanotechnologies and drug delivery, display relatively low scientific impact but high levels of interdisciplinarity. These patterns are more closely evaluated in the next section.

Scientists’ perspectives toward scientific impact and interdisciplinarity across cognitive clusters

We analyzed the differences in scientists’ perspectives regarding interdisciplinarity and scientific impact across scientific fields by comparing the correlation levels between [IMP_QUAL] and [IDR_QUAL] and the rest of the measures, as shown in Table 6. We applied a non-metric multidimensionality reduction approach to group the interrelationships between the measures (Fig. 4).

Table 6 Spearman rho correlation values for interrelations between scientists’ perspectives, [IMP_QUAL] and [IDR_QUAL], and the rest of measures
Fig. 4
figure 4

Cognitive clusters with similar patterns of correlations are grouped via non-metric multidimensional scaling and K-means for a scientific impact and b interdisciplinarity values shown in Table 6. Axes represent relative distances among clusters

The perspectives of scientists regarding scientific impact show high correlation levels across all the clusters and measures. All the clusters display at least one significant correlation, at 0.05 and 0.01 significance levels, as shown in Table 6 (top). In contrast, the level of correlation of measures of interdisciplinarity across clusters is significantly lower; half of the clusters show significant correlations (Table 6, bottom). Interestingly, measures related to collaboration are significantly more highly correlated with scientists’ perspectives regarding interdisciplinarity, but not regarding scientific impact. This is particularly relevant to measures regarding the number of coauthors and the number of collaborations with PIs. These findings suggest the close associations, or confusion, that scientists tend to have between interdisciplinarity and collaboration, as suggested in Sect. "Descriptive statistics and discrimination capability of bibliometric measures and scientists’ perceptions".

Figure 4 is a visual representation of the interrelationships among cognitive clusters based on the correlation levels in Table 6. We executed this visual representation by reducing the evaluation of interdisciplinarity, scientific impact, and collaboration metrics into a 2-dimensional space using Kruskal non-metric multidimensionality reduction and grouping those with similar patterns by applying K-means clustering. In this figure, the y-axis and the x-axis refer to a statistical and relative representation of the cluster distances, respectively. Although clusters from similar research field domains tend to be located closer to each other, they usually group with other field research domains in terms of interrelations in Fig. 4.

The groups comprising glycotechnology, organic materials, metal–organic frameworks, cholesterol, stem cells, DNA nanotechnology, plasma membrane and signaling, and gene switches are characterized by a high association of scientific impact with impact factor and citation-based measures (Fig. 4a). Although not significant, the correlations with the measures of interdisciplinarity of this group displayed negative values. Moreover, some collaboration-related measures are highly relevant. In contrast, the group of inorganic materials and cell-imaging technologies display high-to-mild correlations with scientific impact measures. This group showed positive and relatively high correlations with interdisciplinarity measures. The group of drug delivery approaches, bionanotechnologies, and Terahertz technologies display mild-to-low correlations with scientific impact measures and negative correlations with collaboration measures. Finally, the group that comprises plasma membrane and signaling, DNA nanotechnology, and gene switches displays positive correlations with bibliometric measures of scientific impact. This group showed positive correlations with some measurements of collaboration, such as the number of authors and countries. Overall, there appears to be an alignment between scientific perceptions of impact and their bibliometric counterparts, regardless of the cognitive cluster or field domain.

Figure 4b describes four main groups of clusters obtained from the scientists’ perspectives regarding interdisciplinarity. The first group comprises inorganic materials, terahertz technologies, bionanotechnologies, and cell-imaging technologies. The perceptions of scientists working in these clusters align with outcomes of bibliometric measures of interdisciplinarity, as seen by the positive and mild correlations. They also display high correlations with collaboration measures, particularly with inter-PI collaborations within the research center. The group comprising stem cells, drug delivery approaches, and metal–organic frameworks also correlates with the bibliometric measures of interdisciplinarity, although to a lesser extent. In addition, most of the correlations of these clusters with scientific impact were negative. The group composed of cholesterol, glycotechnology, and organic materials displayed positive correlations with collaboration measures, followed by a mild-to-low correlations with measures of scientific impact, and negative correlations with measures of interdisciplinarity. Compared with the first group, this group highly stresses the role of the number of co-authors, number of countries, and distant countries. Finally, the group including DNA nanotechnology, gene switches, and plasma membrane and signaling shows strong, positive correlations with measures of scientific impact. This is also the only group with consistent composition and pattern for the perception of scientists regarding the impact and interdisciplinarity of their research.

Discussions

This study described a quantitative approach to assess the differences between scientists’ perceptions of the interdisciplinarity of their research and the results obtained from bibliometric indicators that are typically used in the assessment of IDR. Although this study focused on a single research center, it is one of the few studies of its kind to use a unique dataset that conflates scientists’ perceptions of interdisciplinarity and scientific impact with several quantitative measures and control variables. The data for this analysis comprised scientists’ self-assessment of interdisciplinary publications of a Japanese university's cutting-edge, fusion research-inspired research institute. A series of bibliometric indicators encompassing interdisciplinarity, collaboration, and scientific impact were defined. Building on these indicators, we defined a series of statistical analyses, namely bivariate correlation analyses, independent samples Kruskal–Wallis tests, confidence intervals analysis, and correlations and K-means clustering analysis across cognitive clusters.

The findings of this study highlighted the differences in the perceptions of the scientists regarding the interdisciplinarity and scientific impact of their research. As expected, their perceptions regarding the scientific impact of their research are clear-cut and heavily rooted in the impact factor, although other measures—citations and altmetrics—are also statistically significant. For the case of interdisciplinarity, there was no single metric, or “holy grail,” that reflected the scientists’ perceptions, which is in line with previous research such as Adams et al. (2016) and Wang and Schneider (2019). Despite mild yet significant correlations and solid discrimination capabilities between qualitative and quantitative evaluations of interdisciplinarity, our findings call for research managers to use multidimensional and composite approaches, in terms of the types and number of measures, when analyzing interdisciplinarity.

When compared with the findings of previous research, our results highlighted the strong field-dependent nature of the scientists’ perceptions of interdisciplinarity of their research. Our findings revealed that interdisciplinarity is a highly relative term; for a biologist, what encompasses interdisciplinarity may be different for a chemist or a physicist. This high field-specificity has been observed in other studies; however, it has not yet been quantified (Sanz Menéndez et al. 2001). Scientists from different research fields appear to have divergent, disparate perceptions of interdisciplinarity and scientific impact. This could be driven by the different distances and propensities of conducting interdisciplinarity across fields of research (Molas-Gallart et al. 2014), or by the misunderstanding or confusion among scientists about what interdisciplinarity means and entails. Therefore, research managers and administrators should consider these subtle disciplinary differences and homogenize how research groups conceptualize and operationalize interdisciplinarity. In this regard, a consensual view regarding the characterization and operationalization of IDR among policymakers, scientists, and research managers is of importance, as previously described by Wagner et al. (2011) and Siedlok and Hibbert (2014).

Additionally, although collaboration is often viewed as an unnecessary aspect for conducting IDR (Sanz Menéndez et al. 2001; Porter & Rafols, 2009), our findings indicated that collaborative interdisciplinarity, rather than cognitive interdisciplinarity, was more closely related to the scientists’ perceptions. Interestingly, the collaborative perception of interdisciplinarity varied between two extremes: intra-collaboration among the PIs of the research institute under study and collaboration schemes with international partners and multiple countries. In relation to this, despite the calls for the “flattening” of the world through electronic approaches, such as virtual conferences, physicality is a key issue for fostering interdisciplinarity, as described by Claudel et al. (2017). According to Littmann et al. (2020), these results demonstrate the need to consider the evaluation of IDR efforts from cognitive (differences between the underlying bodies of knowledge) and social (differences in the bodies of knowledge encompassed by coauthors) domains.

There are multiple ways to conduct IDR. Given the multidimensional, complex, and field-specific nature of interdisciplinarity, understanding scientists’ perceptions can serve as guiding posts for the operationalization of interdisciplinarity in a particular organizational unit, be it people, research groups, research institutes, or regions. The integration of qualitative and quantitative approaches can, in turn, provide ways to foster and manage more effective interdisciplinarity in any type of organization.

Limitations of this study

Although this study focuses on a single research center, it uses a unique dataset that conflates scientists’ perceptions of interdisciplinarity and scientific impact of their research with several quantitative measures and control variables. Future studies should focus on replicating this study with other research centers, including the comparison with scholars not working in interdisciplinarity-driven research center. Future studies should also include additional measures to evaluate scientists’ perceptions of interdisciplinarity and scientific impact of their research. Furthermore, the evaluation of the scientists’ perspectives regarding scientific impact and interdisciplinarity of their research in this study relied on a self-assessment procedure designed by the research center under study. Hence, the scientists’ perspectives are perhaps consciously or unconsciously biased toward higher levels of scientific impact and interdisciplinarity.