Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Science, Society, Public Funding, and Research

Interest in the methodology for assessment (and especially in the methodology for quantitative assessment) of research systems is growing. The reasons for this are the importance of science for society and economics and the wish for effective use of public funds for research. It has been emphasized in Chap. 1 of this book that science is a system of organized knowledge that is a driving force of positive social evolution. Advances in science lead to technological innovation, and because of this, science may be important component for the growth of a country’s GDP.

Scientific systems are both social and economic systems. They require specific management and large public investment. The good shape of research facilities and institutions and the high status of national researchers are important conditions for increasing research production and the number of technological innovations. Such investments should ensure a sufficient size of the national research community. This size is very important. If a nation has a scientific or technological problem, then an adequate size of the group of corresponding qualified researchers increases the probability of solving the problem.

Kealey [2] formulated several hypotheses about the research funding. These hypotheses are as follows:

  1. 1.

    The percentage of national GDP spent for research and development increases with national GDP per capita.

  2. 2.

    Public and private funding displace each other.

  3. 3.

    Public and private displacements are not equal: public funds displace more than they themselves provide.

The hypotheses of Kealey are consequence observing the evolution of funding in developed countries where the private funding of research and development (R & D) activities is large. But even in this case, private funding cannot substitute the public funding. Without public funding, developed countries may lose their leading technological position with respect to emerging large economies (some of which use massive public funding of R & D). This displacement may strike the private sector in the corresponding country, and as a consequence, the ability of the private sector to fund R & D may decrease. As a consequence, further displacement of the private sector of the country from world markets may follow.

Public funding of R & D is also extremely important for developing economies, where the ability of the private sector to fund research activities is limited. There are threshold values of many indicators that must be exceeded for successful economic development. One such threshold value is the percentage of GDP spent for R & D. Without sufficient public funding and with very low private funding, this threshold value may not be reached, and the corresponding developing country will remain an economic laggard.

The first hypothesis of Kealey is of limited validity even for developed countries, since the percentage of GDP spent for R & D cannot grow indefinitely. Kealey recognizes this and sets an upper bound of 10 % of GDP. We are far away from this value today (twenty years after Kealey’s book). Different factors have already begun to influence spending for R & D. The increase of R & D funding has slowed in many countries. In other countries, one observes cuts in R & D spending. Hence it is not surprising that the economic growth rates have decreased: an important engine of growth does not have enough fuel.

Kealey’s hypothesis that government funding of civil R & D disproportionately displaces private funding is quite interesting. If one believes in this hypothesis, then a decrease in public funding should lead to an increase in private funding. This is certainly not the case in developing countries. And even in developed countries, if a private company remains without sufficient public R & D support (and without other kinds of support supplied by the state), then it may soon experience problems with competitors from other countries whose governments support public funding of R & D. Such public funding of R & D may be very useful for increasing the competitiveness of a nation’s private companies.

Research systems are open and dissipative. Thus in order to keep such a system far from equilibrium flows of energy, matter and information must be directed toward the system. These flows ensure the possibility of self-organization, i.e., a sequence of transitions toward states of greater organization. If the above-mentioned flows decrease below some threshold level, then the corresponding dissipative structures can no longer exist, and the system may end at a state of equilibrium (with a great deal of chaos and minimal organization). Thus such a decrease can lead to instabilities and the degradation of corresponding systems.

Instabilities (crises) have an important role in the evolution of science. They may lead to changes in the state of research systems. This change may be positive, but it may also lead to destruction of the corresponding systems. Because of this, one has to be very careful in the management of a research system in the critical regime of instability. Appropriate management requires analysis, forecasting, and finding solutions that can lead to ending the instability. Mathematical modeling and quantitative tools are very important for all of the above. For example, the evolution of research fields and systems may be followed very effectively by constructing knowledge maps and landscapes [39].

2 Assessment of Research Systems. Indicators and Indexes of Research Production

In addition to knowledge about (i) the importance of science and (ii) the importance of a sufficient amount of knowledge about specific features of research systems, one may need to know about assessment of research systems and about quantitative tools for such assessment. These important topics have been discussed in Chaps. 2 and 3 of the book. The quality of scientific production is important, since scientific information of high quality produced by researchers may be transformed into advanced technology for the production of high-quality goods and services. In order to manage quality, one introduces certain quality management systems (QMS), which are sets of tools for guiding and controlling an organization with respect to aspects of quality: human resources; working procedures, methodologies, and practices; and technology and know-how. In order to understand research systems, one needs to know about their specific statistical features. One such specific feature is that an important difference may exist between the statistical characteristics of processes in nature and those in society. The statistical characteristics of most natural processes are Gaussian, while those of many social processes are non-Gaussian. Because of this, objects and processes in the social sciences usually depend on many more factors than the objects and processes studied in the natural sciences. And research systems are social systems, too.

The need for multifactor analysis becomes obvious when one has the complex task of evaluating the research production of researchers or groups of researchers. The production of researchers has many quantitative and qualitative characteristics. Because of this, one has to use a combination of qualitative and quantitative methods for a successful evaluation of researchers and their production. One should select carefully the sets of indicators, indexes, and tools for evaluation of research production. The principle of Occam’s razor is valid also in scientometrics. The number of indices applied should be the lowest possible, yet it must still be sufficient. Thus evaluators should apply only those indicators and indexes that are absolutely necessary for the process of evaluation of individual researchers or groups of researchers [1].

Research productivity is closely connected to the communication of the results of research activities. This communication is channelled nowadays in large part through the scientific journals, where the majority of results are published. And most indexes for evaluation have been developed for analysis of research publications (as units of scientific information) and their citations (as units of impact of scientific information). Thus the focus in Chaps. 2 and 3 was on these two groups of indexes and indicators. The characteristics of research productivity that are subject to evaluation usually are latent ones (described by latent variables that are not directly measurable). But by means of systems of indicators and indexes, one may evaluate these latent variables. Usually one needs more than one indicator or index for a good evaluation of a latent variable.

3 Frequency and Rank Approaches to Scientific Production. Importance of the Zipf Distribution

Frequency and rank approaches are appropriate for describing the research production of different classes of researchers. The rank approach is appropriate for describing the production of the class of highly productive researchers, in which there are rarely two researchers with the same number of publications/citations, and the ranking may be constructed effectively. The frequency approach is appropriate for a description of the production of less-productive researchers, many of whom have the same number of publications, and because of this, they cannot be effectively ranked. The areas of dominance of the above-mentioned two approaches are different. The frequency approach is dominant in the natural sciences, while the rank approach is more likely to be used in the social sciences. Because of the central limit theorem, the normal distribution plays a central role in the world of Gaussian distributions . Because of the Gnedenko–Doeblin theorem , the Zipf distribution plays an important role in the world of non-Gaussian distributions. Non-Gaussian power-law distributions occur frequently in the area of dynamics of research systems. A consequence of these laws is the concentration–dispersion effect, leading to the fact that in a research organization, there is usually a small number of highly productive researchers and a large number of less-productive researchers. Let me stress again that the laws discussed in Chap. 4 of this book (and the laws of scientometrics in general) must not be regarded as strict rules (such as, e.g., the laws in physics). Instead of this, the above-mentioned laws should be treated as statistical laws (i.e., as laws representing probabilities). Nevertheless, the statistical laws discussed in the book and the corresponding indicators and indices can be used for evaluation and forecasting: it is likely that a researcher’s paper with large values of his/her h- and g-indexes will be more frequently cited than a paper by a scientist from the same research field whose values of the h- and g-indices are much lower. It is probable that a paper published in a journal that has a large impact (Garfield) factor will be more frequently cited than a paper on the same subject published in a journal with smaller impact factor.

4 Deterministic and Probability Models of Science Dynamics and Research Production

The main focus of this book is on the mathematical tools for assessment of research production, on mathematical modeling of dynamics of research systems, and especially on mathematical models connected to the dynamics of research publications and their citations. Such mathematical models can be deterministic or probabilistic. These two classes of models are discussed in Chap. 5. The deterministic models (e.g., epidemic models, logistic curve models, models of competition between systems of ideas) may be more familiar to the reader. Because of this, Chap. 5 is more focused on probabilistic models. Probabilistic models lead to an explanation of many interesting characteristics connected to the dynamics of research publications and their citations. For example, one can prove the (intuitive) fact that there are publications that will never be cited. Many well-known heavy-tail and other statistical distributions such as the Yule distribution, Waring distribution, negative binomial distribution, and rare event distributions such as the Gumbel distribution, Weibull distribution, etc., are used in these models to describe production/citation dynamics, aging of scientific information, etc. In addition to the statistical laws, two kinds of (Matthew) effects connected to citation information are described. The first effect is that researchers (or journals) that have a relatively high standard may obtain more citations than deserved. This effect is accompanied by a second effect, known as the “invitation paradox”: many papers published in journals with a high impact factor are cited less frequently than expected on the basis of the journal’s impact factor. Thus “for many are called, but few are chosen” (second Matthew effect).

Let us note that there are many more models connected to dynamics of science and technology [1012]. Some of these models are evolutionary models [1316]. In general, the models of science dynamics and technology are some of the mathematical tools, and models connected to social dynamics (for several references, see [1740]), which is a rapidly growing research area drawing the attention of an increasing number of researchers.

5 Remarks on Application of Mathematics

Mathematics is used for the quantification of research structures, processes, and systems [4144]. A large field of research is concerned with the application of mathematical models and statistics to research and to quantify the process of written communication. This field of research is covered by bibliometrics [45, 46]. Bibliometrics is used not only in the area of research evaluation. Methods of bibliometrics are applied, for example, to the investigation of the emergence of new disciplines, the study of interactions between science and technology, and the development of indicators that can be used for planning and evaluation of different aspects of scientific activity [47].

One has to be careful in the use of methods of bibliometrics for research evaluation, since these methods are based on the assumption that carrying out research and communicating the results go hand in hand. This assumption is not true in all cases, e.g., research for military purposes. An additional assumption is that publications can be taken to represent the output of science. This assumption is not true in all cases, e.g., in the case of research for the needs of large corporations, since a significant part of such research is not published. But in the cases in which the assumption holds, the arrays of publications can be quantified and analyzed to study trends of development in science (national, global, etc.) as well as to study the production of scientific groups and institutions.

Mathematical tools are also used in citation analysis. The analysis of citations, however, is not connected only to mathematics. There exist also qualitative aspects such as quality, importance, and the impact of citations on research publications. The quality of a citation is an inherent property of the research work [48]. Judgment of quality can be made only by peers who can evaluate cognitive, technological, and other aspects connected to the scientific work and to the place of the citation in the work. The importance of a citation is based on external appraisal [49]. Importance refers to the potential influence on surrounding research activities. We note that self-citations do not have an external appraisal. Because of this, they are not as important as other citations and are usually excluded from the citation analysis of an evaluated scientist, research group, or organization. Finally, the impact of a citation is also based on external appraisal. The impact of citations reflects their actual influence. A citation reflects to some extent the influence of the cited source on the research community. We note here that review articles are generally more frequently cited than regular research articles. In addition, numbers of citations differ across different areas of scientific research. The impact of citations may be measured by different indicators. Such indicators are, for example, number of citations for the corresponding paper, average number of citations per paper (this measures the impact of the corresponding scientist), number of citations of a paper for the past few (three, four, five, or more) years, age distribution of the citations of the corresponding article, etc. Let us note that citation analysis has other interesting aspects [50, 51], e.g., cocitations [5255] (which can be visualized by the Jaccard index or Salton’s cosine [56]). Cocitation analysis may also be used for visualization of scientific disciplines [57], for detection of research fronts [58], or even as a measure of intellectual structure in a group of researchers [59].

Another field of mathematics that has been much used in recent years in studies on research systems is graph theory and the associated theory of networks [60]. Methods such as mapping and clustering are used for processing citation and cocitation networks, coauthorship networks, and other bibliometric networks [6163], and corresponding software such as Gephi, Pajec, Sci2 [6468] is used for visualization of these networks. In more detail, one may study the organization of large research systems on the basis of the information contained in the nodes and links of the corresponding large networks. There are community-detection methods [69, 70], that reveal important structures (e.g., strongly interconnected modules that often correspond to important functional units) in networks. One such method is the map equation method [71]. Let us consider a network on which a network partition is performed (say the n nodes of the network are grouped into m modules). The map equation specifies the theoretical modular description length L(M) of how concisely we can describe the trajectory of a random walker guided by the possibly weighted directed links of the network. Here M denotes a network partition of the n network nodes into m modules, with each node assigned to a module. The description length L(M) given by the map equation is then minimized over possible network partitions M. The network partition that gives the shortest description length best captures the community structure of the network with respect to the dynamics on the network. The map equation framework is able to capture easily citation flow or flow of ideas, because it operates on the flow induced by the links of the network. Because of this, the map equation method is suitable for analysis of bibliometric networks.

Finally, let us note that an entire research area exists called computational and mathematical organization theory. Researchers working in this area focus on developing and testing organizational theory using formal models [7274]. The models of this theory can be very useful for managers and evaluators of research organizations. Let us mention several areas that employ such models:

  1. 1.

    Innovation diffusion from the point of view of complex systems theory [75];

  2. 2.

    Public funding of nanotechnology [76];

  3. 3.

    Technology innovation alliances and knowledge transfer [77];

  4. 4.

    Attitude change in large organizations [78];

  5. 5.

    Complexity of project dynamics [79];

  6. 6.

    Corruption in education organizations [80];

  7. 7.

    Reputation and meeting techniques for support of collaboration [81];

  8. 8.

    Spreading of behavior in organizations [82];

  9. 9.

    Communication and organizational social networks [83];

  10. 10.

    Politics [84].

6 Several Very Final Remarks

Not everything that counts can be counted, and

not everything that can be counted counts.

Albert Einstein

It is time to end our journey through the huge area of evolution of research systems and assessment of research production. There were two competing concepts as this book was being planned: (i) the concept of a scientific monograph and (ii) the concept of an introductory book with elements of a handbook. The first variant would lead to a book twice as big as it is now. Mathematical theorems would be proved there, indexes and indicators would be discussed in much greater detail, and larger sets of topics would be described. Such a book would meet the expectations of the members of group 3 of potential readers mentioned in the preface. But I wanted to write a book for a much larger set of readers: these from the target groups 1 and 2 from the preface. Because of this, the second concept was realized. The introductory character of the book allowed me to concentrate the text around science dynamics and assessment of important elements of research production. The aspect of a handbook allowed me to describe many indexes and models in a small number of pages. Of course, the realization of the concept of introductory text with the aspect of a handbook led to the fact that many topics from the area of research on have been not discussed. I have not discussed important questions such as how researchers choose the list of references for their publications: What is the motivation to cite some publications and not others? Are there reference standards? Can scientific information be institutionalized? And so on. Instead of this, the focus was set on mathematical tools and models. In addition, some indexes and models have been presented very briefly. This is compensated by a sufficient number of warning messages about the proper use of indexes; by the large number of references, where the reader will find additional information; and by clear statements about the condition of validity of the models discussed. There are numerous examples of calculation of indexes, and many more examples could be (i) provided on the basis of the excellent databases available and (ii) found in the lists of references by the interested reader. My experience shows that the shortest way to become familiar with the indexes and with the conditions for their proper application is to calculate them oneself. So my advice to the reader is to perform many such calculations in order to gain experience about the proper and improper application of the indexes. Many years ago (when I was much younger), I needed about an year of practice before I could begin to apply the quantities and tools of nonlinear time series analysis in a proper way. So be patient, carry out a large enough number of exercises, and the results will come.

This is an introductory book, and the introduction has been made from the point of view of mathematics. Once Paul Dirac said, If there is a God, he’s a great mathematician. The achievements of the mathematical theory of research systems are very useful, for science dynamics and research production have quantitative characteristics, and knowledge about those characteristics may help evaluators to perform appropriate assessment of researchers, research groups, research organizations, and systems. One of Plato’s ideas was that a good decision is based on knowledge (and not only on numbers). I hope that this book may help the reader to understand better the processes and structures connected to the dynamics of science and research production. This may lead to better assessment and management of research structures and systems as well as to increased productivity of researchers. If this book contributes to an increased understanding of complex science dynamics and to better assessment of research even in a single country and even in a small number of research groups in that country, I will be happy, and the goal of the book will have been achieved.