Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introductory Remarks

The interest in models of research dynamics and research production has increased greatly since the publication of the book Little Science, Big Science [1] by Derek de Solla Price in 1963, in which the first systematic approach to the structure of modern science was presented. One began to construct models for the growth of the scientific literature, and this growth was assumed to be exponential (for all of science) but could be also logistic or even linear for some scientific disciplines. In addition, models of aging and obsolescence of scientific information appeared [24]. At approximately the same time as Price, Goffman and Newill [5] developed an intellectual epidemics model of scientific communication. From the point of view of this model, the diffusion of ideas in a population of scientists could be compared to the spreading of a virus in some population, causing an epidemic. The model of Goffman and Newill was followed by other models that connected science dynamics to dynamics of populations. Several such models will be discussed below.

The number of models in the area of research dynamics grows continuously. There are many mathematical models connected to the dynamics of research organizations that may supply useful information for support of assessment of research production. The focus of this book is mainly on science dynamics and on results obtained by research on publications and citations. This focus limits the set of models for discussion and determines the selection of the models presented below. In principle, two kinds of models may be developed: deterministic models and probability models. The discussion below begins with models for dynamics of research publications. First of all, several forms of growth function are described . Then two deterministic models of a kind epidemic (SI model and the Goffman–Newill model) are presented. As an example of a deterministic nonepidemiological model, the Price model of knowledge growth is discussed. The nucleation model of Sangwal for citations dynamics follows, and this is the only deterministic model connected to citation dynamics. The reason for this limited coverage is as follows. A citation may be considered a unit of importance of scientific information. But this unit is small, and in addition, citations may arise more frequently than the larger units of scientific information (research publications). Finally, citations may arise quite irregularly. Thus more attention to citation dynamics is given from the point of view of probability models. The presentation of deterministic models continues with a model of competition of ideas, which is important for the evolution of research structures and systems. Further, the reproduction transport equation model of dynamics of scientific fields is discussed. The part devoted to deterministic models ends with a model of science as a component of the economic growth of a country.

The greater part of the chapter is devoted to probability models. This part begins with several general remarks on Poisson processes and their connection to the distributions of Yule and Waring and to the GIGP distribution. Then a probability model of research publications based on the Yule stochastic process is described . After that, attention is focused on models connected to citations of research publications. These models are for citation dynamics of a set of simultaneously appearing research publications and citation behavior of sets containing subsets of publications published at the same time. The discussion is based on the Poisson distribution and on the mixed Poisson distribution, which will be related to the Yule distribution . Models for aging of scientific information follow (the aging of information is an important topic connected to the dynamics of citations of research publications). Two probability models of the aging of scientific information are considered: a model based on a death stochastic process and a model based on a nonstationary birth process. The last model leads to the Waring distribution and to the negative binomial distribution . The Waring distribution is discussed in greater detail: the truncated Waring distribution and multivariate Waring distribution are described. On the basis of the truncated Waring distribution, a model of brain drain in the case of massive migration through migration channels is mentioned. A description of a variational approach to research production and two models of a production–citation process follows. The GIGP model distribution for bibliometric data is discussed. A master equation model of scientific productivity follows. The chapter ends with a probability model for the importance of the human factor in science.

2 Deterministic Models Connected to Research Publications

2.1 Simple Models. Logistic Curve and Other Models of Growth

One may consider simple exponential or logistic models of the growth of a number of items. For the case of the exponential model, the assumption is that the growth is proportional to the number of existing items,

$$\begin{aligned} \frac{dN}{dt} = k N, \end{aligned}$$
(5.1)

where k is a parameter. The solution of (5.1) is \(N(t) = N_0 \exp (kt)\), where \(N_0\) is the number of available items at \(t=0\). It is of interest to know in many cases when the initial number of items \(N_0\) will double. This time is \(t^* = \ln (2)/k\) for the case of the exponential model. The exponential model, e.g., may be considered an approximation of the initial increase in the number of research publications in a newly established research field (more details follow below).

If we consider a longer time interval, then the initial exponential increase of the number of items may cease. In this case, one may consider another model, the logistic model of growth:

$$\begin{aligned} \frac{dN}{dt} = kN(a - N), \end{aligned}$$
(5.2)

where k and a are (positive) parameters. The solution of the logistic equation (5.2) is

$$\begin{aligned} N = \frac{a}{1+ \left( \frac{a}{N_0} -1\right) \exp (-kat)}. \end{aligned}$$
(5.3)

This solutions has regions of almost exponential growth (when \(N \ll a\), a region of almost linear growth around \(N = a/2\), and a region of saturation (almost negative exponential growth) around \(N \approx a\).

Logistic curves are frequently applied for modeling a variety of processes, e.g., the growth of scientific publications [610]. In order to describe trajectories of growth or decline in socio-technical systems, one generally uses the following three-parameter logistic curve [11]:

$$\begin{aligned} x(t) = \frac{K}{1+ \exp [-\alpha t - \beta ]}, \end{aligned}$$
(5.4)

where the quantities are as follows:

  • x(t): number of units in the species or growing variable to study,

  • K: the asymptotic limit of growth,

  • \(\alpha \): growth rate, which specifies the “width” of the curve for x(t),

  • \(\beta \): specifies the time \(t_m\) when the curve reaches the midpoint of the growth trajectory such that \( x(t_m) = 0.5 \;K\).

The parameters K, \(\alpha \), and \(\beta \) are usually obtained after fitting the available data. It is well known that many cases of epidemic growth can be described by parts of an appropriate logistic curve. But not every interaction scheme leads to logistic growth [12]. The evolution of systems in such regimes may be described by more complex curves such as a combination of two or more simple three-parameter functions [11, 13].

Let us consider in more detail the logistic growth of knowledge and aging of scientific information. The appearance of the logistic curve in this case is a consequence of two processes: an increase in the amount of scientific information and the aging of scientific information. If only increasing of scientific information exists, then the increase may be proportional to the amount of the available information,

$$\begin{aligned} \frac{dx}{dt} = \alpha x \rightarrow x = x_0 \exp (\alpha t), \end{aligned}$$
(5.5)

where \(\alpha \) is a coefficient (the assumption is that each element produces a new element with a constant intensity \(\alpha \)). This leads to exponential growth of scientific information. Such a situation can be observed for new areas of research in which the information is relatively new (and not aged). For more mature research areas, the coefficient \(\alpha \) depends on the amount of information x : \(\alpha = f(x)\) and decreases with the aging of the scientific information. A simple assumption is that the decrease in \(\alpha \) is proportional to x. Then

$$\begin{aligned} \frac{dx}{dt} = (a - bx)x. \end{aligned}$$
(5.6)

Equation (5.6) is the logistic equation. Its solution is

$$\begin{aligned} x(t) = \frac{a}{b[1+ \sigma \exp (-at)]}, \end{aligned}$$
(5.7)

where \(\sigma \) is a coefficient that can be determined from the initial conditions. From (5.7), it follows that the speed of the increase of scientific information is

$$\begin{aligned} \mathrm{Eff}= \frac{dx}{dt} = \frac{\sigma a^2}{b} \frac{\exp (-at)}{\left\{ 1 + \exp [\sigma \exp (-at)]\right\} }. \end{aligned}$$
(5.8)

The quantity Eff can be considered a measure of the effectiveness of the scientific field. This effectiveness (i) increases when the scientific field is new; (ii) passes through a maximum at \(t=\ln (\sigma /a)\) (the maximum “expectation” of the scientific field; (iii) tends to 0 as \(t \rightarrow \infty \) (the scientific field is exhausted).

In general, the growth can be described by the relationship

$$\begin{aligned} \frac{dx}{dt} = \alpha (x) x. \end{aligned}$$
(5.9)

If we are interested in the growth around some value \(x=x_0\), then we can represent \(\alpha (x)\) by a Taylor series,

$$\begin{aligned} \alpha (x) = \alpha (x_0) + \frac{1}{1!} \frac{d \alpha }{d x}\mid _{x=x_0}(x-x_0) + \frac{1}{2!} \frac{d^2 \alpha }{d x^2}\mid _{x=x_0}(x-x_0)^2 + \nonumber \\ \frac{1}{3!} \frac{d^3 \alpha }{d x^3}\mid _{x=x_0}(x-x_0)^3 \dots . \end{aligned}$$
(5.10)

If we use only the first term from (5.10), then the local growth around \(x=x_0\) is exponential. If we have to use the first two terms in (5.10), then the local growth can be logistic. If we have to use the first three or more terms from (5.10), then the local growth is more complicated.

Logistic growth is not the only possible growth connected to the evolution of scientific information. The study of Menard [14] revealed three types of research fields with respect to the type of growth of the total number of publications in a given research field: stable fields (linear or exponential growth at small rates); exponentially growing fields (rapidly growing fields); cyclic fields: cyclic change of periods of stable and fast growth [15, 16]. Let us note the mathematical relationships for several kinds of growth functions that may be of interest to readers who encounter growth phenomena in their research:

  1. 1.

    Gompertz growth function [10]

    $$\begin{aligned} x(t) = D A^{B^t}, \end{aligned}$$
    (5.11)

    where \(D>0\) and \(\log (A) \log (B) >0\).

  2. 2.

    Ware growth function [17]

    $$\begin{aligned} x(t) = \delta (1 - \varphi ^{-t}), \end{aligned}$$
    (5.12)

    where \(\delta >0\) and the constant \(\varphi \) is greater than 1.

  3. 3.

    Power law growth function [16]

    $$\begin{aligned} x(t) = a + b t^{\gamma }, \end{aligned}$$
    (5.13)

    where \(a >0\) and \(b>0\). For \(0< \gamma <1\), the growth is concave and without an upper limit; for \(\gamma =1\), the growth is linear: for \(\gamma >1\), the growth is convex.

2.2 Epidemic Models

Below, we discuss two epidemic models of diffusion of knowledge by research publications. Epidemic models were used originally in population dynamics [1824 ] . And for many years, most models of population dynamics were of interest only to biologists [2530]. Today, these models are applied in many more areas of science [2640]. For the area of research on scientific systems, the epidemic models are of great interest, too. This is so because some stages of processes by which ideas spread within a population, e.g., of scientists, has features that are like those of the spread of epidemics [4143].

Epidemic models are a subclass of the more general class of Lotka–Volterra models [4449] that are used in research on systems in the fields of biological population dynamics, social dynamics, economics, as well as for modeling processes connected to the spread of knowledge, ideas, and innovations [5053 ] .

The central concept of the epidemic models is the concept that scientific results spread to scientific communities by an epidemic diffusion process whereby more and more members of the scientific community are “infected” by the new scientific ideas and results. An important channel for spreading of this “infection” is research publications.

2.3 Change in the Number of Publications in a Research Field. SI (Susceptibles–Infectives) Model of Change in The Number of Researchers Working in a Field

Three basic classes of populations are important in epidemic research: [54]:

  • The susceptibles S, who can become infectives on coming in contact with infectious material (the infectious material in our case is the scientific ideas).

  • The infectives I who host the infectious material.

  • The recovered R who are removed from the epidemic.

Because of this, the name of a class of epidemic models is the SIR-model (susceptibles–infectives–recovered (removed)). Nowakowska [55] discussed several discrete epidemic models for predicting changes in the number of publications in a given scientific field. The main assumption of the models is that the number of publications in the next period of time (say one year) will depend on the number of publications that have recently appeared and on the degree to which the subject has been exhausted. The behavior of the number of publications is considered to be as follows. The numbers of publications appearing in successive periods of time should first increase, then reach a maximum, and as the problem becomes more and more exhausted, the number of publications should decrease. A mathematical relationship that reflects such behavior was proposed by Daley [56]:

$$\begin{aligned} p_{t+1} = c_t p_t \left( N - \sum \limits _{i=1}^t p_i \right) , \end{aligned}$$
(5.14)

where

  • \(p_t\): number of publications written in the period t;

  • N: number of publications that have to appear in order to exhaust the problems in the research field.

  • \(c_t\): coefficient that can be connected to the number of researchers \(x_t\) working in the field: \(c_t = 1 - (1-d)^{x_t}\), where d is a parameter.

The epidemic part of the model is connected to the researchers who produce publications in the corresponding research field. There are researchers who produce publications in the field, and the number of these researchers may change. Some factors contribute to a decrease in the number of researchers (they retire or are no longer interested in the corresponding research problems). And there is a factor that contributes to an increase in the number of authors in the research field: new authors may begin to write publications (young researchers that begin their research career or researchers who became interested in the problems from the corresponding research field). We shall treat the last increase in the number of authors as infection and the entire process as an epidemic.

Let us assume that at a certain moment t, the epidemic’s state is (\(x_t, y_t\)), where

  • \(x_t\) is the number of infectives: authors who write publications in the corresponding scientific field;

  • \(y_t\) is the number of susceptibles.

Then:

  1. 1.

    for a sufficiently short time interval \(\varDelta t\), one may expect that the number of infectives \(x_{t+\varDelta t}\) will be equal to \(x_t - a x_t \varDelta t + b x_t y_t \varDelta t\),

  2. 2.

    while the number of susceptibles \(y_{t + \varDelta t}\) will be equal to \(y_t - b x_t y_t \varDelta t\) (a and b are suitable constants).

Let the expected number of individuals who either “die” or “recover” during the interval (\(t, t + \varDelta t\)), be \(a x_t \varDelta t\), and let \(b x_t y_t \varDelta t\) be the expected number of new infections. The equations of this model are

$$\begin{aligned} x_{t +\varDelta t}= & {} x_t -a x_t \varDelta t + b x_t y_t \varDelta t, \nonumber \\ y_{t+\varDelta t}= & {} y_t - b x_t y_t \varDelta t. \end{aligned}$$
(5.15)

The coefficients a and b may depend on the attractiveness of research field, on its being exhausted, etc. After setting appropriate relationships for a and b, one may investigate numerically the dynamics of the infectives x and susceptibles y, i.e., the dynamics of researchers producing publications in the corresponding research field.

2.4 Goffman–Newill Continuous Model for the Dynamics of Populations of Scientists and Publications

The model discussed above is an example of a discrete model. Now let us consider a continuous epidemic model connected with the dynamics of researchers and publications. Such a model is the Goffman–Newill model.

The Goffman–Newill model of intellectual epidemics is based on the Reed–Frost epidemic model [5759], which was developed during the 1930s by Lowell Reed and Wade Frost, of the Johns Hopkins University. In the Reed–Frost model, one assumes a fixed population of size N. At each time, there is a certain number of cases of disease, C, and a certain number of susceptibles, S. One assumes that each case is infectious for a fixed length of time, and ignores the latent period: when individuals recover, one assumes that they are immune to further infection. During the infectious period of each case, one assumes that susceptibles may be infected and the disease may propagate further. The Goffman–Newill model [5, 60, 61] exploits the idea that the spreading of scientific ideas within a population of scientists can be studied on the basis of the publications of the members of that population. The main process in the model is the transfer of infectious materials (ideas) between humans by means of an intermediate host (a written article).

Let a scientific field be F and SF a subfield of F. We shall use the following notation: \(N_0\), the number of scientists writing papers in the field F at \(t_0\); \(I_0\), the number of scientists writing papers in SF at \(t_0\) (the number of infectives). Thus \(S_0 = N_0 - I_0\) is the number of susceptibles; there is no removal (i.e., no scientists move out of the corresponding population) at \(t_0\), but there is removal R(t) at later times t. In addition, \(N_0'\) is the number of papers produced on F at \(t_0\), and \(I_0'\) is the number of papers produced in SF at this time.

The process of intellectual infection takes place as follows:

  1. 1.

    A member of F is infected by a paper from \(I'\);

  2. 2.

    After some latency period, this infected member produces “infected” papers in \(N'\), i.e., the infected member produces a paper in the subfield SF citing a paper from \(I'\);

  3. 3.

    These’“infected” papers may infect other scientists from F and its subfields, such that the intellectual infection spreads from SF to the other subfields of F.

Let \(\beta \) be the rate at which the susceptibles from class S become “intellectually infected” from class I and let \(\beta '\) be the rate at which the papers in SF are cited by members of F who are producing papers in SF. As the infection process develops, some susceptibles and infectives are removed, i.e., some scientists are no longer active, and some papers are no longer cited. In addition, let \(\gamma \) and \(\gamma '\) be the rates of removal of infectives from the populations I and \(I'\) respectively, and let \(\delta \) and \(\delta '\) be the rates of removal from the populations of susceptibles S and \(S'\). Moreover, there can be a supply of infectives and susceptibles in F and SF. Let the rates of introduction of new susceptibles be \(\mu \) and \(\mu '\) (these are the rates at which new authors and new papers are introduced in F) and let the rates of introduction of new infectives be \(\upsilon \) and \(\upsilon '\) (these are the rates at which new authors and new papers are introduced in SF). In addition, within a short interval of time, a susceptible can remain susceptible or can become an infective or be removed; the infective can remain an infective or can be removed; the removed remains removed; the immunes remain immune and do not return to the population of susceptibles.

Let us impose also the condition that the populations are homogeneously mixed. Then the system of model equations is

$$\begin{aligned} \frac{dS}{dt}= & {} - \beta S I' - \delta S + \mu ; \quad \frac{dI}{dt} = \beta S I' - \gamma I + \upsilon \end{aligned}$$
(5.16)
$$\begin{aligned} \frac{dR}{dt}= & {} \gamma I + \delta S; \frac{dS'}{dt} = - \beta ' S' I - \delta S' + \mu ' \end{aligned}$$
(5.17)
$$\begin{aligned} \frac{dI'}{dt}= & {} \beta ' S' I - \gamma ' I' + \upsilon '; \quad \frac{dR'}{dt} = \gamma ' I' + \delta ' S'. \end{aligned}$$
(5.18)

The conditions for development of an epidemic are as follows:

  1. 1.

    If as an initial condition at \(t_0\), a single infective is introduced into the populations \(N_0\) and \(N'_0\), then for an epidemic to develop, the change in the number of infectives must be positive in both populations.

  2. 2.

    Thus for \( \rho = \frac{\gamma - \upsilon }{\beta }\) and \( \rho ' = \frac{\gamma ' - \upsilon '}{\beta '}, \) the threshold for the epidemic arises from the conditions \(\beta S I' > \gamma I - \upsilon \) and \(\beta ' S' I' > \gamma ' I' - \upsilon '\), so that the threshold is

    $$\begin{aligned} S_0 S'_0 > \rho \rho '. \end{aligned}$$
    (5.19)
  3. 3.

    The development of an epidemic is given by the equation for \(\frac{dI}{dt}\).

  4. 4.

    The peaks of the epidemics occur at time points where \(\frac{d^2 I}{dt^2} =0\), while the epidemic’s size is given by \(I(t \rightarrow \infty )\).

The Goffman–Newill model stimulated much research in the area of modeling of processes in science by models from population dynamics and epidemiology. Let us mention here just the models of the growth of mathematics specialties [62] and of the growth of papers in a specialty [6367]. One can add additional categories of researchers to the SIR type of models. One example of this is the adding of the class of researchers exposed to the corresponding scientific ideas. In such a way, one obtains a class of epidemic SEIR models of research production [68, 69].

2.5 Price Model of Knowledge Growth. Cycles of Growth of Knowledge

An example of nonepidemic model of knowledge growth is the model of Price [70, 71 ] . The model is based on the following assumptions:

  1. 1.

    The growth is measured by the number of important publications appearing at a given time.

  2. 2.

    The growth has a continuous character, and a finite time period \(T = \mathrm{const}\) is needed to build up a result of fundamental character.

  3. 3.

    The interactions between various scientific fields are neglected.

Let in addition the number of scientists publishing results in this field be constant. Then the rate of scientific growth (of the publications x) is proportional to the number of important publications at time t minus the time period T required to build up a fundamental result. The model equation is

$$\begin{aligned} \frac{dx}{dt} = \alpha x(t - T), \end{aligned}$$
(5.20)

where \(\alpha \) is a constant, and the initial condition \(x(t) = \phi (t)\) is defined on the interval \([-T,0]\).

Often, the population of researchers is varying. Then for consideration of the evolution of the average number of papers per researcher instead of the linear right-hand side (5.20), the following nonlinear model is used:

$$\begin{aligned} \frac{dx}{dt}= f(x(t-T),x(t)), \end{aligned}$$
(5.21)

where f is a homogeneous function of degree one. The simplest form of such a function is a linear function. Let us assume that the population of researchers L grows at the constant rate \(n =\frac{1}{L} \frac{dL}{dt}\) and let \(z = x/L\) be the mean number of papers written by a researcher. Then the evolution of the number of papers written by a researcher has the form

$$\begin{aligned} \frac{dz}{dt} = \alpha z(t-T) - n z(t). \end{aligned}$$
(5.22)

We note the following:

  1. 1.

    If \(n=0\) and \(T=0\), the Price model of exponential growth is recovered.

  2. 2.

    Equation (5.22) is linear, but cyclic behavior may appear because of the feedback between the delayed and nondelayed terms.

The Price model was criticized along the following points: the quality of research is omitted, and many scientific products that seem to be new are not really new; creativity and innovation are confused, and creative papers with new ideas and results have the same importance as trivial duplications. Price answered by formulating the hypothesis that one may study only the growth of important discoveries, inventions, and scientific laws, rather than all important and trivial things. Then every growth will follow the same pattern as that mentioned above, but the growth will be much slower.

3 A Deterministic Model Connected to Dynamics of Citations

Sangwal [7275] proposed a model of the growth of citations of a scientist based on the progressive nucleation mechanism known from chemistry [76]. In chemistry, this mechanism describes simultaneous nucleation and growth of a nucleus to crystallites of visible size. If the initial volume of the crystallizing phase is V and the crystallized volume is V(t), then one has the following relationship for the ratio \(V_c/V\):

$$\begin{aligned} \alpha (T) = \frac{V_c(t)}{V} = \left\{ 1 - \exp \left[ - \left( \frac{t}{ \varTheta } \right) ^q\right] \right\} , \end{aligned}$$
(5.23)

where the relationships for the time constant \(\varTheta \) and for the exponent q are

$$ q = 1 + \nu d; \ \ \ \varTheta = \left( \frac{q}{kG^{q-1}J_s} \right) ^{1/q}, $$

and the parameters are as follows:

  • \(\nu >0\): a constant;

  • d: dimension of the growing nucleus (can be 1, 2, 3);

  • k: shape factor of the nucleus (\(k=4\pi /3\) for a spherical nucleus);

  • \(G = \frac{r^{1/\nu }}{t}\);

  • r: radius of the growing nucleus;

  • \(J_s\): rate of stationary nucleation.

When \(k J_s = G\), then \(\varTheta = \frac{q^{1/q}}{k J_s}\), which will be the case of interest for us. In this case, the nuclear radius grows in time as \(r(t) \propto t^\nu \).

The process of nucleation can also be used to describe the growth of citations of a paper written by scientist. In this case,

$$\begin{aligned} \alpha (t) = \alpha (t) = \frac{C(t)}{C_{max}} = \left\{ 1 - \exp \left[ - \left( \frac{t}{ \varTheta } \right) ^q\right] \right\} , \end{aligned}$$
(5.24)

where C is the maximum number of citations that a paper can receive, and C(t) is the cumulative number of citations of the paper in the time t. The other parameters are defined as above (we recall that (\(\varTheta = \frac{q^{1/q}}{k J_s}\)). The nucleation model can be transferred to a description of the accumulation of citations of a paper if several conditions are met:

  • Citations received by a paper and the paper earning these citations compose a closed system in which the process of occurrence of citations is stationary.

  • Occurrence of citations of a paper continues in time and finally approaches a constant value \(C_{max}\), which is the maximum number of citations received by the paper at time T.

  • The dependence of the cumulative number of citations C(t) of the paper at time t is determined by the maximum number of citations \(C_{max}\), a time constant \(\varTheta \), and an exponent q. The citation pattern of different papers of an author is characterized by different values of C(t), \(\varTheta \), and q for each paper.

If a researcher has authored n papers, then the cumulative fraction \(\alpha _s(t)\) of the citations of these papers is

$$\begin{aligned} \alpha _s(t) = \sum \limits _{i=0}^n \alpha _i(t). \end{aligned}$$
(5.25)

If we assume that the researcher publishes papers at equal time intervals \(\varDelta T\), then

$$\begin{aligned} \alpha _s(t) = \sum \limits _{i=0}^n \alpha _i[t-(i-1)\varDelta T] = \sum \limits _{i=1}^n \left\{ 1 - \exp \left[ - \left( \frac{t-(i-1) \varDelta T}{\varTheta _i} \right) \right] \right\} . \end{aligned}$$
(5.26)

One can fit the model parameters for the data of the researcher whose production is evaluated. In most cases, the fit describes very well the process of accumulation of citations [75].

4 Deterministic Models Connected to Research Dynamics

4.1 Continuous Model of Competition Between Systems of Ideas

Ideas can diffuse not only among scientists in one organization but also in space (e.g., from scientists from one country to scientists from other countries). Thus one may include spatial variables in the models describing the diffusion of ideas. Such models can be of great interest during periods of globalization of economies, knowledge, and technology [7782]. Below, we describe a model closely connected to the space–time models of migration of populations [83, 84].

The diffusion of ideas is often accompanied by competition between systems of ideas. Let a population of N individuals occupy a two-dimensional plane. We assume that:

  • there exists a set of ideas \(P=\{P_{0},P_{1}, \dots , P_{n} \}\);

  • \(N_i\) members of the population are followers of the set \(P_i\) of ideas;

  • members \(N_0\) of the class \(P_{0}\) are not supporters of any set of ideas.

In such a way, the population is divided into \(n+1\) subpopulations of followers of different sets of ideas, and \(N = N_{0} + N_{1} + \dots + N_{n}\). Let a small region \(\varDelta S = \varDelta x \varDelta y\) be selected in the plane. In this region, there are \(\varDelta N_{i}\) individuals holding the ith set of ideas, \(i=0,1,\dots ,n\). If \(\varDelta S\) is sufficiently small, the density of the ith population can be defined as \(\rho _{i} (x, y, t) = \frac{\varDelta N_{i}}{\varDelta S}\). Further, we assume that members of the ith population are capable of moving through the borders of the area \(\varDelta S\). Let \(\mathbf {j}_{i}(x,y,t)\) be the current of this movement. The total change in the number of members of the ith population is

$$\begin{aligned} \frac{\partial \rho _{i}}{\partial t} + \mathrm{div} \mathbf {j}_{i} = C_{i}, \end{aligned}$$
(5.27)

where the changes are summarized by the function \(C_{i}(x,y,t)\).

The first term in (5.27) describes the net rate of increase of the density of the ith population. The second term describes the net rate of immigration into the area. The right-hand side of (5.27) describes the net rate of increase exclusive of immigration. The quantities \(\mathbf {j}_{i}\) and \(C_{i}\) are as follows: \(\mathbf {j}_{i}\) is assumed to have two parts, a nondiffusion part \(\mathbf {j}_{i}^{(1)}\) and a diffusion part \(\mathbf {j}_{i}^{(2)}\) that is assumed to have the general form of a linear multicomponent diffusion [77] (\(D_{ik}\) is the coefficient of diffusion):

$$\begin{aligned} \mathbf {j}_{i} = \mathbf {j}_{i}^{(1)} + \mathbf {j}_{2}^{(2)} = \mathbf {j}_{i}^{(1)} - \sum _{k=0}^{n} D_{ik} (\rho _{i}, \rho _{k},x, y, t) \nabla \rho _{k}. \end{aligned}$$
(5.28)

A further assumption is that some of the followers of the set of ideas \(P_{i}\) are capable of changing to another set of ideas, e.g., they can change \(P_{i}\) for \(P_{j}\). It can be assumed that the following processes can occur with respect to the members of the subpopulations:

  • Deaths: described by a term \(r_{i} \rho _{i}\). We assume that the number of deaths in the ith population is proportional to its population density. In general, \(r_{i}=r_{i}(\rho _{\nu }, x, y, t; p_{\mu })\), where \(\rho _{\nu }\) stands for (\(\rho _{0}, \rho _{1}, \dots , \rho _{N}\)) and \(p_{\mu }\) stands for \((p_{1},\dots , p_{M})\), containing parameters of the environment.

  • Noncontact conversion: in this class are included kinds of changes between \(P_i\) and \(P_j\) exclusive of changes after interpersonal contact between the members of populations. A reason for noncontact conversion can be the existence of different kinds of mass communication media (scientific books, influence of mass media, etc.). For the ith population, the change in the number of members by this kind of conversion is \(\sum _{j=0}^{n} f_{ij} \rho _{j}\), \(f_{ii}=0\). In general, \(f_{ij}=f_{ij}(\rho _{\nu }, x, y, t; p_{\mu })\).

  • Contact conversion: this happens by interpersonal contacts among the members of the population. Such contacts can happen between members in groups consisting of two members (binary contacts), three members (ternary contacts), four members, etc. As a result of the contacts, members of each population can change their sets of ideas. For binary contacts, let it be assumed that the probability of change for a member of the jth population is proportional to the probability of, for instance, the number of contacts, i.e., proportional to the density of the ith population. Then the total number of “conversions” from \(P_{j}\) to \(P_{i}\) is \(a_{ij} \rho _{i} \rho _{j}\), where \(a_{ij}\) is a parameter.

    Next, a change in the set of ideas can take place by ternary contact. For this, one must have a group of three members. We assume that such a group exists with a probability proportional to the corresponding densities of the concerned populations. In a ternary contact between members of the ith, jth, and kth populations, members of the jth and kth populations can change their sets of ideas to \(P_{i}\) = \(b_{ijk} \rho _{i} \rho _{j} \rho _{k}\), where \(b_{ijk}\) is a parameter. In general, \( a_{ij}=a_{ij}(\rho _{\nu }, x, y, t; p_{\mu }) \); \( b_{ijk} = b_{ijk}(\rho _{\nu }, x, y, t; p_{\mu }) \); etc.

On the basis of all of the above the \(C_{i}\) term can be written as

$$\begin{aligned} C_{i}=r_{i} \rho _{i} + \sum _{j=0}^{n} f_{ij} \rho _{j} + \sum _{j=0}^{n} a_{ij} \rho _{i} \rho _{j} + \sum _{j,k=0}^{n} b_{ijk} \rho _{i} \rho _{j} \rho _{k}+ \dots .\end{aligned}$$
(5.29)

Hence the model system of equations is

$$\begin{aligned} \frac{\partial \rho _{i}}{\partial t} + \mathrm{div} \mathbf {j}_{i}^{(1)} - \sum _{j=0}^{n} \mathrm{div} (D_{ij} \nabla \rho _{j}) = r_{i} \rho _{i} + \sum _{j=0}^{n} f_{ij} \rho _{j} + \nonumber \\ \sum _{j=0}^{n} a_{ij} \rho _{i} \rho _{j} + \sum _{j,k=0}^{n} b_{ijk} \rho _{i} \rho _{j} \rho _{k}+ \dots . \end{aligned}$$
(5.30)

The density of the entire population is \(\rho = \sum _{i=0}^{n} \rho _{i}\). This density can change over time. One possible assumption is that \(\rho \) changes over time according to the Verhulst law

$$\begin{aligned} \frac{\partial \rho }{\partial t} = r \rho \left( 1- \frac{\rho }{C} \right) , \end{aligned}$$
(5.31)

where \(C(\rho _{\nu }, x, y, t; p_{\mu })\) is the carrying capacity of the environment and \(r(\rho _{\nu }, x, y, t; p_{\mu })\) is a positive or negative growth rate.

Now let us consider the case in which the current \(\mathbf {j}_{i}^{(1)}\) is negligible, i.e., \(\mathbf {j}_{i}^{(1)} \approx 0\). In addition, we consider only the case in which all parameters are constants. The model system of equations becomes

$$\begin{aligned} \frac{\partial \rho _{i}}{\partial t} - D_{ij} \sum _{j=0}^{n} \varDelta \rho _{j} = r_{i} \rho _{i} + \sum _{j=0}^{n} f_{ij} \rho _{j} + \sum _{j=0}^{n} a_{ij} \rho _{i} \rho _{j} + \nonumber \\ \sum _{j,k=0}^{n} b_{ijk} \rho _{i} \rho _{j} \rho _{k}+ \dots , \end{aligned}$$
(5.32)

where

$$\begin{aligned} \;\;\;\; \varDelta = \frac{\partial ^{2}}{ \partial x^{2}} + \frac{\partial ^{2}}{\partial y^{2}}, i=0,1,2,\dots ,n. \end{aligned}$$
(5.33)

Next we shall separate the dynamics of averaged quantities from the dynamics of fluctuations. If q(xyt) is a quantity defined in an area S, then the corresponding plane averaged quantity is

$$\begin{aligned} \overline{q} = \frac{1}{S} \int \int _{S} dx dy \ q(x,y,t). \end{aligned}$$
(5.34)

The fluctuations are denoted by Q(xyt):

$$\begin{aligned} q(x,y,t)= \overline{q}(t) + Q (x,y,t). \end{aligned}$$
(5.35)

We assume that territory S is large enough; every plane averaged combination of fluctuations vanishes; \(\int \int _{S} dx dy \varDelta Q_{k}\) is finite. Then \(\overline{\varDelta Q_{k}} = \frac{1}{S} \int \int _{S} dx dy \varDelta Q_{k} \rightarrow 0\). On the basis of these assumptions, the dynamics of the averaged quantities are separated from the dynamics of fluctuations by means of a plane averaging of (5.32). The result is

$$\begin{aligned} \overline{\rho }_{0} = \overline{\rho } - \sum _{i=1}^{n} \overline{\rho }_{i}; \frac{d \overline{\rho }}{d t} = r \overline{\rho } \left( 1 - \frac{\overline{\rho }}{C} \right) \end{aligned}$$
(5.36)
$$\begin{aligned} \frac{d \overline{\rho }_{i}}{dt} = r_{i} \overline{\rho }_{i} + \sum _{j=0}^{n} f_{ij} \overline{\rho }_{j} + \sum _{j=0}^{n} a_{ij} \overline{\rho }_{i} \overline{\rho }_{j} + \sum _{j,k=0}^{n} b_{ijk} \overline{\rho }_{i} \overline{\rho }_{j}\overline{\rho }_{k} + \dots . \end{aligned}$$
(5.37)

Instead of (5.36), we can write an equation for \(\overline{\rho }_0\) of the type of (5.37). Then the total population density \(\overline{\rho }\) will not follow the Verhulst law.

Equations (5.36) and (5.37) represent the model of competition among sets of ideas proposed in [85]. There also exists a discrete version of this model [86], and it can be applied to competition between different sets of ideas (scientific, political, religious, technological, etc.).

4.2 Reproduction–Transport Equation Model of the Evolution of Scientific Subfields

By means of migration, people can move from one territory to another. The change of the field of research by a scientist may also be considered a migration process [82, 87]. In order to study this, let us map research problems by sequences of signal words or macro-terms \(P_i = (m_i^1, m_i^2,\dots , m_i^k, \dots , m_i^n)\), which are registered according to the frequency of their occurrence in the texts. Then:

  • Each point of the problem space, described by a vector \(\mathbf {q}\), corresponds to a research problem, with the problem space containing all scientific problems (no matter whether they are under investigation or not).

  • The scientists distribute themselves over the space of scientific problems with density \(x(\mathbf {q},t)\). Thus there is a number \(x(\mathbf {q},t)d\mathbf {q}\) of scientists working at time t in the element \(d \mathbf {q}\).

  • The field mobility processes correspond to a density change of scientists in the problem space, i.e., instead of working on problem \(\mathbf {q}\), a scientist may begin to work on problem \(\mathbf {q'}\).

  • As a result, \(x(\mathbf {q},t)\) decreases and \(x(\mathbf {q'},t)\) increases.

This movement of scientists can be described by means of a reproduction–transport equation:

$$\begin{aligned} \frac{\partial x(\mathbf {q},t)}{\partial t} = x (\mathbf {q},t) \ w(\mathbf {q} \mid t) + \frac{\partial }{\partial \mathbf {q}} \left( f(\mathbf {q},x) + D(\mathbf {q}) \frac{\partial \mathbf {q}}{\partial x} \right) . \end{aligned}$$
(5.38)

In (5.38), self-reproduction and decline are represented by the term \(w(\mathbf {q} \mid x) \ x(\mathbf {q}, t)\); for the reproduction rate function \(w(\mathbf {q} \mid x)\), one can write the relationship

$$\begin{aligned} w(\mathbf {q} \mid x) = a(\mathbf {q}) + \int d \mathbf {q'} b(\mathbf {q}, \mathbf {q'} x(\mathbf {q},t). \end{aligned}$$
(5.39)

The local value of \(a(\mathbf {q})\) is an expression of the rate at which the number of scientists in field \(\mathbf {q}\) is growing through self-reproduction and decline. The function \(b(\mathbf {q},\mathbf {q'})\) describes the influence exerted on the field \(\mathbf {q}\) by the neighboring field \(\mathbf {q'}\). The field mobility is modeled by means of the term \(\frac{\partial }{\partial \mathbf {q}} \left( f(\mathbf {q},x) + D(\mathbf {q}) \frac{\partial }{\partial \mathbf {q}} x(\mathbf {q},t) \right) \).

In order to use this equation, we need initial conditions and determination of the coefficients on the basis of statistical data for the distribution of the scientists with respect to the research problems.

4.3 Deterministic Model of Science as a Component of the Economic Growth of a Country

Below we discuss a component of the model of evolution of the GDP (gross domestic product) of a country. This component is connected to the role of technology for increasing GDP [8890].

The GDP of a country may grow extensively by inflow of workforce or capital to the national economic structures and systems [91]. But the GDP of a country may grow also intensively by advancement in science and technology. Let us discuss a simple model in which the GDP Y has the form

$$\begin{aligned} Y(t) = Y (L(t), C(t), T(t)). \end{aligned}$$
(5.40)

The quantities in (5.40) are as follows:

  • L(t): labor (human resources);

  • C(t): production resources;

  • T(t): technology level .

Note that the above quantities are not chosen arbitrarily. They represents important factors that may influence the GDP of a country.

The change in the GDP over time is given by

$$\begin{aligned} \frac{dY}{dt} = \frac{\partial Y}{\partial L} \frac{dL}{dt} + \frac{\partial Y}{\partial C} \frac{dC}{dt} + \frac{\partial Y}{\partial T}. \frac{dT}{dt}. \end{aligned}$$
(5.41)

The term \((\partial Y/\partial T) (dT/dt)\) describes the change in the GDP because of the evolution of technology . This component of the change of the GDP will be of interest for us below. Let us note that if technology advances, \(\left( (dT/dt)>0\right) \), this is a contribution to the growth of the GDP. If technology for some reason deteriorates, \(\left( (dT/dt)<0\right) \), then it can contribute to a decrease in the GDP.

The change in the GDP due to technology may be assumed to be [92]

$$\begin{aligned} \frac{\partial Y}{\partial T} = \frac{Y}{T}. \end{aligned}$$
(5.42)

Equation (5.42) means that the increase in the technology level leads to a proportional increase of the GDP. Then the studied term from (5.41) becomes

$$\begin{aligned} \frac{\partial Y}{\partial T} \frac{dT}{dt} = Y \left( \frac{1}{T} \frac{dT}{dt}\right) . \end{aligned}$$
(5.43)

Next we shall discuss how the term (1 / T) (dT / dt) depends on \(S_T\): the growth in knowledge about technology. Then the growth in knowledge about technology will be connected to the growth in scientific knowledge, which will be denoted by S.

We adopt the following notation:

  • \(I_T\): the investment directed to applications of the results of new technologies (machines, processes, etc.);

  • \(I_0\): the investments in older technologies;

  • \(\gamma \): coefficient of proportionality between the growth of knowledge about technology \(S_T\) and growth of scientific knowledge S.

Then the relationship between T and S is

$$\begin{aligned} \frac{1}{T} \frac{dT}{dt} = \gamma \frac{I_T}{I_O} \frac{1}{S} \frac{dS}{dt}. \end{aligned}$$
(5.44)

Equation (5.44) leads to the following conclusions:

  1. 1.

    Importance of the fundamental research: Research and especially fundamental research lead to an increase in scientific knowledge. If there is no growth in scientific knowledge, \(\left( (dS/dt) =0 \right) \), then there is no technological evolution, \(\left( (1/T) (dT/dt) =0\right) \), and an important factor for the growth of the national GDP is lost .

  2. 2.

    Importance of the transfer of scientific knowledge to knowledge about technology: If \(\gamma =0\), i.e., there is no transfer, then \(\left( (1/T) (dT/dt) =0\right) \) (no technology evolution) even if scientific knowledge grows. Thus what is important for a country is to increase \(\gamma \) (by strengthening engineering sciences by creating new engineering institutes, for example). The value of \(\gamma \) for developed countries is about 0.5 (\(1\,\%\) growth in scientific knowledge results in \(0.5\,\%\) growth in the number of patents).

  3. 3.

    Importance of investment in new technologies: If there is no such investment (\(I_T=0\)), then there is no evolution of technology, \(\left( (1/T) (dT/dt) =0 \right) \), even if there is growth of scientific knowledge and an intensive transfer of knowledge about technology .

The rate of growth of scientific knowledge (1 / S) (dS / dt) is assumed to depend on two main factors: the funding of (investment in) science I and the labor L (“human resources” or the number of qualified scientists). Let us set

$$\begin{aligned} \frac{1}{S} \frac{dS}{dT} = \phi (I,L). \end{aligned}$$
(5.45)

Let us assume that \(\phi (I,L)\) is a homogeneous function of degree \(\alpha \) with respect to the funding I and a homogeneous function of the factor \(\beta \) with respect to the human resources L. Then we can obtain the relationship

$$\begin{aligned} \phi = a I^\alpha L^\beta = \frac{1}{S} \frac{dS}{dt}, \end{aligned}$$
(5.46)

where a is a coefficient of integration. Hence a power-law relationship may exist between the rate of growth of scientific knowledge and investment and the number of qualified scientists . We stress the words power law, since such laws arise frequently in studies of research systems (for examples, see Chap. 5).

Equation (5.46) leads to interesting conclusions.

  1. 1.

    Exponential growth of knowledge in an established research area. Let us consider an established research area with constant investment in science: \(I=\mathrm{const}\) and a constant number of qualified scientists \(L=\mathrm{const}\). From (5.46), we obtain the relationship

    $$\begin{aligned} S = S_0 \exp [a I^\alpha L^\beta t] \end{aligned}$$
    (5.47)

    (\(S_0\) is a constant of integration), which means that the scientific knowledge in this area is growing exponentially.

  2. 2.

    Double-exponential growth of scientific knowledge in a new research area. Let us now consider a new research area in which the number of scientists grows exponentially over time, \(L=\exp (\mu t)\), and the funding is constant: \(I=\mathrm{const}\) and large enough. Then the growth of scientific knowledge in this area is double-exponential,

    $$\begin{aligned} S = S_0 \exp \left[ \frac{aI^\alpha }{\mu \beta } \exp (\mu \beta t) \right] . \end{aligned}$$
    (5.48)

The substitution of (5.44)–(5.46) in (5.43) leads to the following relationship for the influence of science on the change of GDP of a country:

$$\begin{aligned} \frac{\partial Y}{\partial T} \frac{dT}{dt} = \gamma a \frac{I_T}{I_O} I^\alpha L^\beta Y. \end{aligned}$$
(5.49)

Equation (5.49) shows that countries that have a large GDP possess advantages (since \( \frac{\partial Y}{\partial T} \frac{dT}{dt} \propto Y\)), and in addition, the human factor and investment in science are very important. Thus every nation should try to build a community of qualified researchers and should invest sufficiently in the national research system . If this is not the case, then the process of global competition among the nations will lead inevitably to a brain drain .

The model above represents a global point of view of the importance of science as a component of economic growth of a country. There exists also a local point of view regarding this importance. A local point of view means that one considers the growth of the output of a worker with advancing technology. A mathematical model of this relationship may be based on the Cobb–Douglass production function and on the Solow model . The form of the Cobb–Douglass production function is [93, 94 ]

$$\begin{aligned} Y = A K^\alpha L^{1-\alpha }, \end{aligned}$$
(5.50)

where

  • Y: output per worker;

  • K: physical capital per worker;

  • L: human capital per worker (labor);

  • A: productivity;

  • \(\alpha \): output elasticity of the physical capital;

  • \(\beta = 1- \alpha \): output elasticity of the human capital.

Looking at (5.50), we can conclude that technological advance allows (by increasing productivity) given quantities of physical and human capital to be combined to produce more output than was possible when older technology was used. Hence changes in technology directly affect economic growth . In addition, human capital L per worker cannot grow infinitely. Then in order to increase the output Y, one has to increase the physical capital K per worker (there are also limits to this increase), or one can increase productivity A by advancing technology. Thus even when K and L have reached their maximum values, as long as A (productivity) continues to grow as a consequence of technological advance, income per capita will continue to grow too.

The result of the mathematical theory is that the rate of growth of the total output \(Y^*= (1/Y)(dY/dt)\) per worker (in the steady state of the production system) is connected to the growth of productivity A (which means that there is a strong connection between the growth of the total output and technological progress) . Namely, if the rate of advance of technology is \(A^* = (1/A)(dA/dt)\), then

$$\begin{aligned} Y^* = A^* \left( \frac{1}{1-\alpha } \right) . \end{aligned}$$
(5.51)

Equation (5.51) tells us that technological advance (by research and development) is extremely important for economic growth .

5 Several General Remarks About Probability Models and Corresponding Processes

In many cases, in the mathematical models of mechanisms of production of scientific information, one uses the concept of population of “sources” producing “items” observed over time [95]. The observation of the items produced by a source is equivalent to the observation of a stochastic point process: a sequence of events occurring randomly in time. The modeling of the corresponding process requires specification of the probabilistic mechanism producing the observed events.

The simplest available point process is the Poisson process, which corresponds to the situation that events occur completely at random over time with the overall average rate of occurrence remaining constant, so that the expected number of events occurring increases linearly with time.

In order to model more realistic situations, the rate of the Poisson process may:

  1. 1.

    vary in time deterministically [96]. In this case, the number of occurring events may have nonlinear variation in time, and the process is called an inhomogeneous Poisson process;

  2. 2.

    vary in time stochastically [97, 98]. Such a process is called a doubly stochastic Poisson process or Cox process.

Each of the three Poisson processes described above has independent increments. The Poisson process and the doubly stochastic Poisson process have stationary increments. Thus they are able to model situations in which the probability distribution of the number of events in a period of time depends only on the length of the period and not on the time at which it begins.

When the entire population of sources is studied, it may happen that some variability in the rate of production between different items exists. The observed process is then a mixture of the individual processes, and it can be modeled mathematically by mixing the parameters determining the rates of production of the individual sources. The resulting mixed process may still have stationary increments, but because of the mixing, the increments are no longer independent.

We are going to describe briefly three kinds of Poisson processes that will arise in the models discussed below: the Greenwood–Yule process (gamma–Poisson process), GIGP (generalized inverse Gaussian–Poisson process), and Waring process (a negative binomial process) [95]. Let us consider a source that produces \(X_t\) (\(t \ge 0\)) items in the interval [0, t]. The process of production of items (the point process) is specified by a parameter \(\theta \), and we know the form of the process \(\{X_t \mid \theta \}\) for a given value of \(\theta \). For given \(\theta \), the increments of the process are stationary but not independent, and

$$\begin{aligned} p(X_t=r) = E_\theta P(X_t = r \mid \theta ) = \int dx f_\theta (x) p(X_t=r \mid \theta =x). \end{aligned}$$
(5.52)

The above-mentioned three processes will be obtained by specifying the probability distribution function \(f_\theta (x)\) and the form of the conditional process \(\{ X_t\mid \theta \}\). For example, in order to obtain the Greenwood–Yule process (called also gamma–Poisson process), we have to assume that each source produces items as a Poisson process and the probability distribution function is for the gamma distribution. In detail,

$$\begin{aligned} p(X_t=r \mid \lambda ) = \exp (-\lambda t) \frac{(\lambda t)^r}{r!}; \ \ r=0,1,\dots , \end{aligned}$$
(5.53)

where \(\lambda \) is the rate of the Poisson process; \(\lambda \) has a gamma distribution with scale parameter \(\beta \) and index \(\nu \):

$$\begin{aligned} f_\lambda (x) = \frac{\beta ^{-\nu } x^{\nu -1}}{\varGamma (\nu )} \exp (- \frac{x}{\beta }); \ \ x>0. \end{aligned}$$
(5.54)

As a result of substituting (5.53) and (5.54) in (5.52), one obtains the negative binomial distribution of index \(\nu \) and parameter \(p_t=1/(1+\beta t)\):

$$\begin{aligned} p(X_t = r) = \left( {\begin{array}{c}r+ \nu - 1\\ r\end{array}}\right) \left( {\begin{array}{c}1\\ 1+ \beta t\end{array}}\right) ^\nu \left( {\begin{array}{c}\beta t\\ 1+ \beta t\end{array}}\right) ^r; \ \ r=0,1,\dots . \end{aligned}$$
(5.55)

The GIGP (generalized inverse Gaussian–Poisson process) is obtained when the probability distribution function for the rate \(\lambda \) of the Poisson process (5.53) is

$$\begin{aligned} f_\lambda (x) = c(\alpha ,\gamma ,\theta ) x^{\gamma -1} \exp \left[ -x \left( \frac{1}{\theta } -1 \right) - \frac{\alpha ^2 \theta }{4x} \right] , \end{aligned}$$
(5.56)

where \(x>0\); \(-\infty< \gamma < \infty \); \(\alpha \ge 0\), and the constant ensuring the normalization is

$$\begin{aligned} c(\alpha ,\gamma ,\theta ) = \frac{(1-\theta )^{\gamma /2}}{2 (\alpha \theta /2)^\gamma } K_\gamma \{\alpha (1-\theta )^{1/2} \}, \end{aligned}$$
(5.57)

where \(K_\gamma \{\alpha (1-\theta )^{1/2}\}\) is the modified Bessel function of the second kind of order \(\gamma \). The substitution of the density (5.56) in (5.52) leads to the distribution

$$\begin{aligned} p(X_t = r) = \frac{(1-\theta _t)^{\gamma /2}}{K_\gamma \{ \alpha (1-\theta )^{1/2}\}} \frac{(\alpha _t \theta _t/2)^r}{r!} K_{r+ \gamma }(\alpha _t); \ \ r=0,1,\dots , \end{aligned}$$
(5.58)

where \(\theta _t = (t \theta )/[1+ \theta (t-1)]\) and \(\alpha _t = \alpha [1+(t-1) \theta ]^{1/2}\). This distribution is reduced to the GIGP distribution when \(t=1\) (then \(\theta _t = \theta \) and \(\alpha _t = \alpha \)). Because of this, the process \(X_t\) described by (5.58) will be called a GIGP process and may be denoted by GIGP\((\alpha _t,\theta _t,\gamma )\). Sichel [99, 100] used \(\gamma = -1/2\), i.e., the GIGP\((\alpha _t,\theta _t,-1/2)\) distribution

$$\begin{aligned} p(X_t=r) = \left( \frac{2 \alpha _t}{\pi } \right) ^{1/2} \exp [\alpha (1-\theta )^{1/2}] \frac{(\alpha _t \theta _t/2)^r}{r!} K_{r-1/2}(\alpha _t); \ \ r = 0,1,\dots , \end{aligned}$$
(5.59)

in many practical applications.

Finally, we consider the Waring process (which will be much discussed below in the text). For this process, each source produces items as a negative binomial process of parameter q and index \(\psi \):

$$\begin{aligned} p(X_t=r\mid q) = \left( {\begin{array}{c}r+ \psi t -1\\ r\end{array}}\right) q^{\psi t} (1-q)^r; \ \ r=0,1,\dots , \end{aligned}$$
(5.60)

and the parameter q has a beta distribution with parameters a and b:

$$\begin{aligned} f_p(x) = \frac{1}{B(a,b)} \frac{\psi ^a x^{b-1}}{(x+\psi )^{a+b}}. \end{aligned}$$
(5.61)

The substitution of (5.60) and (5.61) in (5.52) leads to

$$\begin{aligned} p(X_t=r) = \frac{\varGamma (\psi t + a)}{B(a,b) \varGamma (\psi t)} \frac{\varGamma (r+\psi t) \varGamma (r+b)}{r! \varGamma (r+\psi t+a+b)}. \end{aligned}$$
(5.62)

Equation (5.62) describes the generalized Waring distribution [101103]; \(\varGamma \) is the gamma function, and B is the beta function.

Some remarks about the moments of the obtained distributions follow. Moments of all orders exist for the gamma–Poisson distribution and for the GIGP distribution. For the existence of moments of the generalized Waring distribution, one has to impose some requirements on the parameters of the distribution. For the gamma–Poisson distribution, the mean \(E[X_t]\) and the variance \(V[X_t]\) are

$$\begin{aligned} E[X_t] = \nu \beta t, \end{aligned}$$
(5.63)
$$\begin{aligned} V[X_t] = \nu \beta t (1 + \beta t). \end{aligned}$$
(5.64)

For the GIGP distribution with \(\gamma = -1/2\),

$$\begin{aligned} E[X_t] = \frac{\alpha \theta t}{2(1-\theta )^{1/2}}, \end{aligned}$$
(5.65)
$$\begin{aligned} V(X_t) = \frac{\alpha \theta t}{4(1-\theta )^{3/2}}[2(1-\theta )+t \theta ]. \end{aligned}$$
(5.66)

For the generalized Waring distribution,

$$\begin{aligned} E[X_t] = \frac{\psi bt}{a-1}; \ \ a>1, \end{aligned}$$
(5.67)
$$\begin{aligned} V(X_t) = \frac{\psi b (a+b-1)}{(a-1)^2(a-2)}(a-1+ \psi t); \ \ a>2. \end{aligned}$$
(5.68)

6 Probability Model for Research Publications. Yule Process

Probability models are very interesting and powerful tools for the study of the dynamics of research systems and characteristics of research production. Let us demonstrate this with a discussion of a probability model of dynamics of research publications [104] that will lead us to the famous statistical distribution of Yule .

Let us now consider scientific publications from the following point of view. A researcher has x publications. Then he/she writes one more publication, and we shall consider this as a transition to another state characterized by \(x+1\) publications. The occurrence of a new publication is a rare event, and because of this, we shall consider the process of the occurrence of a new publication to be a Poisson pure multiplicative random process where the probability of transition to a new state in the time interval \((t,t+ \varDelta t)\) depends on the state of the system at time t.

6.1 Definition, Initial Conditions, and Differential Equations for the Process

We begin our study at the point in time where a studied researcher has one publication. Let \(p_x(t)\) be the probability that a researcher has x publications at time t. Then the initial condition is \(p_x(0)=1\) if \(x=1\) and \(p_x(0) =0 \) if \(x \ne 1\). The process evolves according to the following two rules:

  1. 1.

    The probability of a transition from state x to state \(x+1\) in the interval \((t, t+ \varDelta t)\) is proportional to the interval \(\varDelta t\). We denote this probability by \(\lambda (x)\varDelta t\).

  2. 2.

    The probability of two or more transitions for the interval \(\varDelta t\) is negligibly small.

Because of the above rules, the probability of a lack of transition between the states x and \(x+1\) in the time interval \((t,t+\varDelta t)\) is \(1-\lambda (x) \varDelta t\).

The probability that our system (the researcher) is in the state x (has x publications) for the interval \((t, t+ \varDelta t)\) is the sum of the probability that the system jumped there from the state \(x-1\) within the time interval and the probability that the system has not jumped to the next state \(x+1\) within the time interval. In symbols, this reads

$$\begin{aligned} p_x(t+\varDelta t) = [1 - \lambda (x) \varDelta t]p_x(t) + \lambda (x-1)p_{x-1}(t) \varDelta t. \end{aligned}$$
(5.69)

This can be written as the following system of differential equations for the probability:

$$\begin{aligned} \frac{d p_0(t)}{dt}= & {} - \lambda _0 p_0(t), \nonumber \\ \frac{d p_x(t)}{dt}= & {} -\lambda (x) p_x(t)+ \lambda (x-1)p_{x-1}(t). \end{aligned}$$
(5.70)

6.2 How a Yule Process Occurs

In order to continue analysis of (5.70), we have to determine \(\lambda (x)\). We shall use the linear hypothesis for the parameter \(\lambda (x)\):

The probability of a transition increases proportionally to the number of publications:

$$\begin{aligned} \lambda (x)=\lambda x, \end{aligned}$$
(5.71)

where \(\lambda \) is a constant.

In other words, there is a linear hypothesis of the following kind: If an author has many publications, he/she doesn’t need much time to produce another one. In this way, our stochastic process becomes a linear pure multiplicative process (Yule process) [105109].

Using (5.71), one obtains the following solution of the system of equations (5.70): \(p_x(t)=0\) when \(x=0\) and

$$\begin{aligned} p_x(t)=[1-\exp (-\lambda t)]^{x-1}\exp (-\lambda t). \end{aligned}$$
(5.72)

Let us recall that in the case under discussion, the distribution (5.72) gives the probability that a researcher will have x publications at time t if at time \(t=0\), he had one publication.

6.3 Properties of Research Production According to the Model

  1. 1.

    Expected value.

    The expected value is the mean number of publications that are expected to be written for time t. Then

    $$\begin{aligned} E[x(t)] = \exp (\lambda t), \end{aligned}$$
    (5.73)

    which is often observed in practice and is called the law of exponential growth of science.

  2. 2.

    \(\lambda \): a measure of the publication activity of the researchers.

    After a “differentiation” of (5.73), one obtains

    $$\begin{aligned} \lambda = \frac{d x_t/dt}{x_t}, \end{aligned}$$
    (5.74)

    which means that \(\lambda \) is the rate of growth of the number of publications, i.e., a measure of the intensity of publication (and partially of the scientific) activity of a researcher.

  3. 3.

    Research work in a research area for some finite time.

    Usually, a researcher works for some (finite) time on problems from some research area and then changes the research area of work (or retires). This time depends on the potential of the research area, on the talent of the researcher, on the age of the researcher, on the work conditions, etc. The finite time of work is different for different researchers and is a random variable whose distribution can be obtained from queuing theory. The distribution is

    $$\begin{aligned} p(t) = \nu \exp (-\nu t), \end{aligned}$$
    (5.75)

    where \(\nu =1/t^*\) and \(t^*\) is the average value of t. This random distribution of the time of activity in a research area can be incorporated in the Yule distribution as \(p_x(t)=p(x/t)\). Then in order to obtain the probability distribution of the publications that are observed in a database, we have to calculate the following integral:

    $$\begin{aligned} p(x)=\int \limits _{0}^{\infty } dt \ p(x/t) p(t) = \nonumber \\ \int \limits _{0}^{\infty } dt \ [1-\exp (-\lambda t)]^{x-1} \exp (-\lambda t) \nu \exp (-\nu t). \end{aligned}$$
    (5.76)

    The integration of (5.76) leads to the Yule distribution

    $$\begin{aligned} p(x)=\alpha B(x,\alpha +1), \end{aligned}$$
    (5.77)

    where:

    • \(B(x,\alpha +1)= \frac{\varGamma (x) \varGamma (\alpha +1)}{\varGamma (x+ \alpha +1)}\) is the beta function;

    • \(\varGamma (x)=(x-1)!\) is the gamma function;

    • \(\alpha = \nu /\lambda \).

The Yule distribution obtained above leads to several interesting conclusions about research production.

  1. 1.

    Asymptotic behavior: For large x, one obtains \(\frac{\varGamma (x)}{\varGamma (x + \alpha +1)} \approx \frac{1}{x^{\alpha +1}}\) (the Stirling approximation was used). Let us in addition assume that \(\alpha \) has small values. Then \(\varGamma (\alpha +1) \approx 1\), and the Yule distribution is reduced to

    $$\begin{aligned} p(x) \approx \alpha \varGamma (\alpha +1) \frac{1}{x^{\alpha +1}} \approx \frac{\alpha }{x^{\alpha +1}}, \end{aligned}$$
    (5.78)

    which is the law of Pareto for \(x_0=1\) and small values of \(\alpha \) . Thus on the basis of the hypothesis that the scientific activity is a random branching multiplicative process with linear increase of effectiveness of the researchers (Yule process), we have obtained one of the basic laws of research production.

  2. 2.

    Evaluation of the parameter \(\alpha \): This can be done on the basis of the Yule distribution for researchers who have just one publication. For these researchers,

    $$\begin{aligned} p(1)= \frac{\alpha \varGamma (1) \varGamma (\alpha +1)}{\varGamma (\alpha +2)}= \frac{\alpha }{\alpha +1} \end{aligned}$$
    (5.79)

    (we have used \(\varGamma (1)=1\) and \(\varGamma (\alpha +1) = \alpha \varGamma (\alpha )\)). Then taking into account that \(p(1) = N_1/N\) is the proportion of the number \(N_1\) of researchers with one publication in a group of N researchers, we obtain

    $$\begin{aligned} \alpha = \frac{p_1}{1-p_1} = \frac{N_1}{N-N_1}. \end{aligned}$$
    (5.80)

    Thus we can evaluate \(\alpha \) by taking N and \(N_1\) from a large enough database.

7 Probability Models Connected to Dynamics of Citations

7.1 Poisson Model of Citations Dynamics of a Set of Articles Published at the Same Time

Citation analysis is one of the frequently used methods of assessment of research impact [110114]. An important topic in the research on citations is the investigation of citation distributions. This research may follow two paths [115]:

  1. 1.

    Path 1: Take a particular source—book, article, journal issue, journal volume, etc.—and study the age distribution of the cited articles in the studied source [116].

  2. 2.

    Path 2: Take a collection of sources (articles published in a journal, or articles from some scientific field) at a given time and then follow up and note the times at which each source from the collection is cited [117, 118].

Below, we present a probability model obtained by following Path 2 and assuming continuous time as well as the presence of aging of published material (in the course of time, the material becomes obsolescent (and less frequently cited)) and the existence of publications that are never cited. The model is as follows [115]. Let us consider a population of sources that produces items over time. The population (for the case of citation analysis) consists of a collection of articles published at the same time \(t=0\). The items produced by the papers are their citations. The assumption is that citations are received randomly over time. Since different articles are in different scientific areas (with different popularity) and have different relevance, etc., their citation rates are also different. We assume that these rates of a randomly chosen source are characterized by a random variable \(\varLambda \) that has probability distribution \(F_\varLambda \) over the population of sources. Let \(X_t\) be the number of citations to a randomly chosen source (article) in the interval [0, t]. The probability that this number of citations will be equal to r is

$$\begin{aligned} p(X_t=r) = \int \limits _0^\infty d F_\varLambda (\lambda ^*) \ P(X_t=r \mid \varLambda = \lambda ^*). \end{aligned}$$
(5.81)

We can recognize the process \(\{X_t, t \ge 0\}\) as a counting process, and the model (5.81) is a mixture of counting processes with mixing distribution \(F_\varLambda \) and mixing parameter \(\lambda \). Next, one has to assume the nature of the process connected to the conditional term \(P(X_t=r \mid \varLambda = \lambda ^*)\). The initial assumption can be that this process is a Poisson process [119122 ] with stationary and independent increments. This will lead us to the distribution

$$\begin{aligned} P(X_t = r \mid \varLambda = \lambda ^*) = \exp (-\lambda ^* t) \frac{(\lambda ^* t)^r}{r!}; \ \ \ r=0,1,2,\dots . \end{aligned}$$
(5.82)

In (5.82), \(\lambda ^* = \text {const}\), and the mean of the Poisson distribution is \(\lambda ^* t\). We note here that numerous models of citation distribution have been proposed based on different probability distribution functions \(f(\lambda ^*)\), \((d F_\varLambda (\lambda ^*) = f(\lambda ^*) d \lambda ^*)\) [123].

Let us now consider the case in which \(\lambda ^*\) depends on time. Since \(\lambda ^*\) can be associated with the citation rate of a given paper, it can vary with the time t. If \(\lambda ^* = \lambda ^*(t)\), then (5.82) has to be substituted by the more complicated equation [124]

$$\begin{aligned} P(X_t = r \mid \varLambda = \lambda ^*) = \exp [-M(\lambda ^*,t)] \frac{M(\lambda ^*, t)^r}{r!}; \ \ \ r=0,1,2,\dots , \end{aligned}$$
(5.83)

where

$$ M(\lambda ^*,t) = \int \limits _0^t ds \lambda ^*(s). $$

In the case of citations of articles, an almost universal citation pattern in time c(t) can be observed. Then we can assume that the citation rate \(\lambda ^*(t)\) of a paper has the particular form

$$\begin{aligned} \lambda ^*(t) = \lambda c(t), \end{aligned}$$
(5.84)

where \(\lambda = \text {const}\). Then

$$\begin{aligned} M(\lambda ^*,t) = \int \limits _0^t ds \ \lambda c(s) = \lambda C(t); \ \ C(t) = \int \limits _0^t ds \ c(s) \end{aligned}$$
(5.85)

and

$$\begin{aligned} P(X_t = r \mid \varLambda = \lambda ^*) = \exp [-\lambda C(t)] \frac{[(\lambda C(t)]^r}{r!}; \ \ \ r=0,1,2,\dots . \end{aligned}$$
(5.86)

The mean of the Poisson process is \(\lambda C(t)\); c(t) is called the obsolescence density function; and C(t) is called the obsolescence distribution function (\(t>0\)). We assume that \(\lim \limits _{t \rightarrow \infty } C(t) < \infty \).

The substitution of (5.86) in (5.81) leads to the final relationship for the citation production distribution:

$$\begin{aligned} p(X_t=r) = \int \limits _0^\infty d F_\varLambda (\lambda ) [\lambda C(t)]^r \left[ \frac{\exp [-\lambda C(t)]}{r!} \right] , \ r=0,1,2,\dots . \end{aligned}$$
(5.87)

This can also be written as the expected value

$$\begin{aligned} p(X_t = r) = E_\varLambda [P(X_t=r \mid \varLambda )]. \end{aligned}$$
(5.88)

From (5.87), one can obtain the first citation distribution . Let T be the time after publication of the first citation of a randomly chosen source (article). We can consider T a random variable. For times \(t<T\), the number of citations of a paper is 0. Then let \(F_T(t)\) be the cumulative distribution function of the first citation time: \(F_T(t) =p(T \le t)\). Since \(p(T \le t) = 1 -p(T>t)\) and \(p(T>t)\) is the same as the probability \(p(X_t=0)\), we have

$$\begin{aligned} F_T(t) = 1-p(X_t=0) = 1 - \int \limits _0^\infty dF_\varLambda \exp [- \lambda C(t)]. \end{aligned}$$
(5.89)

An interesting consequence obtained on the basis of the first citation distribution (5.89) is as follows.

There will be publications that will be never cited.

This feature follows from the relationship \(\lim \limits _{t \rightarrow \infty } F_T(t) <1\). Indeed, we can see that

$$ \int \limits _0^\infty dF_\varLambda \exp [- \lambda C(t)] = L_\varLambda [C(t)] $$

is the Laplace transformation of \(\varLambda \), which has the property \(L_\varLambda (1) >0\). Then

$$ \lim \limits _{t \rightarrow \infty } F_T(t) = 1 - \lim \limits _{t \rightarrow \infty } p(X_t=0) = 1- \lim \limits _{t \rightarrow \infty } L_\varLambda (C(t)) = 1 - L_\varLambda (1) <1. $$

The model developed above can be used for obtaining the nth citation distribution [125]. The result for the nth citation distribution is

$$\begin{aligned} F_n(t) = p(T_n<t) = \int \limits _0^{C(t)} ds \ \frac{s^{n-1}}{(n-1)!} E_\varLambda [\varLambda ^n \exp (-\varLambda s)]; t < \infty , \end{aligned}$$
(5.90)
$$\begin{aligned} p(T_n = \infty ) = \int \limits _1^{\infty } ds \ \frac{s^{n-1}}{(n-1)!} E_\varLambda [\varLambda ^n \exp (-\varLambda s)]. \end{aligned}$$
(5.91)

7.2 Mixed Poisson Model of Papers Published in a Journal Volume

The accumulation of citations has varying dynamic behavior over the lifetime of a paper, and among other things, this behavior is also influenced by the reputation of the journal in which the paper was published. In most cases, immediately after publication, the number of citations grows slowly, usually because it may take some time for citing papers to appear in print and to be entered in the citations databases. After this initial period, citations increase faster as citations lead to new readers who may also cite the publication. Finally, the material of the paper becomes outdated and/or obsolete. Then the number of citations per year decreases. This is the typical behavior, but there exist other patterns of behavior such as “sleeping beauties,” “shooting stars,” etc. [126, 127 ] .

The investigation of citation behavior in journal volumes can be based on the mixed Poisson distribution [128131 ] model of Burrell [115, 125]. A journal volume can be treated as a collections of paper, usually from the same years and with common characteristics. The main assumption is that each paper generates citations at a constant (latent) rate (\(\lambda \)) following the Poisson distribution but that these rates vary across the collection as a random variable \(\varLambda \). Then the probability that a paper will generate r citations at time t is

$$\begin{aligned} p(Z_t=r \mid \varLambda = \lambda ) = \exp (-\lambda t) \frac{(\lambda t)^r}{r!}. \end{aligned}$$
(5.92)

The population distribution of randomly chosen papers of unknown \(\lambda \) will be a mixture of the Poisson distributions of the kind (5.92),

$$\begin{aligned} p(X_t =r \mid \varLambda ) = \int \limits _0^\infty dF(\lambda ) \exp (-\lambda t) \frac{(\lambda t)^r}{r!}, \end{aligned}$$
(5.93)

where \(F_\varLambda (\lambda )\) is the cumulative distribution of \(\lambda \) (of the latent rate), also called the mixing distribution.

There are different possibilities for the form of mixing distribution [132134], but the most widely used distribution is the gamma distribution of shape parameter \(\nu \) and size \(\alpha \):

$$\begin{aligned} \frac{d}{d \lambda } F_\varLambda (\lambda ) = \exp (-\alpha \lambda ). \frac{\alpha ^\nu \lambda ^{\nu -1}}{\varGamma (\nu )} \end{aligned}$$
(5.94)

The appearance of the gamma distribution above is not a coincidence. The gamma mixture of Poisson distributions follows a negative binomial distribution [135137] (a fact proved by Greenwood and Yule [138]). Yule is the same scientist who first described the preferential attachment process (Yule process) . This negative binomial distribution is

$$\begin{aligned} P(X_t = r) = \left( {\begin{array}{c}r+\nu - 1\\ \nu - 1\end{array}}\right) \left( \frac{\alpha }{\alpha +t} \right) ^\nu \left( 1- \frac{\alpha }{\alpha +t} \right) ^r, \ \ r= 0, 1,2, \dots . \end{aligned}$$
(5.95)

In most cases, citations of a paper do not occur at constant intervals (evenly) in time. Thus in most cases, \(\lambda \) is not a constant. The rate \(\lambda (t)\) will be different for different papers. It can be assumed [115] that \(\lambda (t)\) may be written in the form

$$\begin{aligned} \lambda (t) = \lambda c(t), \end{aligned}$$
(5.96)

where c(t) describes some pattern of citation behavior that is the same for all articles from the discussed collection of articles (i.e., c(t) describes a sort of obsolescence). The function c(t) is the probability density function of obsolescence, and C(t) is the cumulative distribution function of obsolescence .

With the obsolescence distribution, the model discussed above leads to the following negative binomial distribution for the probability that a paper in a collection of papers will have r citations [139]:

$$\begin{aligned} p(X_r = r) = \left( {\begin{array}{c}r+\nu - 1\\ \nu -1\end{array}}\right) \left( \frac{\alpha }{\alpha +C(t)} \right) ^\nu \left( 1 - \frac{\alpha }{\alpha +C(t)} \right) ^r, \ \ \ r=0,1,2,\dots . \end{aligned}$$
(5.97)

Many assumptions can be made about the form of C(t). Two possibilities are as follows:

  • Logistic function: \(C(t) = 1/(1+a\exp (-bt))\);

  • Weibull distribution: \(C(t) = 1 - \exp [-(t/b)^2].\)

The values of C(t) can be determined by fitting citation data. Additional information about the investigation of citations in several research disciplines can be found in [140], where a Poisson distribution and an exponential distribution are used for describing such data.

8 Aging of Scientific Information

As a consequence of the continuous research efforts of scientists, a continuous flow of new scientific information exists, and existing scientific information ages. As a consequence of these two processes, there is a continuous reorganization of the structure of scientific information. For example, suppose a scientist publishes an article. At first, interest in the article may be significant (a large number of citations, for example). Then interest decreases as the information in the article ages and the scientific potential of the obtained results decreases. If one studies closely the number of citations of a publication, three periods can usually be distinguished:

  1. 1.

    First two years after publication: with rare exceptions, articles are not cited much in this period (they are not very well known to the corresponding scientific community). The exceptions are extremely important, however: if an article is very much cited within this initial period, it is highly probable that it will become a very influential publication that may contribute much to the development of the corresponding scientific field.

  2. 2.

    Next five years: here the publication achieves most of its citations as it becomes well known. If there are no citations, the publication has been judged by the corresponding scientific community to be of little use. This judgment is valid in the general case, but there can be rare exceptions: “sleeping beauties” that suddenly become current many years after publication [141].

  3. 3.

    More than seven years after publication: the number of citations usually begins to decrease, and the publication slowly moves toward the scientific archives.

The above considerations show that by their continuous work in obtaining new knowledge, researchers continuously renew the structure of scientific information by opening a place for the new information and compressing the aged information (this information, compressed to citations, arises in some of the new publications). In this process, researchers mainly use the achievements of the previous generation of researchers.

8.1 Death Stochastic Process Model of Aging of Scientific Information

The main assumptions of the model are as follows [142]:

  1. 1.

    At the initial moment of the study, there is some portion of the scientific publications that are cited. The number of citations of these publications x(t) decreases with advancing time. The number of citations at \(t=0\) is \(x_0\).

  2. 2.

    The probability that in the interval \((t,t+\varDelta t)\) there will be \(x-1\) citations if in the previous interval the number of citations was x is \(\mu _x \varDelta t\). Thus the probability that the number of citations will not decrease is \(1 - \mu _x \varDelta t\).

Then the probability that at the moment \(t+\varDelta t\) there will be x citations of the scientific publications is

$$\begin{aligned} p_x(t+ \varDelta t) = (1 - \mu _x \varDelta t) p_x(t) + \mu _{x+1}p_{x+1}(t)\varDelta t. \end{aligned}$$
(5.98)

On the basis of the assumption that the intensity with which citations are decreasing is proportional to the number of citations, \(\mu _x = \mu x\), and imposing the initial condition \(p_x(0) = 1\) for \(x=x_0 \ge 1\) and \(p_x(0)=0\) for \(x \ne x_0\), one obtains the following solution of (5.98):

$$\begin{aligned} p_x(t) = \frac{x_0!}{x!(x_0-x)!} \exp (-\mu x_0 t)[\exp (\mu t)-1]^{x-x_0} \end{aligned}$$
(5.99)

for \(0 \le x \le x_0\). From (5.99), the average number of citations with advancing time is

$$\begin{aligned} x_t = x_0 \exp (- \mu t), \end{aligned}$$
(5.100)

which means that

there occurs an aging of scientific information according to an exponential law. This is a rapid pace of aging, and significant scientific efforts are needed in order to compensate it by production of new scientific information.

8.2 Inhomogeneous Birth Process Model of Aging of Scientific Information. Waring Distribution

Another approach to the aging of scientific information was proposed by Schubert and Glänzel [143] and discussed by Schubert, Glänzel, and Schoepflin [4, 144]. As we shall see below, the model of Schubert and Glänzel is quite interesting, because it is a deterministic one, yet it is connected to the (not much known but very interesting) Waring distribution. We shall see in addition that this model (that can be connected to an inhomogeneous birth process) leads to the same results as the model discussed above that is based on a death process. And the Waring distribution will be of great interest to us, since it is a generalization of several important statistical distributions appearing in the area of research on science dynamics and research production. Below, we describe a simple model that leads to the Waring distribution . Then we consider a particular case of a stochastic process connected to repetitive events, and finally, we shall consider a particular class of the process with repetitive events (such as publishing papers and obtaining citations), and we shall consider the aging of scientific information (scientific articles) from the point of view of obtained citations.

8.2.1 Waring Distribution

The Waring distribution is a distribution with a very long tail. Because of this property, the Waring distribution is quite suitable for describing characteristics of many systems from the areas connected to research on biology and society. We shall see below that the Waring distribution is connected to other interesting distributions that are presented in this book: the Yule and Zipf distributions.

The Waring distribution may be connected to publication activity, and publication activity may be considered a measure of research productivity. Within the context of the epidemic model of Goffman and Newill (discussed above), the susceptible and infected persons have to be continuously replaced by persons entering the system, i.e., the population of researchers should be considered an open population. As we shall see below, the model by Schubert and Glänzel [143] describes similar processes connected to publication activity. The model assumes three groups in the population: a group that is entering the system, a group that is in the system, and a group that is leaving the system. In more detail, we consider an infinite array of cells (boxes) indexed in succession by nonnegative integers. The amount x of some substance can move between the cells. Let \(x_i\) be the amount of the substance in the ith cell. Then

$$\begin{aligned} x = \sum \limits _{i=0}^\infty x_ i. \end{aligned}$$
(5.101)

The fractions \(y_i = x_i/x\) can be considered probability values of a distribution of a discrete random variable \(\zeta \):

$$\begin{aligned} y_i = p(\zeta = i), \ i=0,1, \dots . \end{aligned}$$
(5.102)

We assume that the expected value of the random variable \(\zeta \) is finite and that the content \(x_i\) of any cell can change under any of the following three processes:

  1. 1.

    Some amount s of the substance x may enter the system of cells from the external environment through the 0th cell.

  2. 2.

    The rate \(f_i\) of the substance x can be transferred from the ith cell into the \((i+1)\)th cell;

  3. 3.

    The rate \(g_i\) from the substance x may leak out of the ith cell into the external environment.

The stochastic process connected to the movement of the substance between the cells is formed by a change in the content of the cells, e.g., by a change of papers published by authors who have entered the system. In this case, x(t) is the (random) number of published papers, and \(p(x(t) = i) = y_i\) the probability that an author in the system has published i papers in the period t. The stochastic model is obtained if x(t) is considered the publication activity process of an arbitrary author, and \(p(x(t) = i) = y_i\) is the probability that this author has published i papers in the time interval between 0 and t.

The three processes mentioned above can be modeled mathematically by a system of ordinary differential equations:

$$\begin{aligned} \frac{dx_0}{dt}= & {} s-f_0 - g_0; \nonumber \\ \frac{dx_i}{dt}= & {} f_{i-1} -f_i - g_i. \end{aligned}$$
(5.103)

The following forms of the relationships for the amount of the moving substances are assumed in [143] (\(\alpha , \beta , \gamma , \sigma \) are constants):

$$\begin{aligned} s= & {} \sigma x; \ \ \sigma> 0 \rightarrow \text {self-reproducing property} ,\nonumber \\ f_i= & {} (\alpha + \beta i) x_i; \ \ \ \alpha >0, \ \beta \ge 0 \rightarrow \text {cumulative advantage of higher cells}, \nonumber \\ g_i= & {} \gamma x_i; \ \ \ \gamma \ge 0 \rightarrow \text {uniform leakage over the cells}. \end{aligned}$$
(5.104)

Substitution of (5.104) in (5.103) leads to the relationships

$$\begin{aligned} \frac{dx_0}{dt}= & {} \sigma x - \alpha x_0 - \gamma x_0; \nonumber \\ \frac{dx_i}{dt}= & {} [\alpha + \beta (i-1)]x_{i-1} - (\alpha + \beta i +\gamma )x_i. \end{aligned}$$
(5.105)

Let us sum the equations from (5.105). The result of the summation is

$$\begin{aligned} \frac{dx}{dt} = (\sigma - \gamma ) x, \end{aligned}$$
(5.106)

and the solution for x is

$$\begin{aligned} x = x(0) \exp [(\sigma - \gamma )t], \end{aligned}$$
(5.107)

where x(0) is the amount of x at \(t=0\). Three regimes of change of x(t) follow from (5.107):

  1. 1.

    Regime of exponential growth (\(\sigma > \gamma \)).

  2. 2.

    Stationary regime (\(\sigma = \gamma \)).

  3. 3.

    Regime of exponential decay (\(\sigma < \gamma \)).

The distribution of \(y_i\) will lead us to the Waring distribution. From (5.105) and with the help of (5.107) and the relationship \(\frac{dy_i}{dt} = \frac{1}{x^2}\left[ x \frac{dx_i}{dt} - x_i \frac{dx}{dt}\right] \), one obtains

$$\begin{aligned} \frac{dy_0}{dt}= & {} \sigma -(\alpha + \sigma ) y_0; \nonumber \\ \frac{dy_i}{dt}= & {} [\alpha +\beta (i-1)]y_{i-1} - (\alpha + \beta i + \sigma ) y_i. \end{aligned}$$
(5.108)

The solution of (5.108) is

$$\begin{aligned} y_i = y_i^* + \sum \limits _{j=0}^i b_{ij} \exp [-(\alpha + \beta j + \sigma )t], \end{aligned}$$
(5.109)

where \(y_i^*\) is the stationary solution of (5.109) given by the relationships

$$\begin{aligned} y_0^*= & {} \frac{\sigma }{\sigma + a}, \nonumber \\ y_i^*= & {} \frac{\alpha + \beta (i-1)}{\alpha + \beta i + \sigma } y_{i-1}^*, \ i=1,2,\dots . \end{aligned}$$
(5.110)

The coefficients \(b_{ij}\) are determined by the initial conditions. In the exponential function there are no negative coefficients, and because of this, when \(t \rightarrow \infty \), the sum in (5.109) vanishes and the system comes to the stationary distribution from (5.110). Thus the distribution of \(y_i\) tends to be stationary despite the fact that the system is in a stationary state only when \(\sigma = \gamma \).

Thus starting from any initial distribution, after some time, the system reaches the steady state, where the content of each cell decays exponentially with (the same) characteristic time \(\frac{1}{\sigma - \gamma }\) and the distribution of the substance among the cells is given by (5.110).

This distribution is called the Waring distribution.

The form of the Waring distribution is

$$\begin{aligned} P(\zeta = i) = \frac{ak^{[i]}}{(a+k)^{[i+1]}}; \ \ k^{[i]} = \frac{(k+i)!}{k!}, \end{aligned}$$
(5.111)

with parameters \(k = \alpha /\beta \) and \(a=\sigma /\beta \).

We note that the words “after some time” above mean that the Waring distribution can be considered a good approximation of the considered process for large enough finite times when the stationary state of distribution of substance among the cells has almost been reached.

8.2.2 Parameters and Particular Cases of the Waring Distribution

The Waring distribution is quite interesting, since it contains as particular cases the distributions of Yule and Zipf.

Let \(a>2\). The expected value of the Waring distribution is

$$\begin{aligned} E[\zeta ] = \frac{k}{a-1}; \ \ a >1. \end{aligned}$$
(5.112)

We note that \(a>1\) is a condition for a finite expected value (such a finite value was assumed above). Then from the definition of a, it follows that \(\sigma > \beta \).

The variance of the Waring distribution is

$$\begin{aligned} D^2[\zeta ] = \frac{ka(k+a-1)}{(a-1)^2(a-2)}; \ \ a>2. \end{aligned}$$
(5.113)

Several special cases of the Waring distribution are

  1. 1.

    \(\beta =0\) (geometric distribution ) .

    In this case (called also the model of Frank and Coleman [145, 146] or case with absence of cumulative advantage because of \(f_i = \alpha x_i\)),

    $$\begin{aligned} P(\zeta =i) = q(1-q)^i; \ \ q = \frac{\sigma }{\sigma + a}. \end{aligned}$$
    (5.114)
  2. 2.

    \(k=0, \alpha =0, \beta \ne 0\) (Yule distribution).

    Let then \(k \rightarrow 0\). The Waring distribution reduces to the Yule distribution [147],

    $$\begin{aligned} P(\zeta = i \mid \zeta > 0) = a B(a+1,i), \end{aligned}$$
    (5.115)

    where B is the beta function. Let us note that in this case, \(f_i = \beta i x_i\), which is known also as Gibrath law, much used in economics for describing size distributions of business systems [148] or size distributions of cities [149].

  3. 3.

    \(i \rightarrow \infty \) (Zipf distribution ) .

    As \(i \rightarrow \infty \), the Waring distribution becomes

    $$\begin{aligned} P(\zeta = i) \rightarrow \frac{c}{i^{(1+a)}}, \end{aligned}$$
    (5.116)

    which is the frequency form of the Zipf distribution (c is an appropriate constant depending on the parameters of the distribution).

8.2.3 Truncated Waring Distribution

For some applications, one may need a model with a finite number of cells. In this case, we consider an array of \(N+1\) cells (boxes) indexed in succession by nonnegative integers, i.e., the first cell has index 0, and the last cell has index N. We assume that there exists an amount x of some substance that is distributed among the cells. Let \(x_i\) be the amount of the substance in the ith cell. Then

$$\begin{aligned} x = \sum \limits _{i=0}^N x_ i. \end{aligned}$$
(5.117)

The fractions \(y_i = x_i/x\) can be considered probability values of the distribution of a discrete random variable \(\zeta \),

$$\begin{aligned} y_i = p(\zeta = i), \ i=0,1, \dots , N. \end{aligned}$$
(5.118)

The process of transfer of substance between the cells can be modeled mathematically by a system of ordinary differential equations:

$$\begin{aligned} \frac{dx_0}{dt}= & {} s-f_0-g_0; \nonumber \\ \frac{dx_i}{dt}= & {} f_{i-1} -f_i - g_i, \ i=1,2,\dots , N-1; \nonumber \\ \frac{dx_N}{dt}= & {} f_{N-1} - g_N . \end{aligned}$$
(5.119)

The forms of the amounts of the moving substances are the same as in (5.104). The substitution of (5.104) in (5.119) leads to the relationships

$$\begin{aligned} \frac{dx_0}{dt}= & {} \sigma x - \alpha x_0 - \gamma x_0; \nonumber \\ \frac{dx_i}{dt}= & {} [\alpha + \beta (i-1)]x_{i-1} - (\alpha + \beta i +\gamma )x_i, \ i=1,2,\dots ,N-1, \nonumber \\ \frac{dx_N}{dt}= & {} [\alpha + \beta (N-1)]x_{N-1} - \gamma x_N. \end{aligned}$$
(5.120)

Let us now derive the distribution of \(y_i\). From (5.120), we obtain

$$\begin{aligned} \frac{dy_0}{dt}= & {} \sigma -(\alpha + \sigma ) y_0; \nonumber \\ \frac{dy_i}{dt}= & {} [\alpha +\beta (i-1)]y_{i-1} - (\alpha + \beta i + \sigma ) y_i, \ i=1,2,\dots ,N-1; \nonumber \\ \frac{dy_N}{dt}= & {} [\alpha +\beta (N-1)]y_{N-1} - \sigma y_N. \end{aligned}$$
(5.121)

We search for a solution of (5.121) in the form

$$\begin{aligned} y_i = y_i^* + F_i(t), \end{aligned}$$
(5.122)

where \(y_i^*\) is the stationary solution of (5.122) given by the relationships

$$\begin{aligned} y_0^*= & {} \frac{\sigma }{\sigma + \alpha }; \nonumber \\ y_i^*= & {} \frac{\alpha + \beta (i-1)}{\alpha + \beta i + \sigma } y_{i-1}^*, \ i=1,2,\dots , N-1; \nonumber \\ y_N^*= & {} \frac{\alpha +\beta (N-1)}{\sigma } y_{N-1}^*. \end{aligned}$$
(5.123)

For the functions \(F_i\), we obtain the system of equations

$$\begin{aligned} \frac{dF_0}{dt}= & {} -(\alpha + \sigma ) F_0; \nonumber \\ \frac{dF_i}{dt}= & {} [\alpha +\beta (i-1)]F_{i-1} - (\alpha + \beta i + \sigma ) F_i, \ i=1,2,\dots ,N-1, \nonumber \\ \frac{dF_N}{dt}= & {} [\alpha +\beta (N-1)]F_{N-1} - \sigma F_N. \end{aligned}$$
(5.124)

The solutions of these equations are

$$\begin{aligned} F_0(t) = b_{00} \exp [-(\alpha + \sigma )t], \end{aligned}$$
(5.125)
$$\begin{aligned} F_1(t) = b_{10} \exp [-(\alpha + \sigma )t] + b_{11} \exp [-(\alpha + \beta + \sigma )t], \end{aligned}$$
(5.126)
$$ \dots $$
$$\begin{aligned} F_i(t) = \sum \limits _{j=0}^i b_{ij} \exp [-(\alpha + \beta j + \sigma )t]; \ i=1,2,\dots ,N-1, \end{aligned}$$
(5.127)
$$\begin{aligned} F_N(t) = \sum \limits _{j=0}^N b_{Nj} \exp [-(\alpha + \beta j + \sigma )t], \end{aligned}$$
(5.128)

where

$$\begin{aligned} b_{ij}= & {} \frac{\alpha + \beta (i-1)}{\beta (i-j)} b_{i-1,j}; \ i=1,\dots ,N-1; \ j=0,\dots , i-1; \nonumber \\ b_{Nj}= & {} - \frac{\alpha + \beta (N-1)}{\alpha + j \beta } b_{N-1,j}, \ j=0, \dots , N-1; \nonumber \\ b_{NN}= & {} 0. \end{aligned}$$
(5.129)

The \(b_{ij}\) that are not determined by (5.129) may be determined by the initial conditions. In the exponential function in \(F_i(t)\) there are no negative coefficients, and because of this, as \(t \rightarrow \infty \), we have \(F_i(t) \rightarrow 0\), and the system comes to the stationary distribution from (5.123). The form of this stationary distribution is

$$\begin{aligned} P(\zeta = i)= & {} \frac{a}{a+k} \frac{(k-1)^{[i]}}{(a+k)^{[i]}}; \ \ k^{[i]} = \frac{(k+i)!}{k!}; \ i=0,\dots ,N-1, \nonumber \\ P(\zeta = N)= & {} \frac{1}{a+k} \frac{(k-1)^{[N]}}{(a+k)^{[N-1]}}, \end{aligned}$$
(5.130)

with parameters \(k = \alpha /\beta \) and \(a=\sigma /\beta \).

The obtained distribution is called the truncated Waring distribution. The distribution (5.130) has a concentration of substance in the last cell (i.e., in the Nth cell). For the case of the nontruncated Waring distribution, the same substance is distributed in the cells N, \(N+1\), \(\dots \).

8.2.4 A Nonstationary Birth Process. Negative Binomial Distribution, Papers, and Citations

Let us consider the nontruncated version of the Waring distribution. In addition, let us assume that the system is completely isolated from external influences. This means that no substance enters or leaves the system. Thus the amounts of the moving substances are

$$\begin{aligned} \sigma = 0; \ g_i = 0; \ f_i = (\alpha +\beta i)x_i; \ \frac{\alpha (t)}{\beta (t)} =N >0. \end{aligned}$$
(5.131)

The last of the above relationships shows that the process is nonstationary (since the substance flow can depend on time). The governing equations become

$$\begin{aligned} \frac{dy_0}{dt}= & {} -\beta (t) N y_0; \nonumber \\ \frac{dy_i}{dt}= & {} \beta (t) [(N+i-1) y_{i-1}- (N+1) y_i]; \end{aligned}$$
(5.132)

with initial conditions \(y_i(0)=1\) if \(i=0\) and \(y_i(0)=0\) otherwise. What one needs is to obtain the distribution \(y_i=p(x(t)=i)\) connected to the process. We recall that \(p(x(t)=i)\) is the probability that an author in a system has published i papers in the period t. This distribution can be obtained from (5.132), and its form is very similar to the form of the distribution obtained on the basis of the model of death process above [4]:

$$\begin{aligned} p(x(t) = k) = \left( {\begin{array}{c}N+k-1\\ k\end{array}}\right) \exp [-N \rho (t)] \{ 1- \exp [-\rho (t)] \}^k, \end{aligned}$$
(5.133)

where \(\rho (t) = \int \limits _0^t d \tau \ \beta (\tau )\). Equation (5.133) is the relationship for the negative binomial distribution . In addition to the probability p(x(t)), one can define also transition probabilities \(p_{i,k}(s,t)\) for the probability that at time t, the substance is in the kth unit if at time \(s<t\) it was in the ith unit. From the point of view of the case with scientists and articles, \(p_{ik}(s,t)\) is the probability that an author will own k articles at time t if at time s he/she owns \(i \le k\) articles. In this case, the evolution of the transition probability [144] is given by

$$\begin{aligned} \frac{\partial p_{i,k}(s,t)}{\partial t} = \beta (t) [(N+k-1) p_{i,k-1}(s,t) - (N+k) p_{i,k}(s,t)], \end{aligned}$$
(5.134)

with initial conditions \(p_{i,k}(s,s) = 1\) if \(k=i\) and \(p_{i,k}(s,s) = 1\) otherwise.

Citations are repetitive events exactly like papers . Thus all discussions about the nonstationary birth process connected to papers are the same for the nonstationary birth process connected to citations. In the first case, we have a scientist who publishes papers. In the second case, we have a paper that receives citations. Then (5.133) gives the probability that a paper will have received k citations at time t, and (5.134) gives the transitional probability that a paper will have received k citation at time t if it has i citations at the time s. The distribution connected to the transitional probability \(p_{i,k}\) is also a negative binomial distribution. In more detail, the number of received citations for the time \(t-s\) when the number of received citations at until time s was i, \(p_{i,j}(s,t) = p[x(t)-x(s)=j \mid x(s)=i]\), is

$$\begin{aligned} p_{i,j}(s,t) = \left( {\begin{array}{c}N+i+j-1\\ j\end{array}}\right) \exp \{-[\rho (t)-\rho (s)](N+i)\} (1 - \exp \{-[\rho (t)-\rho (s)]\})^j, \end{aligned}$$
(5.135)

i.e., the substance flow during the time period \(t-s\) has a negative binomial distribution with parameters \(\exp [-r(t)+r(s)]\) and \(N+j\), where j is the index of the unit that was reached by the substance at time s [143, 144, 150].

With respect to the aging of scientific information, it is important to study the mean value function \(M_i(s,t)\). It will show us that a paper that has received some number of citations during the time s after its publication is expected to receive (during an arbitrary time period \(t-s\) after the moment s) a linear expression in what it had received previously:

$$\begin{aligned} M_i(s,t) = E[x(t)-x(s) \mid x(s)=i] = (N+i)\{\exp [\rho (t) - \rho (s)] -1 \} = c_s(t)i+d_s(t). \end{aligned}$$
(5.136)

We note that \(\frac{d_s(t)}{c_s(t)}=N = \text {const}\) is independent of time, and \(c_s(t)\) is a characteristic of the aging process. Large \(c_s(t)\) characterizes slowly aging literature.

Let us define

$$\begin{aligned} M(s,t) = E[x(t)-x(s)] = N \exp [\rho (t)-\rho (s)] \end{aligned}$$
(5.137)

and

$$\begin{aligned} q(s,t) = \frac{E[x(s)+N]}{E[x(t)+N]}. \end{aligned}$$
(5.138)

Then (5.135) can be written as

$$\begin{aligned} p_{i,j}(s,t) = \left( {\begin{array}{c}N+i+j-1\\ j\end{array}}\right) q(s,t)^{N+i}[1-q(s,t)]^j, \end{aligned}$$
(5.139)

and the expected citation rate during the time period \(t - s\) under the condition that the corresponding paper has received i citations during the time span s is

$$\begin{aligned} M_i(s,t)=(N+i) \frac{E[x(t)-x(s)]}{E[x(s)]+N}. \end{aligned}$$
(5.140)

Finally, from (5.139), one obtains that the probability that an article that has received \(i \ge 0\) citations will no longer be cited is

$$\begin{aligned} p_{i,0}(s,t) = p[x(t)-x(s)=0 \mid x(s) = i ] = q(s,t)^{N+i}. \end{aligned}$$
(5.141)

The lifetime distribution of a process \(\{X(t)\}\) is defined by

$$\begin{aligned} F(t) = \frac{M(0,t)}{M(0,\infty )}, \ \ t \ge 0. \end{aligned}$$
(5.142)

Let us choose the following particular form of \(f_i\) [151]:

$$\begin{aligned} f_i = (N+i) \alpha ^* \beta ^* \exp (-\alpha ^* t) x_i = \beta ^* N (1+i/N) \alpha ^* \exp (-\alpha ^*t), \ N>0, \ \alpha ^*>0, \beta ^* >0. \end{aligned}$$
(5.143)

The time-invariant part of \(f_i\) is proportional to \(1=i/N\), and because of this, increases by transfer from the ith cell to the \((i+1)\)th cell (which can be considered a local form reflection of the cumulative advantage principle). The time-dependent component of \(f_i\) reflects the local exponential aging of the process (aging of the content relative to an individual unit). Then

$$\begin{aligned} M(s,t) = N \{\exp [\beta ^*(1-\exp (-\alpha ^*t)) - \exp [\beta ^*(1-\exp (\alpha ^*s))]] \} \end{aligned}$$
(5.144)

and

$$\begin{aligned} F(t) = \frac{\exp [\beta ^*(1-\exp [-\alpha ^*t])] -1}{\exp (\beta ^*)-1}. \end{aligned}$$
(5.145)

Finally, let us discuss the particular cases in which the model describes articles that obtain citations. One can define the obsolescence function H(s): the probability that a paper will not be cited beyond a given time s. The definition is

$$\begin{aligned} H(s) = p(x(\infty ) - x(s) =0). \end{aligned}$$
(5.146)

The obsolescence function for our particular case is

$$\begin{aligned} H(s) = \{1+ \exp (\beta ^*) - \exp [\beta ^*(1-\exp (-\alpha ^*s))] \}^{-N}. \end{aligned}$$
(5.147)

We note that \(H(\infty ) = 1\), i.e., at infinity, every publication is obsolete. We have \(H(0) = \exp (-\beta ^*N)\), i.e., the probability that a paper is already obsolete at the moment it is published equals the probability that it will never be cited.

8.2.5 A Case of Brain Drain: Migration Channel for Research Personnel

Let us now discuss one application of the truncated Waring distribution. We consider a sequence of \(N+1\) countries that form a channel. As a result of a large migration movement, a flow of researchers moves through this channel from the country of entrance to the final destination country that is attractive to them in terms of good conditions for life and work. We may assume a situation of war in some region and motion of a large group of researchers from that region to another (more attractive region). The motion starts from an entry country, and the researchers have to move through a sequence of countries in order to reach a (very attractive from the point of view of the researchers) final destination country. We may think about the sequence of countries as a sequence of boxes (cells). The entry country will be the box with label 0, and the final destination country will be the box with label N. Let us consider a number x of researchers that have entered the channel and are distributed among the countries. Let \(x_i\) be the number of researchers in the ith country. This number can change on the basis of the following three processes: (a) A number s of researchers enter the channel from the external environment through the country of entrance (0th cell); (b) A number \(f_i\) of researchers move from the ith country to the \((i+1)\)th country; (c) A number \(g_i\) of the researchers change their status (e.g., they do not move farther in the direction of the final destination country and they are no longer active in the field of research). Fot the case of a large number of migrating researchers, the values of \(x_i\) can be determined by (5.103). The relationships (5.104) mean that (a) the number of researchers s that enter the channel is proportional to the number of researchers in all countries that form the channel; (b) there may be a preference for some countries, e.g., migrants may prefer the countries that are around the end of the migration channel (and the final destination country may be the most preferred one); (c) it is assumed that the conditions along the channel are the same with respect to “leakage” of researchers, e.g., the same proportion \(\gamma \) of researchers move out of the area of research work in every country of the channel.

As can be seen from (5.107), the change in the number of researchers depends on the values of \(\sigma \) and \(\gamma \). If \(\sigma > \gamma \), the number of researchers in the channel increases exponentially. If \(\sigma < \gamma \), the number of researchers in the channel decreases exponentially. The dynamics of the distribution of the researchers in the channel is modeled by (5.108). When the time since the beginning of the operation of the channel become large enough, the distribution of the researchers in the countries that form the migration channel becomes close to the stationary distribution described by (5.110). Let us stress that the stationary distribution described by (5.110) is very similar to the Waring distribution, but there is a significant difference between the two distributions due to the finite length of the migration channel: there may be a large concentration of researchers in the final destination country especially, if this country is very attractive for researchers.

The parameters that govern the distribution of researchers in the countries that form the channels are \(\sigma \), \(\alpha \), \(\beta \), and \(\gamma \). The parameter \(\sigma \) is the “gate” parameter, since it regulates the number of researchers that enter the channel. If \(\sigma \) is large, then the number of researchers in the channel may increase very rapidly, and this can lead to problems in the corresponding countries. We note that \(\sigma \) participates in each term of the truncated Waring distribution. This means that the situation at the entrance of the migration channel influences significantly the distribution of researchers in the countries of the channel.

The parameter \(\gamma \) regulates the “absorption” of the channel, since it regulates the change of the status of some researchers. They may settle in the corresponding country and may accept a job that is out of the area connected to their research. A large value of \(\gamma \) may compensate for the value of \(\sigma \) and may even lead to a decrease in the number of researchers in the channel. The parameter \(\alpha \) regulates the motion of the researchers from one country to the next country of the channel. A small value of \(\alpha \) means that the researchers tend to concentrate in the entry country (and eventually in the second country of the channel). An increase in \(\alpha \) leads to an increase in the proportion of researchers that reach the second half of the migration channel and especially the final destination country.

The parameter \(\beta \) regulates the attractiveness of the countries along the channel. Large values of \(\beta \) mean that the final destination country is very attractive to researchers (e.g., has excellent conditions for work and the salaries are large). This increases the attractiveness of the countries in the second half of the channel (researchers are more desirous of reaching these countries because the distance to the final destination country is thereby decreased). If for some reason \(\beta \) is kept at a high value, then almost all the researchers may settle in the final destination country.

8.2.6 Multivariate Waring Distribution

One can define the multivariate Waring distribution as follows [152]. Let a and b be positive real numbers. Let \(a^{(k)} = \frac{\varGamma (a+b)}{\varGamma (a)}\), where \(\varGamma (x) = \int \limits _0^\infty dt \ \exp (-t) t^{x-1}\) is the gamma function [153]. Let \(p(x_1=k_1,\dots ,x_n=k_n;a,b_1,\dots ,b_n)\) be the probability that \(x_1=k_1,\dots ,x_n=k_n\) with parameters \(a,b_1,\dots ,b_n\). The multivariate Waring distribution is given by the relationship

$$\begin{aligned} p(x_1=k_1,\dots ,x_n=k_n;a,b_1,\dots ,b_n) = \nonumber \\ a \frac{\varGamma \left( \sum \limits _{i=1}^n k_i -n+1 \right) \varGamma \left( \sum \limits _{i=1}^n b_i + a\right) }{\varGamma \left( \sum \limits _{i=1}^n k_i + \sum \limits _{i=1}^n b_i - n+a +1\right) } \prod \limits _{i=1}^n \frac{\varGamma (k_i + b_i -1)}{\varGamma (k_i) \varGamma (b_i)}, \end{aligned}$$
(5.148)

where \(k_i = 1,2,\dots \) and \(i=1,\dots ,n\), a and \(b_i\) are positive real numbers. For \(n=1\), the multivariate Waring distribution is reduced to the univariate Waring distribution

$$\begin{aligned} p(x=k;a,b) = a \frac{\varGamma (b+k+1) \varGamma (a+b)}{\varGamma (b) \varGamma (a+b+k)}. \end{aligned}$$
(5.149)

Let \(a^{(b)} = \frac{\varGamma (a+b)}{\varGamma (a)}\). Then the univariate form of the Waring distribution can be written as

$$\begin{aligned} p(x=k;a,b) = a \frac{b^{(k-1)}}{(a+b)^{(k)}}. \end{aligned}$$
(5.150)

Two interesting properties of the multivariate Waring distribution are as follows:

  1. 1.

    Let the multivariate random variable \((x_1,\dots ,x_n)\) follow the multivariate Waring distribution (5.148). Then the corresponding expected value is

    $$\begin{aligned} E(x_1,\dots ,x_n) = a \int \limits _0^1 dx \ (1-x)^{a-n-1} \prod \limits _{i=1}^n(1-x+b_i x). \end{aligned}$$
    (5.151)
  2. 2.

    Every marginal distribution of the multivariate Waring distribution is also a Waring distribution

    $$\begin{aligned} \sum \limits _{k_s=1}^\infty \dots \sum \limits _{k_n=1}^\infty p(x_1 = k_1, \dots ,x_s = k_s,x_{s+1}=k_{s+1}, \dots , x_n = k_n; a,b_1,\dots ,b_n) = \nonumber \\ p(x_1 = k_1, \dots ,x_s = k_s; a,b_1,\dots ,b_n). \nonumber \\ \end{aligned}$$
    (5.152)

The simplest case of the multivariate Waring distribution is the bivariate Waring distribution

$$\begin{aligned} p(x=k,y=j;a,b,c) = a \frac{(k+j-2)!b^{(k-1)}c^{(j-1)}}{(a+b+c)^{(k+j-1)(k-1)!(j-1)!}}, \end{aligned}$$
(5.153)

with expected value

$$\begin{aligned} E(x,y) = 1 + \frac{b+c}{a-1} + \frac{2bc}{(a-1)(a-2)} \end{aligned}$$
(5.154)

and covariance

$$\begin{aligned} \text {Cov}(x,y) = 1 + \frac{b+c}{a-1} + \frac{2bc}{(a-1)(a-2)}- \left( 1 + \frac{b}{a-1} \right) \left( 1+ \frac{c}{a-1} \right) . \end{aligned}$$
(5.155)

If (xy) follows the bivariate Waring distribution, then the conditional probability \(p(x=k \mid y=m)\) is

$$\begin{aligned} p(x=k\mid y=m) = \frac{1}{(k+1)!} \frac{(a+c)^{(b)}}{(a+c+m)^{(b)}} \frac{b^{(k-1)} m^{(k-1)}}{(a+b+c+m)^{(k-1)}}, \end{aligned}$$
(5.156)

and the conditional expectation \(E(x \mid y=m)\) is

$$\begin{aligned} E(x \mid y=m) = 1+ \frac{b}{a+c-1} m. \end{aligned}$$
(5.157)

The multivariate Waring distribution was applied to the study of scientific productivity among authors in six main Chinese journals of information science during the three-year periods 1987–1989 and 1990–1992 [152].

8.3 Quantities Connected to the Age of Citations

After publication of an article, some tame elapses before the article is cited. Let T be the time between publication of the article and the publication of the citing source. In general, T is a random variable, and one can study distributions of the time to the first citation [115], or to the nth citation [125]. Here we mention several quantities connected to the time of first citation (these quantities can be applied also to the time of second citation, etc.) [154]. Let us assume that T is a continuous quantity, and let f(t) be the probability density function of the distribution of T. Then one can define the age-specific citation rate

$$\begin{aligned} r(t) = - \frac{d}{dt} [\ln R(t)], \end{aligned}$$
(5.158)

where

$$ f(t) = \frac{dR}{dt}, $$

and \(R(t) = R_T(t) = p(T>t)\) is called the reliability function of T (here \(p(T>t)\) means the probability that \(T>t\)). From (5.158), it follows that

$$\begin{aligned} R(t) = \exp \left( - \int \limits _0^t ds r(s) \right) . \end{aligned}$$
(5.159)

Assuming different kinds of distributions for f(t), we can obtain the corresponding relationship for the age-specific citation rate. Since citations (in most cases) can be considered rare events, we can use distributions connected to the theory of extreme events, such as the following:

  • The exponential distribution \(f(t) = \lambda \exp (-\lambda t)\). In this case, \(R(t) = \exp (-\lambda t)\) and

    $$\begin{aligned} r(t) = \lambda . \end{aligned}$$
    (5.160)

    Thus a constant age-specific citation rate implies an exponential distribution of the citation age.

  • The Weibull distribution of citation age T with shape parameter \(\beta >0\) and scale parameter \(\alpha >0\). Here the reliability function is \(R(t)=\exp [-(t/\alpha )^\beta ]\), and the age-specific citation rate is

    $$\begin{aligned} r(t) = \frac{\beta t^{\beta -1}}{\alpha ^\beta }. \end{aligned}$$
    (5.161)

9 Probability Models Connected to Research Dynamics

9.1 Variation Approach to Scientific Production

The occurrence of laws in the form of hyperbolic relationships (such as the laws of Zipf and Pareto, for example) and the persistence of such laws may lead to the following assumption:

A research organization is in an equilibrium state with respect to scientific production if the statistical laws for the characteristic quantities of this productivity are given by hyperbolic relationships.

We can even extend the above assumption by the additional assumption that the parameters of the statistical laws have selected values (for example, \(\alpha =1\)) when the research organization is in an equilibrium state. And if the distributions of the quantities are not described by the appropriate hyperbolic relationships, then the research organization (and its structure and system of functioning) may not be in an equilibrium state.

Equilibrium states of various systems may be studied by variational methods [155]. A hint at the possible applicability of a variational approach in the social sciences is connected to George Zipf , who explained what is now known as Zipf’s law in the field of linguistics [156] by means of the principle of least effort:

Human communication is based on two opposite tendencies: the one who speaks tries to use the minimum number of words, and this one who hears tries to understand the speaker by investing minimal effort.

Let the effort E(x) of a researcher to produce x publications be proportional to the time he or she invests for research: \(E(x) \propto t\). There is a law for an exponentially growing science that states that scientific production growths exponentially with invested time: \(x(t)=\exp (\lambda t)\), where \(\lambda \) is a parameter. From here, \(t= \frac{1}{\lambda } \ln (x)\) and

$$\begin{aligned} E(x) \propto \frac{1}{\lambda } \ln (x) = \rho \ln (x). \end{aligned}$$
(5.162)

This relationship will be introduced in the relationships for the variational principle of Boltzmann below [104, 157].

The principle of maximum entropy (variational principle of Boltzmann) is for systems whose states x are distributed with probability p(x) (\(\int dx \ p(x)=1\)). Then at an equilibrium state with energy

$$\begin{aligned} E = \int dx \ p(x) E(x), \end{aligned}$$
(5.163)

the entropy

$$\begin{aligned} H = - \int dx \ p(x) \ln [p(x)] \end{aligned}$$
(5.164)

has a maximum value.

The function p(x) above is the probability that a researcher has produced x publications, and we shall treat E(x) below as a measure of the mean effort (mean “energy”) spent in the course of the scientific work. The solution of the above variational problem is

$$\begin{aligned} p(x)=(1/Z) \exp [-\lambda ^* E(x)] = (1/Z) (1/x^{\rho \lambda ^*}), \end{aligned}$$
(5.165)

where Z is the statistical sum and \(\lambda ^*\) is a parameter that can be determined from the normalization condition and the boundary condition.

Here we shall discuss as the least-value state the state \(x_0=1\) (researchers must have at least one publication). Then

$$\begin{aligned} E = \int \limits _{1}^\infty dx \ p(x) E(x) \end{aligned}$$
(5.166)

and

$$\begin{aligned} p(x) = (\rho /E) 1/(x^{1+ \rho /E}) = \alpha /(x^{1+\alpha }); \ \ \alpha = (\rho /E). \end{aligned}$$
(5.167)

This is the law of Pareto (called also the Zipf–Pareto law) .

The entropy of a system that obeys the law (5.167) is

$$\begin{aligned} H = - \int \limits _1^\infty dx \ p(x) \ln [p(x)] = 1 + \frac{1}{\alpha }-\ln (\alpha ); \end{aligned}$$
(5.168)

“Temperature”: The analogy with the thermodynamics may be continued: one may introduce a quantity called “temperature.” This quantity is a measure of the external influence on the scientific system .

“Temperature” can be introduced by comparing the results for Lagrange multipliers in statistical mechanics (where \(\lambda ^* \propto 1/T\)) with the case of scientific production (where \(\lambda ^* = (1+\alpha )/\rho \)). Thus the “temperature” is

$$\begin{aligned} T \propto \frac{\rho }{1+\alpha }. \end{aligned}$$
(5.169)

Using (5.169), we can write the Zipf–Pareto law (5.167) as

$$\begin{aligned} p(x) = \frac{\alpha }{x^{k \rho /T}}, \end{aligned}$$
(5.170)

where k is a coefficient of proportionality. From (5.169), \(\alpha = 1 - \frac{k \rho }{T}\), and the final form of the Zipf–Pareto law (5.170) is

$$\begin{aligned} p(x) = \frac{1-\frac{k \rho }{T}}{x^{k \rho /T}}. \end{aligned}$$
(5.171)

There are two parameters in (5.171):

  • k: characteristic of the efforts of the researcher in the publication process. These efforts can depend on the talent of the researcher but also on the conditions of work, salary, etc. Increasing research efforts lead to a decreasing value of k.

  • T: characteristic of external influence on research organization. The parameter T can be connected to different flows toward the scientific structures (e.g., to money flows). Then if the money flow increases, the system is “heated,” and if the money flow decreases, the system is “frozen.”

Let us analyze (5.171). We shall see the role of better work conditions and increased funding in increasing research production.

  1. 1.

    Let us fix the number of publications x. Thus we can study the influence of \(\rho \) and T. Let us fix also T (for example, a fixed quantity of money flows to the scientific organization, and other external conditions are fixed). Then a decrease in \(\rho \) will increase the numerator of (5.171) and will decrease its denominator. Hence p will increase. This means that initiatives to decrease the necessary expenditures of effort by researchers in the publication production process (for example, an initiative for better work conditions or better social networking in the research organization) may increase the probability that researcher will have a larger number of publications.

  2. 2.

    Let us now fix x and \(\rho \) and increase T (for example, by increasing the money flow toward the research organization). The numerator of (5.171) increases, and the denominator decreases. Thus p increases, which means that one can expect that research production will increase with increased funding.

Finally, let us note that thermodynamic models are also used in other areas of science such as technological forecasting and the theory of manpower systems [158, 159].

The variational approach can also be applied to the case of discrete distributions (e.g., for studying the circulation of documents) [160]. Let us consider a finite probability distribution \(P=\{p_1,\dots ,p_n \}\), where \(p_i \ge 0\) for \(i=1,\dots ,n\) and \(\sum \limits _{i}=1^n p_i =1\). The entropy attached to this probability distribution is

$$\begin{aligned} H_n(P)=- \sum \limits _{i=1}^n p_i \ln (p_i). \end{aligned}$$
(5.172)

The entropy is a measure of uncertainty. The uncertainty is maximal when the outcomes are equally likely. Since the uniform distribution maximizes the entropy, it contains the largest amount of uncertainty.

Let \(X=\{1, \dots ,n \}\) be a random variable and \(p_i\) the probability of the occurrence of the value i. We have the constraint

$$\begin{aligned} \sum \limits _{i=1}^n p_i =1, \end{aligned}$$
(5.173)

and we impose an additional constraint about the expected value of the distribution X:

$$\begin{aligned} E(X) = \sum \limits _{i=1}^n i p_i = \mu . \end{aligned}$$
(5.174)

According to the principle of maximum entropy, we have to find the distribution P that maximizes the entropy (5.172) subject to the constraints (5.173) and (5.174). Introducing two Lagrange multipliers \(\alpha \) and \(\beta \), we have to find a maximum for the functional

$$\begin{aligned} L = H_n(P) - \alpha \left( \sum \limits _{i=1}^n p_i -1 \right) - \beta \left( \sum \limits _{i=1}^n i p_i - E(X) \right) . \end{aligned}$$
(5.175)

The Euler equations for L from (5.175) are

$$\begin{aligned} \partial L/\partial p_i= & {} - \ln (p_i) - 1 - \alpha - \beta i; \ \ i=1,\dots , n ,\nonumber \\ \partial L /\partial \alpha= & {} 1 - \sum \limits _{i=1}^n p_i ,\nonumber \\ \partial L /\partial \beta= & {} E(X) - \sum \limits _{i=1}^n i p_i. \end{aligned}$$
(5.176)

The solution of these equations is

$$\begin{aligned} p_i = \frac{\exp (-\beta _0 i)}{\sum _{i=1}^n \exp (-\beta _0 i)}, \end{aligned}$$
(5.177)

where \(\beta _0\) is the solution of the equation

$$\begin{aligned} \sum \limits _{i=1}^n [i-E(X)] \exp [-(i-E(X))] =0. \end{aligned}$$
(5.178)

A similar calculation can also be made for the case of more than two constraints.

9.2 Modeling Production/Citation Process

Joint modeling of production and citation processes in science attracted considerable attention after the introduction of the h-index of Hirsch. Below, we shall consider two models of the processes connected to the h-index.

9.2.1 Model of h-Index Based on Paretian Distributions

Discrete Paretian distributions and the Price distribution are distributions that are widely used for modeling publication activity and citation processes [161]. The properties of these distributions needed for investigation of the Hirsch index are represented by means of Gumbel’s characteristic extreme values [162]. The reason for this is that the Hirsch index can be defined on the basis of Gumbel’s rth characteristic values.

Gumbel’s rth characteristic values are defined as follows. Let us consider a random variable X that gives the citation rate of a paper. We define

  • \(p_k = P(X=k)\): probability distribution of X (\(k \ge 0)\);

  • \(F(k) = P(X<k)\): cumulative distribution function of X.

Gumbel’s rth characteristic extreme value is then defined as

$$\begin{aligned} u_r = \text {max} \{ k: G(k) \ge r/n \}, \end{aligned}$$
(5.179)

where

  • \(G(k) =G_k = 1 - F(k) = P(X \ge k)\);

  • n: given sample with distribution F.

The Hirsch index can be defined analogously to Gumbel’s rth characteristic extreme value as follows:

$$\begin{aligned} h = u_h. \end{aligned}$$
(5.180)

9.2.2 Case of Paretian Distribution of the Random Variable X

A distribution of a random variable (in our case, the distribution of citations X) is a Paretian distribution if it obeys asymptotically Zipf’s law:

$$\begin{aligned} \lim \limits _{k \rightarrow \infty } \frac{G_k}{k^\alpha } \approx \text {const}. \end{aligned}$$
(5.181)

Below, we shall use a prominent member of the class of Paretian distributions, namely the Pareto distribution \(p_k = P(X=k) \approx \frac{d}{(N+k)^{-(1+\alpha )}}\). This distribution is Paretian as \(k \rightarrow \infty \). For the case \(k\gg N\), we obtain

$$\begin{aligned} G_k = P(X \ge k) \approx \frac{d_1}{k^\alpha }, \end{aligned}$$
(5.182)

where \(d_1\) is a positive constant. Then

$$\begin{aligned} u_r \approx c_1 \left( \frac{n}{r} \right) ^{1/\alpha }, \end{aligned}$$
(5.183)

where \(c_1\) is a positive constant. Equation (5.183) leads to the following equation for the Hirsch index (in the presence of the assumption \(n \gg 1\)):

$$\begin{aligned} h = u_h \approx c_1 \left( n/h \right) ^{1/\alpha }. \end{aligned}$$
(5.184)

From here, we obtain

$$\begin{aligned} h \approx c_2 n^{1/(1+\alpha )}, \end{aligned}$$
(5.185)

where \(c_2 =c_1^{\alpha /(1+\alpha )}\).

We can draw the following conclusions from (5.185) (note that we work with the assumption that the citation distribution is a discrete Paretian distribution (with finite expectation)).

 

  1. 1.

    If the number of underlying papers is large enough, then the Hirsch index h is proportional to the \((1+\alpha )\)th root of the number of publications. Usually \(\alpha \) is close to 1. Then h is proportional to the square root of the number of publications.

  2. 2.

    The number of citations of the papers from the Hirsch core (which contains the h-papers: papers that received at least h citations each) is proportional to \(h^2\) for \(\alpha >1\) and a large value of k [161].

9.2.3 Case of Price Distribution of the Random Variable X

We recall that in our case, the random variable X is the citation rate of a paper. The Price distribution is [163]

$$\begin{aligned} p_k = P(X=k) = N \left( \frac{1}{N+k}-\frac{1}{N+k+1} \right) = \frac{N}{(N+k)(N+k+1)}, \end{aligned}$$
(5.186)

where \(k \ge 0\) and N is a positive parameter.

Note that N is a positive parameter. Thus N may be a noninteger. In addition, the Price distribution contains the case \(k=0\) as well as the law of Lotka (for research publications) when \(k\gg N\). Moreover, no positive moments of the Price distribution exist. The distribution (5.186) is called the Price distribution, since it contains as a limiting case the square root law of Price (which states that half of the scientific papers are contributed by the top square root of the total number of scientific authors) [163]. Let us stress that the Price distribution is a particular case (when \(\alpha =1\)) of the Waring distribution [101, 164 ]

$$\begin{aligned} p_k = P(X=k) = \frac{\alpha }{N+\alpha } \frac{N}{N+\alpha +1} \dots , \frac{N+k-1}{N+\alpha +k} \end{aligned}$$
(5.187)

where \( k \ge 0\) and \(\alpha \) and N are positive parameters.

For the case in which the distribution of the citation rate is described by the Price distribution, one obtains

$$\begin{aligned} G_k = \frac{N}{N+k}. \end{aligned}$$
(5.188)

Thus the distribution is Paretian (but note that the expected value of X for this distribution is \(\infty \), in contrast to the finite expectation connected to the Pareto distribution discussed above).

The Gumbel rth extreme value is

$$\begin{aligned} u_r = \left[ \frac{N(n-r)}{r}\right] , \ \ r=1,2,\dots ,n, \end{aligned}$$
(5.189)

where \([\dots ]\) denotes the integer part of the corresponding argument.

The corresponding h index is a solution of the equation

$$\begin{aligned} h = u_h \approx \frac{N(n-r)}{r}. \end{aligned}$$
(5.190)

The solution (for \(n\gg 1)\) can be approximated as

$$\begin{aligned} h = \left( \frac{N^2}{4} + nN\right) ^{1/2} - \frac{N}{2} \approx (nN)^{1/2}, \end{aligned}$$
(5.191)

which means the following:

The h-index is proportional of the square root of the number of publications (if the citation rate is described by the Price distribution and all other assumptions are valid).

9.2.4 Model of h-Index Based on the Poisson Distribution

Another model of the h-index is based on the publication–citation model of Burrell [165, 166]. This model is for the publishing record of a scientist who publishes papers at certain times. These papers then attract citations, and both the publication and citation accumulation processes are random. The assumption is that the scientist starts his/her publishing career at \(t=0\), and by the time \(T>0\), one observes the following:

  1. 1.

    Poisson process of publishing

    The author publishes papers according to a Poisson process at rate \(\theta \). The distribution of the number of publications \(Y_T\) at time T is

    $$\begin{aligned} P(Y_T = r) = \exp (-\theta T) \frac{(\theta T)^r}{r!}, \ \ r=1,2,\dots , \end{aligned}$$
    (5.192)

    with expected value \(E[Y_T]=\theta T\).

  2. 2.

    Poisson process of citations receiving

    Each of the publications receives citations according to a Poisson process of rate \(\varLambda \), which can vary from paper to paper.

  3. 3.

    Variation of the rate \(\varLambda \)

    The citation rate \(\varLambda \) varies over the set of publications of the scientist according to a gamma distribution of index \(\nu >1\) and parameter \(\alpha >0\):

    $$\begin{aligned} f_\varLambda (\lambda ) = \frac{\alpha ^\nu }{\varGamma (\nu )} \lambda ^{\nu -1} \exp (-\alpha \lambda ), \end{aligned}$$
    (5.193)

    where \(0< \lambda < \infty \).

The model leads to the following distribution of the citations of a randomly chosen paper of the scientist [166]:

$$\begin{aligned} P(X_T=r) = \frac{\alpha }{T(\nu -1)} B \left( \frac{T}{\alpha +T}; r+1, \nu -1 \right) , \ \ \ r=0,1,2, \dots , \end{aligned}$$
(5.194)

where

$$ B(x;a,b) = \frac{\varGamma (a+b)}{\varGamma (a) \varGamma (b)} \int \limits _0^x dy \ y^{a-1}(1-y)^{b-1} $$

is the cumulative distribution of the beta distribution of the first kind, and a and b are parameters.

What remains to be calculated is N(nT): the expected number of papers receiving at least n citations by the time T.

  • Case of \(n=0\) citations

    $$\begin{aligned} E[N(0;T)] = \theta T, \end{aligned}$$
    (5.195)

    i.e., the number of uncited papers of the scientist is expected to have linear increase over time.

  • Case of \(n \ne 0\) citations In this case [166],

    $$\begin{aligned} E[N(n;T)] = \theta T \left[ 1- \frac{\alpha }{T(\nu -1)} \sum _{r=0}^{n-1} B \left( \frac{T}{\alpha +T}; r+1, \nu -1 \right) \right] , n= 1,2,\dots . \end{aligned}$$
    (5.196)

Equation (5.196) has interesting consequences:

  1. 1.

    Publish or perish!: The expected number of papers with n citations is proportional of the publication rate \(\theta \).

  2. 2.

    A long career in science is a good thing!: The expected number of papers with n citations is increasing in T for every n.

  3. 3.

    No one is a genius!: The expected number of papers with n citations is decreasing in n for every T.

Finally, the h-index can be defined as

$$\begin{aligned} h(T) = \text {max} \{n: n \le E[N(n,T)] \}, \end{aligned}$$
(5.197)

and as we have seen just above, the h-index depends on the intensity of publication, the length of the scientific career, and other parameters (such as the parameters \(\alpha \) and \(\nu \) of the beta distribution, which can vary from scientist to scientist).

9.3 The GIGP (Generalized Inverse Gaussian–Poisson Distribution): Model Distribution for Bibliometric Data. Relation to Other Bibliometric Distributions

Up to now, we have discussed several distributions that may be used to model different aspects of research dynamics and to fit bibliometric data. Sichel [167, 168] argues that there exists a distribution that is very suitable for modeling bibliometric data: the GIGP (generalized inverse Gaussian–Poisson) distribution. The GIGP distribution seems to be complicated, but its goodness of the fit with respect to bibliometric data is usually very good. The GIGP distribution may be obtained as follows. Let us consider a researcher who has an average rate of publishing \(\lambda _i\) papers in unit time. Then the expected number of papers published by this researcher for time t will be \(\lambda _i t\). Let us assume that the statistical variability around the average \(\lambda _i t\) follows a Poisson distribution. If we have a group of researchers, then within this group, the value of \(\lambda _i\) will vary, since some researchers are more productive than others. Let us assume that the values of \(\lambda _i\) are distributed according to a generalized inverse Gaussian distribution law (called a GIG distribution).Footnote 1 Then we arrive at the compound Poisson distribution called GIGP [170]:

$$\begin{aligned} p(r,t) = \frac{(1-\theta _t)^{\gamma _t/2}}{K_{\gamma _t}[\alpha _t \sqrt{1- \theta _t}]} \frac{(\alpha _t \theta _t)^r}{2^r r!} K_{r+\gamma _t}(\alpha _t), \end{aligned}$$
(5.198)

where \(r=0,1,2,\dots \); \(0 \le \theta _t \le 1\); \(- \infty< \gamma _t < \infty \); \(\alpha _t \ge 1; K_\nu (z)\) is the modified Bessel function of the second kind of order \(\nu \); and t is the length of the considered time period. The time-dependent parameters are as follows:

$$\begin{aligned} \alpha _t = \alpha \sqrt{1+ \theta (t-1)}; \ \ \theta _t = \frac{\theta t}{1+ \theta (t-1)}; \ \ \gamma _t = \gamma . \end{aligned}$$
(5.199)

From (5.198), one can calculate the probabilities p(r) by means of a recurrence relation as follows if one knows p(0) and p(1) for \(r=0,1,2,\dots \):

$$\begin{aligned} p(r) = \left( \frac{r+\gamma -1}{r} \right) \theta _t p(r-1) + \frac{\alpha _t^2 \theta _t^2}{4r(r-1)} p(r-2). \end{aligned}$$
(5.200)

The GIGP is also able do describe the domain \(r=1,2,,3,\dots \). For this purpose, one has to perform zero truncation of the distribution from (5.198). The result is

$$\begin{aligned} p(r,t) = \frac{(\alpha _t \theta _t)^r K_{\gamma +r}(\alpha _t)}{2^r r!\{(1-\theta _t)^{-\gamma /2} K_\gamma [\alpha _t(1-\theta _t)^{1/2}] - K_\gamma (\alpha _t)\}}. \end{aligned}$$
(5.201)

The GIGP distribution has been used to describe bibliometric data such as the number of articles published in the area of operations research, the scattering of literature in applied geophysics, the literature on mast cells, publications of a group of chemists several years after receiving their doctoral degrees, in-house journal use in libraries, etc. [167].

The GIGP distribution (5.198) has three parameters. If some of these parameters are known a priori, then the GIGP distribution can be reduced to several different distributions. Some examples of such reduction are as follows:

  1. 1.

    Negative binomial distribution: \(\alpha = 0\); \(\gamma >0\).

  2. 2.

    Zero-truncated negative binomial distribution: \(\alpha =0\); \(-1< \gamma <1\).

  3. 3.

    Fisher logarithmic series distribution: \(\alpha = \); \(\gamma = 0\).

  4. 4.

    Inverse Gaussian–Poisson (IGP) distribution: \(\gamma = -1/2\); \(r=0,1,2,\dots \).

The upper tail (i.e., for large values of r) of the GIGP distribution is given by the following relationship [168]:

$$\begin{aligned} p(r) \sim \frac{c \theta ^r}{r^{1-\gamma }}, \end{aligned}$$
(5.202)

where c is a normalizing constant, \(0 < \theta \le 1\), and \(-\infty< \gamma < \infty \). Taking the logarithm of both sides of (5.202), one can write

$$\begin{aligned} Y = A - (1-\gamma )X - B \exp (X), \end{aligned}$$
(5.203)

where \(Y=\ln p(r)\); \(X=\ln r\); \(A= \ln C\); \(B=-\ln \theta \). Thus the tail of the GIGP distribution for \(\gamma <1\) is first linear, and then with increasing value of r, it becomes convex. Let \(\theta =1\). Then the tail of the GIGP distribution described by (5.203) becomes linear, and thus the GIGP distribution for this case corresponds to the distributions of Lotka and Zipf discussed in a previous chapter of this book.

9.4 Master Equation Model of Scientific Productivity

We know already that productivity is an important element in the evolution of a research community. It is possible to derive an equation that accounts for the stochastic fluctuations in the productivity of the members of a scientific organization [171]. In order to obtain this model equation, we assume that the main processes of evolution of scientific community are these:

  1. 1.

    the self-reproduction of scientists,

  2. 2.

    aging and death of scientists,

  3. 3.

    departure of scientists from the scientific field due to mobility or abandoning research activities.

Let a be the scientific age (number of years devoted to scientific research) of a researcher, and let a scientific productivity index \(\xi \) be incorporated into the researcher state space (\(\xi \) and a are assumed to be continuous variables with values in \([0,\infty ]\)). The dynamics of the research community are described by a number density function \(n(a, \xi , t)\), which specifies the age and productivity structure of the scientific community at time t. For example, the number of researchers with age in \([a_1,a_2]\) and scientific productivity in \([\xi _1, \xi _2]\) at time t is given by the integral \(\int \limits _{a_1}^{a_2} \int \limits _{\xi _1}^{\xi _2} da\ d \xi \ n(a, \xi , t)\).

The following master equation for this function \(n( a,\xi , t )\) can be derived [171]:

$$\begin{aligned} \left( \frac{\partial }{\partial a} + \frac{\partial }{\partial t} \right) n( a,\xi , t ) = -[J(a,\xi ,t) + w (a, \xi , t)] n(a,\xi ,t) + \nonumber \\ \int \limits _{-\infty }^{\xi } d \xi ' \chi (a, \xi - \xi ',\xi ',t) n(a,\xi - \xi ',t), \nonumber \\ \end{aligned}$$
(5.204)

where \(w(a, \xi , t)\) denotes the departure rate of community members. If x(t) is a random process describing the scientific productivity variation and \(p_a(x, t \mid y, \tau )\) (\(\tau < t\)) is the transition probability density corresponding to such a process, then

$$\begin{aligned} \chi (a, \xi , \xi ', t) = \lim _{\varDelta t \rightarrow 0}\frac{p(\xi +\xi ', t + \delta t \mid \xi , t)}{\varDelta t}. \end{aligned}$$
(5.205)

The transition rate \(J(a, \xi , t)\) at time t from the productivity level \(\xi \) is by definition

$$ J ( a,\xi ,t )= \int _{-\xi }^{\infty } d \xi ' \ \chi (a, \xi , \xi ',t). $$

The increment \(\xi '\) may be positive or negative. The equation for \(n(a, \xi , t)\) can be obtained in the following way. First, for the increment we have

$$\begin{aligned} n( a + \varDelta a,\xi ,t + \varDelta t) = n( a,\xi ,t ) - J ( a,\xi , t )n( a,\xi , t ) \varDelta t + \nonumber \\ \int _{-\infty }^{\xi } \chi (a, \xi - \xi ',\xi ', t) n( a,\xi - \xi ',t ) d\xi ' \varDelta t - w( a,\xi ,t )n( a,\xi , t ) \varDelta t , \end{aligned}$$
(5.206)

where:

  • the term on the right-hand side, \([1-J(a, \xi , t) \varDelta t] n(a, \xi , t)\), describes the proportion of individuals whose scientific productivity does not change in \((t, t + \varDelta t)\);

  • the integral term describes the individuals whose scientific productivity becomes equal to \(\xi \) because of increase or decrease in \((t, t + \varDelta t)\);

  • the last term corresponds to the departure of individuals through stopping research activities or death.

After expanding \(n(a + \varDelta t, \xi , t + \varDelta t)\) around a and t and retaining terms up to the first order in \(\varDelta t\), one obtains the master equation (5.204).

The above master equation is difficult for analysis, and because of this, it is often reduced to an approximation similar to the well-known Fokker–Planck equation . Let

$$\begin{aligned} \mu _k (a,\xi , t) = \int _{-\xi }^{ \infty } d \xi ' (\xi ')^k \chi (a, \xi , \xi ',t)= \lim _{\varDelta t \rightarrow 0}\frac{1}{\varDelta t} <(\xi ')^k> ; \nonumber \\ k=1,2,\dots , \end{aligned}$$
(5.207)

where the brackets denote the average with respect to the conditional probability density \(p_a(\xi +\xi ', t + \varDelta t \mid \xi , t)\). In addition, we make the following assumptions:

  • \(\mu _1, \mu _2 < \infty \);

  • \(\mu _k = 0\) for \(k > 3\);

  • \(n(a, \xi , t)\) and \(\chi (a, \xi , \xi ', t)\) are analytic in \(\xi \) for all a, t, and \(\xi '\).

The assumption \(\mu _k =0\) for \(k>3\) demands that productivity be continuous, i.e., when \(\varDelta t \rightarrow 0\), the probability of large fluctuations \(\mid \xi ' \mid \) must decrease so quickly that \(<\mid \xi ' \mid ^3> \rightarrow 0\) more quickly than \(\varDelta t\).

When the above assumptions hold, the function n satisfies the equation

$$\begin{aligned} \left( \frac{\partial }{\partial a} + \frac{\partial }{\partial t} \right) n = -\frac{\partial (\mu _1 n)}{\partial \xi } + \frac{1}{2} \frac{\partial ^2 (\mu _2 n)}{ \partial \xi ^2}- wn. \end{aligned}$$
(5.208)

The following notes are in order here.

  1. 1.

    If \(w=0\), (5.208) is reduced to the Fokker–Planck equation.

  2. 2.

    Equation (5.208) describes the evolution of the scientific community through a drift along the age component and a drift and diffusion with respect to the productivity component.

  3. 3.

    The diffusion term characterized by the diffusivity \(\mu _2\) takes into account the stochastic fluctuations of scientific productivity conditioned by internal factors (such as individual abilities, labor motivations, etc.) and external factors (such as labor organization, stimulation systems, etc.).

  4. 4.

    The initial and boundary conditions for (5.208) are:

    • \(n(a,\xi ,0) = n^0 (a, \xi )\), where \(n^0(a,\xi )\) is a known function defining the community age and productivity distribution at time \(t=0\);

    • \(n(0,\xi , t) = \nu (\xi ,t)\), where the function \(\nu (\xi , t)\) represents the intensity of input flow of new members at age \(a=0\) and \(\nu (\xi , 0) = n^0(0,\xi )\).

  5. 5.

    In addition, \(n(a, \xi , t) \rightarrow 0\) as \(a \rightarrow \infty \).

The general solution of (5.208) with the above initial and boundary conditions is still a difficult task. But for many practical applications, knowledge of the first and second moments of the distribution function \(n(a, \xi , t)\) is sufficient. Equation (5.208) can be solved numerically or can be reduced to a system of ordinary differential equations [171].

In a similar way, a model of personal movement can be obtained [172]. The model equation for this case is

$$\begin{aligned} \left( \frac{\partial }{\partial a} + \frac{\partial }{\partial t} \right) n(a,t) = - n(a,t)[w_1(a,t)+w_2(a,t)] + r(a,t) v(t), \end{aligned}$$
(5.209)

where a is the age variable, t is the time, n(at) is the density of researchers having age a at time t, w is the age intensity of researchers’ departure, v(t) is the intensity of the input flow of new researchers at the moment of time t, r(at) is the density of the input flow age distribution, \(w_1(a,t)\) is the intensity of departure due to death, retirement, etc., \(w_2(a,t)\) is the intensity of the regulated departure of researchers (\(w(a,t)=w_1(a,t)+w_2(a,t)\)). Also, \(a_0\) denotes the minimum age of researchers and A denotes the maximum admissible age of researchers; \(a_0\) and A participate in the initial condition

$$\begin{aligned} n(a,0) = n^0(a), \ \ a_0 \le a \le A, \end{aligned}$$
(5.210)

and the boundary condition is

$$\begin{aligned} n(a_0,t) = 0 , \ \ t \ge 0. \end{aligned}$$
(5.211)

10 Probability Model for Importance of the Human Factor in Science

Below, we shall discuss a probability model connected to the importance of the human factor in science. One often hears that technological evolution is closely connected to the growth of science and that the growth of science depends heavily on the human factor (number and quality of scientists). Such statements are no surprise, since a connection has been observed between the values of scientometric indicators of the research production of a country’s researchers and the corresponding GDP [173177]. A research organization may have a perfect structure with respect to research positions and research equipment associated with those positions. The research positions may be connected by a perfect system of relations, and the processes in the organization may be carefully planned. But this is not enough. In order to put all the above into effective action, one needs researchers. Researchers of good quality have to fill the research positions. Researchers have to perform actions that contribute to a smooth flow of the processes in a research organization . Only then can the work of this organization be effective. In addition, a researcher does not work alone [178183]. Teamwork and collaboration among scientists and scientific groups is becoming ever more for solving the scientific problems of today [184189].

This shows that the human factor is of extreme importance for research organizations. Because of this, we shall discuss below (with the help of mathematics) the importance of the size of the research community .

10.1 The Effective Solutions of Research Problems Depend on the Size of the Corresponding Research Community

It is intuitive that larger research communities can solve more complex problems [92]. Let us consider some research problem and let \(\beta \) be the mean probability that a qualified researcher will solve the problem. Then:

  • \(1-\beta \) is the probability that the researcher will not solve the problem.

  • \((1-\beta )^n\) is the probability that a group of n qualified researchers will not solve the problem.

Thus the probability that the same group of n qualified researchers will solve the complex problem (which is not likely to be solved by a single researcher, i.e., \(\beta \ll 1\)) is

$$\begin{aligned} p_n =1-(1-\beta )^n = 1 - \exp [n \ln (1-\beta )] \approx 1-\exp (-n\beta ). \end{aligned}$$
(5.212)

If the research group is small, i.e., \(n \beta \ll 1\), then from (5.212), we obtain the linear relationship

$$\begin{aligned} p_n \approx \beta n. \end{aligned}$$
(5.213)

Then an increase in the size of the group of qualified researchers increases the probability of solving the problem. When the group is small, the probability of solving the problem is proportional of the group size. When the size of the group increases, the nonlinear terms become significant, and the probability \(p_n\) increases faster than a linear function.

10.2 Increasing Complexity of Problems Requires Increase of the Size of Group of Researchers that Has to Solve Them

Scientific organizations evolve and usually become more complex [190, 191]. One factor for such a development is the need to solve research problems of increasing complexity . This increasing complexity leads to a decreasing probability \(\beta \) that a single researcher can solve such a problem. In order to compensate this decrease, one may increase the size of the research group that has to solve the problem.

Let us study the above situation with the help of mathematics. To compensate the decrease of probability means that one has to keep \((dp_n/dt)\ge 0\). Then from (5.212), one obtains

$$\begin{aligned} \frac{1}{n}\frac{dn}{dt} \ge - \frac{1}{\beta } \frac{d \beta }{dt}. \end{aligned}$$
(5.214)

Taking into account that \((d \beta /dt) <0\), the increase in the size of the research group with increasing complexity of the solved problem has to be

$$\begin{aligned} \frac{dn}{dt} \ge \frac{n}{\beta } \left( - \frac{d \beta }{dt} \right) . \end{aligned}$$
(5.215)

The above simple model leads to the following conclusions. As the complexity of scientific problems increases with time, one needs larger research collectives in order to support a large probability of solving the problems. Thus if a government wants an effective solution of national scientific and technological problems, it has to support a large enough national research community. A decrease in the number of researchers diminishes the national scientific capacity: the probability of solving problems important to the society decreases at least proportionally to the decrease in the size of the corresponding research community .

Note that the value of the parameter \(\beta \) plays an important role in the above model. This value must be kept as large as possible. In other words, an effective scientific community consists of qualified scientists. In addition, let us note that research groups in most cases consist not only of researchers. There are also supporting staff. In connection with this, certain scaling properties may exist for research units [192]. For example, a power law relationship may exist between the number of supporting staff \(N_s\) and the number of academic staff \(N_A\) of a research institution: \(N_s = C N_A^\beta \) , where C is a constant and \(\beta \) is the exponent of the power law. For the case of the UK National Health System, \(C\approx 0.07\) and \(\beta \approx 1.3\). The last relationship is an example of a quantitative power law relationship connected to the parts of research (and other) organizations. Such power laws have been discussed in Chap. 4.

11 Concluding Remarks

In this chapter, selected classes of deterministic and probability models connected to science dynamics and research production have been discussed. The focus was on the models connected to dynamics of research systems and especially on models for deterministic and statistical properties of the process of publication and the process of citation of research publications. Some of the models have been described very briefly, while for some (probability) models, more discussion has been provided (for the case in which one can obtain interesting conclusions without having to perform long mathematical calculations). This manner of presentation permitted a description of more that twenty models in relatively few pages. We hope that the selected set of models has provided a good impression to the reader about the mathematical tools and methods used for modeling of complex processes and the nonlinear dynamics connected to research systems.

There exist also other deterministic and probability models. For example, there exists a model of science as a part of a global model of a social system . In this model, the scientific system can be treated as a system that has entrances and exits [92]. The input (different flows) comes from the other parts of the social system to the entrances of the science subsystem. At the exit, there are scientific output flows to other parts of the social system. The input flows can be flows of funding or human resources, for example. The main output flow is scientific knowledge . Part of this flow is the flow of publications.

Finally, let us make several remarks on the limited dependent variable models and on the generalized Zipf distribution, since these topics are of significant interest for research in the area of informetrics.

Limited dependent variable models (e.g., binary, ordinal, and count data regression models) may be used for analysis of all kinds of categorical and count data in bibliometrics and scientometrics (such as assessment scores, citation counts, career transitions, editorial decisions, or funding decisions) [193]. The main advantage of limited dependent variable models is that in using them, one may identify the main explanatory variables in a multivariate framework, and in addition, one may estimate the size of the (marginal) effects of these variables.

Let us consider the group of regression models . Limited dependent variable models are a subgroup of this group with a limited range of possible values of the variable of interest. This variable may have a binary outcome (e.g., whether a journal article was cited over a certain period). The variable may take multiple discrete values (e.g., for the case of assessment of research or for the case of peer reviews).

In the case of a binary regression model, we have a variable \(y_i\) that can take only the values 0 and 1. We may model the probability that this variable will take value 1 depending on the values of other variables \(x_{1i}, \dots , x_{ki}\) as follows:

$$\begin{aligned} p(y_i=1 \mid , x_{1i}, \dots ,x_{ki}) = L(\beta _0 + \beta _1 x_{1i} + \dots + \beta _{k} x_{ki}), \end{aligned}$$
(5.216)

where \(L(x) = \frac{\exp (x)}{1+\exp (x)}\) is the logistic function (whose range is between 0 and 1). The model (5.216) is called the logit model. The coefficients \(\beta _i\) of the logit model may be estimated by maximizing the likelihood of the data with respect to the coefficients.

The binary logistic model may be used for analyzing or predicting (or for analyzing and predicting) whether articles will be cited [194], for analysis of funding and editorial decisions [195], for analysis of winning scientific awards [196], etc. [197, 198]. One illustration of the application of the model can be seen in [193], in which the dependent variable measures whether an article was cited in another published article during the calendar year following its publication.

For the case of the ordinal regression model, the variable of interest \(y_i\) is an ordinal variable that can take only the values \(j = 1, 2, \dots , J\). In this model, the cumulative probability is the probability that an observation i is in the jth category or lower: \(p(y_i) \le j = \delta _{ij}\) can be modeled by the logit relationship

$$\begin{aligned} \mathrm{logit} (\delta _{ij}) = \alpha _j - \beta _1 x_{1i} - \dots - \beta _{k} x_{ki}, \end{aligned}$$
(5.217)

where \(\mathrm{logit}(p) = \log (\frac{p}{1-p}) = \log (p) - \log (1-p)\). Ordinal regression models are applied when we are interested in additional characteristics of the investigated variables with respect to a characteristic modeled by the binary regression model. For example, in the case of binary regression analysis of citations, it was of interest to know whether an article has been cited. If an article has been cited, it may not be of interest how many citations of this article exist. If we are interested in the number of citations, we may use the ordinal regression model above. Such models are used in peer assessment of research groups [199] and for predicting the impact of international coauthorship on citation impact [200].

Finally, one may use count data models if the modeled variable represents the frequency of an event. The count data models can be Poisson models, negative binomial models, etc. The Poisson model is for a count variable \(y_i\) that can take only nonnegative integer values: \(0,1,\dots \). It is assumed that \(y_i\) conditional on the independent variables has a Poisson distribution (\(y=1,2,\dots \))

$$\begin{aligned} p(y_i = y \mid x_{1i}, \dots , x_{ki}) = \frac{\mu _i^y \exp (-\mu _i)}{y!}, \end{aligned}$$
(5.218)

where \(\mu _i\) is the expected value of the distribution that is modeled by

$$\begin{aligned} \mu _i = \exp [\beta _0 + \beta _1 x_{1i} + \dots + \beta _k x_{ki}]. \end{aligned}$$
(5.219)

A limitation of the Poisson regression model is that the Poisson distribution is completely determined by its mean and that the variance is assumed to equal the mean. This restriction may be violated in many applications, since the variance is often greater than the mean. Then there is overdispersion: the variance is greater than the variance implied by assuming a Poisson distribution. One possibility for dealing with overdispersion is to use a negative binomial regression model. This model allows the conditional mean \(\mu _i\) of \(y_i\) to differ from its variance \(\mu _i + a \mu _i^2\) by estimating an additional dispersion parameter a.

A Poisson model may be used to identify the effects of coauthorship networks on performance of scholars [201]. Negative binomial regression models can be applied to study citation counts for the purpose of determining the relative importance of authors and journals [202], for comparing sets of papers [203], and for modeling the number of papers [204].

There is a generalization of the Zipf distribution (called the generalized Zipf distribution) that contain as particular cases a family of skew distributions found to describe a wide range of phenomena both within and outside the information sciences and referred to as being of Zipf type. The generalized Zipf distribution is defined as follows [205]. Let

$$\begin{aligned} d(k \mid f) = \frac{\log [\,f(k+1)]-\log [\,f(k)]}{\log (k+1) - \log (k)}, \end{aligned}$$
(5.220)

where \(f(k)>0\) and the integer k is greater than 1. Let N be the set of natural numbers \(1,2,\dots \) and Z a random variable defined on N. Let \(P(k)=P(X=k)\) and \(F(k) = P(X\ge k)=\sum \limits _{i \ge k} P(i)\) be the corresponding distributions connected to Z. A distribution F defined on N is a generalized Zipf distribution with exponent \(\alpha >0\) if and only if \(d(k \mid f) \rightarrow -\alpha \) as \(k \rightarrow \infty \), i.e.,

$$\begin{aligned} \lim _{k \rightarrow \infty } d(k \mid f) = \frac{\log [F(k+1)]-\log [F(k)]}{\log (k+1) - \log (k)}. \end{aligned}$$
(5.221)

It is easily to check that the Waring distribution with

$$\begin{aligned} F(k) = \frac{\beta ^{(k-1)}}{(\alpha + \beta )^{(k-1)}} , \ \beta ^{(k)} = \beta (\beta -1) \dots (\beta +k-1) \end{aligned}$$
(5.222)

is a particular case (belongs to the class) of the generalized Zipf distribution. But the geometric distribution (\(P(k)=\theta (1-\theta )^{k-1}\) and \(F(k)=(1-\theta )^{k-1}\)) does not belong to the class of generalized Zipf distributions.

The class of generalized Zipf distributions has several properties. In order to define the first property, we need to know when a function \(\varphi (k)\) varies gradually. Let \(\varphi (k)\) be a positive function defined on N. Then \(\varphi (k)\) varies gradually if and only if

$$\begin{aligned} \lim _{k \rightarrow \infty } d(k\varphi ) = \lim _{k \rightarrow \infty } \frac{\log \varphi (k+1)-\log \varphi (k)}{\log (k+1) - \log (k)} =0; \end{aligned}$$
(5.223)

F(k) is a generalized Zipf distribution of exponent \(\alpha >0\) if and only if [205]

$$\begin{aligned} F(k)=\frac{\varphi (k)}{k^\alpha }, \end{aligned}$$
(5.224)

where \(\varphi (k)\) is a gradually varying function. An example of a distribution that belongs to the class of generalized Zipf distributions is the Yule distribution, with

$$\begin{aligned} F(k)=\frac{(k-1)!}{(\alpha +1)^{(k-1)}}. \end{aligned}$$
(5.225)

We can write this distribution in the form (5.224), where

$$\begin{aligned} \varphi (k)=\frac{(k-1)!}{(\alpha +1)^{(k-1)}}k^\alpha . \end{aligned}$$
(5.226)

One can define the quantities proportional hazard rate

$$\begin{aligned} r(k)=\frac{kP(k)}{F(k)}, \end{aligned}$$
(5.227)

and the conditional expectation

$$\begin{aligned} e(m)=E[X \mid X \ge m] = \sum \limits _{k \ge m} k \frac{P(x=k)}{P(X \ge m)}. \end{aligned}$$
(5.228)

Then the following two statements can be proved [205]. First of all, F(k) is a generalized Zipf distribution with exponent \(\alpha >0\) if and only if

$$\begin{aligned} \lim _{k \rightarrow \infty } r(k) \rightarrow \alpha . \end{aligned}$$
(5.229)

Next, F(k) is a generalized Zipf distribution with exponent \(\alpha >1\) if and only if

$$\begin{aligned} \lim _{k \rightarrow \infty } \frac{e(k)}{k} = \lim _{k \rightarrow \infty } [e(k+1)-e(k)] = \frac{\alpha }{\alpha - 1}. \end{aligned}$$
(5.230)