Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 The Principle of Classification: How to Neutralize the Immanence of Variability

Since the birth of modern science the development of statistical thought goes alongside the evolution of the semantic concept of variability. The variability of natural and social phenomena has been the challenge faced by Galilean science by replacing the apparent disorder found in Nature with the order of scientific laws. The need to investigate phenomena producing several different results has shifted the interest of scientific research from the single case to all cases as a whole. The search for laws concerning a group considered as a whole has found its empirical ground in the variability of reality, and of the phenomena that constitute it.

1.1 The Gnosiological Strength of Classification

Scientific knowledge responds to the need for simplification with respect to the multitude of aspects and manifestations in which reality appears to our senses. To classify means grouping together the single items that make up a population, according to similarities and differences with respect to some characteristics, replacing the plurality of individuals with the typology of the classes. Through the principle of classification it is possible to understand the statistical properties of a group only by considering as essential the characteristics according to which similarities and differences are recognized, and by ignoring the many other characteristics that make the single observations appear heterogeneous (Scardovi et~al. 1983).

In this attempt to define the classification process followed by modern science, much circularity emerges, which require us to accept a priori some concepts as postulates. We should define what a population is and which are its elementary constituents, what a phenomenon is and what are the relational properties in which it occurs, that is, the latent factors that determine empirical manifestations. Nowadays, modern statistics adopts as a common and shared heritage, many of these concepts, such as “phenomenon,” “population,” “category,” “statistical unit,” “elementary event,” “characteristic,” or “observable variable.” Each of these concepts has been investigated by the greatest philosophers, from those of classical antiquity to the greatest statisticians of the twentieth century, who needed to establish their epistemological statements on the new sciences to which they provided their method by renewing it at the roots, beginning with the language.

The ability to classify is innate in human beings and in many animal species. Ordinary language itself has got its basis in classification. Within the common noun is already expressed a classificatory identity that enables us to recognize different entities as the same, only on the basis of a few shared features deemed essential. We find again the “Platonic idea” that associates to each word a class (in itself homogeneous) of facts and things that are similar in some respects (principle of relative similarity), which allows us to recognize what belongs to the class and what is excluded.

1.2 The Phenomenon, a Necessary Abstraction

At this point, however, we cannot neglect a term currently used by scientific language, which, together with the concept of class (category), contains all the strength and the semantic ambiguity of the statistical method. I mean the word phenomenon. Karl Pearson writes in The Grammar of Science (1892, Chapter II): “… we have frequently spoken of the classification of facts as the basis of the scientific method, we also have had occasion to use the words real and unreal, and universe phenomenon. It is appropriate, therefore, that before proceeding further we should endeavour to clarify our ideas as to what these terms mean … But what are these facts in themselves, and what is for us the criterion of their reality” (Pearson 1911). To approach the scientific concept of phenomenon we should revisit the classic Galilean experiment. It did not claim to reproduce reality, but a phenomenon, that is a slice of reality that has been freed from everything that makes it unique and unrepeatable. The same happens when we observe a fragment of reality, even outside an experimental setting.

As the best dictionaries state, “a phenomenon is an observable fact or event, an item of experience or reality, a fact or event in the changing and perceptible forms as distinguished from the permanent essences of things.” Well, we must now ask ourselves what is the implied relationship that transforms our perceptions into “facts” related to each others, so as to become inter-subjective and shared macro-concepts. When we define a phenomenon, this loses its historical context and becomes an idealized model that goes beyond contingency; it becomes a conceptual artifact (observational or experimental). Within this idealized model, all circumstances (facts or perceptions) unrelated to those of interest are considered irrelevant. Moreover, the variability induced by circumstances not connected to those considered “strictly related” to the phenomenon is eliminated because it is considered an element of disturbance.

The process of classification extends the epistemological rules of the experimental method to all observable phenomena. The principle of classification is based on a set of relational connections that allow us to isolate what, according to our perceptions, are shared or shareable similarities from what are irrelevant differences, and as such can be virtually eliminated. This process, certainly innate in human beings and consolidated by the need to survive in our natural and social environment, has developed into a rational ability that has led to classify perceptions into homogeneous classes and phenomena, i.e., sets of inter-connected categories. Compared to the concept of class, the concept of phenomenon includes a further abstraction that codifies within a closed system all the relational connections between certain categories of facts, according to a kind of centripetal force, and turns it into a unicum, precisely the phenomenon (Krantz et~al. 1971–1990).

Modern science has refined these rational abilities and has widely analyzed the philosophical canons that lead from experience to abstract theory. In this context we find the ideal continuity between the statistical and the experimental method. But the path has been very long. Most modern science is based on the concepts of class and phenomenon. Its roots dip into Aristotelian science and pre-Galilean Scholasticisms, which sought its authority in the most extreme classificatory “bulimia.” Redemption from those early classification schemes, tarnished by the contamination of ruling esotericisms, was achieved by Linnaeus’ Systema Naturae (1759), which goes beyond the creationist paradigm that had inspired it and definitely puts the principle of classification amongst the fundamental epistemological canons (Scardovi et~al. 1983).

If the statistical method together with the experimental one is the rational foundation of modern sciences, it accompanies the development of many scientific bodies, both in phases of normality, and in those of transformation. On alternate phases, all the sciences have taken advantage from the strategies offered by statistics, and in turn, statistics made use of advantage from the discoveries of other sciences: first astronomy, then biology, physics, psychology, genetics, social sciences, etc. It is a matter of fact that the entire scientific methodology has developed around the many facets in which variability is expressed.

Initially, the aim was to neutralize it to look for invariants. Following the principle of classification, statistics dealt with means, moments, and frequency distributions (Galton 1885). Then, the same principle coherently suggested statistics to identify the different types of variability that observational sciences began to put in evidence. With Pearson (1912) and Fisher (1930b), statistics has built very powerful methods for breaking down variability to compare the (systematic) variability between groups and the (accidental, random) variability within groups. Subsequently, variability was employed to search for the relations that make a phenomenon a conventional concept that can be described by logical or functional relations between its basic components (Galton 1883, 1885; Pearson 1901–1902). We can think about analysis of variance, regression and correlation analysis, exploratory factor analysis, structural equation models, generalized latent variable models, etc.

2 Combinatorial System, Induced Variability, Probability

In the history of the scientific thought, as well as in human history, the concepts of variability and uncertainty have often been associated. They are very different from each other, but also very intertwined, so as to be often confused.

The variability of the physical world has forced man to create coping strategies to find regularities, to make predictions. How is the world beyond our observational perspective, beyond the present time?

It has been more difficult dealing with uncertainty, which is a product of variability, but which concerns the single fact, the individual occurrence. In the distant past as well as today, fortune tellers, oracles, astrologers, magicians have always brought relief to the worries of human beings. Nonetheless, uncertainty has always been a challenge, a mind’s creative moment that has manifested in the games of chance (Monari et~al. 2005).

One of the first objects intended for this purpose was astragals (small sheep or dog’s anklebones) invented by man for games of chance, and cited by historians or represented in graffiti, murals, and decorated vases (David 1962). Subsequently, with a decisive leap in abstraction, astragals were replaced by dice, artifacts with which man has tried, perhaps unconsciously, to shape a very sophisticated ideal concept, that of symmetry, so to ensure each of the six faces of the die an equal possibility to appear in each throw. The die becomes the symbolic representation of an immutable physical object, which becomes variable when it is used. Even in an irrational way, man becomes the creator of variability in order to challenge it. That of the dice it is not a phenomenal variability, but pure mental abstraction. One may play dice without dice, just think about all possibilities and pick one.

Therefore, man was familiar with gambling and uncertainty. Why, then, still in ancient times, was not born a mathematics of games that could anticipate modern probability theory in the same way as the forms of the physical world have inspired Euclidean geometry? Many answers have been suggested, all unsatisfactory. We have to take a long time in order to find the first attempts at describing the possible outcomes of games of chance such as throwing dice or coins, attempts that became the empirical premise for the modern combinatorics. However, they still did not mention any measures of potential combinatorial macro-states, seen as aggregations of micro-states (elementary events) that produce the same synthetic result, the outcome (success or failure) of the game.

The history of scientific thought recognizes Luca Pacioli’s Summa de arithmetica, geometria, proportioni et proportionalità (1494) and Gerolamo Cardano’s De ludo aleae (1501–1576), as the forerunners of a new formal language able to describe the space of events in a random experiment, whose dimension is much broader than the few elements that generated the experiment (Hacking 1975). The same language was used by Galilei in his famous essay Sulla scoperta dei dadi (1635), where he shows the possible combinations of points in the throw of three dice, whose sum is equal to or less than ten.

Uncertainty could therefore be measured in a rudimentary form consisting of the distribution of all possible events that anticipated the concept of random variable. The awareness of the randomness of events introduces a new rational outlook to the interpretation of the variability of real phenomena, which can be ideally represented by the games of chance, in the same way as the perfect shapes of Euclidean geometry represented physical objects. These real transpositions of abstract concepts (mind experiments or simulations) have offered to a multitude of scholars the intuitive hook to understand the rational foundations of probability and its theorems.

From combinatorics, it is born the idea of a new variability that is no longer the one of the real phenomena, but comes only from the speculative ability of the mind. This new idea of “random variability” is a brilliant and subversive product of the rational thinking that has revolutionized science.

From the work of Cardano almost 100 years had to pass for Pascal (1654) to be able to see in those combinatorial schemes the logical premises of his probability theory. In the language of combinatorics, which is completely deterministic and mathematical, Pascal also found the easiest language to explain to the scientific world the power of his new logic, that of probability. It was a language that did not scare scientists of those times because combinatorial variability remained governed and governable by man, it was a playful mind game that had nothing to do with the reality of phenomena. It remained completely subjugated to that which will become Laplace’s determinism, in which probability was confined to neutralize the effects of accidental errors in measurement, in order to search for the “true” value of the observed magnitude.

In that “neutral” context, the first probabilists managed to demonstrate fundamental theorems, just think about De Moivre, Lagrange, Bernoulli, and Gauss (Hacking 1975; Hald 2003; Stigler 1986). These leading figures of modern thinking, however, were not only mathematicians; they were above all physicists and astronomers, and their philosophical speculations were strongly influenced by observational experience. Gauss (1809) drew his famous model in a purely analytical way, after assuming some formal preconditions, which he had taken from the evidence regarding the distribution of the repeated measurements of astronomical magnitudes, following an entirely circular logical path. That evidence had already led Lagrange (1806) to indicate the arithmetic mean of instrumental measurements as the most likely value for an unknown quantity. And he did so, before the adventurous inversion of De Moivre’s theorem would have generated the ambiguous confusion with the law of large numbers, logically resolved only with modern statistical inference (Porter 1986).

However, for many years, probability continued to be convenient to compensate for the human mind’s cognitive limitations, a mind that could never compete with the “infinite intelligence” postulated by Laplace (1814). But the subtle workings of this new logic and its language were broadening the possible horizons of scientific thinking. Once identified a phenomenon, this could be described by a statistical model able to interpret and manage both the “accidental” variability that differentiates between individuals, and the “systematic” variability that characterizes the phenomenon in its essential trait (Scardovi 1982).

3 The Return Match of Variability

In 1859 Darwin had already published On the Origin of Species, and formulated his theory of evolution by natural selection, which offers to science a new way of reading the variability up to then described by deterministic models. In Darwin’s theory the species are not immutable but they evolve conditioned by the environment (Darwin 1972). Beyond its ethical and philosophical impact, which has not yet diminished, this theory opened up two huge issues: (1) to prove the new theory in quantitative terms, and (2) to find the processes that determine the phenotypical changes upon which the environment could act selectively.

The first issue promoted the rise of the statistical method with the fundamental works of Galton, Pearson, and Weldon jointly with the journals that have launched statistics in the world as a unifying method of modern sciences: The Journal of the Royal Statistical Society and Biometrika.

Variability was no longer a state of disorder to be eliminated in order to find the true laws of nature, but became in itself a source of knowledge. The laws of the physical world could be discovered only by studying the variability of its phenomena, namely the set of relationships that conventionally connect the observed facts. In this context, it originates the theory of linear regression in which is nested the concept of causality, and that of linear correlation, where the concept of cause fades into a state of interdependence. The latter offered to Pearson (1912) and Spearman (1904) the idea of a latent explanation underlying observed phenomena, the “factors,” in a constant pursuit of a deep causal system.

The second issue relates more closely to scientific research, and particularly biology in its new structure, that is genetics. Mendel (1866) had the brilliant idea of the genetic inheritance of characters, expressed in the simplest form of a diallelic gene by the expansion of Newton’s binomial formula [p(A) + p(a)]n, where the exponent indicates the generations, A and a refer to the dominant and recessive alleles that determine the phenotype, while the binomial coefficients define the numerical proportions of genotypes and phenotypes.

The theorem’s structure at the basis of the representation of the hereditary process experimentally demonstrated by Mendel, and by those who came after him to codify molecular genetics and population genetics, is the basic one of the repeated toss of the coin, which describes the aggregation of micro-states (the possible combinatorial outcomes) in macro-states (all the combinations that produce the same expected result) (Monari et~al. 2009; Monari and Scardovi 1989). In this new paradigm combinatorics and variability become modus intellegendi of a new science that finds in the ancient gambling the most appropriate language to provide semantic, and at the same time formal content to the explanation of its processes.

If analogy has a place in the evolution of scientific thought, then we can understand the growth process of Ronald Fisher, who, starting from the discovery of the life sciences, was led to his extraordinary contributions to statistics (Monari et~al. 2009). Fisher gave an original theoretical layout to population genetics (Fisher 1930a) by blending the Darwinian theory of evolution and the Mendelian genetics.

From this huge work what has modern statistics taken, beyond the strength of the method? The answer is: a new way of dealing with variability. The arithmetic mean is no longer the final point of a science that seeks above all the invariants. As a model of invariance, the arithmetic mean becomes the starting point to investigate variability. Standard deviation is no longer the worrying measurement of dispersion or the reassuring measurement of precision. The analysis of variance breaks new grounds because it allows recognizing the variability within groups as a sign of a system in equilibrium, from the variability between groups, as sign of significant differences between groups. Fisher ingeniously associated the first type of variability to that of combinatorial schemes, generated by constant probabilities, and translated it into the language of random sampling, where variability is just sampling error. On the other hand, he associated the second type of variability to the one that occurs when an innovative factor breaks the balance and changes the original connections (parameters, probabilities, etc.). This entirely new perspective of variability has completely transformed modern statistics, which became much more than a mere tool for quantitative research; it became the explanatory language of the new sciences (Monari et~al. 2009).

4 Time and Variability

Modern science had to wait for the twentieth century to acknowledge that time is an intrinsic factor in the variability of phenomena. There is the variability expressed by the uncertainty of a future event since the conditions that regulate the phenomenon are unknown: here time is inert, and uncertainty about the future is the same as in the unknown past. And there is the variability that instead is created and shaped by time: time becomes a factor of variability because it intervenes to provide a direction to the phenomena that evolve, in the same way as evolutionary turning points mark the time. When this variability intervenes, time becomes irreversible and phenomena cannot return to their previous state (in the sense that the probability of this occurrence is 0).

The dominant Laplacian philosophy strengthened the thesis that the order of the universe was fixed at the origin, and could remain unchanged and unchangeable in astronomical phenomena as in life phenomena (Spearman 1904). Laplace’s thinking has influenced all of modern science, which had strenuously tried to fight the first signs of weakness when, in the second half of the nineteenth century, Charles Darwin disrupted all research canons through a dynamic and evolutionary explanation of natural variation in which time is beaten by the clock of the generations that pass. Darwin’s concept of time is a time measured along the direction imprinted by environmental factors on the combinatorial variability of genetic crosses in the passing of generations. For the first time, a time that does not allow return is established. In the same way in which time in Boltzmann’s physics did not allow any logical return (Boltzmann 1905).

The question of prediction was then open. In an entirely deterministic world, all events could be predictable. If they are not so, it is just because we do not have the “infinite intelligence” that would allows us to know at any moment all the forces by which nature is moved. For centuries this has allowed humans to foresee large astronomical phenomena, and to classify a living being into its species. Here uncertainty is only a limit of the researcher’s skills, which does not remove semantic value from scientific law, and the variability of single components is only a factor of disturbance. Prediction of each single event becomes possible only through cognitive approximation, but it could be exact.

The new science of Darwin and Boltzmann is not like this anymore, it shows another world that is indeterministic and that can only be explained by the language of probability. The living species are no longer immutable, but become a continuous interlaced web of genetic combinations, contingent factors, and environmental contexts. Thus, although a single molecule follows the laws of classical physics, a population of molecules follows other rules that are statistical and combinatorial, and that lead that population toward the most probable state of maximum entropy, driven only by random combinations of elementary events.

Once again, the statistics lends its semantic language and acquires new tools: new measurements for variability in terms of entropy, formalization of stochastic processes, and time series analysis that break down a phenomenon that changes over time in all its possible components (Scardovi 1982).

5 Statistical Laws and Revealing Variability

Galileo’s experimental rationality pictured a nature that could be described through the language of mathematics (today statistics), in which qualities could be converted into quantities. This was the kind of science used in astronomy, an observational science that could only emulate the canons of the Galilean experiment. Moreover Galileo wanted to establish a way of thinking free from metaphysical prejudice and anchored in experience. Galileo’s rationality is that of Kepler, of the great astronomers who came before him, and those who followed him until Newton. The laws of astronomy were looking for regularities within the intricate web of variability in the movements, sizes, space and, above all, measurements. Those laws had to interpret the divine plan, but also had to convince people by means of the accuracy in the predictions of celestial events.

The laws of Galilean science planned to combine two objectives: (a) to explain phenomena in a causal context, and (b) to predict events not yet explained by those laws. The statements of modern science have not always achieved both objectives. However, the most revolutionary scientific theories of the twentieth century were very powerful as explanatory models, but often very weak as forecasting models when applied to single events. This is because the new theories are first of all “statistical” theories. Science has learned to deal with statistical populations and collective properties.

The characteristics of these laws are properties that relate to a phenomenon as a whole and not to its elementary (inessential) components. Scientific interest shifts from single units to groups in search of statistical regularities which become group properties. For example, the sex ratio at birth is a feature of many species, and it does not apply to a single birth; in the same way as the second law of thermodynamics does not refer to a single molecule, but it describes the possible states of a large set (population) of molecules. The genetic theory of heredity too does not allow determining with certainty how each person will be, but it accurately describes the genetic structure of a group.

What are the conditions according to which a statistical property, a regularity, a distribution of frequency observed in a group may free themselves from the group dimension to become parameters of a population, scientific laws, theories? The law of large numbers attempted to answer this. The answer is not a mathematical theorem, nor a scientific discovery; it is the expression of the rational ability of man to find rules in the repetition of its experiences and perceptions. The repetition of experiences and observations in order to search for regularities is a common feature of scientific research, whatever the factual or epistemological context in which it takes place. The distinction between absolute laws and statistical laws, which has so animated the philosophical debate in the last centuries, is solved statistically in identifying phenomena with no variability from phenomena consisting of several different elementary events in which the variability between single units cannot be eliminated.

When the phenomena of interest for science are “statistical,” variability becomes the explanation key and acquires its own semantic meaning. A macro phenomenon is statistically stable because it is the result of many irregular micro phenomena. This means going beyond the single phenomenon and looking for a different perspective: the physical sciences shifted their attention to macro phenomena in a more agile and less absorbing way than life sciences and social sciences.

The first epistemological consequence is that every semantic distinction between sample and population blurs when the sample is large enough to bring out the law; in the same way as, faced with a statistical statement, the distinction between validation and confutation blurs (Scardovi 1988, 1999). In the “statistical laws,” the analysis of variability becomes the main focus of scientific research, and statistics plays a main role and it is no longer just a tool. Statistical language becomes the language of the new scientific theories, and the statistical method for the study of variability in all its facets offers the interpretive keys for all phenomena, as well as the conceptual tool to follow the evolution of a phenomenon through the transformations of its internal variation or its entropic system.