Access provided by Autonomous University of Puebla. Download reference work entry PDF
The term “variance” was coined by Ronald Fisher in 1918 in his famous paper on population genetics, The Correlation Between Relatives on the Supposition of Mendelian Inheritance, published by Royal Society of Edinburgh: “It is … desirable in analyzing the causes of variability to deal with the square of the standard deviation as the measure of variability. We shall term this quantity the Variance …” (p. 399). Interestingly, according to O. Kempthorne, this paper was initially rejected by the Royal Society of London, “probably the reason was that it constituted such a great advance on the thought in the area that the reviewers were unable to make a reasonable assessment.”
The variance of a random variable (or a data set) is a measure of variable (data) dispersion or spread around the mean (expected value).
Definition Let X be a random variable with second moment E(X 2) and let μ = E(X) be its mean. The variance of X is defined by (see, e.g., Feller 1968, p. 228)
The variance of a random variable is also frequently denoted by V (X), σ X 2 or simply σ2, when the context is clear. The positive square root of variance is called the standard deviation.
From (1), the variance of X can be interpreted as the “mean of the squares of deviations from the mean” (Kendall 1945, p. 39). Since the deviations are squared, it is clear that variance cannot be negative. Variance is a measure of dispersion “since if the values of a random variable X tend to be far from their mean, the variance of X will be larger than the variance of a comparable random variable Y whose values tend to be near their mean” (Mood et al. 1974, p. 67). It is obvious that a constant has variance 0, since there is no spread. Because the deviations are squared, the variance is expressed in the original units squared (inches2, euro2) which are difficult to interpret.
To compute the variance of a random variable, it is required to know the probability distribution of X. If X is a discrete random variable, then
When X is a continuous random variable with probability density function f(x), then
Example 1 If X has a Uniform distribution on [a, b], with pdf 1 ∕ (b − a), then
and
Hence the variance is equal to
The following table provides expressions for variance for some standard univariate discrete and continuous probability distributions.
The Cauchy distribution possesses neither mean nor variance.
Next, we list some important properties of variance.
-
1.
The variance of a constant is 0; in other words, if all observations in the data set are identical, the variance takes its minimum possible value, which is zero.
-
2.
If b is a constant then
Distribution
Notation
Variance
Bernoulli
Be(p)
pq
Binomial
Bin(n, p)
npq
Geometric
Ge(p)
q ∕ p 2
Poisson
Po(λ)
λ
Uniform
U(a, b)
(b − a)2 ∕ 12
Exponential
Exp(λ)
1 ∕ λ2
Normal
N(μ, σ)
σ2
Standard Normal
N(0, 1)
1
Student
t(ν)
ν(ν − 2) for ν > 2
F
F(ν1, ν2)
\(\frac{2{\nu }_{2}^{2}({\nu }_{ 1}+{\nu }_{2}-2)} {{\nu }_{1}{({\nu }_{2}-2)}^{2}({\nu }_{2}-4)}\) for ν2 > 4
Chi-square
Chi(ν)
2ν
$$V ar(X + b) = V ar\;X,$$which means that adding a constant to a random variable does not change the variance.
-
3.
If a and b are constants, then
$$V ar(aX + b) = {a}^{2}V ar\;X$$ -
4.
If two variables X and Y are independent, then
$$\begin{array}{rcl} V ar(X + Y )& =& V ar\;X + V ar\;Y \\ V ar(X - Y )& =& V ar\;X + V ar\;Y \\ \end{array}$$ -
5.
The previous property can be generalized, i.e., the variance of the sum of independent random variables is equal to the sum of variances of these random variables
$$V ar\left (\sum\limits_{i=1}^{n}{X}_{ i}\right ) = \sum\limits_{i=1}^{n}V ar({X}_{ i}).$$This result is called Bienaymé equality (see Loève 1977, p. 12, or Roussas p. 171).
-
6.
If two random variables X and Y are independent and a and b are constants, then
$$V ar(aX + bY ) = {a}^{2}V ar\;X + {b}^{2}V ar\;Y.$$
In practice, the variance of a population, σ2, is usually not known, and therefore it can only be estimated using the information contained in a sample of observations drawn from that population. If x 1, x 2, …, x n is a random sample of size n selected from a population with mean μ, then the sample variance is usually denoted by s 2 and is defined by
where \(\overline{x}\) is the sample mean. The sample variance depicts the dispersion of sample observations around the sample mean. The squared deviations in (4) are divided by n − 1, not by n, in order to obtain the unbiased estimator of the population variance, E(s 2) = σ2. The factor 1 ∕ (n − 1) increases sample variance enough to make it unbiased. This factor is known as Bessel’s correction (after Friedrich Bessel). Although the sample variance defined as in (4) is an unbiased estimator of population variance, the same does not relate to its square root, standard deviation; the sample standard deviation is a biased estimate of the population standard deviation.
Example 2 The first column of the following table contains first five measurements of the speed of light in suitable units (000 km/s) from the classical experiments performed by Michelson in 1879 (data obtained from the Ernest N. Dorsey’s 1944 paper “The Velocity of Light”).
x i | \({x}_{i} -\overline{x}\) | \({({x}_{i} -\overline{x})}^{2}\) | x i 2 |
---|---|---|---|
299.85 | − 0. 048 | 0.002304 | 89,910.0225 |
299.74 | − 0. 158 | 0.024964 | 89,844.0676 |
299.90 | 0.002 | 0.000004 | 89,940.0100 |
300.07 | 0.172 | 0.029584 | 90,042.0049 |
299.93 | 0.032 | 0.001024 | 89,958.0049 |
Σ 1499.49 | 0.000 | 0.057880 | 449,694.1099 |
Since the sample mean is equal to \(\ \overline{x} = \frac{\sum\limits_{i=1}^{5}{x}_{ i}} {5} = \frac{1499.49} {5} = 299.898\) using the formula given in (4) results in the variance value
In the past, instead of the “definitional” formula (4), the following (so-called shorthand) formula was commonly used, but it has become obsolete with the wide access of statistical software, spreadsheets, and Internet java applets:
About the Author
Abdulbari Bener, Ph.D., has joined the Department of Public Health at the Weill Cornell Medical College as Research Professor of Public Health. Professor Bener is Director of the Medical Statistics and Epidemiology Department at Hamad Medical Corporation/Qatar. He is also an advisor to the World Health Organization and Adjunct Professor and Coordinator for the postgraduate and master public health programs (MPH) of the School of Epidemiology and Health Sciences, University of Manchester. He is Fellow of Royal Statistical Society (FRSS) and Fellow of Faculty of Public Health (FFPH). Dr Bener holds a Ph.D. degree in Medical Statistics (Biometry) and Genetics from the University College of London, and a B.Sc. degree from Ankara University, Faculty of Education, Department of Management, Planning and Investigation. He completed research fellowships in the Departments of Genetics and Biometry and Statistics and Computer Sciences at the University College of London. He has held academic positions in public health, epidemiology, and statistics at universities in Turkey, Saudi Arabia, Kuwait, the United Arab Emirates, Qatar, and England. Professor Bener has been author or coauthor of more than 430 published journal articles; Editor, Associate Editor, Advisor Editor, and Asst. Editor for several Journals; and Referee for over 23 journals. He has contributed to more than 15 book chapters and supervised thesis of 40 postgraduate students (M.Sc., MPH, M.Phil. and Ph.D.).
References and Further Reading
Fisher R (1918) The correlation between relatives on the supposition of mendelian inheritance. Philos Trans Roy Soc Edinb 52:399–433
Dorsey EN (1944) The velocity of light. T Am Philos Soc 34(Part 1): 1–110, Table 22
Feller W (1968) An introduction to the probability theory and its applications, 3rd edn. Wiley, New York
Kempthorne O (1968) Book reviews. Am J Hum Genet 20(4):402–403
Kendall M (1945) The advanced theory of statistics. Charles Griffin, London
Loève M (1977) Probability theory I, 4th edn. Springer, New York
Mood AM, Graybill FA, Boes DC (1974) Introduction to the theory of statistics, 3rd edn. McGraw-Hill, London
Roussas G (1997) A course in mathematical statistics, 2nd edn. Academic, Hardcover
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this entry
Cite this entry
Bener, A., Lovric, M. (2011). Variance. In: Lovric, M. (eds) International Encyclopedia of Statistical Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04898-2_634
Download citation
DOI: https://doi.org/10.1007/978-3-642-04898-2_634
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04897-5
Online ISBN: 978-3-642-04898-2
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering