Abstract
This paper presents a construction of confidence intervals for the common variance of normal distributions based on generalized confidence intervals, and then compares the results with a large sample approach. A Monte Carlo simulation was used to evaluate the coverage probability and average length of confidence intervals. Simulation studies showed that the generalized confidence interval approach provided much better confidence interval estimates than the large sample approach. Two real data examples are exhibited to illustrate our approaches.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The construction of confidence intervals for a normal variance are well known and simple to apply and attracted a great deal of attention from researchers. An investigation of the history and development of constructing confidence intervals for a normal variance was given in Cohen [1], and he constructed confidence intervals for the variance that had the same length as the usual minimum length interval but greater coverage probability. Analogously, Shorrock [2, 3] presented an improved interval based on Stein’s technique and a smooth version of Cohen’s interval using Brewster and Zidek’s technique. Stein-type improvements of confidence intervals for the normal variance with unknown mean were also obtained by Nagata [4]. Casella [5] constructed a class of intervals each of which improved both coverage probability and size over the usual interval. Lastly, Kubokawa [6] presented a unified approach to the variance estimation problem. There are many researchers that are also interested in the estimation of variance; see e.g., Shorrock and Zidek [7]. Sarkar [8] constructed the shortest confidence interval and Iliopoulos and Kourouklis [9] presented a stein-type interval for generalized variances.
The motivation of this paper comes from an analysis of variance (ANOVA), which are used to compare several means. Under the assumption of analysis of variance are normality, homogeneity of variance, and independence of errors. If the quantitative data of the sample n observations from k populations come from a different time or space and in experimental situations have repeated many times. In this case if the variances are homogeneous, what is the best way for construction the confidence interval estimation of common variance to obtain a single estimation? Therefore, interval estimation procedures regarding common variance of normal distributions are interesting.
The practical and theoretical of developing procedures for interval estimation of common variance based on several independent normal samples are important. Thus, the goal of this paper was to provide two approaches for the confidence interval estimation of common variance derived from several independent samples from normal distributions. The generalized confidence interval and large sample confidence interval concept will be used for the end evaluation. The approach is based on the concepts of generalized confidence intervals. The notions of generalized confidence intervals are proposed by Weerahandi [10]. The generalized confidence interval approach has been successfully used to construct the confidence interval for many common parameters and since then these ideas have been applied to solve many statistical problems, for example, Tian [11], Tian and Wu [12], Krishnamoorthy [13], and Ye et al. [14]. However, the generalized confidence interval approach used to construct these confidence interval estimations for the common variance are also interesting. To our knowledge, there are no previous works on inferences on common variance referring to normal distributions with a generalized confidence interval approach compared with the large sample approach.
The remainder of the paper is organized as follows. Section 2 introduces the basic properties of normal distribution. Section 3 presents the generalized variable approach developed and describes computational procedures. Section 4 presents simulation results to evaluate the performances of generalized confidence interval approach and the large sample approach on coverage probabilities and average lengths. Section 5 illustrates the proposed approaches with real examples. Finally, conclusions are given in Sect. 6.
2 Properties of Normal Distribution
If the random variable X follows the normal distribution, that is \(X\sim N \left( \mu ,\sigma ^{2}\right) \). The probability density function of X is given by
The maximum likelihood estimators (MLE) of \(\mu \) and \(\sigma ^{2}\) are \(\widehat{\mu }\) and \(\widehat{\sigma }^{2}\) respectively,
where \(\widehat{\mu } = \overline{X} = \dfrac{1}{n}\sum \limits ^{n}_{i=1}X_{i}\), \( \widehat{\sigma }^{2} = \dfrac{1}{n}\sum \limits ^{n}_{i=1}\left( X_{i}-\overline{X}\right) ^{2}\).
The estimator \(\widehat{\sigma }^{2}\) is the sample variance of the sample \(\left( X_{1}, X_{2},..., X_{n}\right) \). In practice, another estimator is often used instead of the \(\widehat{\sigma }^{2}\). This other estimator is denoted \(S^{2}\), and is also called the sample variance. The estimator \(S^{2}\) differs from \(\widehat{\sigma }^{2}\) by having \(\left( n-1\right) \) instead of n in the denominator
Thus \(S^{2} = \dfrac{n}{n-1}\widehat{\sigma }^{2} = \dfrac{1}{n-1}\sum \limits ^{n}_{i=1}\left( X_{i}-\overline{X}\right) ^{2}\).
The estimator \(S^{2}\) is an unbiased estimator of the underlying parameter \(\sigma ^{2}\), whereas \(\widehat{\sigma }^{2}\) is biased.
Theorem 1
Suppose \(X\sim N \left( \mu ,\sigma ^{2}\right) \), where \(\mu , \sigma ^{2}\) are respectively population mean and population variance of X. Then the estimator of \(\sigma ^{2}\) is \(\widehat{\sigma }^{2}= S^{2}\), the variance of \(\widehat{\sigma }^{2}\) is
Proof
Let \(X_{1}, X_{2},..., X_{n}\) be an independent and identically distributed random variables with mean \( \mu \) and variance \(\sigma ^{2}\), then \(\overline{X}\) and \( S^{2} \) are unbiased estimators of \( \mu \) and \( \sigma ^{2} \) :
where \(\overline{X} = \dfrac{1}{n}\sum \limits ^{n}_{i=1}x_{i}\), \(S^{2} = \dfrac{1}{n-1}\sum \limits ^{n}_{i=1}\left( x_{i}-\overline{X}\right) ^{2}\).
Also, by the LehmannScheff theorem the estimator \( S^{2} \) is uniformly minimum variance unbiased (UMVU). In finite samples both \( S^{2} \) and \(\widehat{\sigma }^{2}\) have scaled chi-squared distribution with \(\left( n-1\right) \) degrees of freedom
so, \( S^{2}\sim \dfrac{\sigma ^{2}}{n-1}\cdot \chi _{n-1}^{2}\), \(\widehat{\sigma }^{2}\sim \dfrac{\sigma ^{2}}{n}\cdot \chi _{n-1}^{2},\)
then sampling distribution of \( \dfrac{\left( n-1\right) S^{2}}{\sigma ^{2}} \) is chi-square with \( n-1 \) degrees of freedom. For the chi-square distribution, it turns out that the mean and variance are \( E\left( \chi ^{2}_{n-1}\right) = n-1, var\left( \chi ^{2}_{n-1}\right) = 2\left( n-1\right) . \)
We can use this to get the mean and variance of \( S^{2} \)
Hence,
3 The Confidence Interval Approaches of the Common Variance
3.1 The Generalized Confidence Interval Approach
The generalized confidence intervals (GCI) are based on the simulation of a known generalized pivotal quantity (GPQ). Weerahandi [10] introduced the concept of a generalized pivotal quantity for a parameter \(\theta \) as follows:
Suppose that \(X_{ij}\sim N \left( \mu _{i},\sigma ^{2}_{i}\right) \), for \(i = 1,2,...,k, j= 1,2,...,n_{i}\) are a random samples from a distribution which depends on a vector of parameters \(\underset{\sim }{\theta } = \left( \theta ,\underset{\sim }{\nu }\right) \) where \( \theta \) is the parameter of interest and \( \underset{\sim }{\nu } \) is a vector of nuisance parameters. A generalized pivot \( R\left( \underset{\sim }{X},\underset{\sim }{x},\theta ,\underset{\sim }{\nu }\right) \) for interval estimation, where \(\underset{\sim }{x}\) is an observed value of \( \underset{\sim }{X} \), as a random variable having the following two properties:
-
1.
\( R\left( \underset{\sim }{X},\underset{\sim }{x},\theta ,\underset{\sim }{\nu }\right) \) has a distribution free of the vector of nuisance parameters \( \underset{\sim }{\nu }\).
-
2.
The observed value of \( R\left( \underset{\sim }{X},\underset{\sim }{x},\theta ,\underset{\sim }{\nu }\right) \) is \( \theta \).
Let \( R_{\alpha } \) be the \( 100\alpha \)-th percentile of R. Then \( R_{\alpha } \) becomes the \( 100\left( 1-\alpha \right) \)% lower bound for \( \theta \) and \(\left( R_{\alpha /2}, R_{1-\alpha /2}\right) \) becomes a \( 100\left( 1-\alpha \right) \)% two-side generalized confidence interval for \( \theta \).
Generalized Variable Approach. Consider k independent normal populations with a common variance \( \theta \). Let \(X_{i1}, X_{i2},..., X_{in_{i}}\) be a random sample from the i-th normal population as follows:
Thus \(\theta = \sigma ^{2}_{i}\).
Let \( S_{i}^{2}\) denote the sample variance for data \( X_{ij} \) for the i-th sample and let \( s_{i}^{2}\) denote the observed sample variance respectively. From
so, \(\sigma _{i}^{2} = \dfrac{\left( n_{i}-1\right) S^{2}_{i}}{V_{i}} whereV_{i} \sim \chi ^{2}_{n_{i}-1}\).
where \( V_{i} \) is \(\chi ^{2}\) variates with degrees of freedom and \(n_{i}-1\), we have the generalized pivot
The generalized pivotal quantity for estimating \( \theta \) based on the i-th sample is
From the i-th sample, the maximum likelihood estimator of \( \theta \) is
The large sample variance for \(\widehat{\theta }^{\left( i\right) }\) is
The generalized pivotal quantity, we propose for the common variance \( \theta \) is a weighted average of the generalized pivot \( R_{\theta }^{\left( i\right) } \) based on k individual samples as; see, Ye et al. [14],
where
That is, \(R_{{\text {var} \left( {\hat{\theta }^{{(i)}} } \right) }}\) is \(\text {var} (\hat{\theta }^{{(i)}} )\) with \(\sigma _{i}^{2}\) replaced by \(R_{{\sigma _{i}^{2} }}\).
Computing Algorithms. For a given data set \(X_{{ij}}\) for \(i = 1,2, \ldots ,k,\,\, j = 1,2, \ldots ,n_{i}\), the generalized confidence intervals for \(\theta \) can be computed by the following steps.
-
1.
Compute \(\bar{x}_{i}\) and \(s_{i}^{2}\) for \(i = 1,2, \ldots ,k\).
-
2.
Generate \(V_{i} \sim \chi _{{n_{i} - 1}}^{2}\) and then calculate \(R_{{\sigma _{i}^{2} }}\) from (1) for \(i = 1,2, \ldots ,k\).
-
3.
Calculate \(R_{\theta }^{{(i)}}\) from (2) for \(i = 1,2, \ldots ,k\).
-
4.
Repeat steps 2, calculate \(R_{{w_{i} }}\) from (6) and (7 8) for \(i = 1,2, \ldots ,k\).
-
5.
Compute \(R_{\theta }\) following (5).
-
6.
Repeat step 2–5 a total m times and obtain an array of \(R_{\theta }\)’s.
-
7.
Rank this array of \(R_{\theta }\)’s from small to large.
The \(100\alpha \)-th percentile of \(R_{\theta }\)’s, \(R_{\theta } (\alpha )\), is an estimate of the lower bound of the one - sided \(100(1-\alpha )\%\) confidence interval and \(\left( R_{\theta }(\alpha /2),R_{\theta } (1-\alpha /2)\right) \) is a two - sided \(100(1-\alpha )\%\) confidence interval.
3.2 The Large Sample Approach
The large sample estimate of normal variance is a pooled estimate of the common normal variance defined as
where \(\widehat{\theta }^{\left( i\right) }\) is defined in (3) and \( var\left( \widehat{\theta }^{\left( i\right) }\right) \) is an estimate of \( var\left( \widehat{\theta }^{\left( i\right) }\right) \) in (4) with \( \sigma _{i}^{2} \) replaced by \(s_{i}^{2}\), respectively.
Hence, the large sample solution for confidence interval estimation is
Computing Algorithms. For a given data set \( X_{ij} \) for \( i = 1,2,...,k,\,\, j= 1,2,...,n_{i}\), the generalized confidence intervals for \( \theta \) can be computed by the following steps.
4 Simulation Studies
A simulation study was performed to estimate the coverage probabilities and average lengths of the common variance of the normal distributions for various combinations of the number of samples \(k=2\) and \(k=6\), the sample sizes \( n_{1}=...=n_{k}=n \), the values used for sample sizes were 10, 30, 50,100 and 200 the population mean of normal data within each sample 1, and the population standard deviation \( \sigma \) = 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 1.00 and 2.00. In this simulation study, we compared two methods, comprising of our proposed procedure generalized confidence interval approach and the large sample approach. For each parameter setting, 5000 random samples were generated, 2500 \(R_{\theta }\)’s were obtained for each of the random samples.
Tables 1 and 2 present the coverage probabilities and average lengths for 2 and 6 sample cases respectively. In 2 and 6 sample cases, the generalized confidence interval approach and the large sample approach provide the underestimates coverage probabilities for most of the scenarios, especially when the sample size is small. Additionally, the coverage probabilities of the generalized confidence interval approach are better than the large sample approach for all sample sizes, especially when the sample size is small. In overall, the generalized confidence interval approach and the large sample approach have the coverage probabilities close to the nominal level when the sample size increases. In this case, there is no need to see the average lengths from two intervals since the large sample approach provide the coverage probability below the generalized confidence interval approach for almost cases. Finally, it was discovered that the generalized confidence interval approach provided much better results over the large sample approach in terms of coverage probabilities.
5 An Empirical Application
In this section, two real data examples are exhibited to illustrate the generalized confidence interval approach and the large sample approach. The first data set compares two different procedures for the shear strength for steel plate girders. Data for nine girders for two of these procedures, Karlsruhe method and Lehigh method [15]. Using the data from Table 3, for test the hypothesis of equal mean treatment effects. Under the assumption are normality, homogeneity of variance, and independence of errors. The Shapiro - Wilk normality test indicate that the two sets of data come from normal populations and the variances were homogeneous by Levene’s test. The sample variances of the normal data were 0.0213 and 0.0024 for Karlsruhe method and Lehigh method respectively. Using the generalized confidence interval approach, the generalized confidence interval for the overall variance was (0.0012, 0.0104) with the length of interval 0.0092. In comparison, the confidence interval by the large sample approach was (0.0003, 0.0050) with the length of interval 0.0047.
The second example was blood sugar levels (mg/100g) measured from ten animals of five different breeds [16]. The results are presented in Table 4, for test the hypothesis of equality of means for the five breeds. The data on the five set were tested from normal populations by Shapiro - Wilk normality test and the variances were homogeneous by Levene’s test. The sample variances of the normal data were 84.0000, 124.6667, 126.5444, 101.1111 and 173.1667 for breeds A, B, C, D and E respectively. Using the generalized confidence interval approach, the generalized confidence interval for the overall variance was (56.9113, 156.9829) with the length of interval 100.0716. In comparison, the confidence interval by the large sample approach was (62.6062, 155.0373) with the length of interval 92.43106.
6 Discussion and Conclusions
This paper has presented a simple approach to construct confidence intervals for the common variance of normal distributions. The proposed confidence intervals were constructed by two approaches, the generalized confidence interval and large sample approaches. The generalized confidence interval approach provided coverage probability close to nominal level 0.95 and is better than the large sample approach for all sample sizes. The average lengths increased when the value of \(\sigma \) increased for both approaches. The results indicated that the confidence interval for the common variance of normal distributions based on the generalized confidence interval approach is better than confidence interval based on the large sample approach. In conclusion, the generalized confidence interval can be successfully used to estimate the common variance of normal distributions. This conclusion supports the research papers of Tian [11], Tian and Wu [12], Krishnamoorthy [13] and Ye et al. [14].
References
Cohen, A.: Improved confidence intervals for the variance of a normal distribution. J. Am. Stat. Assoc. 67, 382–387 (1972)
Shorrock, G.: A minimax generalized bayes confidence interval for a normal variance. Ph.D. dissertation, Dept. Statistics, Rutgers Univ (1982)
Shorrock, G.: Improved confidence intervals for a normal variance. Ann. Stat. 18, 972–980 (1990)
Nagata, Y.: Improvements of interval estimations for the variance and the ratio of two variances. J. Jpn. Stat. Soc. 19, 151–161 (1989)
Casella, G., Goutis, C.: Improved invariant confidence intervals for a normal variance. Ann. Statist. 19, 2015–2031 (1991)
Kubokawa, T.: A unified approach to improving equivariant estimators. Ann. Stat. 22, 290–299 (1994)
Shorrock, R.W., Zidek, J.V.: An improved estimator of the generalized variance. Ann Stat. 4(3), 629–638 (1976)
Sarkar, S.K.: On improving the shortest length confidence interval for the generalized variance. J. Multivar. Anal. 31, 136–147 (1989)
Iliopoulos, G., Kourouklis, S.: On improved interval estimation for the generalized variance. J. Stat. Plan Infer. 66, 305–320 (1998)
Weerahandi, S.: Generalized confidence intervals. J. Am. Stat. Assoc. 88, 899–905 (1993)
Tian, L.: Inferences on the common coefficient of variation. Stat. Med. 24, 2213–2220 (2005)
Tian, L., Wu, J.: Inferences on the common mean of several log-normal populations: the generalized variable approach. Biometrical J. 49, 944–951 (2007)
Krishnamoorthy, K., Lu, Y.: Inference on the common means of several normal populations based on the generalized variable method. Biometrics 59, 237–247 (2003)
Ye, R.D., et al.: Inferences on the common mean of several inverse gaussian populations. Comput. Stat. Data Anal. 54, 906–915 (2010)
Montgomery, C.D.: Design and Analysis of Experiments, p. 57. Wiley, New York (2001)
Rencher, A.C., Schaalje, B.G.: Linear Models in Statistics, p. 373. Wiley, New Jersey (2008)
Acknowledgments
The first author gratefully acknowledges the financial support from Faculty of Applied Science and Graduate College of King Mongkuts University of Technology North Bangkok of Thailand.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Smithpreecha, N., Niwitpong, SA., Niwitpong, S. (2016). Confidence Intervals for Common Variance of Normal Distributions. In: Huynh, VN., Inuiguchi, M., Le, B., Le, B., Denoeux, T. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2016. Lecture Notes in Computer Science(), vol 9978. Springer, Cham. https://doi.org/10.1007/978-3-319-49046-5_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-49046-5_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49045-8
Online ISBN: 978-3-319-49046-5
eBook Packages: Computer ScienceComputer Science (R0)