Analysis of Medical Data Using Interval Estimators for Common Mean of Gaussian Distributions with Unknown Coefficients of Variation

Thangjai, Warisa; Niwitpong, Sa-Aat; Niwitpong, Suparat

doi:10.1007/978-3-030-98018-4_13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13199))

Included in the following conference series:

International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making

557 Accesses

Abstract

The common mean of Gaussian distributions is a parameter of interest when analyzing medical data. In practice, the population coefficient of variation (CV) is unknown because the population mean and variance are unknown. In this study, the common mean of Gaussian distributions with unknown CVs is considered and four new interval estimators for it using generalized confidence interval (GCI), large sample (LS), adjusted method of variance estimates recovery (adjusted MOVER), and standard bootstrap (SB) approaches are proposed. Furthermore, the proposed interval estimators are compared with a previously reported one based on the GCI approach. Monte Carlo simulation was used to evaluate the performances of the interval estimators based on their coverage probabilities and average lengths, while, medical datasets were used to illustrate the efficacy of these approaches. Our findings show that the interval estimator based on the GCI approach for the common mean of Gaussian distributions with unknown CVs provided the best performance in terms of coverage probability for all sample sizes. However, the adjusted MOVER and SB approaches can be considered as an alternative when the sample size is large ($n_{i} \ge $ 100).

Access provided by Autonomous University of Puebla. Download conference paper PDF

Confidence Intervals for the Difference Between the Coefficients of Variation of Inverse Gaussian Distributions

Generalized Classes of Regression-Cum-Ratio Estimators of Population Mean in Stratified Random Sampling

Article 20 June 2019

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Article Open access 19 December 2014

Keywords

1 Introduction

The population coefficient of variation (CV), which is free from a unit of measurement, is defined as the ratio of the population standard deviation to the population mean, $\tau =\sigma /\mu $, and has been widely applied in many fields, e.g., agriculture, biology, and environmental and physical sciences. Estimating a known CV has been suggested by many scholars. For example, Gerig and Sen [1] used Canadian migratory bird survey data from 1969 and 1970 while assuming that the CV for each province was known. Meanwhile, estimating the mean of Gaussian distributions with a known CV has also been studied extensively (e.g., Searls [2] and Niwitpong [3]). However, the CV is unknown when the population mean and variance have been estimated and thus, needs to be estimated too. Srivastava [4] and Srivastava [5] proposed an estimator for the normal population mean with an unknown CV and indicated that it is more efficient than a previously reported sample mean estimator. He later presented a uniformly minimum variance unbiased estimator of the efficiency ratio and compared its usefulness to estimate an unknown CV with an existing estimator (Srivastava and Singh [6]). Sahai [7] provided an estimator for the normal mean with unknown CV and studied it along the same lines as Srivastava [4] and Srivastava [5] estimators. Meanwhile, Thangjai et al. [8] presented confidence intervals for the normal mean and the difference between two normal means with unknown CVs. In addition, Thangjai et al. [9] proposed the Bayesian confidence intervals for means of normal distributions with unknown CVs.

In practice, samples are collected at different time points, and the problem of estimating common parameters under these circumstances has been widely studied by several researchers. Krishnamoorthy and Lu [10] proposed the generalized variable approach for inference on the common mean of normal distributions. Lin and Lee [11] developed a new generalized pivotal quantity based on the best linear unbiased estimator for constructing confidence intervals for the common mean of normal distributions. Tian [12] presented procedures for inference on the common CV of normal distributions. Tian and Wu [13] provided the generalized variable approach for inference on the common mean of log-normal distributions. Thangjai et al. [14] investigated a new confidence interval for the common mean of normal distributions using the adjusted method of variance estimates recovery (adjusted MOVER) approach. Finally, the estimator of Srivastava [4] is well established for constructing confidence intervals for the common mean of Gaussian distributions with unknown CVs.

Interval estimators for the common mean of Gaussian distributions with unknown CVs have been proposed in several medical science studies, such as the common percentage of albumin in human plasma proteins from four sources (Jordan and Krishnamoorthy [15]) and quality assurance in medical laboratories for the diagnostic determinations of hemoglobin, red blood cells, the mean corpuscular volume, hematocrit, white blood cells, and platelets in normal and abnormal blood samples (Tian [12] and Fung and Tsang [16]).

Herein, the concepts in Thangjai et al. [8] and Thangjai and Niwitpong [17] are extended to k populations to construct new interval estimators for the common mean of Gaussian distributions with unknown CVs. The approaches to construct these interval estimators: the generalized confidence interval (GCI), large sample (LS), adjusted MOVER, and standard bootstrap (SB) are compared with the GCI approach of Lin and Lee [11]. The GCI approach first introduced by Weerahandi [18] has been used successfully to construct interval estimators (e.g., Krishnamoorthy and Lu [10], Lin and Lee [11], Tian [12], Tian and Wu [13], Ye et al. [19]). The LS approach using the central limit theorem (along with a GCI approach) was first proposed by Tian and Wu [13] to construct confidence intervals for the common mean of log-normal distributions. The adjusted MOVER approach motivated by Zou and Donner [20] and Zou et al. [21] was extended by Thangjai et al. [14] and Thangjai and Niwitpong [17] to construct an interval estimator for a common parameter.

2 Preliminaries

In this section, the lemma and theorem are explained to estimate the interval estimators for common Gaussian mean with unknown CVs.

Let $X = (X_{1},X_{2},...,X_{n})$ be a random variable from the Gaussian distribution with mean $\mu $ and variance $\sigma ^{2}$. The population CV is $\tau = \sigma /\mu $. Let $\bar{X}$ and $S^{2}$ be sample mean and sample variance for X, respectively. The CV estimator is $\hat{\tau } = S/\bar{X}$.

Following Srivastava [4] and Thangjai et al. [8], the Gaussian mean estimator when the CV is unknown, $\hat{\theta }$, is

$$\begin{aligned} \hat{\theta } = \frac{n\bar{X}}{n+\frac{S^{2}}{\bar{X}^{2}}}. \end{aligned}$$

(1)

According to Thangjai et al. [8], the mean and variance of the mean estimator with unknown CV are

$$\begin{aligned} E\left( \hat{\theta }\right) =&\left( \frac{\mu }{1+\left( \frac{\sigma ^{2}}{n\mu ^{2}+\sigma ^{2}}\right) \left( 1+\frac{2\sigma ^{4}+4n\mu ^{2}\sigma ^{2}}{\left( n\mu ^{2}+\sigma ^{2}\right) ^{2} }\right) }\right) \nonumber \\ *&\left( 1+\frac{\left( \frac{n\sigma ^{2}}{n\mu ^{2}+\sigma ^{2}}\right) ^{2}\left( \frac{2}{n}+\frac{2\sigma ^{4}+4n\mu ^{2}\sigma ^{2}}{\left( n\mu ^{2}+\sigma ^{2}\right) ^{2}}\right) }{\left( n+\left( \frac{n\sigma ^{2}}{n\mu ^{2}+\sigma ^{2}}\right) \left( 1+\frac{2\sigma ^{4}+4n\mu ^{2}\sigma ^{2}}{\left( n\mu ^{2}+\sigma ^{2}\right) ^{2}}\right) \right) ^{2}}\right) \end{aligned}$$

(2)

and

$$\begin{aligned} Var\left( \hat{\theta }\right) =&\left( \frac{\mu }{1+\left( \frac{\sigma ^{2}}{n\mu ^{2}+\sigma ^{2}}\right) \left( 1+\frac{2\sigma ^{4}+4n\mu ^{2}\sigma ^{2}}{\left( n\mu ^{2}+\sigma ^{2}\right) ^{2}}\right) }\right) ^{2}\nonumber \\ *&\left( \frac{\sigma ^{2}}{n\mu ^{2}}+\frac{\left( \frac{n\sigma ^{2}}{n\mu ^{2}+\sigma ^{2}}\right) ^{2}\left( \frac{2}{n}+\frac{2\sigma ^{4}+4n\mu ^{2}\sigma ^{2}}{\left( n\mu ^{2}+\sigma ^{2}\right) ^{2}}\right) }{\left( n+\left( \frac{n\sigma ^{2}}{n\mu ^{2}+\sigma ^{2}}\right) \left( 1+\frac{2\sigma ^{4}+4n\mu ^{2}\sigma ^{2}}{\left( n\mu ^{2}+\sigma ^{2}\right) ^{2} }\right) \right) ^{2}}\right) . \end{aligned}$$

(3)

Consider k independent Gaussian distributions with a common mean with unknown CVs. Let $X_{i} = (X_{i1},X_{i2},...,X_{in_{i}})$ be a random variable from the i-th Gaussian distribution with the common mean $\mu $ and possibly unequal variances $\sigma ^{2}_{i}$ as follows: $X_{ij} \sim N(\mu ,\sigma ^{2}_{i})$; $i = 1,2,...,k$, $j = 1,2,...,n_{i}$.

For the i-th sample, let $\bar{X}_{i}$ and $\bar{x}_{i}$ be sample mean and observed sample mean of $X_{i}$, respectively. And let $S^{2}_{i}$ and $s^{2}_{i}$ be sample variance and observed sample variance of $X_{i}$, respectively. According to Thangjai et al. [8], the estimator of Srivastava [4] is well established. The estimator is given by

$$\begin{aligned} \hat{\theta }_{i} = \frac{n_{i}\bar{X}_{i}}{n_{i}+\frac{S^{2}_{i}}{(\bar{X}_{i})^{2}}}; i=1,2,...,k. \end{aligned}$$

(4)

This paper is interested in constructing confidence intervals for the common Gaussian mean with unknown CVs, based on Graybill and Deal [22], defined as follows:

$$\begin{aligned} \hat{\theta } = {\sum \limits ^{k}_{i=1}\frac{\hat{\theta }_{i}}{\widetilde{Var}\left( \hat{\theta }_{i}\right) }}\Bigg /{\sum \limits ^{k}_{i=1}\frac{1}{\widetilde{Var}\left( \hat{\theta }_{i}\right) }}, \end{aligned}$$

(5)

where $\widetilde{Var}\left( \hat{\theta }_{i}\right) $ denotes the estimator of $Var\left( \hat{\theta }_{i}\right) $ which is defined in Eq. (3) with $\mu _{i}$ and $\sigma ^{2}_{i}$ replaced by $\bar{x}_{i}$ and $s^{2}_{i}$, respectively.

2.1 GCI

Definition 1

Let $X = (X_{1},X_{2},...,X_{n})$ be a random variable from a distribution $F(x|\delta )$, where $x = (x_{1},x_{2},...,x_{n})$ be an observed sample, $\delta = (\theta ,\nu )$ is a unknown parameter vector, $\theta $ is a parameter of interest, and $\nu $ is a nuisance parameters. Let $R = R(X;x,\delta )$ be a function of X, x and $\delta $. The random quantity R is called a generalized pivotal quantity if it satisfies the following two properties; see Weerahandi [18]:

(i)
The probability distribution of R is free of unknown parameters.
(ii)
The observed value of R does not depend on the vector of nuisance parameters.

The $100(\alpha /2)$-th and $100(1-\alpha /2)$-th percentiles of R are the lower and upper limits of $100(1-\alpha )\%$ two-sided GCI.

Following Thangjai et al. [8], the generalized pivotal quantities of $\sigma ^{2}_{i}$, $\mu _{i}$, and $\theta _{i}$ based on the i-th sample are defined as follows:

$$\begin{aligned} R_{\sigma ^{2}_{i}} = \frac{\left( n_{i}-1\right) s^{2}_{i}}{V_{i}}. \end{aligned}$$

(6)

$$\begin{aligned} R_{\mu _{i}} = \bar{x}_{i}-\frac{Z_{i}}{\sqrt{U_{i}}}\sqrt{\frac{\left( n_{i}-1\right) s^{2}_{i}}{n_{i}}} \end{aligned}$$

(7)

and

$$\begin{aligned} R_{\theta _{i}} = \frac{n_{i}R_{\mu _{i}}}{n_{i}+\frac{R_{\sigma ^{2}_{i}}}{(R_{\mu _{i}})^{2}}}, \end{aligned}$$

(8)

where $V_{i}$ denotes a chi-squared distribution with $n_{i}-1$ degrees of freedom, $Z_{i}$ denotes a standard normal distribution, and $U_{i}$ denotes a chi-squared distribution with $n_{i}-1$ degrees of freedom.

According to Tian and Wu [13], the generalized pivotal quantity for the common Gaussian mean with unknown CVs is a weighted average of the generalized pivotal quantity. That is given by

$$\begin{aligned} R_{\theta } = {\sum \limits ^{k}_{i=1}\frac{R_{\theta _{i}}}{R_{Var(\hat{\theta }_{i})}}}\Bigg /{\sum \limits ^{k}_{i=1}\frac{1}{R_{Var(\hat{\theta }_{i})}}}, \end{aligned}$$

(9)

where $R_{Var(\hat{\theta }_{i})}$ is defined in Eq. (3) with $\mu _{i}$ and $\sigma ^{2}_{i}$ replaced by $R_{\mu _{i}}$ and $R_{\sigma ^{2}_{i}}$, respectively.

Hence, the $R_{\theta }$ is the generalized pivotal quantity for $\theta $ and is satisfied the conditions (i) and (ii) in Definition 1. Then the common Gaussian mean with unknown CVs can be constructed from $R_{\theta }$.

Therefore, the $100(1-\alpha )\%$ two-sided confidence interval for the common Gaussian mean with unknown CVs based on the GCI approach is

$$\begin{aligned} CI_{GCI} = [L_{GCI},U_{GCI} ] = [R_{\theta }\left( \alpha /2\right) ,R_{\theta }\left( 1-\alpha /2\right) ], \end{aligned}$$

(10)

where $R_{\theta }\left( \alpha /2\right) $ and $R_{\theta }\left( 1-\alpha /2\right) $ denote the $100(\alpha /2)$-th and $100(1-\alpha /2)$-th percentiles of $R_{\theta }$, respectively.

2.2 LS Confidence Interval

According to Graybill and Deal [22] and Tian and Wu [13], the LS estimate of the Gaussian mean with unknown CV is a pooled estimated estimator of the Gaussian mean with unknown CV defined as in Eq. (5), where $\hat{\theta }_{i}$ is defined in Eq. (4) and $\widetilde{Var}\left( \hat{\theta }_{i}\right) $ denotes the estimator of $Var\left( \hat{\theta }_{i}\right) $ which is defined in Eq. (3) with $\mu _{i}$ and $\sigma ^{2}_{i}$ replaced by $\bar{x}_{i}$ and $s^{2}_{i}$, respectively.

The distribution of $\hat{\theta }$ is approximately Gaussian distribution when the sample size is large. Then the quantile of the Gaussian distribution is used to construct confidence interval for $\theta $. Therefore, the $100(1-\alpha )\%$ two-sided confidence interval for the common Gaussian mean with unknown CVs based on the LS approach is

$$\begin{aligned} CI_{LS}= & {} [L_{LS},U_{LS}] \nonumber \\= & {} [\hat{\theta }-z_{1-\alpha /2}\sqrt{1\Bigg /{\sum \limits ^{k}_{i=1}\frac{1}{\widetilde{Var}\left( \hat{\theta }_{i}\right) }}},\hat{\theta }+z_{1-\alpha /2}\sqrt{1\Bigg /{\sum \limits ^{k}_{i=1}\frac{1}{\widetilde{Var}\left( \hat{\theta }_{i}\right) }}}], \end{aligned}$$

(11)

where $z_{1-\alpha /2}$ denotes the $(1-\alpha /2)$-th quantile of the standard normal distribution.

2.3 Adjusted MOVER Confidence Interval

Now recall that Z is a standard normal distribution with the mean 0 and variance 1, defined as follows:

$$\begin{aligned} Z = \frac{\bar{X}-\mu }{\sqrt{\widetilde{Var}\left( \hat{\theta }\right) }} \sim N(0,1). \end{aligned}$$

(12)

The confidence interval for mean of Gaussian distribution is

$$\begin{aligned} CI_{\mu } = [l,u] = [\bar{x}-z_{1-\alpha /2}\sqrt{\widetilde{Var}\left( \hat{\theta }\right) },\bar{x}+z_{1-\alpha /2}\sqrt{\widetilde{Var}\left( \hat{\theta }\right) }]. \end{aligned}$$

(13)

For $i=1,2,...,k$, the lower limit ($l_{i}$) and upper limit ($u_{i}$) for the normal mean $\mu _{i}$ based on the i-th sample can be defined as

$$\begin{aligned} l_{i} = \bar{x}_{i}-z_{1-\alpha /2}\sqrt{\widetilde{Var}\left( \hat{\theta }_{i}\right) } \end{aligned}$$

(14)

and

$$\begin{aligned} u_{i} = \bar{x}_{i}+z_{1-\alpha /2}\sqrt{\widetilde{Var}\left( \hat{\theta }_{i}\right) }, \end{aligned}$$

(15)

where $\widetilde{Var}\left( \hat{\theta }_{i}\right) $ denotes the estimator of $Var\left( \hat{\theta }_{i}\right) $ which is defined in Eq. (3) and $z_{1-\alpha /2}$ denotes the $(1-\alpha /2)$-th quantile of the standard normal distribution.

According to Thangjai et al. [14] and Thangjai and Niwitpong [17], the common mean with unknown CVs is weighted average of the mean with unknown CV $\hat{\theta }_{i}$ based on k individual samples. The common mean with unknown CVs has the following form

$$\begin{aligned} \hat{\theta } = {\sum \limits ^{k}_{i=1}\frac{\hat{\theta }_{i}}{\widehat{Var}\left( \hat{\theta }_{i}\right) }}\Bigg /{\sum \limits ^{k}_{i=1}\frac{1}{\widehat{Var}\left( \hat{\theta }_{i}\right) }}, \end{aligned}$$

(16)

where $\hat{\theta }_{i}$ is defined in Eq. (4), $\widehat{Var}\left( \hat{\theta }_{i}\right) =\frac{1}{2}(\frac{(\hat{\theta }_{i}-l_{i})^{2}}{z^{2}_{\alpha /2}}+\frac{(u_{i}-\hat{\theta }_{i})^{2}}{z^{2}_{\alpha /2}})$, and $l_{i}$ and $u_{i}$ are defined in Eqs. (14) and (15), respectively.

Therefore, the $100(1-\alpha )\%$ two-sided confidence interval for the common Gaussian mean with unknown CVs based on the adjusted MOVER approach is

$$\begin{aligned} CI_{AM}= & {} [L_{AM},U_{AM}] \nonumber \\= & {} [\hat{\theta }-z_{1-\alpha /2}\sqrt{1\Bigg /\sum \limits ^{k}_{i=1}\frac{z^{2}_{\alpha /2}}{(\hat{\theta }_{i}-l_{i})^{2}}},\hat{\theta }+z_{1-\alpha /2}\sqrt{1\Bigg /\sum \limits ^{k}_{i=1}\frac{z^{2}_{\alpha /2}}{(u_{i}-\hat{\theta }_{i})^{2}}}], \end{aligned}$$

(17)

where $\hat{\theta }$ is defined in Eq. (16), and $z_{\alpha /2}$ and $z_{1-\alpha /2}$ denote the $(\alpha /2)$-th and $(1-\alpha /2)$-th quantiles of the standard normal distribution, respectively.

2.4 SB Confidence Interval

Let $X^{*}_{i} = (X^{*}_{i1},X^{*}_{i2},...,X^{*}_{in_{i}})$ be a bootstrap sample with replacement from $X_{i} = (X_{i1},X_{i2},...,X_{in_{i}})$ and let $\bar{X}^{*}_{i}$ and $S^{2*}_{i}$ be mean and variance of $X^{*}_{i}$, respectively. Let $x^{*}_{i} = (x^{*}_{i1},x^{*}_{i2},...,x^{*}_{in_{i}})$ be an observed value of $X^{*}_{i} = (X^{*}_{i1},X^{*}_{i2},...,X^{*}_{in_{i}})$ and let $\bar{x}^{*}_{i}$ and $s^{2*}_{i}$ be mean and variance of $x^{*}_{i}$, respectively. The estimates of $\hat{\theta }^{*}_{i}$ and $Var(\hat{\theta }^{*}_{i})$ are

$$\begin{aligned} \hat{\theta }^{*}_{i} = \frac{n_{i}\bar{X}^{*}_{i}}{n_{i}+\frac{S^{2*}_{i}}{({\bar{X}^{*}_{i})^{2}}}} \end{aligned}$$

(18)

and

$$\begin{aligned} Var\left( \hat{\theta }^{*}_{i}\right) =&\left( \frac{\mu ^{*}_{i}}{1+\left( \frac{\sigma ^{2*}_{i}}{n_{i}(\mu ^{*}_{i})^{2}+\sigma ^{2*}_{i}}\right) \left( 1+\frac{2\sigma ^{4*}_{i}+4n_{i}(\mu ^{*}_{i})^{2}\sigma ^{2*}_{i}}{\left( n_{i}(\mu ^{*}_{i})^{2}+\sigma ^{2*}_{i}\right) ^{2} }\right) }\right) ^{2}\nonumber \\ *&\left( \frac{\sigma ^{2*}_{i}}{n_{i}(\mu ^{*}_{i})^{2}}+\frac{\left( \frac{n_{i}\sigma ^{2*}_{i}}{n_{i}(\mu ^{*}_{i})^{2}+\sigma ^{2*}_{i}}\right) ^{2}\left( \frac{2}{n_{i}}+\frac{2\sigma ^{4*}_{i}+4n_{i}(\mu ^{*}_{i})^{2}\sigma ^{2*}_{i}}{\left( n_{i}(\mu ^{*}_{i})^{2}+\sigma ^{2*}_{i}\right) ^{2}}\right) }{\left( n_{i}+\left( \frac{n_{i}\sigma ^{2*}_{i}}{n_{i}(\mu ^{*}_{i})^{2}+\sigma ^{2*}_{i}}\right) \left( 1+\frac{2\sigma ^{4*}_{i}+4n_{i}(\mu ^{*}_{i})^{2}\sigma ^{2*}_{i}}{\left( n_{i}(\mu ^{*}_{i})^{2}+\sigma ^{2*}_{i}\right) ^{2}}\right) \right) ^{2}}\right) . \end{aligned}$$

(19)

According to Graybill and Deal [22], the common Gaussian mean with unknown CVs is a pooled estimated unbiased estimator of the Gaussian mean with unknown CVs based on k individual samples. The common Gaussian mean with unknown CVs is defined by

$$\begin{aligned} \hat{\theta }^{*} = {\sum \limits ^{k}_{i=1}\frac{\hat{\theta }^{*}_{i}}{\widetilde{Var}\left( \hat{\theta }^{*}_{i}\right) }}\Bigg /{\sum \limits ^{k}_{i=1}\frac{1}{\widetilde{Var}\left( \hat{\theta }^{*}_{i}\right) }}, \end{aligned}$$

(20)

where $\hat{\theta }^{*}_{i}$ is defined in Eq. (18) and $\widetilde{Var}\left( \hat{\theta }^{*}_{i}\right) $ is the estimator of $Var\left( \hat{\theta }^{*}_{i}\right) $ which is defined in Eq. (19) with $\mu _{i}^{*}$ and $\sigma ^{2*}_{i}$ replaced by $\bar{x}^{*}_{i}$ and $s^{2*}_{i}$, respectively.

The B bootstrap statistics are used to construct the sampling distribution for estimating the confidence interval for the common Gaussian mean with unknown CVs. Therefore, the $100(1-\alpha )\%$ two-sided confidence interval for the common Gaussian mean with unknown CVs based on the SB approach is

$$\begin{aligned} CI_{SB} = [L_{SB},U_{SB}] = [\bar{\hat{\theta ^*}}-z_{1-\alpha /2}S_{\hat{\theta }^{*}},\bar{\hat{\theta ^*}}+z_{1-\alpha /2}S_{\hat{\theta }^{*}}], \end{aligned}$$

(21)

where $\bar{\hat{\theta ^*}}$ and $S_{\hat{\theta }^{*}}$ are the mean and standard deviation of $\hat{\theta }^{*}$ defined in Eq. (20) and $z_{1-\alpha /2}$ denotes the $100(1-\alpha /2)$-th percentile of the standard normal distribution.

Next, we briefly review the GCI of Lin and Lee [11] for the common mean of Gaussian distributions. The generalized pivotal quantity based on the best linear un-biased estimator for the common Gaussian mean $\mu $ is

$$\begin{aligned} R_{\mu } = \frac{\sum \limits ^{k}_{i=1} \frac{n_{i}\bar{x}_{i}U_{i}}{v_{i}}-Z\sqrt{\sum \limits ^{k}_{i=1} \frac{n_{i}U_{i}}{v_{i}}}}{\sum \limits ^{k}_{i=1} \frac{n_{i}U_{i}}{v_{i}}}, \end{aligned}$$

(22)

where Z denotes the standard normal distribution, $U_{i}$ denotes a chi-squared distribution with $n_{i}-1$ degrees of freedom, and $v_{i}=(n_{i}-1)s^{2}_{i}$.

Therefore, the $100(1-\alpha )\%$ two-sided confidence interval for the common Gaussian mean based on the GCI approach of Lin and Lee [20] is

$$\begin{aligned} C_{LL} = [L_{LL},U_{LL}]=[R_{\mu }(\alpha /2),R_{\mu }(1-\alpha /2)], \end{aligned}$$

(23)

where $R_{\mu }(\alpha /2)$ and $R_{\mu }(1-\alpha /2)$ denote the $100(\alpha /2)$-th and $100(1-\alpha /2)$-th percentiles of $R_{\mu }$, respectively.

3 Simulation Studies

Monte Carlo simulation was used to estimate the coverage probabilities (CPs) and the average lengths (ALs) of all confidence intervals; those constructed via the GCI, LS, adjusted MOVER, and SB approaches are denoted as $CI_{GCI}$, $CI_{LS}$, $CI_{AM}$, and $CI_{SB}$, respectively, while the GCI of Lin and Lee [20] is denoted as $CI_{LL}$. The CP of the $100(1-\alpha )\%$ confidence level is $c\pm z_{\alpha /2}\sqrt{\frac{c(1-c)}{M}}$, where c is the nominal confidence level and M is the number of simulation runs. At the 95% confidence level, the best performing confidence interval will have a CP in the range [0.9440,0.9560] with the shortest AL.

Each confidence interval was evaluated at the nominal confidence level of 0.95. The number of populations $k =$ 2; and the sample sizes within each population $n_{1}$ and $n_{2}$ were given in the following table. Without loss of generality (Thangjai et al. [8]), the common mean of Gaussian data within each population was $\mu =$ 1.0. The population standard deviations were set at $\sigma _{1} =$ 0.5, 1.0, 1.5, 2.0 and $\sigma _{2} =$ 1.0. The CVs were computed by $\tau _{i}=\sigma _{i}/\mu $, where $i = 1,2$. Hence, the ratio of $\tau _{1}$ to $\tau _{2}$ was reduced to $\sigma _{1}/\sigma _{2}$.

The result of simulations with the number of simulation runs $M =$ 5,000 is reported in Table 1. Only $CI_{GCI}$ obtained CPs greater than 0.95 in all cases whereas those of $CI_{LS}$, $CI_{AM}$, $CI_{SB}$, and $CI_{LL}$ were under 0.95. However, the CPs of $CI_{LS}$, $CI_{AM}$, and $CI_{SB}$ increased and became close to 0.95 when the sample size was increased. For $n_{i} \le $ 50, the CPs of $CI_{LS}$, $CI_{AM}$, and $CI_{SB}$ tended to decrease when $\sigma _{1}/\sigma _{2}$ increased. Moreover, the CPs of $CI_{GCI}$ did not change when $\sigma _{1}/\sigma _{2}$ was varied. Hence, $CI_{GCI}$ is preferable for most cases, while $CI_{AM}$ and $CI_{SB}$, which are easy to use in practice, can be used when the sample size is large ($n_{i} \ge $ 100).

As the sample case (k) increased, $CI_{GCI}$ is preferable when the sample size is small. For a large sample size, $CI_{GCI}$, $CI_{AM}$, $CI_{SB}$, and $CI_{LL}$ performed similarly in terms of CP but the ALs of the $CI_{AM}$, $CI_{SB}$, and $CI_{LL}$ were shorter than that of $CI_{GCI}$.

Table 1. The CPs and ALs of 95% two-sided confidence intervals for the common mean of Gaussian distributions with unknown CVs: 2 sample cases.

Full size table

4 Empirical Application

Empirical application of the proposed confidence intervals to real data were presented and compared with $CI_{LL}$.

The dataset reported by Fung and Tsang [16] and Tian [12] and used here comprises hemoglobin, red blood cells, the mean corpuscular volume, hematocrit, white blood cells, and platelet values in normal and abnormal blood samples collected by the Hong Kong Medical Technology Association in 1995 and 1996. The summary statistics for 1995 are $\bar{x}_{1} =$ 84.1300, $s^{2}_{1} =$ 3.3900, and $n_{1} =$ 63, and those for 1996 are $\bar{x}_{2} =$ 85.6800, $s^{2}_{2} =$ 2.9460, and $n_{2} =$ 72. The means of the Gaussian distributions with unknown CVs are $\hat{\theta }_{1} =$ 84.1294 and $\hat{\theta }_{2} =$ 85.6795 for 1995 and 1996, respectively, while the common mean of the Gaussian distributions with unknown CVs is $\hat{\theta } =$ 85.1962.

The two datasets fit Gaussian distributions. The 95% two-sided confidence intervals for $CI_{GCI}$, $CI_{LS}$, $CI_{AM}$, and $CI_{SB}$ were [84.1099,85.8884], [61.4262,108.9661], [60.9972,109.3992], and [84.4604,85.6013] with interval lengths of 1.7785, 47.5399, 48.4020, and 1.1409. For comparison, $CI_{LL}$ provided [84.6502,85.3635] with an interval length of 0.7133. Thus, $CI_{LL}$ had the shortest interval length, while $CI_{SB}$ performed the best out of the proposed approaches as its interval length was shorter than those of the other three for $k= $ 2.

Therefore, these results confirm our simulation study in the previous section in term of length. In simulation, the GCI of Lin and Lee [20] is the shortest average lengths, but the coverage probabilities are less than the nominal confidence level of 0.95. Furthermore, the coverage probability and length in this example are computed by using only one sample, whereas the coverage probability and average length in the simulation are computed by using 5,000 random samples. Therefore, the GCI of Lin and Lee [20] is not recommended to construct the confidence intervals for common mean of Gaussian distributions with unknown CVs.

5 Discussion and Conclusions

Thangjai et al. [8] proposed confidence intervals for the mean and difference of means of normal distributions with unknown coefficients of variation. In addition, Thangjai et al. [9] presented the Bayesian approach to construct the confidence intervals for means of normal distributions with unknown coefficients of variation. In this paper, we extend the work of Thangjai et al. [8, 9] to construct confidence intervals for the common mean of k Gaussian distributions with unknown CVs.

Herein, GCI, LS, adjusted MOVER, and SB approaches to construct interval estimators for the common mean of Gaussian distributions with unknown CVs are presented. Their CPs and ALs were evaluated via a Monte Carlo simulation and compared with the confidence interval based on the GCI approach of Lin and Lee [11]. The results of the simulation studies indicate that the confidence intervals performed similarly based on their CPs for large sample sizes (i.e., $n_{i} \ge $ 100). However, the CP of $CI_{GCI}$ was more satisfactory than those of the other confidence intervals. Moreover, the CPs of $CI_{AM}$, $CI_{SB}$, and $CI_{LL}$ were close to 0.95 and their ALs were slightly shorter than $CI_{GCI}$ when the sample size was large (i.e., $n_{i} \ge $ 100). Thus, $CI_{AM}$ and $CI_{SB}$ can be considered as an alternative to construct an interval estimator for the common mean of Gaussian distributions with unknown CVs when the sample size is large whereas $CI_{LS}$ is not recommended for small sample sizes (i.e., $n_{i}<$ 100) as its CP is below 0.95. Further research will be conducted to find other approaches for comparison.

References

Gerig, T.M., Sen, A.R.: MLE in two normal samples with equal but unknown population coefficients of variation. J. Am. Stat. Assoc. 75, 704–708 (1980)
MathSciNet Google Scholar
Searls, D.T.: The utilization of a known coefficient of variation in the estimation procedure. J. Am. Stat. Assoc. 59, 1225–1226 (1964)
Article Google Scholar
Niwitpong, S.: Confidence intervals for the normal mean with a known coefficient of variation. Far East J. Math. Sci. 97, 711–727 (2015)
MATH Google Scholar
Srivastava, V.K.: On the use of coefficient of variation in estimating mean. J. Indian Soc. Agric. Stat. 26, 33–36 (1974)
Google Scholar
Srivastava, V.K.: A note on the estimation of mean in normal population. Metrika 27(1), 99–102 (1980). https://doi.org/10.1007/BF01893580
Article MathSciNet MATH Google Scholar
Srivastava, V.K., Singh, R.S.: Uniformly minimum variance unbiased estimator of efficiency ratio in estimation of normal population mean. Statist. Probab. Lett. 10, 241–245 (1990)
Article MathSciNet Google Scholar
Sahai, A.: On an estimator of normal population mean and UMVU estimation of its relative efficiency. Appl. Math. Comput. 152, 701–708 (2004)
MathSciNet MATH Google Scholar
Thangjai, W., Niwitpong, S., Niwitpong, S.: Confidence intervals for mean and difference of means of normal distributions with unknown coefficients of variation. Mathematics 5, 1–23 (2017)
Article Google Scholar
Thangjai, W., Niwitpong, S.-A., Niwitpong, S.: Bayesian confidence intervals for means of normal distributions with unknown coefficients of variation. In: Huynh, V.-N., Entani, T., Jeenanunta, C., Inuiguchi, M., Yenradee, P. (eds.) IUKM 2020. LNCS (LNAI), vol. 12482, pp. 361–371. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62509-2_30
Chapter MATH Google Scholar
Krishnamoorthy, K., Lu, Y.: Inference on the common means of several normal populations based on the generalized variable method. Biometrics 59, 237–247 (2003)
Article MathSciNet Google Scholar
Lin, S.H., Lee, J.C.: Generalized inferences on the common mean of several normal populations. J. Stat. Plan. Inference 134, 568–582 (2005)
Article MathSciNet Google Scholar
Tian, L.: Inferences on the common coefficient of variation. Stat. Med. 24, 2213–2220 (2005)
Article MathSciNet Google Scholar
Tian, L., Wu, J.: Inferences on the common mean of several log-normal populations: the generalized variable approach. Biom. J. 49, 944–951 (2007)
Article MathSciNet Google Scholar
Thangjai, W., Niwitpong, S., Niwitpong, S.: Confidence intervals for the common mean of several normal populations. Stud. Computat. Intell. 692, 321–331 (2017)
Article Google Scholar
Jordan, S.J., Krishnamoorthy, K.: Exact confidence intervals for the common mean of several normal populations. Biometrics 52, 77–86 (1996)
Article Google Scholar
Fung, W.K., Tsang, T.S.: A simulation study comparing tests for the equality of coefficients of variation. Stat. Med. 17, 2003–2014 (1998)
Article Google Scholar
Thangjai, W., Niwitpong, S.: Confidence intervals for the weighted coefficients of variation of two-parameter exponential distributions. Cogent Math. 4, 1–16 (2017)
Article MathSciNet Google Scholar
Weerahandi, S.: Generalized confidence intervals. J. Am. Stat. Assoc. 88, 899–905 (1993)
Article MathSciNet Google Scholar
Ye, R.D., Ma, T.F., Wang, S.G.: Inferences on the common mean of several inverse Gaussian populations. Comput. Stat. Data Anal. 54, 906–915 (2010)
Article MathSciNet Google Scholar
Zou, G.Y., Donner, A.: Construction of confidence limits about effect measures: a general approach. Stat. Med. 27, 1693–1702 (2008)
Article MathSciNet Google Scholar
Zou, G.Y., Taleban, J., Hao, C.Y.: Confidence interval estimation for lognormal data with application to health economics. Comput. Stat. Data Anal. 53, 3755–3764 (2009)
Article MathSciNet Google Scholar
Graybill, F.A., Deal, R.B.: Combining unbiased estimators. Biometrics 15, 543–550 (1959)
Article MathSciNet Google Scholar

Download references

Acknowledgments

This research was funded by Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok. No. 651118.

Author information

Authors and Affiliations

Department of Statistics, Faculty of Science, Ramkhamhaeng University, Bangkok, 10240, Thailand
Warisa Thangjai
Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok, 10800, Thailand
Sa-Aat Niwitpong & Suparat Niwitpong

Authors

Warisa Thangjai
View author publications
You can also search for this author in PubMed Google Scholar
Sa-Aat Niwitpong
View author publications
You can also search for this author in PubMed Google Scholar
Suparat Niwitpong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suparat Niwitpong .

Editor information

Editors and Affiliations

Osaka Prefecture University, Sakai, Osaka, Japan
Katsuhiro Honda
University of Hyogo, Kobe, Japan
Tomoe Entani
Osaka Prefecture University, Sakai, Japan
Seiki Ubukata
Japan Advanced Institute of Science and Technology, Nomi, Japan
Van-Nam Huynh
Osaka University, Toyonaka, Osaka, Japan
Masahiro Inuiguchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thangjai, W., Niwitpong, SA., Niwitpong, S. (2022). Analysis of Medical Data Using Interval Estimators for Common Mean of Gaussian Distributions with Unknown Coefficients of Variation. In: Honda, K., Entani, T., Ubukata, S., Huynh, VN., Inuiguchi, M. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2022. Lecture Notes in Computer Science(), vol 13199. Springer, Cham. https://doi.org/10.1007/978-3-030-98018-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-98018-4_13
Published: 04 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98017-7
Online ISBN: 978-3-030-98018-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Analysis of Medical Data Using Interval Estimators for Common Mean of Gaussian Distributions with Unknown Coefficients of Variation

Abstract

Similar content being viewed by others

Confidence Intervals for the Difference Between the Coefficients of Variation of Inverse Gaussian Distributions

Generalized Classes of Regression-Cum-Ratio Estimators of Population Mean in Stratified Random Sampling

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Keywords

1 Introduction

2 Preliminaries

2.1 GCI

Definition 1

2.2 LS Confidence Interval

2.3 Adjusted MOVER Confidence Interval

2.4 SB Confidence Interval

3 Simulation Studies

4 Empirical Application

5 Discussion and Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Analysis of Medical Data Using Interval Estimators for Common Mean of Gaussian Distributions with Unknown Coefficients of Variation

Abstract

Similar content being viewed by others

Confidence Intervals for the Difference Between the Coefficients of Variation of Inverse Gaussian Distributions

Generalized Classes of Regression-Cum-Ratio Estimators of Population Mean in Stratified Random Sampling

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Keywords

1 Introduction

2 Preliminaries

2.1 GCI

Definition 1

2.2 LS Confidence Interval

2.3 Adjusted MOVER Confidence Interval

2.4 SB Confidence Interval

3 Simulation Studies

4 Empirical Application

5 Discussion and Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation