1 Introduction

Uncertainty theory, founded by Liu (2007) and perfected by Liu (2009), has been a branch of mathematics and successfully applied in the fields like science and engineering. As an important application of uncertainty theory, uncertain statistics, first discussed by Liu (2010), is a methodology of collecting, analyzing and interpreting data based on uncertainty theory. Up to now, uncertain statistics has four main development fields: estimating uncertainty distribution, uncertain regression analysis, uncertain time series analysis, and parameter estimation in uncertain differential equation.

Estimating uncertainty distribution is aimed to use uncertainty theory to fit the uncertainty distribution for an uncertain variable based on the expert’s experimental data. The first step to estimate uncertainty distribution is to collect expert’s experimental data. For that matter, Liu (2010) designed a questionnaire survey. The next step is to fit the uncertainty distribution based on the collected expert’s experimental data. If the functional form of the uncertainty distribution to be estimated is known but it contains some unknown parameters, then in order to estimate the unknown parameters via expert’s experimental data, Liu (2010) investigated the principle of least squares, and Wang and Peng (2014) presented the method of moments. If the functional form of the uncertainty distribution to be estimated is even unknown, then in order to estimate the uncertainty distributions, Liu (2010) presented the linear interpolation method, and Chen and Ralescu (2012) explored a series of spline interpolation methods. In addition, Delphi method (Wang et al. (2012)) was suggested as a process to estimate the uncertainty distribution when multiple experts are available.

Uncertain regression analysis is aimed to use uncertainty theory to study the relationship between explanatory variables and response variables. Parameter estimation for the unknown parameters is a vital topic in uncertain regression analysis. Many approaches about estimating unknown parameters in uncertain regression models have been developed, such as the least squares estimation (Yao and Liu (2018)), the least absolute deviations estimation (Liu and Yang (2020)), and the maximum likelihood estimation (Lio and Liu (2020)). In addition, Lio and Liu (2018) proposed an approach to make interval estimation for predicting the response variables. Furthermore, there are many other directions in uncertain regression analysis, including cross-validation (Liu and Jia (2020); Liu (2019)), variable selection (Liu and Yang (2020)), multivariate regression analysis (Song and Fu (2018); Ye and Liu (2020)), and nonparametric regression analysis (Ding and Zhang (2021)).

Uncertain time series analysis is aimed to use uncertainty theory to predict future values based on previously observed data. As a basic model of uncertain time series, uncertain autoregressive model was first proposed by Yang and Liu (2019). In the uncertain autoregressive model, the observed data depend linearly on its previous values and an uncertain disturbance term. In order to take the multiple uncertain disturbance terms in uncertain time series into consideration, Yang and Ni (2020) presented uncertain moving average model where the observed data depend linearly on the current and various past values of a disturbance term.

Parameter estimation in uncertain differential equation is aimed to use uncertainty theory to estimate unknown parameters in uncertain differential equation based on observed data. Many researchers have studied lots of methods of parameter estimation in uncertain differential equation. For example, Yao and Liu (2020) investigated moment estimation, Yang et al. (2020) studied minimum cover estimation, Sheng et al. (2021) investigated least squares estimation, Liu (2021) proposed generalized moment estimation, and Liu and Liu (2020) presented maximum likelihood estimation. As another topic, initial value estimation was proposed by Lio and Liu (2021) to estimate the unknown initial value of uncertain differential equation according to observed data.

This paper explores to develop a new direction of uncertain statistics called uncertain hypothesis test, which is concerned with using uncertainty theory to make decisions about whether some hypotheses are correct or not, according to observed data. As a purpose of investigating uncertain hypothesis test, we employ it in uncertain regression analysis to test whether the estimated disturbance term and the fitted regression model are appropriate.

The rest of the paper is organized as follows. Uncertain hypothesis test is introduced in Sect. 2, and is applied in uncertain regression analysis in Sect. 3. Then, some numerical examples are given in Sect. 4. Finally, a brief summary is made in Sect. 5.

2 Uncertain hypothesis test

Let \(\xi \) be a population with uncertainty distribution \(\varPhi _{\theta }\) where \(\theta \) is an unknown parameter with \(\theta \in \varTheta \). A hypothesis testing problem about the unknown parameter \(\theta \) can be formulated as deciding which of the following two statements is true:

$$\begin{aligned} H_0: \theta \in \varTheta _0 \quad \text {versus} \quad H_1: \theta \in \varTheta _1 \end{aligned}$$
(1)

where \(\varTheta _0\) and \(\varTheta _1\) are two disjoint subsets of \(\varTheta \) and \(\varTheta _0 \cup \varTheta _1=\varTheta \). The statement \(H_0\) is called a null hypothesis, and \(H_1\) is called an alternative hypothesis. Especially, the following hypotheses are called two-sided hypotheses:

$$\begin{aligned} H_0: \theta =\theta _0 \quad \text {versus} \quad H_1: \theta \ne \theta _0, \end{aligned}$$

where \(\theta _0\in \varTheta \).

Assume there is a vector of observed data \((z_1,z_2,\cdots ,z_n)\). A rejection region for the null hypothesis \(H_0\) is a set \(W\subset \mathfrak {R}^n\). If the vector of observed data

$$\begin{aligned} (z_1,z_2,\cdots ,z_n)\in W, \end{aligned}$$

then we reject \(H_0\). Otherwise, we accept \(H_0\). A core problem is how to choose a suitable rejection region W for the given hypothesis \(H_0\).

Definition 1

Let \(\xi \) be a population with uncertainty distribution \(\varPhi _{\theta }\) where \(\theta \) is an unknown parameter. A rejection region \(W\subset \mathfrak {R}^n\) is said to be a test for the two-sided hypotheses \(H_0: \theta =\theta _0\) versus \(H_1: \theta \ne \theta _0\) at significance level \(\alpha \) if

  1. (a)

    for any \((z_1,z_2,\cdots ,z_n)\in W\), there are at least \(\alpha \) of indexes i’s with \(1\le i\le n\) such that

    $$\begin{aligned} \text{ M}_{\theta _0}\{\xi>z_i\}\vee \text{ M}_{\theta _0}\{\xi <z_i\}> 1-\frac{\alpha }{2}, \end{aligned}$$
  2. (b)

    for some \(\theta \ne \theta _0\) and some \((z_1,z_2,\cdots ,z_n)\in W\), there are more than \(1-\alpha \) of indexes i’s with \(1\le i\le n\) and at least \(\alpha \) of indexes j’s with \(1\le j\le n\) such that

    $$\begin{aligned} \text{ M}_{\theta }\{\xi>z_i\}\vee \text{ M}_{\theta }\{\xi<z_i\}<\text{ M}_{\theta _0}\{\xi >z_j\}\vee \text{ M}_{\theta _0}\{\xi <z_j\}. \end{aligned}$$

Remark 1

From Definition 1, we can see that the test W is related to the significance level \(\alpha \). How do we choose it? Standard values, such as 0.1, 0.05, or 0.01, are often used for convenience.

In order to find a suitable rejection region W satisfying the two conditions in Definition 1, we introduce a concept of nonembedded uncertainty distribution family.

Definition 2

A regular uncertainty distribution family \(\{ \varPhi _{\theta }: \theta \in \varTheta \}\) is said to be nonembedded for \(\theta _0\in \varTheta \) at level \(\alpha \) if

$$\begin{aligned} \varPhi _{\theta _0}^{-1}(\beta )>\varPhi _{\theta }^{-1}(\beta )\quad \text {or}\quad \varPhi _{\theta }^{-1}(1-\beta )>\varPhi _{\theta _0}^{-1}(1-\beta ) \end{aligned}$$

for some \(\theta \in \varTheta \) and some \(\beta \) with \(0<\beta \le \alpha /2\).

Example 1

The normal uncertainty distribution family \(\{ \text{ N }(e,\sigma ): e\in \mathfrak {R}, \sigma >0\}\) is nonembedded for any \(\theta _0=(e_0,\sigma _0)\in \mathfrak {R}\times (0,+\infty )\) at any level \(\alpha \). Note that the inverse uncertainty distribution of \(\text{ N }(e,\sigma )\) is

$$\begin{aligned} \varPhi ^{-1}(\beta )=e+\frac{\sigma \sqrt{3}}{\pi } \ln \frac{\beta }{1-\beta }. \end{aligned}$$

Take

$$\begin{aligned} \theta _1=(e_1,\sigma _1)=(e_0-1,\sigma _0),\quad \beta =\frac{\alpha }{2}. \end{aligned}$$

Since

$$\begin{aligned} \begin{aligned} \varPhi _{\theta _0}^{-1}(\beta )-\varPhi _{\theta _1}^{-1}(\beta )&=e_0+\frac{\sigma _0\sqrt{3}}{\pi } \ln \frac{\beta }{1-\beta }-\left( e_1+\frac{\sigma _1\sqrt{3}}{\pi } \ln \frac{\beta }{1-\beta }\right) \\&=e_0+\frac{\sigma _0\sqrt{3}}{\pi } \ln \frac{\beta }{1-\beta }-\left( e_0-1+\frac{\sigma _0\sqrt{3}}{\pi } \ln \frac{\beta }{1-\beta }\right) \\&=1>0, \end{aligned} \end{aligned}$$

the normal uncertainty distribution family \(\{ \text{ N }(\theta ,\sigma ): e\in \mathfrak {R}, \sigma >0\}\) is nonembedded for \(\theta _0\) at level \(\alpha \).

Example 2

The linear uncertainty distribution family \(\{ \text{ L }(a,b): a< b\}\) is nonembedded for any \(\theta _0=(a_0,b_0)\) with \(a_0< b_0\) at any level \(\alpha \). Note that the inverse uncertainty distribution of \(\text{ L }(a,b)\) is

$$\begin{aligned} \varPhi ^{-1}(\beta )=(1-\beta )a+ \beta \theta . \end{aligned}$$

Take

$$\begin{aligned} \theta _1=(a_1,b_1)=(a_0-1,b_0-1),\quad \beta =\frac{\alpha }{2}. \end{aligned}$$

Since

$$\begin{aligned} \begin{aligned} \varPhi _{\theta _0}^{-1}(\beta )-\varPhi _{\theta _1}^{-1}(\beta )&=(1-\beta )a_0+ \beta b_0-[(1-\beta )a_1+ \beta b_1]\\&=(1-\beta )(a_0-a_1)+ \beta (b_0-b_1)\\&=1-\beta +\beta =1>0, \end{aligned} \end{aligned}$$

the linear uncertainty distribution family \(\{ \text{ L }(a,b): a< b\}\) is nonembedded for \(\theta _0\) at level \(\alpha \).

From Definition 2, we also know that a regular uncertainty distribution family \(\{ \varPhi _{\theta }: \theta \in \varTheta \}\) is embedded for \(\theta _0\in \varTheta \) at level \(\alpha \) if

$$\begin{aligned} \varPhi _{\theta _0}^{-1}(\beta )\le \varPhi _{\theta }^{-1}(\beta )\quad \text {and}\quad \varPhi _{\theta }^{-1}(1-\beta )\le \varPhi _{\theta _0}^{-1}(1-\beta ) \end{aligned}$$
(2)

for any \(\theta \in \varTheta \) and any \(\beta \) with \(0<\beta \le \alpha /2\). It is obvious that (2) is equivalent to

$$\begin{aligned}{}[\varPhi _{\theta }^{-1}(\beta ),\varPhi _{\theta }^{-1}(1-\beta )]\subseteq [\varPhi _{\theta _0}^{-1}(\beta ),\varPhi _{\theta _0}^{-1}(1-\beta )], \end{aligned}$$

which is the reason why \(\{ \varPhi _{\theta }: \theta \in \varTheta \}\) is named as embedded uncertainty distribution family. To illustrate the concept of embedded uncertainty distribution family, some examples are given as follows.

Example 3

The uncertainty distribution family

$$\begin{aligned} \left\{ \text{ N }\left( 0,\exp {\left( -(\theta -1)^2\right) }\right) : \theta \in \mathfrak {R}\right\} \end{aligned}$$

is embedded for \(\theta _0=1\) at any level \(\alpha \). Note that the inverse uncertainty distribution of

$$\begin{aligned} \text{ N }\left( 0,\exp {\left( -(\theta -1)^2\right) }\right) \end{aligned}$$

is

$$\begin{aligned} \varPhi _{\theta }^{-1}(\beta )=0+\frac{\sqrt{3}}{\pi }\exp {\left( -(\theta -1)^2\right) } \ln \frac{\beta }{1-\beta }=\frac{\sqrt{3}}{\pi }\exp {\left( -(\theta -1)^2\right) } \ln \frac{\beta }{1-\beta }. \end{aligned}$$

For any \(\theta \in \mathfrak {R}\) and any \(\beta \) with \(0<\beta \le \alpha /2<0.5\), since

$$\begin{aligned} \begin{aligned}&\varPhi _{\theta _0}^{-1}(\beta )-\varPhi _{\theta }^{-1}(\beta )\\&\quad =\frac{\sqrt{3}}{\pi }\exp {\left( -(\theta _0-1)^2\right) } \ln \frac{\beta }{1-\beta }-\frac{\sqrt{3}}{\pi }\exp {\left( -(\theta -1)^2\right) } \ln \frac{\beta }{1-\beta }\\&\quad =\frac{\sqrt{3}}{\pi }\left( \exp {\left( -(\theta _0-1)^2\right) }-\exp {\left( -(\theta -1)^2\right) }\right) \ln \frac{\beta }{1-\beta }\\&\quad =\frac{\sqrt{3}}{\pi }\left( \exp {\left( -(1-1)^2\right) }-\exp {\left( -(\theta -1)^2\right) }\right) \ln \frac{\beta }{1-\beta }\\&\quad =\frac{\sqrt{3}}{\pi }\left( 1-\exp {\left( -(\theta -1)^2\right) }\right) \ln \frac{\beta }{1-\beta }\le 0 \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \varPhi _{\theta _0}^{-1}(1-\beta )-\varPhi _{\theta }^{-1}(1-\beta )=\frac{\sqrt{3}}{\pi }\left( 1-\exp {\left( -(\theta -1)^2\right) }\right) \ln \frac{1-\beta }{\beta }\ge 0, \end{aligned}$$

we have

$$\begin{aligned} \varPhi _{\theta _0}^{-1}(\beta )\le \varPhi _{\theta }^{-1}(\beta ),\quad \varPhi _{\theta }^{-1}(1-\beta )\le \varPhi _{\theta _0}^{-1}(1-\beta ), \end{aligned}$$

which implies the uncertainty distribution family

$$\begin{aligned} \left\{ \text{ N }\left( 0,\exp {\left( -(\theta -1)^2\right) }\right) : \theta \in \mathfrak {R}\right\} \end{aligned}$$

is embedded. A sketch map for ease of understanding is shown in Fig. 1.

Fig. 1
figure 1

The sketch map in Example 4

Example 4

Whether an uncertainty distribution family is nonembedded is related to the value of \(\theta _0\) in Definition 2. For example, the uncertainty distribution family

$$\begin{aligned} \left\{ \text{ N }\left( 0,\exp {\left( -(\theta -1)^2\right) }\right) : \theta \in \mathfrak {R}\right\} \end{aligned}$$

is embedded for \(\theta _0=1\) at any level \(\alpha \), but nonembedded for any \(\theta _0\ne 1\) at any level \(\alpha \).

Example 5

Whether an uncertainty distribution family is nonembedded is also related to the level \(\alpha \) in Definition 2. For example, for each \(\theta \in \mathfrak {R}\), write

$$\begin{aligned} \varPhi _{\theta }(x)= {\left\{ \begin{array}{ll} 0, &{} \text {if} ~x\le -0.2\\ x+0.2, &{} \text {if} ~-0.2<x\le 0\\ 0.3\exp {(\theta ^2)}x+0.2, &{} \text {if} ~0<x\le \exp {(-\theta ^2)}\\ \displaystyle \frac{0.3}{2-\exp {(-\theta ^2)}}(x-2)+0.8, &{} \text {if} ~\exp {(-\theta ^2)}<x\le 2\\ x-1.2, &{} \text {if} ~2<x\le 2.2\\ 1, &{} \text {if} ~x> 2.2. \end{array}\right. } \end{aligned}$$

Then, the uncertainty distribution family \(\{ \varPhi _{\theta }: \theta \in \mathfrak {R}\}\) is nonembedded for any \(\theta _0\in \mathfrak {R}\) at any level \(\alpha \) with \(0.4<\alpha <1\), but embedded for any \(\theta _0\in \mathfrak {R}\) at any level \(\alpha \) with \(0<\alpha \le 0.4\).

Theorem 1

Let \(\xi \) be a population with regular uncertainty distribution \(\varPhi _{\theta }\) where \(\theta \) is an unknown parameter with \(\theta \in \varTheta \). If the uncertainty distribution family \(\{ \varPhi _{\theta }: \theta \in \varTheta \}\) is nonembedded for a known parameter \(\theta _0\in \varTheta \) at significance level \(\alpha \), then the test for the two-sided hypotheses \(H_0: \theta =\theta _0\) versus \(H_1: \theta \ne \theta _0\) at significance level \(\alpha \) is

$$\begin{aligned} \begin{aligned} W=\bigg \{(z_1,z_2&,\cdots ,z_n): \text { there are at least } \alpha \text { of indexes } i\text {'s with } 1\le i\le n \\&\text {such that }z_i<\varPhi _{\theta _0}^{-1}\left( \frac{\alpha }{2}\right) \text { or } z_i>\varPhi _{\theta _0}^{-1}\left( 1-\frac{\alpha }{2}\right) \bigg \}. \end{aligned} \end{aligned}$$

Proof

In order to prove that W is a test for the two-sided hypotheses \(H_0: \theta =\theta _0\) versus \(H_1: \theta \ne \theta _0\) at level \(\alpha \), we need to verify that W satisfies the two conditions in Definition 1.

First, we will verify the condition (a) in Definition 1. For any \((z_1,z_2,\cdots ,z_n)\in W\), it follows from the definition of W that there are at least \(\alpha \) of indexes i’s with \(1\le i\le n\) such that

$$\begin{aligned} z_i<\varPhi _{\theta _0}^{-1}\left( \frac{\alpha }{2}\right) \quad \text {or}\quad z_i>\varPhi _{\theta _0}^{-1}\left( 1-\frac{\alpha }{2}\right) , \end{aligned}$$

i.e.,

$$\begin{aligned} \text{ M}_{\theta _0}\{\xi>z_i\}> 1-\frac{\alpha }{2} \quad \text {or}\quad \text{ M}_{\theta _0}\{\xi <z_i\}> 1-\frac{\alpha }{2}. \end{aligned}$$

Therefore W satisfies the condition (a).

Second, we will verify the condition (b). Since the uncertainty distribution family \(\{ \varPhi _{\theta }: \theta \in \varTheta \}\) is nonembedded for \(\theta _0\) at level \(\alpha \), we have

$$\begin{aligned} \varPhi _{\theta _0}^{-1}(\beta )>\varPhi _{\theta }^{-1}(\beta )\quad \text {or}\quad \varPhi _{\theta }^{-1}(1-\beta )>\varPhi _{\theta _0}^{-1}(1-\beta ) \end{aligned}$$

for some \(\theta \in \varTheta \) and some \(\beta \) with \(0<\beta \le \alpha /2\). Take

$$\begin{aligned} z_i = {\left\{ \begin{array}{ll} \varPhi _{\theta }^{-1}(\beta ), &{} \text {if} ~\varPhi _{\theta _0}^{-1}(\beta )>\varPhi _{\theta }^{-1}(\beta ) \text { and } \varPhi _{\theta }^{-1}(1-\beta )\le \varPhi _{\theta _0}^{-1}(1-\beta )\\ \varPhi _{\theta }^{-1}(1-\beta ), &{} \text {if} ~\varPhi _{\theta }^{-1}(1-\beta )>\varPhi _{\theta _0}^{-1}(1-\beta ), \end{array}\right. } \end{aligned}$$

\(i=1,2,\cdots ,n\). It is easy to verify that

$$\begin{aligned} \text{ M}_{\theta }\{\xi >z_i\}\vee \text{ M}_{\theta }\{\xi <z_i\}\le 1-\beta \end{aligned}$$

and

$$\begin{aligned} \text{ M}_{\theta _0}\{\xi>z_i\}\vee \text{ M}_{\theta _0}\{\xi <z_i\}>1-\beta , \end{aligned}$$
(3)

\(i=1,2,\cdots ,n\). Thus,

$$\begin{aligned} \text{ M}_{\theta }\{\xi>z_i\}\vee \text{ M}_{\theta }\{\xi<z_i\}<\text{ M}_{\theta _0}\{\xi >z_j\}\vee \text{ M}_{\theta _0}\{\xi <z_j\},~i, j=1,2,\cdots ,n. \end{aligned}$$

In addition, since \(\beta \le \alpha /2\), it follows from (3) that

$$\begin{aligned} \text{ M}_{\theta _0}\{\xi>z_i\}\vee \text{ M}_{\theta _0}\{\xi <z_i\}>1-\beta \ge 1-\frac{\alpha }{2},~i=1,2,\cdots ,n. \end{aligned}$$

That is, \((z_1,z_2,\cdots ,z_n)\in W\). Therefore W satisfies the condition (b). The theorem is proved. \(\square \)

Remark 2

In order to make it easier to determine if the vector of observed data \((z_1,z_2,\cdots ,z_n)\) falls into the test W defined in Theorem 1, we introduce a concept of singular point. For each i with \(1\le i\le n\), if

$$\begin{aligned} z_i<\varPhi _{\theta _0}^{-1}\left( \frac{\alpha }{2}\right) \quad \text {or}\quad z_i>\varPhi _{\theta _0}^{-1}\left( 1-\frac{\alpha }{2}\right) , \end{aligned}$$

then \(z_i\) is called a singular point. It follows from Theorem 1 that \((z_1,z_2,\cdots ,z_n)\in W\) iff the number of singular points is at least \(\alpha n\), and \((z_1,z_2,\cdots ,z_n)\notin W\) iff the number of singular points is less than \(\alpha n\).

Example 6

The condition of nonembedded uncertainty distribution family in Theorem 1 cannot be removed. For example, let \(\xi \) be a population with uncertainty distribution

$$\begin{aligned} \text{ N }\left( 0,\exp {\left( -(\theta -1)^2\right) }\right) \end{aligned}$$

where \(\theta \) is an unknown parameter. Write \(\theta _0=1\). For a given significance level \(\alpha \), take the set

$$\begin{aligned} \begin{aligned} W=\bigg \{(z_1,z_2&,\cdots ,z_n): \text { there are at least } \alpha \text { of indexes } i\text {'s with } 1\le i\le n \\&\text {such that }z_i<\varPhi _{\theta _0}^{-1}\left( \frac{\alpha }{2}\right) \text { or } z_i>\varPhi _{\theta _0}^{-1}\left( 1-\frac{\alpha }{2}\right) \bigg \} \end{aligned} \end{aligned}$$

where \(\varPhi _{\theta _0}^{-1}\) is the inverse uncertainty distribution of

$$\begin{aligned} \text{ N }\left( 0,\exp {\left( -(\theta _0-1)^2\right) }\right) . \end{aligned}$$

It follows from the proof of Theorem 1 that the set W satisfies the condition (a) in Definition 1. However, we claim that the set W does not satisfy the condition (b) in Definition 1. To prove it, we employ the method of proof by contradiction. Suppose, on the contrary, that W satisfies the condition (b) in Definition 1. Then for some \(\theta \ne \theta _0\) and some \((z_1,z_2,\cdots ,z_n)\in W\), there are more than \(1-\alpha \) of indexes i’s with \(1\le i\le n\) and at least \(\alpha \) of indexes j’s with \(1\le j\le n\) such that

$$\begin{aligned} \text{ M}_{\theta }\{\xi>z_i\}\vee \text{ M}_{\theta }\{\xi<z_i\}<\text{ M}_{\theta _0}\{\xi >z_j\}\vee \text{ M}_{\theta _0}\{\xi <z_j\}, \end{aligned}$$

i.e.,

$$\begin{aligned} \text{ M}_{\theta }\{\xi>z_i\}\vee \text{ M}_{\theta }\{\xi<z_i\}\le 1-\beta<\text{ M}_{\theta _0}\{\xi >z_j\}\vee \text{ M}_{\theta _0}\{\xi <z_j\} \end{aligned}$$

for some \(\beta \) with \(0<\beta \le \alpha /2\). Thus there exists an index k such that

$$\begin{aligned} \varPhi _{\theta }^{-1}(\beta ) \le z_k \le \varPhi _{\theta }^{-1}(1-\beta ) \end{aligned}$$

and

$$\begin{aligned} z_k<\varPhi _{\theta _0}^{-1}(\beta ) \quad \text {or}\quad z_k>\varPhi _{\theta _0}^{-1}(1-\beta ). \end{aligned}$$

Hence

$$\begin{aligned} \varPhi _{\theta _0}^{-1}(\beta )>\varPhi _{\theta }^{-1}(\beta )\quad \text {or}\quad \varPhi _{\theta }^{-1}(1-\beta )>\varPhi _{\theta _0}^{-1}(1-\beta ), \end{aligned}$$

which indicates that the uncertainty distribution family

$$\begin{aligned} \left\{ \text{ N }\left( 0,\exp {\left( -(\theta -1)^2\right) }\right) : \theta \in \mathfrak {R}\right\} \end{aligned}$$

is nonembedded for \(\theta _0\) at level \(\alpha \). This contradicts the conclusion shown in Example 3, i.e., the uncertainty distribution family

$$\begin{aligned} \left\{ \text{ N }\left( 0,\exp {\left( -(\theta -1)^2\right) }\right) : \theta \in \mathfrak {R}\right\} \end{aligned}$$

is embedded for \(\theta _0\) at level \(\alpha \). Thus W does not satisfy the condition (b) in Definition 1. Therefore the condition of nonembedded uncertainty distribution family cannot be removed.

Corollary 1

Let \(\xi \) be a population that follows a normal uncertainty distribution with unknown expected value e and variance \(\sigma ^2\). Then the test for the two-sided hypotheses

$$\begin{aligned} H_0: e=e_0 \text { and } \sigma =\sigma _0 \text { versus } H_1: e\ne e_0 \text { or } \sigma \ne \sigma _0 \end{aligned}$$
(4)

at significance level \(\alpha \) is

$$\begin{aligned} \begin{aligned} W=\bigg \{(z_1,z_2&,\cdots ,z_n): \text { there are at least } \alpha \text { of indexes } i\text {'s with } 1\le i\le n \\&\text {such that }z_i<\varPhi ^{-1}\left( \frac{\alpha }{2}\right) \text { or } z_i>\varPhi ^{-1}\left( 1-\frac{\alpha }{2}\right) \bigg \} \end{aligned} \end{aligned}$$
(5)

where

$$\begin{aligned} \varPhi ^{-1}(\alpha )=e_0+\frac{\sigma _0 \sqrt{3}}{\pi }\ln \frac{\alpha }{1-\alpha }. \end{aligned}$$

Proof

Since Example 1 shows that the normal uncertainty distribution family

$$\begin{aligned} \{ \text{ N }(e,\sigma ): e\in \mathfrak {R}, \sigma >0\} \end{aligned}$$

is nonembedded for \((e_0,\sigma _0)\) at any significance level \(\alpha \), it follows from Theorem 1 that the test for hypotheses (4) is W defined in (5). \(\square \)

Example 7

Let \(\xi \) be a population, and let \((z_1,z_2,\cdots ,z_n)\) be a vector of observed data. In order to test whether \(\xi \) follows the normal uncertainty distribution \(\text{ N }(e_0,\sigma _0)\), we may consider the two-sided hypotheses

$$\begin{aligned} H_0: e=e_0 \text { and } \sigma =\sigma _0 \text { versus } H_1: e\ne e_0 \text { or } \sigma \ne \sigma _0. \end{aligned}$$
(6)

Given a significance level \(\alpha \), it follows from Corollary 1 that the test for the hypotheses (6) at level \(\alpha \) is

$$\begin{aligned} \begin{aligned} W=\bigg \{(z_1,z_2&,\cdots ,z_n): \text { there are at least } \alpha \text { of indexes } i\text {'s with } 1\le i\le n \\&\text {such that }z_i<\varPhi ^{-1}\left( \frac{\alpha }{2}\right) \text { or } z_i>\varPhi ^{-1}\left( 1-\frac{\alpha }{2}\right) \bigg \} \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \varPhi ^{-1}(\alpha )=e_0+\frac{\sigma _0 \sqrt{3}}{\pi }\ln \frac{\alpha }{1-\alpha }. \end{aligned}$$

If \((z_1,z_2,\cdots ,z_n)\in W\), then we reject \(H_0\). Otherwise, we accept \(H_0\).

3 Uncertain regression analysis

In this section, we will apply the uncertain hypothesis test in uncertain regression analysis. Let \((x_1,x_2,\cdots ,x_p)\) be a vector of explanatory variables, and let y be a response variable. Yao and Liu (2018) suggested that the functional relationship between \((x_1,x_2,\cdots ,x_p)\) and y is expressed by an uncertain regression model

$$\begin{aligned} y=f(x_1,x_2,\cdots ,x_p | {{\varvec{\beta }}})+\varepsilon \end{aligned}$$

where \({{\varvec{\beta }}}\) is a vector of parameters, and \(\varepsilon \) is an uncertain disturbance term (uncertain variable).

Suppose there is a set of observed data,

$$\begin{aligned} (x_{i1},x_{i2},\cdots ,x_{ip},y_i),~i=1,2,\cdots ,n. \end{aligned}$$

By employing least squares method (Yao and Liu (2018)), least absolute deviations method (Liu and Yang (2020)) or maximum likelihood method (Lio and Liu (2020)), we can obtain an estimation \(\hat{{{\varvec{\beta }}}}\) for \({{\varvec{\beta }}}\). Then the fitted regression model is determined by

$$\begin{aligned} y=f(x_1,x_2,\cdots ,x_p | \hat{{{\varvec{\beta }}}}). \end{aligned}$$
(7)

For each i \((i=1,2,\cdots ,n)\), the i-th residual is

$$\begin{aligned} \varepsilon _i=y_i-f(x_{i1},x_{i2},\cdots ,x_{ip} | \hat{{{\varvec{\beta }}}}). \end{aligned}$$

The residuals \(\varepsilon _1, \varepsilon _2,\cdots ,\varepsilon _n\) can be regarded as the samples of the uncertain disturbance term \(\varepsilon \). Thus, Lio and Liu (2018) suggested that the expected value of the uncertain disturbance term \(\varepsilon \) can be estimated as the average of residuals, i.e.,

$$\begin{aligned} {\hat{e}}=\frac{1}{n}\sum _{i=1}^n \varepsilon _i \end{aligned}$$

and the variance can be estimated as

$$\begin{aligned} {\hat{\sigma }}^2=\frac{1}{n}\sum _{i=1}^n (\varepsilon _i-{\hat{e}})^2. \end{aligned}$$

Therefore, we may assume the estimated disturbance term \({\hat{\varepsilon }}\) follows the normal uncertainty distribution \(\text{ N }({\hat{e}},{\hat{\sigma }})\). Then the forecast uncertain variable of response variable y with respect to \((x_1,x_2,\cdots ,x_p)\) is determined by

$$\begin{aligned} {\hat{y}}=f(x_1,x_2,\cdots ,x_p | \hat{{{\varvec{\beta }}}})+{\hat{\varepsilon }},~{\hat{\varepsilon }}\sim \text{ N }({\hat{e}},{\hat{\sigma }}). \end{aligned}$$

In order to test whether the estimated disturbance term \({\hat{\varepsilon }}\) is appropriate, we consider the following hypotheses:

$$\begin{aligned} H_0: e={\hat{e}} \text { and } \sigma ={\hat{\sigma }} \text { versus } H_1: e\ne {\hat{e}} \text { or } \sigma \ne {\hat{\sigma }}. \end{aligned}$$
(8)

Given a level of significance \(\alpha \) (e.g. 0.05), it follows from Corollary 1 that the test for the hypotheses (8) is

$$\begin{aligned} \begin{aligned} W=\bigg \{(z_1,z_2&,\cdots ,z_n): \text { there are at least } \alpha \text { of indexes } i\text {'s with } 1\le i\le n \\&\text {such that }z_i<\varPhi ^{-1}\left( \frac{\alpha }{2}\right) \text { or } z_i>\varPhi ^{-1}\left( 1-\frac{\alpha }{2}\right) \bigg \} \end{aligned} \end{aligned}$$
(9)

where

$$\begin{aligned} \varPhi ^{-1}(\alpha )={\hat{e}}+\frac{{\hat{\sigma }} \sqrt{3}}{\pi }\ln \frac{\alpha }{1-\alpha }. \end{aligned}$$

For each i \((i=1,2,\cdots ,n)\), if

$$\begin{aligned} \varepsilon _i<\varPhi ^{-1}\left( \frac{\alpha }{2}\right) \quad \text {or}\quad \varepsilon _i>\varPhi ^{-1}\left( 1-\frac{\alpha }{2}\right) , \end{aligned}$$

then \((x_{i1},x_{i2},\cdots ,x_{ip},y_i)\) is regarded as an outlier. If the number of outliers is at least \(\alpha n\), i.e.,

$$\begin{aligned} (\varepsilon _1, \varepsilon _2,\cdots ,\varepsilon _n)\in W, \end{aligned}$$

then either the estimated disturbance term \(\text{ N }({\hat{e}},{\hat{\sigma }})\) or the fitted regression model (7) is inappropriate. Otherwise, both the estimated disturbance term \(\text{ N }({\hat{e}},{\hat{\sigma }})\) and the fitted regression model (7) are appropriate.

4 Numerical Examples

This section will provide two examples to illustrate how to employ uncertain hypothesis test in uncertain regression analysis to test whether the estimated disturbance term and the fitted regression model are appropriate.

Example 8

Assume there is a set of observed data \((x_{i1},x_{i2},x_{i3},y_i)\), \(i=1,2,\cdots ,30\). See Table 1. In order to fit these observed data, we employ the linear uncertain regression model

$$\begin{aligned} y=\beta _0+\beta _1 x_1+\beta _2 x_2+\beta _3 x_3+\varepsilon \end{aligned}$$

where \(\beta _0,\beta _1,\beta _2,\beta _3\) are some parameters, and \(\varepsilon \) is an uncertain disturbance term (uncertain variable).

Table 1 Observed data in Example 8

Using the observed data in Table 1 and solving the minimization problem

$$\begin{aligned} \min _{\beta _0,\beta _1,\beta _2,\beta _3} \sum _{i=1}^{30} (y_i-\beta _0-\beta _1x_{i1}-\beta _2x_{i2}-\beta _3x_{i3})^2, \end{aligned}$$

we obtain the fitted linear regression model

$$\begin{aligned} y=4.3965+1.3644x_1+1.3130x_2+0.7166x_3. \end{aligned}$$
(10)

From

$$\begin{aligned} \varepsilon _i=y_i-4.3965-1.3644 x_{i1}-1.3130 x_{i2}-0.7166 x_{i3}, ~i=1,2,\cdots ,30, \end{aligned}$$

we obtain 30 residuals \(\varepsilon _1,\varepsilon _2,\cdots ,\varepsilon _{30}\). Thus the expected value of estimated disturbance term \({\hat{\varepsilon }}\) is

$$\begin{aligned} {\hat{e}}=\frac{1}{30}\sum _{i=1}^{30} \varepsilon _i=0.0000, \end{aligned}$$

and the variance is

$$\begin{aligned} {\hat{\sigma }}^2=\frac{1}{30}\sum _{i=1}^{30} (\varepsilon _i-{\hat{e}})^2=2.8529^2. \end{aligned}$$

Therefore, we may assume the estimated disturbance term \({\hat{\varepsilon }}\) follows the normal uncertainty distribution \(\text{ N }(0.0000,2.8529)\). Then the forecast uncertain variable of response variable y with respect to \((x_1,x_2,x_3)\) is determined by

$$\begin{aligned} {\hat{y}}=4.3965+1.3644x_1+1.3130x_2+0.7166x_3+{\hat{\varepsilon }},~{\hat{\varepsilon }}\sim \text{ N }(0.0000,2.8529). \end{aligned}$$

To test whether \(\text{ N }(0.0000,2.8529)\) is appropriate, we consider the following hypotheses:

$$\begin{aligned} H_0: e=0.0000 \text { and } \sigma =2.8529 \text { versus } H_1: e\ne 0.0000 \text { or } \sigma \ne 2.8529. \end{aligned}$$
(11)

Given a significance level \(\alpha =0.05\), we obtain

$$\begin{aligned} \varPhi ^{-1}\left( \frac{\alpha }{2}\right) =-5.7624, \quad \varPhi ^{-1}\left( 1-\frac{\alpha }{2}\right) =5.7624 \end{aligned}$$

where \(\varPhi ^{-1}\) is the inverse uncertainty distribution of \(\text{ N }(0.0000,2.8529)\), i.e.,

$$\begin{aligned} \varPhi ^{-1}(\alpha )=0.0000+\frac{2.8529 \sqrt{3}}{\pi }\ln \frac{\alpha }{1-\alpha }. \end{aligned}$$

Since \(\alpha \times 30=1.5\), it follows from (9) that the test for the hypotheses (11) is

$$\begin{aligned} \begin{aligned} W=\{(z_1,z_2&,\cdots ,z_{30}): \text { there are at least } 2\text { of indexes } i\text {'s with } 1\le i\le 30 \\&\text {such that }z_i<-5.7624 \text { or } z_i> 5.7624\}. \end{aligned} \end{aligned}$$
Fig. 2
figure 2

Residual plot in Example 8

As shown in Fig. 2, we can see that only

$$\begin{aligned} \varepsilon _{24} \notin [-5.7624,5.7624]. \end{aligned}$$

Thus \((\varepsilon _1,\varepsilon _2,\cdots ,\varepsilon _{30})\notin W\). Therefore we think both the estimated disturbance term \(\text{ N }(0.0000,2.8529)\) and the fitted linear regression model (10) are appropriate.

Example 9

Assume there is a set of observed data \((x_{i1},x_{i2},x_{i3},y_i)\), \(i=1,2,\cdots ,30\). See Table 2. In order to fit these observed data, we employ the linear uncertain regression model

$$\begin{aligned} y=\beta _0+\beta _1 x_1+\beta _2 x_2+\beta _3 x_3+\varepsilon \end{aligned}$$

where \(\beta _0,\beta _1,\beta _2,\beta _3\) are some parameters, and \(\varepsilon \) is an uncertain disturbance term (uncertain variable).

Table 2 Observed data in Example 9

Using the observed data in Table 2 and solving the minimization problem

$$\begin{aligned} \min _{\beta _0,\beta _1,\beta _2,\beta _3} \sum _{i=1}^{30} (y_i-\beta _0-\beta _1x_{i1}-\beta _2x_{i2}-\beta _3x_{i3})^2, \end{aligned}$$

we obtain the fitted linear regression model

$$\begin{aligned} y=4.5285+1.0549x_1+1.1399x_2+0.9292x_3. \end{aligned}$$
(12)

From

$$\begin{aligned} \varepsilon _i=y_i-4.5285-1.0549 x_{i1}-1.1399 x_{i2}-0.9292 x_{i3}, \end{aligned}$$

we obtain 30 residuals \(\varepsilon _1,\varepsilon _2,\cdots ,\varepsilon _{30}\). Thus the expected value of estimated disturbance term \({\hat{\varepsilon }}\) is

$$\begin{aligned} {\hat{e}}=\frac{1}{30}\sum _{i=1}^{30} \varepsilon _i=0.0000, \end{aligned}$$

and the variance is

$$\begin{aligned} {\hat{\sigma }}^2=\frac{1}{30}\sum _{i=1}^{30} (\varepsilon _i-{\hat{e}})^2=2.7449^2. \end{aligned}$$

Therefore, we may assume the estimated disturbance term \({\hat{\varepsilon }}\) follows the normal uncertainty distribution \(\text{ N }(0.0000,2.7449)\). Then the forecast uncertain variable of response variable y with respect to \((x_1,x_2,x_3)\) is determined by

$$\begin{aligned} {\hat{y}}=4.5285+1.0549x_1+1.1399x_2+0.9292x_3+{\hat{\varepsilon }},~{\hat{\varepsilon }}\sim \text{ N }(0.0000,2.7449). \end{aligned}$$

To test whether \(\text{ N }(0.0000,2.7449)\) is appropriate, we consider the following hypotheses:

$$\begin{aligned} H_0: e=0.0000 \text { and } \sigma =2.8529 \text { versus } H_1: e\ne 0.0000 \text { or } \sigma \ne 2.8529. \end{aligned}$$
(13)

Given a significance level \(\alpha =0.05\), we obtain

$$\begin{aligned} \varPhi ^{-1}\left( \frac{\alpha }{2}\right) =-5.5443, \quad \varPhi ^{-1}\left( 1-\frac{\alpha }{2}\right) =5.5443, \end{aligned}$$

where \(\varPhi ^{-1}\) is the inverse uncertainty distribution of \(\text{ N }(0.0000,2.7449)\), i.e.,

$$\begin{aligned} \varPhi ^{-1}(\alpha )=0.0000+\frac{2.7449 \sqrt{3}}{\pi }\ln \frac{\alpha }{1-\alpha }. \end{aligned}$$

Since \(\alpha \times 30=1.5\), it follows from (9) that the test for the hypotheses (13) is

$$\begin{aligned} \begin{aligned} W=\{(z_1,z_2&,\cdots ,z_{30}): \text { there are at least } 2 \text { of indexes } i\text {'s with } 1\le i\le 30 \\&\text {such that }z_i<-5.5443 \text { or } z_i> 5.5443\}. \end{aligned} \end{aligned}$$
Fig. 3
figure 3

Residual plot in Example 9

As shown in Fig. 3, we can see that

$$\begin{aligned} \varepsilon _{10}>5.5443, ~\varepsilon _{12}>5.5443, ~\varepsilon _{15}>5.5443. \end{aligned}$$

Thus \((\varepsilon _1,\varepsilon _2,\cdots ,\varepsilon _{30})\in W\). Therefore we think either the estimated disturbance term \(\text{ N }(0.0000,2.7449)\) or the fitted linear regression model (12) is inappropriate.

5 Conclusion

This paper first introduced a mathematical tool of uncertain hypothesis test to decide whether some hypotheses are correct or not, based on observed data. With the help of the concept of nonembedded uncertainty distribution family, the test for two-sided hypotheses was constructed. Then uncertain hypothesis test was employed in uncertain regression analysis to test whether the estimated disturbance term and the fitted regression model are appropriate. Finally, this paper gave some numerical examples to illustrate the test process.

In the future, the uncertain hypothesis test will be applied in other development fields of uncertain statistics like estimating uncertainty distribution, uncertain time series analysis and parameter estimation in uncertain differential equation.