1 Introduction

Rough set theory (RST), an effective data analysis tool put forward by Pawlak [23, 24]. It is based on the idea that some objects in the universe have the corresponding information values. By generalizing equivalence classes or equivalence relations, this theory has been widely extended. On the one hand, equivalence relations are divided into tolerance relations, dominance relations and reflexive relations. On the other hand, the division of an equivalence class is extended to cover. RST, as an important method to manage uncertainty, has the merit of being directly based on the original data, rather than requiring preliminary or additional data information. Therefore, it is highly reliable. Many applications of RST are based on information systems (ISs) [6, 42, 43, 47]. In addition, some scholars studied multigranulation rough sets [5, 20, 41, 44].

An IS as a database that displays relationships between objects and attributes is also put forwarded by Pawlak so as to reveal large databases and knowledge discovery process mathematically. It is important to note that there may be missing information values in an IS. An IS with missing information values is called an incomplete IS (IIS). A set containing all possible information values can be utilized to represent the missing information values. By representing all missing information values of a single valued IS as a set containing all possible information values, it can be evolved into a set-valued information system (SVIS). In this way, a SVIS can effectively deal with the noise caused by the missing information values.

As an important IS, scholars have paid great attention to SVISs. For instance, Yao [46] studied SVISs with upper and lower approximations; Leung et al. [16] found a method to select minimum feature set for SVISs; Couso et al. [7] looked into the rationality of SVISs from a statistical point of view; Huang et al. [15] obtained a probabilistic set-valued ISs by using probability distribution to describe set values; Qian et al. [25] introduced two kinds of set-valued ordered ISs and put forward an attribute reduction method that can simplify set-valued ordered ISs. Liu et al. [21] researched feature selection for a set-valued decision IS from the view of dominance relations. Xie et al. [38] gave uncertainty measures for interval-valued ISs. Chen et al. [4] investigated feature selection in a SVIS according to tolerance relations. A SVIS has been successfully employed to the data analysis of complete ISs, such as distinguishing the dependence between attributes, distinguishing the importance of attributes, and attribute reduction.

Uncertainty is mainly composed of four parts: randomness, fuzziness, incompleteness and inconsistency. It exists in every field of real life. Uncertainty measure has become a noticeable problem in many fields such as machine learning [40], image processing [22], medical diagnosis [14], and data mining [10]. With the development of research, some excellent results have been obtained. For example, information entropy put forward by Shannon [28] has been recognized as a very important research method to measure the uncertainty of ISs. Yao [45] considered granularity measure in terms of granularity. Until now, information granularity and information entropy have gradually become two important tools to consider the uncertainty of ISs. On the basis of these two tools, some outstanding scholars have promoted and applied them. Wierman [36] discussed granularity measure in RST. D\(\ddot{u}\)ntsch et al. [11] explored the measurement of decision rules in RST based on information entropy. Dai et al. [8] brought up the two tools of entropy measure and granularity measure to measure the uncertainty of SVISs. Li et al. [21] researched entropy theory in fuzzy relation IS. Wang et al. [35] applied information entropy and information granularity to the measurement of interval and SVISs. Li et al. [18] analyzed the information structure of fuzzy set-valued ISs and the uncertainty measurement method of fuzzy set-valued ISs. Wu et al. [37] proposed a reliable approximation operator based on semi-monolayer covering for set-valued ISs.

Attribute reduction or feature selection, as an important technology of data processing in machine learning, can effectively reduce redundant attributes. It can also reduce the complexity of high-dimensional data for calculation and improve the accuracy of classification. To different data, many researchers study attribute reduction. For instance, Tang et al. [31] researched attribute reduction in set-valued decision ISs. Song et al. [30] applied attribute reduction in set-valued decision ISs. Cornelis et al. [6] studied a general definition for a fuzzy decision reduct. Wang et al. [34] presented an iterative reduction algorithm from the view of variable distance parameter. Giang et al. [13] obtained an algorithm with application to attribute reduction in a dynamic decision table. Qian et al. [26] explored an accelerator algorithm for attribute reduction based on RST. Chen et al. [2] brought up the concept of fuzzy kernel alignment and applied it to attribute reduction for heterogeneous data. Singh et al. [29] introduced a attribute selection method of rough set based on fuzzy similarity in SVISs. Wang et al. [32] proposed four uncertainty measures. On this basis, they designed a greedy algorithm for attribute reduction. Li et al. [19] constructed a new acceleration strategy for general attribute reduction algorithms. Li et al. [17] studied existing reduction methods to help researchers better understand and use these reduction methods to meet their own needs.

Set-valued data is an important data in practical applications. However, in some practical cases, set-valued may be described by missing information values, which can cause some critical information to be missing. An incomplete set-valued information system (ISVIS) is a SVIS with missing information values. Xie et al. [39] introduced the distance between the values of two information functions and applied it to obtain the information structures and uncertainty measure of incomplete probability set-valued ISs. Chen et al. [3] obtained some tools to measure the uncertainty of an ISVIS by means of Gaussian kernel.

This article focuses on studying uncertainty measurement of incomplete set-valued data and its attribute reduction. For an incomplete set-valued data, we treat it as an ISVIS. In an ISVIS, objects described by the same information are indiscernible. The indiscernibility relations produced in this mode constitute the mathematical foundation of RST. Therefore, the similarity degree between information values on each attribute is shown in an ISVIS base on RST. The tolerance relation induced by each subsystem is given and the tolerance relation is dealt with by introducing an approximate equality between fuzzy sets. Some tools are put passed on to measure the uncertainty of ISVISs. From the point of view of data’s incomplete rate, some statistical methods are used to analyze the effectiveness of the proposed measures. Base on two measurement methods (i.e., information granulation and information entropy), two reduction algorithms are given, and their effectiveness under different incomplete rates is analyzed and verified by k-means clustering algorithm and Mean Shift clustering algorithm. The work process of the paper is given in Fig. 1.

Fig. 1
figure 1

The work process of the paper

The rest of this paper is intended to be below. Section 2 retrospects the cardinal perceptions of fuzzy relations and ISVISs. Section 3 obtains similarity degree and equivalence relations in an ISVIS. Section 4 investigates uncertainty measure for an ISVIS. Section 5 gives experiments analysis for the proposed measures. Section 6 studies an application for attribute reduction in an ISVIS. Section 7 compares the proposed algorithms with the other two algorithms. Section 8 summaries this paper.

2 Preliminaries

In this section, we recall some basic notions about fuzzy relations and ISVISs.

Throughout this paper, UA denote two non-empty finite sets, \(2^U\) means the family of all subsets of U and |X| expresses the cardinality of \(X\in 2^U\).

In this paper, put

$$\begin{aligned} U=\{x_1,x_2,\ldots ,x_n\},\quad A=\{a_1,a_2,\ldots ,a_m\} \end{aligned}$$

2.1 Fuzzy relations

Recall that R is a binary relation on U whenever \(R\subseteq U\times U\). If \((x,y)\in R\), then we denote it by xRy.

Let R be a binary relation on U. Then R is called

(1):

reflexive, if xRx for any \(x\in U\);

(2):

symmetric, if xRy implies yRx for any \(x,y\in U\);

(3):

transitive, if xRy and yRz imply xRz for any \(x,y,z\in U.\)

Let R be a binary relation on U. Then R is called an equivalence relation on U, if R is reflexive, symmetric and transitive. Moreover, R is called a universal relation on U if \(R=\delta\); R is said to be an identity relation on U if  \(R=\triangle\).

Recall that F is a fuzzy set whenever F is a function defined by \(F:U\rightarrow I\).

In this article, \(I^U\) shows the collection of fuzzy sets on U.

If R is a fuzzy set in \(U\times U\), then R is called a fuzzy relation on U, and R can be expressed by the following matrix

$$\begin{aligned} M(R)=(R(x_i,x_j))_{n\times n}. \end{aligned}$$

In this article, \(I^{U\times U}\) denotes the family of all fuzzy relations on U.

Definition 2.1

([21]) Suppose \(R\in I^{U\times U}\). For any \(x\in U\), define

$$\begin{aligned} S_R(x)(y)=R(x,y),\forall ~y\in U. \end{aligned}$$

Then \(S_R(x)\) is called the fuzzy information granule of the point x with respect to R.

In [33], \(S_R(x)\) is denote by \([x]_R\).

2.2 An ISVIS

Definition 2.2

([24]) Let U be an object set and A an attribute set. Suppose that U and A are finite sets. Then the pair (UA) is called an information system (IS), if each attribute \(a\in A\) determines a information function \(a:U\rightarrow V_a\), where \(V_a=\{a(x):x\in U\}\).

Let (UA) be an IS. If there is \(a\in A\) such that \(*\in V_a\), here \(*\) means a null or unknown value, then (UA) is called an incomplete information system (IIS).

If (UA) is an IIS, given \(P\subseteq A\). Then a binary relation \(T_P\) on U can be defined as

$$\begin{aligned} (x,y)\in T_P~\Leftrightarrow ~\forall ~a\in P, a(x)=a(y)~or~ a(x)=*~or~a(y)=*. \end{aligned}$$

Clearly, \(T_P\) is a tolerance relation on U. For each \(x\in U\), denote

$$\begin{aligned} T_P(x)=\{y\in U:(x,y)\in T_P\}. \end{aligned}$$

Then, \(T_P(x)\) is called the tolerance class of x under the tolerance relation \(T_P\).

For convenience, \(T_{\{a\}}\) and \(T_{\{a\}}(x)\) are denoted by \(T_a\) and \(T_a(x)\), respectively.

Obviously,

$$\begin{aligned} T_P=\bigcap \limits _{a\in P}T_a,~~T_P(x)=\bigcap \limits _{a\in P}T_a(x). \end{aligned}$$

Let (UA) be an IIS. For each \(a\in A\), denote

$$\begin{aligned} V_a^*=V_a-\{a(x):a(x)=*\}. \end{aligned}$$

Then, \(V_a^*\) means the set of all non-missing information values of the attribute a.

Definition 2.3

([45]) Suppose that (UA) is an IIS. Then (UA) is referred to as an incomplete set-valued information system (for short, an ISVIS), if for any \(a\in A\) and \(x\in U\), a(x) is set.

If \(P\subseteq A\), then (UP) is referred to as the subsystem of (UA).

Example 2.4

Table 1 depicts an ISVIS (UA), where \(U=\{x_1,x_2,\ldots ,x_{7}\}\) and \(A=\{a_1,a_2,\ldots ,a_4\}\).

Table 1 An ISVIS (UA)

Example 2.5

(Continued from Example 2.4)

$$\begin{aligned}&V_{a_1}^*=\{\{1,3\},\{1,2\},\{3,4\},\{1,2,3\}\}, V_{a_2}^*=\{\{a,c\},\{a,b\},\{a,b,c\}\},\\&V_{a_3}^*=\{\{2,3\},\{5,6,7\},\{4,5,7\},\{1,2,5\}\}, V_{a_4}^*=\{\{T,F\},\{T,H\},\{H,T,F\}\}. \end{aligned}$$

3 The equivalence relation induced by each subsystem of an ISVIS

In an ISVIS, objects described by the same information are indiscernible. The indiscernibility relation produced in this mode constitutes mathematical foundation of RST. Thus, this section constructs the similarity degree between information values on each attribute in an ISVIS and gives the equivalence relation induced by each subsystem.

Definition 3.1

 Let (UA) be an ISVIS. Then \(\forall ~x,y\in U\), \(a\in A\), the similarity degree between a(x) and a(y) is defined as follows:

$$\begin{aligned}&s(a(x),a(y))\\ & \quad =\left\{ \begin{array}{rcl} 1, &{} &{} x=y;\\ \frac{1}{|V_a^*|^2}, &{} &{} x\ne y,~a(x)=*,~a(y)=*;\\ \frac{1}{|V_a^*|}, &{} &{} x\ne y,~a(x)\ne *,~a(y)=*;\\ \frac{1}{|V_a^*|}, &{} &{} x\ne y,~a(x)=*,~a(y)\ne *;\\ 1, &{} &{} x\ne y,~a(x)\ne *,~a(y)\ne *,~a(x)=a(y); \\ \frac{|a(x)\bigcap a(y)|}{|a(x)\bigcup a(y)|}, &{} &{} x\ne y,~a(x)\ne *,~a(y)\ne *,~a(x)\ne a(y). \end{array} \right. \end{aligned}$$

For the convenience of expression, denote

$$\begin{aligned} s_{ij}^k=s(a_k(x_i),a_k(x_j)). \end{aligned}$$

\(s_{ij}^k\) indicates the similarity degree between \(a_k(x_i)\) and \(a_k(x_j)\). This also expresses the similarity degree between two objects \(x_i\) and \(x_j\) with respect to the attribute \(a_k\).

Example 3.2

(Continued from Example 2.4) By Definition 3.1, then \(s_{ij}^k\) \((i,j=1,\ldots ,7,k=1,\ldots ,4)\) is obtained as follows (see Tables 2, 3, 4, 5).

Table 2 \(s_{ij}^1\)
Table 3 \(s_{ij}^2\)
Table 4 \(s_{ij}^3\)
Table 5 \(s_{ij}^4\)

Let (UA) be an ISVIS. For any \(a\in A\), define

$$\begin{aligned} R_a(x,y)=s(a(x),a(y)). \end{aligned}$$

Then \(R_a\) is a fuzzy relation on U.

Below, we attempt to deal with a fuzzy relation \(R_{a}\) by introducing the approximate equality between fuzzy sets.

Definition 3.3

([49]) Suppose \(k\in N\). Given \(a,b\in [0,1]\). If \(a,b\in [0,\frac{1}{10^k})\) or \(a,b\in [\frac{1}{10^k},\frac{2}{10^k})\) or \(\cdot \cdot \cdot \cdot \cdot \cdot\) or \(a,b\in [\frac{10^k-1}{10^k},1)\) or  \(a=b=1\), then a and b are said to be class-consistent, and k is said to be a threshold value. We denote it by \(a \approx _k b\).

In this paper, we pick \(k=1\).

Definition 3.4

([49]) Suppose \(A,B\in I^U\). Then

$$\begin{aligned} A\approx _1 B\Leftrightarrow ~ \forall ~x\in U, ~A(x) \approx _1 B(x). \end{aligned}$$

Definition 3.5

Let (UA) be an ISVIS. Given \(P\subseteq A\). Define

$$\begin{aligned}&R_a^*=\{(x,y)\in U\times U :S_{R_a}(x)\approx _1 S_{R_a}(y)\},\\&R_P^*=\bigcap \limits _{a\in P}R_a^*. \end{aligned}$$

It is easy to see that \(R_P^*\) is an equivalence relation on U. Then \(R_P^*\) is called the equivalence relation induced by the subsystem (UP). And the partition on U induced by \(R_P^*\) is denoted by \(U/R_P^*\).

For any \(x\in U\), denote

$$\begin{aligned} R_P^*(x)=\{y\in U:(x,y)\in R_P^*\}. \end{aligned}$$

Example 3.6

(Continued from Example 3.2) We can obtain \(R_A^*(x_1)=\{x_1\},\) \(R_A^*(x_2)=\{x_2\},\) \(R_A^*(x_3)=\{x_3\},\) \(R_A^*(x_4)=\{x_4\},\) \(R_A^*(x_5)=\{x_5\},\) \(R_A^*(x_6)=\{x_6\},\) \(R_A^*(x_7)=\{x_7\}.\)

4 Measuring uncertainty of an ISVIS

In this section, some tools for measuring uncertainty of an ISVIS are obtained.

4.1 Granulation measure for an ISVIS

Definition 4.1

Suppose that (UA) is an ISVIS. Given \(P \subseteq A\). Then information granulation of the subsystem (UP) is specified as

$$\begin{aligned} G(P) = \frac{1}{n^2}\sum \limits _{i=1}^n|R_P^*(x_i)|. \end{aligned}$$

Proposition 4.2

Let (UA) be an ISVIS. Then for any \(P\subseteq A\),

$$\begin{aligned} \frac{1}{n}\le G(P)\le 1. \end{aligned}$$

Furthermore, if \(R_P^*\) is an universal relation on U, G achieves the minimum value \(\frac{1}{n}\); if \(R_P^*\) is a identity relation on U, G will achieve the maximum value 1.

Proof

\(\forall ~i\)\(1\le |R_P^*(x_i)|\le n\) , \(n\le \sum \limits _{i=1}^n|R_P^*(x_i)|\le n^2\). By Definition 4.1,

$$\begin{aligned} \frac{1}{n}\le G(P)\le 1. \end{aligned}$$

If \(R_P^*\) is an identity relation on U, for any i\(|R_P^*(x_i)|=1\), \(G(P)=\frac{1}{n}\).

If \(R_P^*\) is a universal relation on U, for any i\(|R_P^*(x_i)|=n\), \(G(P)=1\). \(\square\)

Proposition 4.3

Let (UA) be an ISVIS. If \(Q\subseteq P\subseteq A\), then \(G(P)\le G(Q)\).

Proof

(1) Since \(Q\subseteq P\subseteq A\), \(\forall ~i\), we have \(R_P^*(x_i)\subseteq R_Q^*(x_i)\). Then \(|R_P^*(x_i)|\le |R_Q^*(x_i)|\). By Definition 4.1,

$$\begin{aligned} G(P) = \frac{1}{n}\sum \limits _{i=1}^n\frac{1}{n}|R_P^*(x_i)|\le \frac{1}{n}\sum \limits _{i=1}^n\frac{1}{n}|R_Q^*(x_i)|=G(Q) . \end{aligned}$$

Thus \(G(P)\le G(Q).\)

\(\square\)

This Proposition 4.3 shows that information granulation increases with the coarsening of information and decreases with the refinement of information. This means that the uncertainty of ISVISs can be evaluated based on the information granulation introduced in Definition 4.1.

Example 4.4

(Continued from Example 3.6) By Definition 4.1, we can obtain

$$\begin{aligned} G(A) = \frac{1}{7^2}\sum \limits _{i=1}^7|R_P^*(x_i)|=\frac{7}{49}\approx 0.14. \end{aligned}$$

4.2 Entropy measure for an ISVIS

Entropy tends to measure the disorder degree of a system. The higher its value, the higher the disorder order of the system is. Shannon [28] applies this concept of entropy to information theory for calculating the measurement uncertainty of a system. Similarly, information entropy of a given ISVIS is defined as following.

Definition 4.5

Suppose that (UA) is an ISVIS. Given \(P\subseteq A\). Then information entropy of the subsystem (UP) is defined as

$$\begin{aligned} H(P) = -\sum \limits _{i=1}^n \frac{1}{n}\log _2 \frac{|R_P^*(x_i)|}{n}. \end{aligned}$$

Proposition 4.6

Let (UA) be an ISVIS. If \(Q\subseteq P\subseteq A\), then \(H(P)\le H(Q)\).

Proof

Since \(Q\subseteq P\subseteq A\), similar to the proof of Proposition 4.3, we again that \(\forall ~i\), \(1\le |R_P^*(x_i)|\le |R_Q^*(x_i)|.\)

Then \(\forall ~i\), \(-\log _2 \frac{|R_P^*(x_i)|}{n}=\log _2\frac{n}{|R_P^*(x_i)|}\ge \log _2\frac{n}{|R_Q^*(x_i)|}=-\log _2\frac{|R_Q^*(x_i)|}{n}.\)

Consequently, \(H(P)\le H(Q)\). \(\square\)

This statement clarifies that information entropy increases with the refinement of information and decreases with the coarsening of information. This means that the uncertainty of ISVISs can be evaluated based on the information entropy introduced in Definition 4.5.

Example 4.7

(Continued from Example 3.6) By Definition 4.5, we can obtain

$$\begin{aligned}H(A) = -\sum \limits _{i=1}^7 \frac{1}{7}\log _2 \frac{|R_P^*(x_i)|}{7} \approx 2.81. \end{aligned}$$

Rough entropy is used to measure granularity of a given partition. It is also called co-entropy by some scholars.

Similarly, rough entropy of a given ISVIS is put forward in the following definition.

Definition 4.8

Let (UA) be an ISVIS. Given \(P\subseteq A\). Then rough entropy of the subsystem (UP) is deemed as

$$\begin{aligned} E_r(P)=-\sum \limits ^n_{i=1}\frac{1}{n}\log _2\frac{1}{|R_P^*(x_i)|}. \end{aligned}$$

Proposition 4.9

Let (UA) be an ISVIS. Given \(P\subseteq A\). Then

$$\begin{aligned} 0\le E_r(P)\le \log _2 n. \end{aligned}$$

What is more, if \(R_P^*\) is an identity relation on U, then \(E_r^*\) reaches the minimum value 0; if \(R_P^*\) is an universal relation on U, then \(E_r^*\) attains the maximum value \(\log _2 n\).

Proof

Note that \(R_P^*\) is a fuzzy equivalence relation on U. Then \(\forall ~i\), \(R_P^*(x_i)(x_i)=1.\)

Hence \(\forall ~i\), \(1\le |R_P^*(x_i)|\le n\), then \(0\le -\log _2 \frac{1}{|R_P^*(x_i)|}=\log _2 |R_P^*(x_i)|\le \log _2 n\). Therefore, \(0\le -\sum _{i=1}^n\log _2 \frac{1}{|R_P^*(x_i)|}\le n\log _2 n\).

By Definition 4.8, we obtain that

$$\begin{aligned} 0\le E_r(P)\le \log _2 n. \end{aligned}$$

If \(R_P^*\) is an identity relation on U, then \(\forall ~i\), \(|R_P^*(x_i)|=1\). Thus \(E_r(P)=0\).

If \(R_P^*\) is an universal relation on U, then \(\forall ~i\), \(|R_P^*(x_i)|=n\). Thus \(E_r(P)=\log _2 n\).

\(\square\)

Proposition 4.10

Let (UA) be an ISVIS. If \(P\subseteq Q\subseteq A\), then \(E_r(Q)\le E_r(P)\).

Proof

Since \(P\subseteq Q\subseteq A\), similar to the proof of Proposition 4.3, we obtain that \(\forall ~i\),

$$\begin{aligned} 1\le |R_Q^*(x_i)|\le |R_P^*(x_i)|. \end{aligned}$$

Then \(\forall ~i\), \(-\log _2 \frac{1}{|R_P^*(x_i)|}=\log _2 |R_P^*(x_i)|\ge \log _2 |R_Q^*(x_i)|=-\log _2\frac{1}{|R_Q^*(x_i)|}\)

As a result, \(E_r(P)\ge E_r(Q)\). \(\square\)

For Proposition 4.10, it can be found that the more uncertain the available information is, the bigger rough entropy value becomes. This means that rough entropy brought forward in Definition 4.8 can be used to evaluate the uncertainty of an ISVIS.

Example 4.11

(Continued from Example 3.6) By Definition 4.8, we can obtain

$$\begin{aligned} E_r(P)=-\sum \limits ^7_{i=1}\frac{1}{7}\log _2\frac{1}{|R_P^*(x_i)|} = 0. \end{aligned}$$

Theorem 4.12

Let (UA) be an ISVIS. Given \(P\subseteq A\). Then

$$\begin{aligned} E_r(P)+H(P)=log_2~n. \end{aligned}$$

Proof

$$\begin{aligned} \begin{aligned} E_r(P)+H(P)&=-\frac{1}{n}\sum \limits _{i=1}^n\left(\log _2\frac{1}{|R_P^*(x_i)|}+\log _2\frac{|R_P^*(x_i)|}{n}\right)\\&=-\frac{1}{n}\sum \limits _{i=1}^n\log _2\frac{1}{n}=\log _2n.\\ \end{aligned} \end{aligned}$$

\(\square\)

Corollary 4.13

Let (UA) be an ISVIS. Given \(P\subseteq A\). Then

$$\begin{aligned} 0\le H(P)\le log_2~n. \end{aligned}$$

Proof

By Proposition 4.9, \(0\le E_r(P)\le \log _2 n\).

By Theorem 4.12, \(H(P)=log_2~n-E_r(P)\). Consequently, \(0\le H(P)\le log_2~n\). \(\square\)

4.3 Fuzzy information amount in an ISVIS

Similarly, information amount in a given ISVIS is stated in the following definition.

Definition 4.14

Let (UA) be an ISVIS. Given \(P\subseteq A\). Then information amount of the subsystem (UP) is regarded as

$$\begin{aligned} E(P)=\sum \limits ^n_{i=1}\frac{1}{n}\left(1-\frac{|R_P^*(x_i)|}{n}\right). \end{aligned}$$

Proposition 4.15

Let (UA) be an ISVIS. If \(P\subseteq Q\subseteq A\), then \(E(P)\le E(Q)\).

Proof

Since \(P\subseteq Q\subseteq A\), similar to the proof of Proposition 4.3, we get that \(\forall ~i\), \(1\le |R_Q^*(x_i)|\le |R_P^*(x_i)|.\) Then

$$\begin{aligned} E(P)=\sum \limits ^n_{i=1}\frac{1}{n}\left(1-\frac{|R_P^*(x_i)|}{n}\right) \le \sum \limits ^n_{i=1}\frac{1}{n}\left(1-\frac{|R_Q^*(x_i)|}{n}\right)=E(Q). \end{aligned}$$

Hence \(E(P)\le E(Q)\). \(\square\)

It can be found that the more uncertain the available information is, the bigger information amount value becomes.

Theorem 4.16

Let (UA) be an ISVIS. Given \(P\subseteq A\). Then \(G(P)+E(P)=1.\)

Proof

$$\begin{aligned} \begin{aligned} G(P)+E(P)&=\frac{1}{n^2}\sum \limits _{i=1}^n[|R_P^*(x_i)|+(n-|R_P^*(x_i)|)]=1.\\ \end{aligned} \end{aligned}$$

\(\square\)

Corollary 4.17

Let (UA) be an ISVIS. Given \(P\subseteq A\). Then \(0\le E(P)\le 1-\frac{1}{n}.\)

Proof

By Proposition 4.2, \(\frac{1}{n}\le G(P)\le 1\).

By Theorem 4.16, \(E(P)=1-G(P)\).

Thus \(0\le E(P)\le 1-\frac{1}{n}\). \(\square\)

From Proposition 4.15 and Corollary 4.17, we know that information amount introduced in Definition 4.14 can evaluate the uncertainty of an ISVIS.

Example 4.18

(Continued from Example 3.6) By Definition 4.14, we can obtain

$$\begin{aligned} E(P)=\sum \limits ^7_{i=1}\frac{1}{7}(1-\frac{|R_P^*(x_i)|}{7}) \approx 0.86. \end{aligned}$$

5 Experiments analysis

In this section, some numerical experiments are designed to evaluate the effectiveness of the proposed measures under different incomplete rates.

5.1 Incomplete rate

In this section, 8 incomplete data were selected in UCI to test the performance of the proposed measures, as shown in Table 6, where Obesity was randomly hollowed out with 20% of content and became an incomplete data.

For an ISVIS (UA), the missing information values are randomly distributed on all attributes, and the incomplete rate of (UA) (denoted by \(\beta\)) is defined as

$$\begin{aligned} \beta = \frac{Number~of~missing~~values}{mn}. \end{aligned}$$

First, we transform the incomplete data into an ISVIS. Then, among all the information values, we delete 2%, 4%, 6%, 8%, 10%, 12%, 14% and 16% randomly, and we call the created data ‘\(\beta\)-ISVIS’ respectively, whose \(\beta =0.02k~(k=1,2,\ldots ,8).\) If the incomplete rate of an incomplete data has exceeded the value of \(\beta\) we want to set, then the missing information value is randomly selected and set as the set composed of all possible values of this attribute. If the incomplete rate of an incomplete data does not reach the value of \(\beta\) we want to set, then the known information value is randomly selected as the missing information value to achieve the value of \(\beta\) to be set.

Table 6 Eight data from UCI

First of all, we explore the number of objects with missing information values at \(\beta =0.02k~(k=1,2,\ldots ,8)\) based on the data in Table 6. The results are shown in Fig. 2, where the X-axis represents \(\beta =0.02k~(k=1,2,\ldots ,8)\) and the Y-axis represents the percentage of objects with missing information values. The values in Fig. 2 are the average values of 10 training sets.

Fig. 2
figure 2

Percentage of missing objects under different \(\beta\)

From Fig. 2, we can draw the following conclusions:

(1) As the incomplete rate increases, so does the percentage of objects that contain missing information values; (2) The more attributes a data has, the more decentralized distributed the missing information values are. As a result, the rate of missing objects is much higher; (3) When the incomplete rate is only 6%, the percentage of missing objects in all data except Sl is greater than 50%. When the incompleteness rate is only 10%, the percentage of missing objects in all data is greater than 70%; (4) Preprocessing methods such as deleting can seriously reduce the available information in the database.

5.2 Numerical experiments

In Aa, pick \(\beta =0.02k\) \((k=1,\ldots ,8)\), \(A_i=\{a_1,\ldots , a_{i}\}\)  \((i=1,\ldots ,20)\). Denote

$$\begin{aligned}&X^{\beta }_{G}(Aa)=\{G^{\beta }(A_1),\ldots ,G^{\beta }(A_{20})\},~~ X^{\beta }_{E}(Aa)=\{E^{\beta }(A_1),\ldots ,E^{\beta }(A_{20})\}, \\&X^{\beta }_{E_r}(Aa)=\{E_r^{\beta }(A_1),\ldots ,E_r^{\beta }(A_{20})\},~~ X^{\beta }_{H}(Aa)=\{H^{\beta }(A_1),\ldots ,H^{\beta }(A_{20})\}. \end{aligned}$$

In Ac, pick \(\beta =0.02k~(k=1,\ldots ,8)\), \(C_i=\{a_1,\ldots , a_{i}\}\), \((i=1,\ldots ,20)\). Denote

$$\begin{aligned}&X^{\beta }_{G}(Ac)=\{G^{\beta }(C_1),\ldots ,G^{\beta }(C_{20})\},~~ X^{\beta }_{E}(Ac)=\{E^{\beta }(C_1),\ldots ,E^{\beta }(C_{20})\}, \\&X^{\beta }_{E_r}(Ac)=\{E_r^{\beta }(C_1),\ldots ,E_r^{\beta }(C_{20})\},~~ X^{\beta }_{H}(Ac)=\{H^{\beta }(C_1),\ldots ,H^{\beta }(C_{20})\}. \end{aligned}$$

In De, pick \(\beta =0.02k~(k=1,\ldots ,8)\), \(D_i=\{a_1,\ldots , a_{i}\}\), \((i=1,\ldots ,34)\). Denote

$$\begin{aligned}&X^{\beta }_{G}(De)=\{G^{\beta }(D_1),\ldots ,G^{\beta }(D_{34})\},~~ X^{\beta }_{E}(De)=\{E^{\beta }(D_1),\ldots ,E^{\beta }(D_{34})\}, \\&X^{\beta }_{E_r}(De)=\{E_r^{\beta }(D_1),\ldots ,E_r^{\beta }(D_{34})\},~~ X^{\beta }_{H}(De)=\{H^{\beta }(D_1),\ldots ,H^{\beta }(D_{34})\}. \end{aligned}$$

In He, pick \(\beta =0.02k~(k=1,\ldots ,8)\), \(H_i=\{a_1,\ldots , a_{i}\}\), \((i=1,\ldots ,19)\). Denote

$$\begin{aligned}&X^{\beta }_{G}(He)=\{G^{\beta }(H_1),\ldots ,G^{\beta }(H_{19})\},~~ X^{\beta }_{E}(He)=\{E^{\beta }(H_1),\ldots ,E^{\beta }(H_{19})\}, \\&X^{\beta }_{E_r}(He)=\{E_r^{\beta }(H_1),\ldots ,E_r^{\beta }(H_{19})\},~~ X^{\beta }_{H}(He)=\{H^{\beta }(H_1),\ldots ,H^{\beta }(H_{19})\}. \end{aligned}$$

In Pc, pick \(\beta =0.02k~(k=1,\ldots ,8)\), \(P_i=\{a_1,\ldots , a_i\}\), \((i=1,\ldots ,13)\). Denote

$$\begin{aligned}&X^{\beta }_{G}(Pc)=\{G^{\beta }(P_1),\ldots ,G^{\beta }(P_{13})\},~~ X^{\beta }_{E}(Pc)=\{E^{\beta }(P_1),\ldots ,E^{\beta }(P_{13})\}, \\&X^{\beta }_{E_r}(Pc)=\{E_r^{\beta }(P_1),\ldots ,E_r^{\beta }(P_{13})\},~~ X^{\beta }_{H}(Pc)=\{H^{\beta }(P_1),\ldots ,H^{\beta }(P_{13})\}. \end{aligned}$$

In Sl, pick \(\beta =0.02k~(k=1,\ldots ,8)\), \(L_i=\{a_1,\ldots , a_{i}\}\), \((i=1,\ldots ,35)\). Denote

$$\begin{aligned}&X^{\beta }_{G}(Sl)=\{G^{\beta }(L_1),\ldots ,G^{\beta }(L_{35})\},~~ X^{\beta }_{E}(Sl)=\{E^{\beta }(L_1),\ldots ,E^{\beta }(L_{35})\}, \\&X^{\beta }_{E_r}(Sl)=\{E_r^{\beta }(L_1),\ldots ,E_r^{\beta }(L_{35})\},~~ X^{\beta }_{H}(Sl)=\{H^{\beta }(L_1),\ldots ,H^{\beta }(L_{35})\}. \end{aligned}$$

In Ob, pick \(\beta =0.02k~(k=1,\ldots ,8)\), \(B_i=\{a_1,\ldots , a_{i}\}\), \((i=1,\ldots ,16)\). Denote

$$\begin{aligned}&X^{\beta }_{G}(Ob)=\{G^{\beta }(B_1),\ldots ,G^{\beta }(B_{16})\},~~ X^{\beta }_{E}(Ob)=\{E^{\beta }(B_1),\ldots ,E^{\beta }(B_{16})\}, \\&X^{\beta }_{E_r}(Ob)=\{E_r^{\beta }(B_1),\ldots ,E_r^{\beta }(B_{16})\},~~ X^{\beta }_{H}(Ob)=\{H^{\beta }(B_1),\ldots ,H^{\beta }(B_{16})\}. \end{aligned}$$

In Gw, pick \(\beta =0.02k~(k=1,\ldots ,8)\), \(G_i=\{a_1,\ldots , a_{i}\}\), \((i=1,\ldots ,14)\). Denote

$$\begin{aligned}&X^{\beta }_{G}(Gw)=\{G^{\beta }(G_1),\ldots ,G^{\beta }(G_{14})\},~~ X^{\beta }_{E}(Gw)=\{E^{\beta }(G_1),\ldots ,E^{\beta }(G_{14})\}, \\&X^{\beta }_{E_r}(Gw)=\{E_r^{\beta }(G_1),\ldots ,E_r^{\beta }(G_{14})\},~~ X^{\beta }_{H}(Gw)=\{H^{\beta }(G_1),\ldots ,H^{\beta }(G_{14})\}. \end{aligned}$$
Fig. 3
figure 3

Values of uncertainty measurement on Aa

Fig. 4
figure 4

Values of uncertainty measurement on Ac

Fig. 5
figure 5

Values of uncertainty measurement on De

Fig. 6
figure 6

Values of uncertainty measurement on He

From Figs. 3,4, 5, 6, 7, 8, 910, the following conclusions are obtained:

Fig. 7
figure 7

Values of uncertainty measurement on Pc

Fig. 8
figure 8

Values of uncertainty measurement on Sl

Fig. 9
figure 9

Values of uncertainty measurement on Ob

Fig. 10
figure 10

Values of uncertainty measurement on Gw

Regardless of the incomplete rate, \(E_r\) and G decrease monotonously as the number of attributes in the attribute subset increases. At the same time, both E and H increase monotonously with the increase of the number of attributes in the attribute subset. Therefore, the four measures proposed in this paper can be used to measure the uncertainty of ISVISs.

5.3 Dispersion analysis

Coefficient of variation CV, also known as discrete coefficient, is a statistic to measure the degree of variation of each observation in the data, which is obtained from the ratio of standard deviation to mean value. Coefficient of variation can eliminate the influence of unit difference or average difference on the variation degree of two or more data.

Let \(A=\{a_1,a_2,\ldots ,a_n\}\) be a data set. Then, the coefficient of variation of A is denoted as CV(A), which is defined as follows

$$\begin{aligned} CV(A)=\frac{\sqrt{\frac{1}{n}\sum _{i=1}^n(a_i-\frac{1}{n}\sum _{i=1}^na_i)^2}}{\frac{1}{n}\sum _{i=1}^na_i}. \end{aligned}$$
Fig. 11
figure 11

CV-values of four measure sets of Aa

Fig. 12
figure 12

CV-values of four measure sets of Ac

Fig. 13
figure 13

CV-values of four measure sets of De

Fig. 14
figure 14

CV-values of four measure sets of He

Continue the above experiment, CV-values of four measure sets are compared at different incomplete rates. The results are shown in Figs. 11, 12, 13, 14, 15, 16, 17, 18.

Fig. 15
figure 15

CV-values of four measure sets of Pc

Fig. 16
figure 16

CV-values of four measure sets of Sl

Fig. 17
figure 17

CV-values of four measure sets of Ob

Fig. 18
figure 18

CV-values of four measure sets of Gw

As can be seen from Figs. 1118, some conclusions are obtained:

  1. (1)

    When the data in Table 6 are at different incomplete rates, the CV-values of Er are all greater than 0.5, the CV-values of E and H are all less than 0.5, and the CV-values of E are all smaller than the CV-values of H.

  2. (2)

    When the data in Table 6 are at different incomplete rates, CV-values of G are greater than 0.5 in all other gave data except He.

  3. (3)

    For He with different incomplete rates, CV-values of Er are all greater than CV-values of G.

  4. (4)

    For Pc with an incomplete rate 0.14 or 0.08, the CV-values of Er are greater than the CV-values of G, while there is little difference between the CV-values of G and the CV-values of Er at other incomplete rates.

Therefore, the coefficient of variation of E is the smallest in the given data with different incomplete rates, so E has the best measurement effect on the uncertainty of an ISVIS.

5.4 Correlation analysis

Spearman rank correlation [48], as an important statistical analysis method in statistics, is used to estimate the correlation between two statistical variables by using monotone equation.

Suppose that \(A=\{a_1,a_2,\ldots ,a_n\}\) and \(B=\{b_1,b_2\ldots ,b_n\}\) are two data sets. By sorting A and B (ascending or descending at the same time), two element ranking sets \(R=\{R_1,R_2,\ldots ,R_n\}\) and \(Q=\{Q_1,Q_2,\ldots ,Q_n\}\) are obtained, where \(R_i\) and \(Q_i\) (\(i=1,\ldots ,n\)) are the ranking of \(a_i\) in A and \(b_i\) in B respectively. Spearman rank correlation coefficient between A and B, denoted by \(r_s(A,B)\), is defined as \(r_s(A,B)=1-\frac{6\sum _{i=1}^n{d_i}^2}{n(n^2-1)},\) where \(d_i=R_i-Q_i\). Obviously, \(-1\le r_s(A,B)\le 1.\)

To test the significance of a correlation, we assume that there is no correlation between A and B. In the case of small samples, that is, the number of samples is less than 30, we can verify the hypothesis directly using the lookup method in Table 7 [48]. When \(|r_s(A,B)|\) is greater than the threshold value of \(r_\alpha\), which indicates that the assumption is rejected, then correlation between A and B is significantly.

Continue the above experiment, \(r_s\)-values of four measure sets on Aa and Ac are compared. For Aa and Ac, the number of four measure sets is 20. Then, from Table 7, we can obtain \(r_{0.05}=0.380\). Since \(|r_s|\) in Aa and Ac exceeds 0.380, we can conclude that the pairwise correlations between these four measures are significant.

Example 5.1

Table 7 Tables of critical values for Spearman rank correlation in which level of significance \(\alpha =0.05\)-one-tailed test

If \(0.7\le r_s(A,B)<1\), \(-1\le r_s(A,B)<-0.7\), \(r_s(A,B)=1\) and \(r_s(A,B)=-1\) then the correlations between A and B are called height positive correlation (for short, HPC), height negative correlation (for short, HNC), completely positive correlation (for short, CNC) and completely positive negative correlation (for short, CPC), respectively. The following conclusions are obtained through calculation (see Tables 8, 9).

Table 8 \(r_s\)-values of four measure sets on Aa
Table 9 \(r_s\)-values of four measure sets on Ac

5.5 Friedman test and Nemenyi test

In this subsection, Friedman test [12] and Nemenyi test are used to further evaluate the performance of the proposed measures.

Friedman test, a nonparametric method to test whether there are significant differences among multiple algorithms by using rank, which is defined as

$$\begin{aligned} \chi _{F}^{2}=\frac{12N}{k(k+1)}\left( \sum _{i=1}^{k}r_{i}^{2}-\frac{k(k+1)^{2}}{4}\right) , \end{aligned}$$

where k, N and \(r_{i}\) are respectively the number of algorithms to be evaluated, the number of samples, and the average ranking of the i-th algorithm. This test is too conservative, which is why it is often replaced by the following statistic

$$\begin{aligned} F_F=\frac{(N-1)\chi _{F}^{2}}{N(k-1)-\chi _{F}^{2}}. \end{aligned}$$

When \(F_F\) is greater than the threshold value of \(F_{\alpha }(k-1, n-1)\), which indicates that the assumption that “ all algorithms have the same performance” is rejected, then the performance of the algorithms is significantly different. Nemenyi test calculates the critical distance \(CD_{\alpha }\) of average rank to judge which algorithm is better. If the corresponding average rank difference of the two algorithms reaches at least a critical distance, the performance of the two algorithms is significantly different. The critical distance \(CD_{\alpha }\) is denoted as \(CD_{\alpha }=q_{\alpha }\sqrt{\frac{k(k+1)}{6N}},\) where \(q_{\alpha }\) and \(\alpha\) are the critical tabulated value and significance level of Nemenyi test, respectively.

According to Figs. 1118, we obtained the ranking of CV-values of the four measurement sets in the eight data (see Tables 10, 11, 12).

Table 10 The ranking of CV-values of the four measure sets on eight datasets with \(\beta =0.02k,k=1,2,3,6,8\)
Table 11 The ranking of CV-values of the four measure sets on eight datasets with \(\beta =0.02k,k=4,7\)
Table 12 The ranking of CV-values of the four measure sets on eight datasets with \(\beta =0.02k,k=5\)

Friedman test was used to introduced whether there were significant differences in the four measures obtained in this paper. Since \(k=4, N=8.\) Then \(k-1=3, ~(k-1)(N-1)=21,\) \(F_{0.05}(3, 21)=3.072\). Therefore, for Table 10, \(F_F \approx 109.29\) and \(F_F>F_{0.05}(3, 21)\). For Table 11, \(F_F\approx 61.67\) and \(F_F>F_{0.05}(3, 21)\) and the results in Table 12 are the same as in Table 11. Therefore, the assumption that “ all algorithms have the same performance” is rejected at \(\alpha =0.05\), then the performance of the obtained algorithms is significantly different. Next, to further illustrate the significant differences between the four measures, Nemenyi test was investigate. Since \(\alpha =0.05\), then \(q_{\alpha }=2.569\) and \(CD_\alpha \approx 1.658\). Based on these tests, we get Figs. 19, 20, where the dots represent the average ranking and the line segments represent the range of \(CD_\alpha\). If the two line segments do not overlap on the X-axis, there is a significant difference between the two uncertainty measures.

Fig. 19
figure 19

Friedman test base on Table 10

Fig. 20
figure 20

Friedman test base on Tables 11 and 12

From Figs. 19, 20, in the case of data with an incomplete rate, the following conclusions can be drawn. (1) As far as performance is concerned, E is superior to G and Er. (2) In terms of performance, there are significant differences between E and Er, and between E and G.

6 An application in attribute reduction

In this section, an application of the proposed measures in attribute reduction is presented.

Definition 6.1

Suppose that (UA) is an ISVIS. Then \(P\subseteq A\) is referred to as consistent, if \(R_P^*=R_A^*\).

Definition 6.2

Suppose that (UA) is an ISVIS. Then \(P\subseteq A\) is referred to as a reduct of A, if P is consistent and \(\forall ~a\in P\), \(P-\{a\}\) is not consistent.

In this paper, the family of all coordination subsets (resp., all reducts) of A is denoted by co(A) (resp., red(A)).

Theorem 6.3

Suppose that (UA) is an ISVIS. Given \(P\subseteq A\). Then the following conditions are equivalent:

(1) \(P\in co(A)\); (2) \(G(P)=G(A)\); (3) \(H(P)=H(A)\); (4) \(E_r(P)=E_r(A)\); (5) \(E(P)=E(A)\).

Proof

(1) \(\Rightarrow\) (2). Clearly. (2) \(\Rightarrow\) (1). Suppose \(G(P)=G(A)\). Then \(\frac{1}{n^2}\sum \limits _{i=1}^n|R_P^*(x_i)|=\frac{1}{n^2}\sum \limits _{i=1}^n|R_A^*(x_i)|.\)

So \(\sum \limits _{i=1}^n(|R_P^*(x_i)|-|R_A^*(x_i)|)=0.\)

Note that \(R_A^*\subseteq R_P^*\). Then \(\forall ~i\), \(R_A^*(x_i)\subseteq R_P^*(x_i)\). This implies that

$$\begin{aligned} \forall ~i,~~|R_P^*(x_i)|-|R_A^*(x_i)|\ge 0. \end{aligned}$$

So \(\forall ~i\),  \(|R_P^*(x_i)|-|R_A^*(x_i)|= 0.\) It follows that \(\forall ~i\),  \(R_P^*(x_i)=R_A^*(x_i).\)

Thus \(R_P^*=R_A^*.\) Hence

$$\begin{aligned} P\in co(A). \end{aligned}$$

(2) \(\Leftrightarrow\) (5). It can be obtained by Theorem 4.16.

(1) \(\Rightarrow\) (3). This is clear.

(3) \(\Rightarrow\) (1). Suppose \(H(P)=H(A)\). Then

$$\begin{aligned} -\sum \limits ^n_{i=1}\frac{1}{n}\log _2\frac{|R_P^*(x_i)|}{n}=-\sum \limits ^n_{i=1}\frac{1}{n}\log _2\frac{|R_A^*(x_i)|}{n}. \end{aligned}$$

So

$$\begin{aligned} \sum \limits _{i=1}^n\log _2\frac{|R_P^*(x_i)|}{|R_A^*(x_i)|}=0. \end{aligned}$$

Note that \(R_A^*\subseteq R_P^*\). Then \(\forall ~i\), \(R_A^*(x_i)\subseteq R_P^*(x_i)\). This implies that

$$\begin{aligned} \forall ~i,~~\log _2\frac{|R_P^*(x_i)|}{|R_A^*(x_i)|}\ge 0. \end{aligned}$$

So \(\forall ~i\),  \(\log _2\frac{|R_P^*(x_i)|}{|R_A^*(x_i)|}= 0.\) It follows that \(\forall ~i\),  \(R_P^*(x_i)=R_A^*(x_i).\)

Thus \(R_P^*=R_A^*.\) Hence \(P\in co(A).\)

(3) \(\Leftrightarrow\) (4). It follows from Theorem 4.12. \(\square\)

Corollary 6.4

Suppose that (UA) is an ISVIS. Given \(P\subseteq A\). Then the following conditions are equivalent:

(1):

\(P\in red(A)\);

(2):

\(G(P)=G(A)\) and \(\forall ~a\in P\), \(G(P-\{a\})\ne G(A)\);

(3):

\(H(P)=H(A)\) and \(\forall ~a\in P\), \(H(P-\{a\})\ne H(A)\);

(4):

\(E_r(P)=E_r(A)\) and \(\forall ~a\in P\), \(E_r(P-\{a\})\ne E_r(A)\);

(5):

\(E(P)=E(A)\) and \(\forall ~a\in P\), \(E(P-\{a\})\ne E(A)\).

Proof

It can be proved by Theorem 6.3. \(\square\)

Below, we study reduction algorithms in an ISVIS based on its uncertainty measurement. By Theorems 4.12 and  4.16, we have

$$\begin{aligned} E_r(P)+H(P)=log_2~n,~G(P)+E(P)=1, \end{aligned}$$

where (UA) is an ISVIS and \(P\subseteq A\). Then, we only need to consider reduction algorithms based on information granulation and information entropy, respectively.

Reduction algorithms based on information granulation and information entropy are given as follows.

figure a
figure b

For Algorithms 1–2, we assume that the number of attributes is m and the number of samples is n. First of all, we need to calculate each attribute \(a_k\) similarity matrix \(s_{ij}^k\), its time complexity and space complexity are both \(O(mn^2)\). Algorithm 1 randomly selects an attribute in each loop and judges whether the attribute can be discarded according to G. If not, the loop is terminated and the reduction set P is obtained. Therefore the worst search time for a reduct will need m evaluations. Algorithm 2 selects an attribute that meets the condition to add P in each loop according to H. If none of the attributes meet the condition, the loop is terminated and the reduction set P is obtained. Therefore the worst search time for a reduct will need \((m^2 +m)/2\) evaluations. Obviously, the overall time complexity of Algorithm 1 and Algorithm 2 is \(O(mn^2+m)\) and \(O(mn^2+m^2 +m)\) respectively. Since the space occupied above can be reused, the total space complexity of Algorithm 1 and Algorithm 2 is \(O(mn^2)\).

6.1 Cluster analysis

In this subsection, in order to verify the effectiveness of the proposed algorithms, t-distributed stochastic neighbor embedding, k-means clustering algorithm ad Mean Shift clustering algorithm are used to cluster and analyze the reducts of the obtained algorithms.

Fig. 21
figure 21

The reduced clustering image of Algorithm 1 on Aa

Fig. 22
figure 22

The reduced clustering image of Algorithm 1 on Ac

Fig. 23
figure 23

The reduced clustering image of Algorithm 1 on De

Fig. 24
figure 24

The reduced clustering image of Algorithm 1 on He

Fig. 25
figure 25

The reduced clustering image of Algorithm 1 on Pc

Fig. 26
figure 26

The reduced clustering image of Algorithm 1 on Sl

In this paper, we give eight data from UCI described in Table 6. Each data can be regarded as an ISVIS. In order to verify that those algorithms are still effective in the case of different data missing, we also conducted experiments with Algorithm 1 and Algorithm 2 according to the incomplete rate from 0.02 to 0.16 with step size of 0.02. Then, for each algorithm, we obtained 8 reducts for each data and listed them in Tables 13, 14, respectively. As the reducts in Tables 13, 14 can be observed, Algorithms 1–2 can effectively reduce the dimension of incomplete data. Therefore, the results obtained from Algorithms 1–2 are employed to cluster analysis.

Table 13 Reduction results of Algorithm 1 at the incomplete rates of \(\beta =0.02k\) \((k=1,2,\ldots ,8)\)
Table 14 Reduction results of Algorithm 2 at the incomplete rates of \(\beta =0.02k,k=1,2,\ldots ,8\)

6.1.1 k-means cluster

we first use k-means clustering algorithm and the data before and after reduction to cluster. Then, we use three indexes, namely silhouette coefficient [27], calinski-harabasz index [1] and daviesbouldin index [9] to evaluate the clustering effect, so as to verify the effectiveness of the proposed algorithms. Among the three indicators, the larger the silhouette coefficient and calinski-harabasz index are, the better clustering effect is, while the daviesbouldin index is on the contrary. In order to make the experiment more reasonable, we set the number of clustering to the number of categories inherent in the data.

Fig. 27
figure 27

The reduced clustering image of Algorithm 1 on Ob

Fig. 28
figure 28

The reduced clustering image of Algorithm 1 on Gw

Fig. 29
figure 29

The reduced clustering image of Algorithm 2 on Aa

Fig. 30
figure 30

The reduced clustering image of Algorithm 2 on Ac

Fig. 31
figure 31

The reduced clustering image of Algorithm 2 on De

Clustering results are visualized by t-distributed stochastic neighbor embedding, as shown in Figs. 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36. The three indices values corresponding to the clustering results of Algorithms 1–2 are shown in Figs. 37, 38, 39, 40, 41, 42, 43, 44.

Fig. 32
figure 32

The reduced clustering image of Algorithm 2 on He

Fig. 33
figure 33

The reduced clustering image of Algorithm 2 on Pc

Fig. 34
figure 34

The reduced clustering image of Algorithm 2 on Sl

Fig. 35
figure 35

The reduced clustering image of Algorithm 1 on Ob

Fig. 36
figure 36

The reduced clustering image of Algorithm 1 on Gw

6.1.2 Mean Shift

we first use Mean Shift clustering algorithm and the data before and after reduction to cluster. Then, we also evaluate the clustering effect with these three indexes to verify the effectiveness of the proposed algorithms.

Fig. 37
figure 37

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on Aa

Fig. 38
figure 38

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on Ac

Fig. 39
figure 39

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on De

Fig. 40
figure 40

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on He

Fig. 41
figure 41

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on Pc

Fig. 42
figure 42

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on Sl

Fig. 43
figure 43

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on Ob

Fig. 44
figure 44

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on Gw

Clustering results are visualized by t-distributed stochastic neighbor embedding, as shown in Figs. 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60. The three indices values corresponding to the clustering results of Algorithms 1–2 are shown in Figs. 61, 62, 63, 64, 65, 66, 67, 68.

Fig. 45
figure 45

The reduced clustering image of Algorithm 1 on Aa

Fig. 46
figure 46

The reduced clustering image of Algorithm 1 on Ac

Fig. 47
figure 47

The reduced clustering image of Algorithm 1 on De

Fig. 48
figure 48

The reduced clustering image of Algorithm 1 on He

6.1.3 Analysis of clustering results

Fig. 49
figure 49

The reduced clustering image of Algorithm 1 on Pc

Fig. 50
figure 50

The reduced clustering image of Algorithm 1 on Sl

Fig. 51
figure 51

The reduced clustering image of Algorithm 1 on Ob

According to 37-44 and 61 -68, the following findings can be made:

  1. (1)

    For Aa and De, the results of the two clustering methods under the three indicators are superior to the original data.

  2. (2)

    For Ac, only the calinski-Harabasz index under the Mean Shift clustering algorithm shows the reducts of Algorithm 2 have little difference with the original data. Besides, other indices indicate all reducts are superior to the original data.

  3. (3)

    For He, the three indices under the Mean Shift clustering algorithm show that all the reducts are better than the original data. In addition, the silhouette coefficient index and daviesbouldin index of Algorithm 2’s R4 and R8 under k-means clustering algorithm are worse than the original data, while other reducts are superior to the original data.

  4. (4)

    For Pc, the three indices under the k-means clustering algorithm show that all the reducts are better than the original data. The calinski-Harabasz index under the Mean Shift clustering algorithm shows that the R2–R4, R6–R8 of Algorithm 1 and the R6–R8 of Algorithm 2 are better than the original data, while the other indicator has little difference with the original data. In addition, the other two indexes indicate that all the reducts are better than the original data.

  5. (5)

    For Sl, the daviesbouldin index under the k-means clustering algorithm indicates that all the reducts are better than the original data. The coefficient index show that only the Re4 of Algorithm 2 has little difference with the original data, while other reducts are superior to the original data. The silhouette coefficient index under the k-means clustering algorithm shows that the Re4, Re5, Re7 and Re8 of Algorithm 1 and Re7 and Re8 of Algorithm 2 are better than the original data. The silhouette coefficient index and daviesbouldin index under the Mean Shift clustering algorithm show that all the reducts are better than the original data, while the other indicator has little difference with the original data.

  6. (6)

    For Ob, the three indices under the k-means clustering algorithm show that all the reducts are better than the original data. The silhouette coefficient index and daviesbouldin index under the Mean Shift clustering algorithm show that all the reducts are better than the original data. The coefficient index show that only the Re4, Re5 and Re6 of Algorithm 1 are better than the original data while other reductions are not much different from the original data.

  7. (7)

    For Gw, the three indices under the k-means clustering algorithm show that all the reducts are better than the original data. The silhouette coefficient index and daviesbouldin index under the Mean Shift clustering algorithm show that all the reducts are better than the original data. The coefficient index indicates that the R2 of Algorithm 1 and the R3, R6 and R7 of Algorithm 2 are worse than the original data.

In a nutshell, the three indices values of those reducts are obviously better than those original data. Therefore, those reducts of Algorithms 1–2 are credible. This also means that the obtained algorithms can effectively perform attribute reduction at different miss rates, so it is of great significance to study the attribute reduction of these two algorithms at different miss rates. In addition, these three indices show that, under the same incomplete rate, the reduct of Algorithm 1 is better than Algorithm 2 in most cases. Therefore, Algorithm 1 can be preferred for attribute reduction to improve efficiency.

Fig. 52
figure 52

The reduced clustering image of Algorithm 1 on Gw

Fig. 53
figure 53

The reduced clustering image of Algorithm 2 on Aa

Fig. 54
figure 54

The reduced clustering image of Algorithm 2 on Ac

Fig. 55
figure 55

The reduced clustering image of Algorithm 2 on De

Fig. 56
figure 56

The reduced clustering image of Algorithm 2 on He

7 Comparison and discussion

Fig. 57
figure 57

The reduced clustering image of Algorithm 2 on Pc

Fig. 58
figure 58

The reduced clustering image of Algorithm 2 on Sl

Fig. 59
figure 59

The reduced clustering image of Algorithm 1 on Ob

Fig. 60
figure 60

The reduced clustering image of Algorithm 1 on Gw

In this subsection, we evaluate the performance of the proposed method and existing methods. We consider comparing our algorithm with two other algorithms. Dai et al. [8] brought up the two tools of entropy measure and granularity measure to measure the uncertainty of SVISs. They also explored the problem of attribute reduction for SVISs and proposed a representative attribute selection algorithm (FRSM) based on fuzzy rough sets. Liu et al. [21] researched feature selection for a set-valued decision IS from the view of dominance relations. They proposed a representative attribute selection algorithm (DRM) based on dominance relation.

Fig. 61
figure 61

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on Aa

Fig. 62
figure 62

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on Ac

Fig. 63
figure 63

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on De

Fig. 64
figure 64

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on He

Fig. 65
figure 65

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on Pc

Table 15 shows the reduction results of FRSM and DRM for the data in Table 6. By comparing the results in Tables 13, 14 and 15, we can see that the attributes of the reduction set obtained by Algorithms 1–2 are less than FRSM and DRM by in most case.

Fig. 66
figure 66

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on Sl

Fig. 67
figure 67

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on Ob

Fig. 68
figure 68

The clustering evaluation coefficient of Algorithm 1 and Algorithm 2 on Gw

Table 15 Reduction results of FRSM and DRM

For Algorithms 1–2, we choose the reducts when the miss rate is 8%, and compare it with the reducts of FRSM and DRM. Through Mean Shift clustering algorithm and the three evaluation indexes, we get Fig. 69.

Fig. 69
figure 69

The clustering evaluation coefficient of four algorithms

From Fig. 69, the following conclusions can be drawn. (1) The silhouette coefficient index shows that Algorithms 1–2 are superior to the other two algorithms in Aa, De, He and Ob. In Ac and Gw, Algorithm 1 are superior to the other algorithms. Besides, in the other data, the results of the four algorithms are not very different. (2) The calinski-Harabasz index shows that Algorithms 1–2 are superior to the other two algorithms in Ob. In He, Algorithm 1 are superior to the other algorithms. What’s more, in the other data, the results of the four algorithms are not very different. (3) The daviesbouldin index shows that Algorithms 1–2 are superior to the other two algorithms in Aa, Ac, De and Ob. In the other data, the results of the four algorithms are not very different.

Therefore, Algorithms 1–2 are more effective in data reduction than FRSM and DRM algorithm. So it is meaningful to explore the application of ISVISs.

8 Conclusions

In this paper, incomplete set-valued data has been viewed as ISVISs. By this way, the similarity degree between information values on each attribute in incomplete set-valued data has been presented. The tolerance relation induced by each subsystem in an ISVIS has been given and rough approximations based on this relation have been investigated. To explore the potential information of this data, four methods for measuring uncertainty in incomplete set-valued data have been studied, which is information granulation, information amount, rough entropy and information entropy respectively. The validity of the proposed measures has been statistically analyzed. It can be verified that the obtained measures can effectively measure the uncertainty of incomplete set-valued data with different incomplete rates. In order to study the attribute reduction of incomplete set-valued data, Algorithms 1 and 2 have been proposed based on information granulation and information entropy, respectively, and their validity has been analyzed and verified by k-means clustering algorithm and Mean Shift clustering algorithm. It is worth mentioning that the incomplete rate has been adopted. In the future, we will further explore more effective reduction methods for incomplete set-valued data.