Neighborhood attribute reduction approach to partially labeled data

Liu, Keyu; Tsang, Eric C. C.; Song, Jingjing; Yu, Hualong; Chen, Xiangjian; Yang, Xibei

doi:10.1007/s41066-018-00151-5

Neighborhood attribute reduction approach to partially labeled data

Original Paper
Published: 11 December 2018

Volume 5, pages 239–250, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Granular Computing Aims and scope Submit manuscript

Neighborhood attribute reduction approach to partially labeled data

Download PDF

Keyu Liu¹,
Eric C. C. Tsang²,
Jingjing Song¹,
Hualong Yu¹,
Xiangjian Chen¹ &
…
Xibei Yang¹

249 Accesses
8 Citations
Explore all metrics

Abstract

Presently, from the viewpoint of rough set, most of the attribute reductions are only suitable for analyzing samples with complete labels. However, in many real-world applications, it is difficult to acquire the detailed labels of all samples, it follows that many attribute reductions may be ineffective for data with both labeled and unlabeled samples, i.e., partially labeled data. To fill such a gap, the attribute reduction is explored by neighborhood rough set over partially labeled data. First, two different measurements are combined for evaluating the importance of attribute, which comes from the labeled and unlabeled samples, respectively. Second, a heuristic algorithm is re-designed using such combined importance for computing reduct. Finally, by considering several different ratios of missing labels over UCI datasets, the experimental results demonstrate that the reducts derived by our approach not only reduce the degree of uncertainty, but also offer us better classification performance. Therefore, the main contribution of this paper is to construct an effective attribute reduction strategy for partially labeled data. Moreover, this research also suggests new applications for considering attribute reduction problems in complex data.

Co-training Based Attribute Reduction for Partially Labeled Data

Rough sets-based tri-trade for partially labeled data

Article 11 January 2023

Unsupervised attribute reduction based on neighborhood dependency

Article 24 August 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Rough set theory, introduced by Pawlak (1992), is an effective tool for handling the vagueness, imprecision and uncertainty (Zadeh 1965) in data. With more than 30 years of development, such method has been widely applied to Feature Selection (Min and Xu 2016; Swiniarski and Skowron 2003; Wang et al. 2018), Pattern Recognition (Dai et al. 2013; Hu et al. 2016), Granular Computing (Huang and Li 2018; Pedrycz and Chen 2011, 2015; Peter et al. 2003; Polkowski and Artiemjew 2015; Wang 2017; Wang et al. 2017; Zhi and Li 2018), Knowledge Discovery (Mi et al. 2004; Wu et al. 2016) and so on. Specially, as what has been pointed out by Chen et al. (2012), attribute reduction (Ju et al. 2017; Xu et al. 2016) has been considered as one of the most representative topics in rough set theory, which can be distinguished from other techniques of Feature Selection. This is mainly because: (1) attribute reductions have clear semantic explanations; (2) many measurements developed in rough set can be used to design constraints in attribute reductions.

Presently, it has been reported that attribute reduction aims to remove the redundant attributes with a given constraint. Various measurements such that approximate quality (Pawlak and Skowron 2007), conditional entropy (Hu et al. 2006) and so on have been employed to define constraints. Note that most of the measurements are generally derived from the relationship between conditional attributes and decision attribute. Therefore, the values over decision attribute, i.e., labels are required for exploring attribute reduction. From this point of view, most of the attribute reductions may only be performed over data without missing labels.

In many real-world applications (Chen and Chang 2011; Chen and Chen 2009; Chen and Tanuwijaya 2011; Wang and Chen 2008), acquiring complete data is a difficult task. Generally speaking, the difficulties in labeling samples include two aspects. On the one hand, the correct labels may be unknown. For example, when adopting computer technology to aid medical institution in analyzing medical image, it is practically impossible for doctors or even experienced experts to locate the nidus (Zhou and Li 2010). Consequently, the cases of illness (labels) may be unknown. On the other hand, labeling samples is hard, expensive and time consuming as it costs too many efforts. For instance, in this era abounding with social media and connectivity, web users are becoming increasingly obsessed with interacting and sharing with their phones or computers. Along with this process, data are becoming extremely complex and it is hard for network supervisor to label all samples. To sum up, data with both labeled and unlabeled samples can be seen everywhere, and such type of data is referred to as the partially labeled data (Dai et al. 2017; Liu et al. 2018).

Up to now, most of the previous attribute reductions fail to consider the partially labeled data. Therefore, the motivations of this paper are: (1) how to handle the partially labeled data for realizing attribute reduction; (2) how to preserve or even improve the various performances of the reducts derived from the partially labeled data. For such reasons, we propose a novel attribute reduction approach to partially labeled data. The main process of our approach includes two steps: (1) considering that the approximate quality and binary relation can be used to evaluate the importance of attribute, which comes from labeled and unlabeled samples, respectively, the new importance can be expressed by combining such two measurements; (2) the significance function [also fitness function (Yang and Yao 2018)] can be constructed based on such new importance, correspondingly, a heuristic algorithm can be re-designed for computing reduct over partially labeled data.

Note that the neighborhood rough set (Hu et al. 2008b) is employed to realize the topic addressed in this paper. It is mainly because compared with other rough sets, neighborhood rough set has obvious advantages. Take the comparison between neighborhood rough set and fuzzy rough set (Dubois and Prade 1990) as an example, it is well known that (1) similar to classical rough set, the approximations obtained by neighborhood rough set are clear, while those obtained by fuzzy rough set are still fuzzy; (2) neighborhood rough set provides us a framework for handling continuous or even mixed data (Hu et al. 2008a), while fuzzy rough set is limited to continuous data.

The main contribution of our research includes two aspects: (1) a new importance for attribute reduction over partially labeled data is proposed, a heuristic algorithm based on such importance can be re-designed for computing reduct over partially labeled data; (2) through introducing the neighborhood rough set into our approach, the experimental results over several UCI datasets demonstrate that our approach is effective in selecting qualified attributes from partially labeled data.

The rest of this paper is organized as follows. Section 2 reviews some basic notations of rough set and definition of attribute reduction. The new importance for partially labeled data and the corresponding neighborhood attribute reduction approach are presented in Sect. 3. Experiments are conducted and the experimental results are analyzed in Sect. 4. Finally, we then conclude with some remarks and perspectives for future work in Sect. 5.

2 Preliminary knowledge

2.1 Neighborhood rough set

As one of the most important expanded models of classical rough set (Dou et al. 2016; Skowron and Stepaniuk 1996; Wojna 2005; Xu et al. 2017; Yang et al. 2011a, b), neighborhood rough set has been widely concerned. Since different radii used in neighborhood rough set may characterize the similarity between samples through different scales, neighborhood rough set is then more flexible and more adaptive for complex data (Hu et al. 2008a).

In rough set theory, a decision system can be described by a pair such that $\text{DS}=\langle U, AT \cup \{d\}\rangle$, in which the universe U is a nonempty and finite set of samples, AT is a nonempty and finite set of conditional attributes and d is the decision attribute. Furthermore, $\forall x \in U, d(x)$ indicates the label of sample x.

Given a decision system DS, we assume that the values of decision attribute are discrete, then an equivalence relation over d can be defined such that $\text{IND}_{d}=\{(x,y)\in U \times U: d(x)=d(y)\}$. By $\text{IND}_{d}$, a partition $U/ \text{IND}_{d}=\{X_{1}, X_{2},\ldots , X_{n}\}$ is derived. In rough set theory, $X_{k} \in U/ \text{IND}_{d}$ is called the k-th decision class. Specially, the decision class which contains sample x is denoted by $[x]_{d}$.

Given a decision system DS, $\forall x\in U$ and $\forall A \subseteq AT$, then given a radius $\sigma \in [0,1]$, the size of neighborhood of x related to A can be defined as (Hu et al. 2008b):

$$\begin{aligned} \sigma _{A}(x)&= \min _{y\in U\wedge y\ne x}\triangle _{A}(x,y)+\sigma \cdot \left( \max _{y\in U\wedge y\ne x}\triangle _{A}(x,y)\right.\\&\quad \left. -\min _{y\in U\wedge y\ne x}\triangle _{A}(x,y)\right) , \end{aligned}$$

(1)

where $\triangle _{A}$ is one distance function, and the Euclidean distance is employed to derive $\triangle _{A}(x,y)$ in this paper. Note that the construction of Eq. (1) aims to avoid an undesirable case: given a sample $x \in U$, the neighborhood of x may contain only x itself if smaller radius is used. That is, any two samples can be distinguished to each other and then it is meaningless for learning process.

Following Eq. (1), the neighborhood relation in DS can be defined as:

$$\begin{aligned} \delta _{A}=\{(x,y)\in U\times U:\triangle _{A}(x,y)\le \sigma _{A}(x)\}. \end{aligned}$$

(2)

Correspondingly, the neighborhood of x related to A can be defined as:

$$\begin{aligned} \delta _{A}(x)=\{y\in U: \triangle _{A}(x,y)\le \sigma _{A}(x)\}. \end{aligned}$$

(3)

Definition 1

Given a decision system DS, $\forall A \subseteq AT$ and $\forall X_{k} \in U/{\mathrm{IND}}_{d}$, the neighborhood lower and upper approximations of $X_{k}$ in terms of A are defined as:

$$\begin{aligned}\underline{X_{k}}_{A}=\{x\in U:\delta _{A}(x)\subseteq X_{k}\}; \end{aligned}$$

(4)

$$\begin{aligned}\overline{X_{k}}_{A}=\{x\in U:\delta _{A}(x)\cap X_{k}\ne {\emptyset }\}. \end{aligned}$$

(5)

The pair $[\underline{X_{k}}_{A},\overline{X_{k}}_{A}]$ is called a neighborhood rough set of $X_{k}$.

2.2 Some measurements

Up to now, many measurements have been proposed to describe the certainty or uncertainty in data from the viewpoint of rough set. Similar to classical rough set, approximate quality (Pawlak 1992) can also be used for describing the degree of certainty in neighborhood rough set. The definition is as follows.

Definition 2

Given a decision system DS, $\forall A \subseteq AT$, the approximate quality of d related to A is defined as:

$$\begin{aligned} \gamma _{A}(d)=\frac{\left| \bigcup ^{n}_{k=1}\underline{X_{k}}_{A}\right| }{|U|}, \end{aligned}$$

(6)

where |X| denotes the cardinality of set X.

Obviously, $0\le \gamma _{A}(d) \le 1$ holds. If the value of approximate quality is higher, then the certainty in data is regarded as higher.

Besides approximate quality, conditional entropy is also another widely accepted measurement in rough set, which can characterize the uncertainty. Presently, many definitions of conditional entropy (Hu et al. 2010; Wei et al. 2013; Zhang et al. 2016; Zhu and Wen 2012) have been proposed with respect to different requirements. A typical representation of conditional entropy (Hu et al. 2006) is shown in Definition 3.

Definition 3

Given a decision system DS, $\forall A \subseteq AT$, the conditional entropy of d related to A is defined as:

$$\begin{aligned} {\mathrm{ENT}}_{A}(d)=-\frac{1}{|U|}\sum _{x\in U}\log \frac{|\delta _{A}(x)\cap [x]_{d}|}{|\delta _{A}(x)|}. \end{aligned}$$

(7)

Different from approximate quality, it is not difficult to observe that the certainty is higher when the value of conditional entropy is lower.

2.3 Neighborhood classifier

Classifier can be used to evaluate the generalization performance of attributes (Chen et al. 2001). In neighborhood rough set, the neighborhood classifier proposed by Hu et al. (2008b) is frequently used. Given a test sample, neighborhood classifier uses the majority rule over labels of neighbors to determine the label of such test sample. The detailed process is shown in Algorithm 1.

2.4 Attribute reduction

In general, the purpose of attribute reduction is to delete the redundant or irrelevant attributes, and then the rest can still meet the constraint. Based on such purpose, the attribute reduction with respect to the approximate quality can be defined as follows.

Definition 4

Given a decision system DS, $\forall A \subseteq AT$, A is referred to as a $\gamma$-reduct if and only if

1.
$\gamma _{A}(d) \ge \gamma _{AT}(d)$;
2.
$\forall B \subset A, \gamma _{B}(d) < \gamma _{A}(d)$.

Based on Definition 4, the approach to find reduct is not only worth to be addressed but also important. Up to now, the heuristic algorithm based on greedy strategy for finding reduct has been widely used because of its lower time consuming (Wang et al. 2016; Yang et al. 2019). In the framework of heuristic algorithm, the most significant attribute in each iteration is determined by a significance function (Yang and Yao 2018). For example, the significance function in terms of approximate quality is as follows.

Definition 5

Given a decision system DS, if $A \subset AT$, then $\forall a \in AT-A$, its significance with respect to approximate quality is :

$$\begin{aligned} {\mathrm{Sig}}_{\gamma }(a, A, d)=\gamma _{A\cup \{a\}}(d)-\gamma _{A}(d). \end{aligned}$$

(8)

The detailed process of heuristic approach to compute reduct in terms of approximate quality is shown in Algorithm 2.

3 Attribute reduction for partially labeled data

Many significant functions such as the one used in Algorithm 2 are derived from labeled samples. It follows that such significant function can be employed to evaluate the significance of each attribute in terms of labeled samples. However, if unlabeled samples exist in data, then it is impossible for us to obtain the expected measurement. Moreover, if unlabeled samples are ignored, then the information given by these samples may be wasted. For such reason, attribute reduction we mentioned in the above section cannot be directly used for partially labeled data.

Hence, considering the importance of attribute based on both labeled and unlabeled samples, we will propose a combined importance. In such strategy, the neighborhood approximate quality is used to measure importance of attribute in terms of labeled samples while the neighborhood relation is employed to measure the importance of attribute in terms of unlabeled samples. The general process of constructing the new importance is shown in Fig. 1.

Through many experiments, it is trivial to observe that if the number of used attributes is increasing, then the value of approximate quality tends to be higher while the cardinality of neighborhood relation tends to be lower. Therefore, the higher the importance of an attribute, the greater the value of approximate quality and the finer the neighborhood relation. From this point of view, the new importance is defined in Definition 6.

Definition 6

Given a decision system DS, let $U=l\cup ul$, where l is the set of labeled samples and $ul$ is the set of unlabeled samples, $\forall A \subseteq AT$, the importance of A is defined by:

$$\begin{aligned} {\text{IMP}}(A)=\alpha \cdot \frac{\gamma ^{l}_{A}(d)}{\gamma ^{l}_{AT}(d)}+\beta \cdot \frac{|\delta ^{ul}_{AT}|}{|\delta ^{ul}_{A}|}, \end{aligned}$$

(9)

where $\gamma ^{l}_{A}(d)$, $\gamma ^{l}_{AT}(d)$ are computed by labeled samples in l, $|\delta ^{ul}_{AT}|$, $|\delta ^{ul}_{A}|$ are obtained by unlabeled samples in $ul$. In addition, $\alpha +\beta =1$ and their values are in [0, 1].

Example 1

Let us consider the following example of partially labeled data with ten samples and four conditional attributes. Among these ten samples, seven samples are labeled and three samples are unlabeled.

Suppose that we want to compute the $\text{IMP}(\{a_{1}\})$ in Table 1 by Definition 6 when the radius used in neighborhood is 0.1.

1.
For the first step of computation, we have
$$\begin{aligned} \gamma ^{l}_{AT}(d) = 0.2857 \quad {\textit{and}}\quad |\delta ^{ul}_{AT}|=6. \end{aligned}$$
2.
For the second step of computation, we have
$$\begin{aligned} \gamma ^{l}_{\{a_{1}\}}(d) = 0\quad {\textit{and}}\quad |\delta ^{ul}_{\{a_{1}\}}|=6. \end{aligned}$$

Therefore, if $\alpha$ and $\beta$ are set to 0.8 and 0.2, then $\text{IMP} (\{a_{1}\}) = 0.8\cdot (0/0.2857) + 0.2\cdot (6/6) = 0.2$ is obtained by Eq. (9).

By Definition 6, the attribute reduction for partially labeled data with respect to IMP can be defined as follows.

Table 1 A small example of partially labeled data

Full size table

Definition 7

Given a decision system DS, $\forall A \subseteq AT$, A is referred to as an IMP-reduct if and only if

1.
IMP$(A) \ge$ IMP$(AT)$;
2.
$\forall B \subset A$, IMP(B) < IMP(A).

Immediately, the significant function designed for IMP and the attribute reduction algorithm based on IMP can be constructed as follows. Note that to simplify our algorithm and reduce its time consumption, we focus on the suboptimal solution instead of the optimum solution (Yang et al. 2013, 2014). Therefore, the second condition in Definition 7 is not taken into account in this paper.

Definition 8

Given a decision system DS, if $A \subset AT$, then $\forall a \in AT-A$, its significance with respect to IMP is:

$$\begin{aligned} {\text{Sig}}_{\mathrm{IMP}}(a, A)=\text{IMP}(A\cup \{a\})-\text{IMP}(A). \end{aligned}$$

(10)

4 Experiments

4.1 Datasets

To evaluate various performances of our CIMR, 12 real-world datasets from UCI machine learning repository have been employed in the following experiment. Table 2 summarizes some detailed statistics of these datasets used in our experiments.

Table 2 Characteristics of the experimental datasets

Full size table

4.2 Experimental setup

All the experiments have been carried out on a personal computer with Window7, Intel Core i5-3337U CPU (1.80 GHz) and 4.00 GB memory. The programming language is MATLAB R2014a.

In our experiments, tenfold cross-validation is employed for evaluating the effectiveness of different methods. And for each train set, we randomly divide it into two groups (groups of labeled and unlabeled samples) by ratios of 7:3, 5:5 and 3:7, i.e., three different ratios (30%, 50% and 70%) of missing labels. Moreover, each complete train set without missing labels (0% of missing labels) is retained for CAQR. And then we appoint ten different neighborhood radii such that $\sigma = 0.03, 0.06,\ldots , 0.3.$

To set $\alpha$ and $\beta$ for a better performance of our approach, we conduct CIMR with different settings beforehand. Tenfold cross-validation is also employed, and then CIMR is performed over each train set. Through executing those derived reducts over each test set, the NEC based classification accuracies are compared. Note that for each setting of $\alpha$ and $\beta$, with such three ratios of missing labels, three reducts by CIMR can be derived from each train set. Correspondingly, three groups of classification accuracies can be obtained from each test set. And the averages of these classification accuracies are mainly compared as shown in Table 3. It is not difficult to observe that if $\alpha$ and $\beta$ are set to be 0.8 and 0.2, then CIMR may be more effective. From this point of view, we set $\alpha$ and $\beta$ to be 0.8 and 0.2 in our experiments.

Table 3 Classification accuracies among different settings of $\alpha$ and $\beta$ (greater values are in bold)

Full size table

4.3 Experimental results and analyses

In the following, with respect to three ratios (30%, 50% and 70%) of missing labels, the reducts derived by CAQR are denoted as 30%-AQR, 50%-AQR and 70%-AQR, respectively. Similar to such representation, the reducts obtained by CIMR are indicated as 30%-IMR, 50%-IMR and 70%-IMR, respectively. Note that CAQR is only executed over the labeled samples and the reduct derived by complete data with 0% of missing labels is denoted as 0%-AQR.

In our experiments, the lengths, values of approximate quality, values of conditional entropy and NEC based classification accuracies derived by reducts will be compared. The detailed experimental results are shown in the following.

With an investigation of Table 4, it is not difficult to observe: (1) with the same ratios of missing labels, IMRs are longer than AQRs; (2) if CAQR is executed over the complete data, the lengths of 0%-AQRs are still smaller. It is mainly because both the value of approximate quality derived by labeled samples and the neighborhood relation obtained by unlabeled samples should be considered synchronously, i.e., the constraint used in CIMR is stricter than that in CAQR.

Table 4 Comparisons among lengths of reducts (great values are in bold)

Full size table

By Fig. 2, it is not difficult to observe the following.

1.
In general, the values of approximate quality derived by IMRs are higher than or equal to those by AQRs. As shown in Table 4, IMRs are greater than AQRs. Therefore, we may conclude that a reduct with more attributes may bring us higher value of approximate quality.
2.
If the radius is greater, then the derived value of approximate quality tends to be lower. It may because that with the greater radius, the value of approximate quality derived by raw data tends to be lower, which may lead to the looser constraints both in CAQR and CIMR. It follows that the corresponding reducts may offer us worse performances in improving the value of approximate quality.

By Fig. 3, the values of conditional entropy derived by IMRs are lower than or equal to those by AQRs. By Table 4, we may conclude that a reduct with more attributes may bring us lower value of conditional entropy. Moreover, similar to the results related to the values of approximate quality, it is easily to know that if the used radius is greater, then the value of conditional entropy derived by reduct may tend to be higher.

By Fig. 4, we can observe that with the same ratios of missing labels, the classification accuracies derived by IMRs are higher than or equal to those by AQRs. It may because in the process of computing reducts, CAQR ignores the unlabeled samples so that samples used in CIMR is more. Therefore, if the complete data are used, then the classification accuracies derived by 0%-AQRs may be higher.

With a thorough investigation of Figs. 2, 3 and 4, we can observe that from the viewpoint of used ratio, when the ratio of missing labels is 30%, the 30%-IMRs offer us the better performances. It may imply that although we have found an effective algorithm to deal with partially labeled data, if the ratio of missing labels is too high, then our algorithm will become much less effective.

4.4 Statistics comparisons among the results

In the following, to further analyze the previous results from the viewpoint of statistics, the one-way analysis of variance (one-way ANOVA) (Fisher 1921) will be employed for comparing such results.

The one-way ANOVA can be used for comparing the averages of two or more groups of data. In the context of this paper, such method is used to rank the differences in performances of two different types of reducts. For example, for the ratios of missing labels 30%, with 10 different neighborhood radii, 10 different 30%-IMRs can be derived, then 10 values of approximate quality can be collected into a group. Similarly, 10 values of approximate quality in terms of 10 different 30%-AQRs can be collected into the other group. Immediately, the difference of the values of approximate quality in such two groups can be compared by the one-way ANOVA.

Note that the returned p value in the one-way ANOVA is under the null hypothesis that these groups are drawn from populations with the same average. If p value is higher than the default 5% significance level (0.05), it indicates that the averages of these groups are similar; instead, the averages of these groups are significantly different. The detailed results of p values are shown in Tables 5, 6 and 7.

Table 5 p values of one-way ANOVA for comparing the values of approximate quality derived by reducts

Full size table

Table 6 p values of one-way ANOVA for comparing the values of conditional entropy derived by reducts

Full size table

Table 7 p values of one-way ANOVA for comparing classification accuracies of reducts

Full size table

Following the results shown in Tables 5, 6 and 7, it is not difficult to observe that in most cases, the returned p values are higher than 0.05. Therefore, we may conclude that the performances of IMRs derived by our approach cannot be worse than those of AQRs.

5 Conclusions and future perspectives

In this paper, we have designed a new algorithm of finding reduct for partially labeled data. The main contributions of this paper are: (1) a new importance for evaluating attribute is proposed, and such importance can be used for searching suitable attributes over partially labeled data; (2) through various comparative experiments, the final results indicate that our approach is effective in handling partially labeled data. The following topics deserve our further investigations.

1.
We have only realized our algorithm with the concept of approximate quality in this paper, and some other measurements, such as conditional entropy and neighborhood decision error rate, will be further applied.
2.
Attribute reduction can be considered as the previous step of data processing, and classification performances of different classifiers based on our reduct will be further explored.

References

Chen SM, Chang YC (2011) Weighted fuzzy rule interpolation based on GA-based weight-learning techniques. IEEE Trans Fuzzy Syst 19:729–744
Google Scholar
Chen SM, Chen JH (2009) Fuzzy risk analysis based on ranking generalized fuzzy numbers with different heights and different spreads. Expert Syst Appl 36:6320–6334
Google Scholar
Chen SM, Tanuwijaya K (2011) Fuzzy forecasting based on high-order fuzzy logical relationships and automatic clustering techniques. Expert Syst Appl 38:15425–15437
Google Scholar
Chen SM, Lee SH, Lee CH (2001) A new method for generating fuzzy rules from numerical data for handling classification problems. Appl Artif Intell 15:645–664
Google Scholar
Chen DG, Zhao SY, Zhang L, Yang YP, Zhang X (2012) Sample pair selection for attribute reduction with rough set. IEEE Trans Knowl Data Eng 24:2080–2093
Google Scholar
Dai JH, Wang WT, Xu Q (2013) An uncertainty measure for incomplete decision tables and its applications. IEEE Trans Cybern 43:1277–1289
Google Scholar
Dai JH, Hu QH, Zhang JH, Hu H, Zheng NG (2017) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybern 47:2460–2471
Google Scholar
Dou HL, Yang XB, Song XN, Yu HL, Wu WZ, Yang JY (2016) Decision-theoretic rough set: a multicost strategy. Knowl Based Syst 91:71–83
Google Scholar
Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets. Int J Gen Syst 17:191–209
MATH Google Scholar
Fisher RA (1921) On the “probable error” of a coefficient of correlation. Metron 1:3–32
Google Scholar
Hu QH, Yu DR, Xie ZX, Liu JF (2006) Fuzzy probabilistic approximation spaces and their information measures. IEEE Trans Fuzzy Syst 14:191–201
Google Scholar
Hu QH, Liu JF, Yu DR (2008a) Mixed feature selection based on granulation and approximation. Knowl Based Syst 21:294–304
Google Scholar
Hu QH, Yu DR, Xie ZX (2008b) Neighborhood classifiers. Expert Syst Appl 34:866–876
Google Scholar
Hu QH, Zhang L, Chen DG, Pedrycz W, Yu DR (2010) Gaussian kernel based fuzzy rough sets: model, uncertainty measures and applications. Int J Approx Reason 51:453–471
MATH Google Scholar
Hu J, Li TR, Wang HJ, Fujita H (2016) Hierarchical cluster ensemble model based on knowledge granulation. Knowl Based Syst 91:179–188
Google Scholar
Huang B, Li HX (2018) Distance-based information granularity in neighborhood-based granular space. Granul Comput 3:93–110
Google Scholar
Ju HR, Li HX, Yang XB, Zhou XZ, Huang B (2017) Cost-sensitive rough set: a multi-granulation approach. Knowl Based Syst 123:137–153
Google Scholar
Liu KY, Yang XB, Yu HL, Mi JS, Wang PX, Chen XJ (2018) Rough set based semi-supervised feature selection via ensemble selector. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2018.11.034
Article Google Scholar
Mi JS, Wu WZ, Zhang WX (2004) Approaches to knowledge reduction based on variable precision rough set model. Inf Sci 159:255–272
MathSciNet MATH Google Scholar
Min F, Xu J (2016) Semi-greedy heuristics for feature selection with test cost constraints. Granul Comput 1:199–211
Google Scholar
Pawlak Z (1992) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht
MATH Google Scholar
Pawlak Z, Skowron A (2007) Rough sets: some extensions. Inf Sci 177:28–40
MathSciNet MATH Google Scholar
Pedrycz W, Chen SM (2011) Granular computing and intelligent systems: design with information granules of higher order and higher type. Springer, Heidelberg
Google Scholar
Pedrycz W, Chen SM (2015) Granular computing and decision-making: interactive and iterative approaches. Springer, Heidelberg
Google Scholar
Peter JF, Skowron A, Synak P, Ramanna S (2003) Rough sets and information granulation. In: Proceedings 10th international fuzzy systems association world congress, Istanbul, Turkey, pp 370–377
Polkowski L, Artiemjew P (2015) Granular computing in decision approximation: an application of rough mereology. Springer, Heidelberg
MATH Google Scholar
Skowron A, Stepaniuk J (1996) Tolerance approximation spaces. Fundamenta Informaticae 27:245–253
MathSciNet MATH Google Scholar
Swiniarski W, Skowron A (2003) Rough set methods in feature selection and recognition. Pattern Recognit Lett 24:83–849
MATH Google Scholar
Wang GY (2017) Dgcc: data-driven granular cognitive computing. Granul Comput 2:343–355
Google Scholar
Wang HY, Chen SM (2008) Evaluating students’ answerscripts using fuzzy numbers associated with degrees of confidence. IEEE Trans Fuzzy Syst 16:403–415
Google Scholar
Wang CZ, Shao MW, He Q, Qian YH, Qi YL (2016) Feature subset selection based on fuzzy neighborhood rough sets. Knowl Based Syst 111:173–179
Google Scholar
Wang GY, Yang J, Xu J (2017) Granular computing: from granularity optimization to multi-granularity joint problem solving. Granul Comput 2:105–120
Google Scholar
Wang CZ, Hu QH, Wang XZ, Chen DG, Qian YH, Dong Z (2018) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst 29:2986–2999
MathSciNet Google Scholar
Wei W, Liang JY, Wang JH, Qian YH (2013) Decision-relative discernibility matrices in the sense of entropies. Int J Gen Syst 42:721–738
MathSciNet MATH Google Scholar
Wojna A (2005) Analogy-based reasoning in classifier construction. Trans Rough Sets IV 3700:277–374
MathSciNet MATH Google Scholar
Wu WZ, Qian YH, Li TJ, Gu SM (2016) On rule acquisition in incomplete multi-scale decision tables. Inf Sci 378:282–302
MathSciNet MATH Google Scholar
Xu SP, Yang XB, Yu HL, Yu DJ, Yang JY, Tsang ECC (2016) Multi-label learning with label-specific feature reduction. Knowl Based Syst 104:52–61
Google Scholar
Xu WH, Li WT, Zhang XT (2017) Generalized multigranulation rough sets and optimal granularity selection. Granul Comput 2:271–288
Google Scholar
Yang XB, Yao YY (2018) Ensemble selector for attribute reduction. Appl Soft Comput 70:1–11
Google Scholar
Yang XB, Song XN, Dou HL, Yang JY (2011a) Multi-granulation rough set: from crisp to fuzzy case. Ann Fuzzy Math Inform 1:55–70
MathSciNet MATH Google Scholar
Yang XB, Zhang M, Dou HL, Yang JY (2011b) Neighborhood systems-based rough sets in incomplete information system. Knowl Based Syst 24:858–867
Google Scholar
Yang XB, Qi YS, Song XN, Yang JY (2013) Test cost sensitive multigranulation rough set: model and minimal cost selection. Inf Sci 250:184–199
MathSciNet MATH Google Scholar
Yang XB, Qi Y, Yu HL, Song XN, Yang JY (2014) Updating multigranulation rough approximations with increasing of granular structures. Knowl Based Syst 64:59–69
Google Scholar
Yang XB, Liang SC, Yu HL, Gao S, Qian YH (2019) Pseudo-label neighborhood rough set: measures and attribute reductions. Int J Approx Reason 105:112–129
MathSciNet MATH Google Scholar
Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353
MATH Google Scholar
Zhang X, Mei CL, Chen DG, Li JH (2016) Feature selection in mixed data: a method using a novel fuzzy rough set-based information entropy. Pattern Recognit 56:1–15
MATH Google Scholar
Zhi HL, Li JH (2018) Granule description based on positive and negative attributes. Granul Comput 3:1–14
MathSciNet Google Scholar
Zhou ZH, Li M (2010) Semi-supervised learning by disagreement. Knowl Inf Syst 24:415–439
MathSciNet Google Scholar
Zhu P, Wen QY (2012) Information-theoretic measures associated with rough set approximations. Inf Sci 212:33–43
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (nos. 61572242, 61503160, 61502211), Macau Science and Technology Development Fund (no. 081/2015/A3).

Author information

Authors and Affiliations

School of Computer, Jiangsu University of Science and Technology, Zhenjiang, 212003, Jiangsu, People’s Republic of China
Keyu Liu, Jingjing Song, Hualong Yu, Xiangjian Chen & Xibei Yang
Faculty of Information Technology, Macau University of Science and Technology, Macau, 519020, People’s Republic of China
Eric C. C. Tsang

Authors

Keyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Eric C. C. Tsang
View author publications
You can also search for this author in PubMed Google Scholar
Jingjing Song
View author publications
You can also search for this author in PubMed Google Scholar
Hualong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangjian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xibei Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xibei Yang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, K., Tsang, E.C.C., Song, J. et al. Neighborhood attribute reduction approach to partially labeled data. Granul. Comput. 5, 239–250 (2020). https://doi.org/10.1007/s41066-018-00151-5

Download citation

Received: 23 July 2018
Accepted: 04 December 2018
Published: 11 December 2018
Issue Date: April 2020
DOI: https://doi.org/10.1007/s41066-018-00151-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Neighborhood attribute reduction approach to partially labeled data

Abstract

Similar content being viewed by others

Co-training Based Attribute Reduction for Partially Labeled Data

Rough sets-based tri-trade for partially labeled data

Unsupervised attribute reduction based on neighborhood dependency

1 Introduction