Abstract
Rough set theory is an effective supervised learning model for labeled data. However, it is often the case that practical problems involve both labeled and unlabeled data. In this paper, the problem of attribute reduction for partially labeled data is studied. A novel semi-supervised attribute reduction algorithm is proposed, based on co-training which capitalizes on the unlabeled data to improve the quality of attribute reducts from few labeled data. It gets two diverse reducts of the labeled data, employs them to train its base classifiers, then co-trains the two base classifiers iteratively. In every round, the base classifiers learn from each other on the unlabeled data and enlarge the labeled data, so better quality reducts could be computed from the enlarged labeled data and employed to construct base classifiers of higher performance. The experimental results with UCI data sets show that the proposed algorithm can improves the quality of reduct.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Pawlak, Z.: Rough sets. International Journal of Computer & Information Sciences 11(5), 341–356 (1982)
Pawlak, Z.: Rough sets: Theoretical aspects of reasoning about data. Kluwer Academic Publishers. Dordrecht & Boston (1991)
Liu, Q.: Rough sets and rough reasoning. Academic Pub., Beijing (2001)
Wang, G.Y.: Rough set theory and knowledge acquisition. Xi’an Jiaotong University Press, Xi’an (2001)
Zhang, W.X., Wu, W.Z., Liang, J.Y., et al.: Rough set theory and methods. Science and Technology Press, Beijing (2001)
Polkowski, L.: Rough sets: Mathematical foundations. Springer Science & Business (2013)
Miao, D.Q., Li, D.G.: Rough Set Theory, Algorithms and Applications. Tsinghua University Press, Beijing (2008)
Miao, D.Q., Zhao, Y., Yao, Y.Y., et al.: Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model. Information Sciences 179(24), 4140–4150 (2009)
Thangavel, K., Pethalakshmi, A.: Dimensionality reduction based on rough set theory: A review. Applied Soft Computing 9(1), 1–12 (2009)
Xiaojin, Z.: Semi-supervised learning literature survey. Computer Sciences TR 1530. Department of Computer Sciences, University of Wisconsin (2008)
Miao, D.Q., Gao, C., Zhang, N., et al.: Diverse reduct subspaces based co-training for partially labeled data. International Journal of Approximate Reasoning 52(8), 1103–1117 (2011)
Yang, M.: An incremental updating algorithm for attribute reduction based on improved dis-cernibility matrix. Chinese Journal of Computers 30(5), 815–822 (2007)
Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Intelligent Decision Support. Theory and Decision Library, vol. 11, pp. 331–362. Springer, Netherlands (1992)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100. ACM, New York (1998)
Zhu, X.J., Goldberg, A.B.: Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 3(1), 1–130 (2009)
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 86–93. ACM, New York (2000)
Feger, F., Koprinska, I.: Co-Training using RBF nets and different feature splits. In: International Joint Conference on Neural Networks, pp. 1878–1885. IEEE, Piscataway (2006)
Wang, J., Luo, S.W., Zeng, X.H.: A random subspace method for co training. Acta Electronica Sinica 36(12A), 60–65 (2008)
Tang, H.L., Lin, Z.K., Lu, M.Y., et al.: An advanced co-training algorithm based on mutual independence and diversity measures. Journal of Computer Research and Development. 45 (11),1874-1881 (2008)
Salaheldin, A., El Gayar, N.: New feature splitting criteria for co-training using genetic algorithm optimization. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 22–32. Springer, Heidelberg (2010)
Yaslan, Y., Cataltepe, Z.: Co-training with relevant random subspaces. Neurocomputing 73(10), 1652–1661 (2010)
Goldman, S.A., Zhou, Y.: Enhancing Supervised Learning with Unlabeled Data. In: Proceedings of the 17th International Conference on Machine Learning, pp. 327–334. Morgan Kaufmann, San Francisco (2000)
Zhou, Y., Goldman, S.: Democratic co-learning. In: The 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 594–602. IEEE, Piscataway (2004)
Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005)
Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man, and Cybernetics: Systems 37(6), 1088–1098 (2007)
Gao, C., Miao, D.Q., Zhang, Z.F., et al.: A Semi-Supervised rough set model for classification based on active learning and co-training. Pattern Recognition and Artificial Intelligence 25(5), 745–754 (2012)
Blake, C., Merz, C.J.: UCI Repository of machine learning databases, http://archive.ics.uci.edu/ml/datasets.html
Øhrn, A., Komorowski, J.: ROSETTA–A Rough Set Toolkit for Analysis of Data. In: 5th International Workshop on Rough Sets and Soft Computing, pp. 403–407 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, W., Miao, D., Gao, C., Yue, X. (2014). Co-training Based Attribute Reduction for Partially Labeled Data. In: Miao, D., Pedrycz, W., Ślȩzak, D., Peters, G., Hu, Q., Wang, R. (eds) Rough Sets and Knowledge Technology. RSKT 2014. Lecture Notes in Computer Science(), vol 8818. Springer, Cham. https://doi.org/10.1007/978-3-319-11740-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-11740-9_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11739-3
Online ISBN: 978-3-319-11740-9
eBook Packages: Computer ScienceComputer Science (R0)