Abstract
For any educational project, it is important and challenging to know, at the moment of enrollment, whether a given student is likely to successfully pass the academic year. This task is not simple at all because many factors contribute to college failure. Being able to infer how likely is an enrolled student to present promotions problems, is undoubtedly an interesting challenge for the areas of data mining and education. In this paper, we propose the use of data mining techniques in order to predict how likely a student is to succeed in the academic year. Normally, there are more students that success than fail, resulting in an imbalanced data representation. To cope with imbalanced data, we introduce a new algorithm based on probabilistic Rough Set Theory (RST). Two ideas are introduced. The first one is the use of two different threshold values for the similarity between objects when dealing with minority or majority examples. The second idea combines the original distribution of the data with the probabilities predicted by the RST method. Our experimental analysis shows that we obtain better results than a range of state-of-the-art algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explor. 6(1), 20–29 (2004)
Bello, R., Falcon, R., Pedrycz, W., Kacprzyk, J.: Granular Computing: at the Junction of Rough Sets and Fuzzy Sets. Springer, Berlin (2008)
Bello, R., Garcia, M.M.: Probabilistic approaches to the rough set theory and their applications in decision-making. In: Soft Computing for Business Intelligence, pp. 67–80. Springer, Berlin (2014)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997)
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Pacific-Asia Conf. Knowl. Discov. Data Min. 3644, 475–482 (2009)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chawla, N., Japkowicz, N., Kolcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. 6(1), 1–6 (2004)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)
Dekker, G.W., Pechenizkiy, M., Vleeshouwers, J.M.: Predicting students drop out: a case study. Educational Data Mining, pp. 41–50 (2009)
Domingo, P.A., Garcia-Crespo, B.R., Iglesias, A.: Edu-ex: A tool for auto-regulated intelligent tutoring systems development based on models. Artif. Intell. Rev. 18, 15–32 (2002)
Dun, L., Huaxiong, L., Xianzhong, Z.: Two decades research on decision-theoretic rough sets. In: Proceedings of the 9th IEEE International Conference on Cognitive Informatics, ICCI 2010, pp. 968–973 (2010)
Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006)
Fawcett, T.E., Provost, F.: Adaptive fraud detection. Data Mining Knowl. Discov. 3, 291–316 (1997)
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern.-Part C: Appl. Rev. 42(4), 463–484 (2012)
Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognit. 46, 3460–3471 (2013)
Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning, pp. 878–887. Springer (2005)
He, H., García, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Herzog, S.: Measuring determinants of student return vs. dropout/stopout vs. transfer: a first-to-second year analysis of new freshmen. In: Proceedings of 44th Annual Forum of the Association for Institutional Research (AIR) (2004)
Huang, Y.M., Hung, C., Jiau, H.C.: Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Anal.: Real World Appl. 7(4), 720–747 (2006)
Kotsiantis, S.B.: Use of machine learning techniques for educational proposes: a decision support system for forecasting students grades. Artif. Intell. Rev. 37, 331–344 (2012)
Lassibille, G., Gomez, L.: Why do higher education students drop out? evidence from spain. Edu. Econ. 16(1), 89–105 (2007)
Liu, D., Li, T., Ruan, D.: Probabilistic model criteria with decision-theoretic rough sets. Inf. Sci. 181, 3709–3722 (2011)
Luan, J.: Data mining and its applications in higher education. New Directions For Institutional Research, pp. 17–36 (2002)
Mazurowski, M., Habas, P., Zurada, J., Lo, J., Baker, J., Tourassi, G.: Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw. 21(2–3), 427–436 (2008)
Napierala, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. Rough Sets Curr. Trends Comput. Lect. Notes Comput. Sci. 6086, 158–167 (2010)
Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11, 145–172 (1982)
Pawlak, Z., Wong, S., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. Int. J. Man-Mach. Stud. 29, 81–95 (1988)
Quinlan, J.: C4.5 Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)
Rahman Ali, M.H.S., Lee, S.: Rough set-based approaches for discretization: a compact review. Artif. Intell. Rev. (2015). https://doi.org/10.1007/s10462-014-9426-2
Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB\(_*\): a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Int. J. Knowl. Inf. Syst. 33, 245–265 (2012)
Romero, C., Ventura, S.: Educational data mining: a survey from 1995 to 2005. Expert Syst. Appl. 33, 135–146 (2007)
Romero, C., Ventura, S., Espejo, P.G., Hervas, C.: Data mining algorithms to classify students. In: Proceedings of the 1st International Conference on Educational Data Mining (EDM 08) (2008)
Slowinski, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. Adv. Mach. Intell. Soft-Comput. 4, 17–33 (1997)
Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)
Superby, J., Vandamme, J.P., Meskens, N.: Determination of factors influencing the achievement of the first-year university students using data mining methods. In: Proceedings of the Workshop on Educational Data Mining at ITS 06 (2006)
Terenzini, P.T., Lorang, W.G., Pascarella, E.: Predicting freshman persistence and voluntary dropout decisions: a replication. Res. Higher Educ. 15(2), 109–127 (1981)
Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14(3), 659–665 (2002)
Vapnik, V.: The Nature of Statistical Learning. Springer, Berlin (1995)
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)
Weiss, G., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)
Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)
Yao, Y., Wong, S., Lin, T.: A review of rough set models. In: Lin, T.Y., Cercone, N. (eds.) Rough Sets and Data Mining: Analysis for Imprecise Data, pp. 47–75. Kluwer Academic Publishers, Boston (1997)
Yao, Y.Y.: Generalized rough set models. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery, pp. 286–318. Physica, Heidelberg (1998)
Yao, Y.Y.: Probabilistic approaches to rough sets. Expert Syst. 20, 287–297 (2003)
Yao, Y.Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180, 341–353 (2010)
Yao, Y.Y., Wong, S.K.M.: A decision theoretic framework for approximating concepts. Int. J. Man-mach. Stud. 37, 793–809 (1992)
Zhou, Z.H., Liu, X.Y.: On multi-class cost-sensitive learning. Comput. Intell. 26(3), 232–257 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Ramentol, E., Madera, J., Rodríguez, A. (2019). Early Detection of Possible Undergraduate Drop Out Using a New Method Based on Probabilistic Rough Set Theory. In: Bello, R., Falcon, R., Verdegay, J. (eds) Uncertainty Management with Fuzzy and Rough Sets. Studies in Fuzziness and Soft Computing, vol 377. Springer, Cham. https://doi.org/10.1007/978-3-030-10463-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-10463-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10462-7
Online ISBN: 978-3-030-10463-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)