Early Detection of Possible Undergraduate Drop Out Using a New Method Based on Probabilistic Rough Set Theory

Ramentol, Enislay; Madera, Julio; Rodríguez, Abdel

doi:10.1007/978-3-030-10463-4_12

Enislay Ramentol⁵,
Julio Madera⁵ &
Abdel Rodríguez⁵

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 377))

387 Accesses
1 Citations

Abstract

For any educational project, it is important and challenging to know, at the moment of enrollment, whether a given student is likely to successfully pass the academic year. This task is not simple at all because many factors contribute to college failure. Being able to infer how likely is an enrolled student to present promotions problems, is undoubtedly an interesting challenge for the areas of data mining and education. In this paper, we propose the use of data mining techniques in order to predict how likely a student is to succeed in the academic year. Normally, there are more students that success than fail, resulting in an imbalanced data representation. To cope with imbalanced data, we introduce a new algorithm based on probabilistic Rough Set Theory (RST). Two ideas are introduced. The first one is the use of two different threshold values for the similarity between objects when dealing with minority or majority examples. The second idea combines the original distribution of the data with the probabilities predicted by the RST method. Our experimental analysis shows that we obtain better results than a range of state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Predicting Students’ Results Using Rough Sets Theory

Predicting Student Drop-Out Rates Using Data Mining Techniques: A Case Study

Sequential Three-Way Decisions for Reducing Uncertainty in Dropout Prediction for Online Courses

References

Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)
Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explor. 6(1), 20–29 (2004)
Article Google Scholar
Bello, R., Falcon, R., Pedrycz, W., Kacprzyk, J.: Granular Computing: at the Junction of Rough Sets and Fuzzy Sets. Springer, Berlin (2008)
Book Google Scholar
Bello, R., Garcia, M.M.: Probabilistic approaches to the rough set theory and their applications in decision-making. In: Soft Computing for Business Intelligence, pp. 67–80. Springer, Berlin (2014)
Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997)
Article Google Scholar
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Pacific-Asia Conf. Knowl. Discov. Data Min. 3644, 475–482 (2009)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Chawla, N., Japkowicz, N., Kolcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. 6(1), 1–6 (2004)
Article Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)
Article Google Scholar
Dekker, G.W., Pechenizkiy, M., Vleeshouwers, J.M.: Predicting students drop out: a case study. Educational Data Mining, pp. 41–50 (2009)
Google Scholar
Domingo, P.A., Garcia-Crespo, B.R., Iglesias, A.: Edu-ex: A tool for auto-regulated intelligent tutoring systems development based on models. Artif. Intell. Rev. 18, 15–32 (2002)
Google Scholar
Dun, L., Huaxiong, L., Xianzhong, Z.: Two decades research on decision-theoretic rough sets. In: Proceedings of the 9th IEEE International Conference on Cognitive Informatics, ICCI 2010, pp. 968–973 (2010)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006)
Article Google Scholar
Fawcett, T.E., Provost, F.: Adaptive fraud detection. Data Mining Knowl. Discov. 3, 291–316 (1997)
Article Google Scholar
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern.-Part C: Appl. Rev. 42(4), 463–484 (2012)
Article Google Scholar
Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognit. 46, 3460–3471 (2013)
Article Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning, pp. 878–887. Springer (2005)
Google Scholar
He, H., García, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Herzog, S.: Measuring determinants of student return vs. dropout/stopout vs. transfer: a first-to-second year analysis of new freshmen. In: Proceedings of 44th Annual Forum of the Association for Institutional Research (AIR) (2004)
Google Scholar
Huang, Y.M., Hung, C., Jiau, H.C.: Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Anal.: Real World Appl. 7(4), 720–747 (2006)
Article MathSciNet Google Scholar
Kotsiantis, S.B.: Use of machine learning techniques for educational proposes: a decision support system for forecasting students grades. Artif. Intell. Rev. 37, 331–344 (2012)
Article Google Scholar
Lassibille, G., Gomez, L.: Why do higher education students drop out? evidence from spain. Edu. Econ. 16(1), 89–105 (2007)
Article Google Scholar
Liu, D., Li, T., Ruan, D.: Probabilistic model criteria with decision-theoretic rough sets. Inf. Sci. 181, 3709–3722 (2011)
Article MathSciNet Google Scholar
Luan, J.: Data mining and its applications in higher education. New Directions For Institutional Research, pp. 17–36 (2002)
Google Scholar
Mazurowski, M., Habas, P., Zurada, J., Lo, J., Baker, J., Tourassi, G.: Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw. 21(2–3), 427–436 (2008)
Article Google Scholar
Napierala, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. Rough Sets Curr. Trends Comput. Lect. Notes Comput. Sci. 6086, 158–167 (2010)
Article Google Scholar
Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11, 145–172 (1982)
Article Google Scholar
Pawlak, Z., Wong, S., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. Int. J. Man-Mach. Stud. 29, 81–95 (1988)
Article Google Scholar
Quinlan, J.: C4.5 Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)
Google Scholar
Rahman Ali, M.H.S., Lee, S.: Rough set-based approaches for discretization: a compact review. Artif. Intell. Rev. (2015). https://doi.org/10.1007/s10462-014-9426-2
Article Google Scholar
Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB$_*$: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Int. J. Knowl. Inf. Syst. 33, 245–265 (2012)
Article Google Scholar
Romero, C., Ventura, S.: Educational data mining: a survey from 1995 to 2005. Expert Syst. Appl. 33, 135–146 (2007)
Article Google Scholar
Romero, C., Ventura, S., Espejo, P.G., Hervas, C.: Data mining algorithms to classify students. In: Proceedings of the 1st International Conference on Educational Data Mining (EDM 08) (2008)
Google Scholar
Slowinski, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. Adv. Mach. Intell. Soft-Comput. 4, 17–33 (1997)
Google Scholar
Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)
Article Google Scholar
Superby, J., Vandamme, J.P., Meskens, N.: Determination of factors influencing the achievement of the first-year university students using data mining methods. In: Proceedings of the Workshop on Educational Data Mining at ITS 06 (2006)
Google Scholar
Terenzini, P.T., Lorang, W.G., Pascarella, E.: Predicting freshman persistence and voluntary dropout decisions: a replication. Res. Higher Educ. 15(2), 109–127 (1981)
Article Google Scholar
Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14(3), 659–665 (2002)
Article MathSciNet Google Scholar
Vapnik, V.: The Nature of Statistical Learning. Springer, Berlin (1995)
Google Scholar
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)
Google Scholar
Weiss, G., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)
Article Google Scholar
Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)
Article Google Scholar
Yao, Y., Wong, S., Lin, T.: A review of rough set models. In: Lin, T.Y., Cercone, N. (eds.) Rough Sets and Data Mining: Analysis for Imprecise Data, pp. 47–75. Kluwer Academic Publishers, Boston (1997)
Chapter Google Scholar
Yao, Y.Y.: Generalized rough set models. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery, pp. 286–318. Physica, Heidelberg (1998)
Google Scholar
Yao, Y.Y.: Probabilistic approaches to rough sets. Expert Syst. 20, 287–297 (2003)
Article Google Scholar
Yao, Y.Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180, 341–353 (2010)
Article MathSciNet Google Scholar
Yao, Y.Y., Wong, S.K.M.: A decision theoretic framework for approximating concepts. Int. J. Man-mach. Stud. 37, 793–809 (1992)
Article Google Scholar
Zhou, Z.H., Liu, X.Y.: On multi-class cost-sensitive learning. Comput. Intell. 26(3), 232–257 (2010)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Research Institute of Sweden RISE SICS Västerås AB, Stora Gatan 36, SE-722 12, Västerås, Sweden
Enislay Ramentol, Julio Madera & Abdel Rodríguez

Authors

Enislay Ramentol
View author publications
You can also search for this author in PubMed Google Scholar
Julio Madera
View author publications
You can also search for this author in PubMed Google Scholar
Abdel Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julio Madera .

Editor information

Editors and Affiliations

Department of Computer Science, Universidad Central "Marta Abreu" de Las Villas, Santa Clara, Villa Clara, Cuba
Rafael Bello
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
Rafael Falcon
Department of Computer Science and Artificial Intelligence, Technical School of Informatics and Telecommunications Engineering, University of Granada, Granada, Spain
José Luis Verdegay

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ramentol, E., Madera, J., Rodríguez, A. (2019). Early Detection of Possible Undergraduate Drop Out Using a New Method Based on Probabilistic Rough Set Theory. In: Bello, R., Falcon, R., Verdegay, J. (eds) Uncertainty Management with Fuzzy and Rough Sets. Studies in Fuzziness and Soft Computing, vol 377. Springer, Cham. https://doi.org/10.1007/978-3-030-10463-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-10463-4_12
Published: 23 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10462-7
Online ISBN: 978-3-030-10463-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Early Detection of Possible Undergraduate Drop Out Using a New Method Based on Probabilistic Rough Set Theory

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predicting Students’ Results Using Rough Sets Theory

Predicting Student Drop-Out Rates Using Data Mining Techniques: A Case Study

Sequential Three-Way Decisions for Reducing Uncertainty in Dropout Prediction for Online Courses

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Early Detection of Possible Undergraduate Drop Out Using a New Method Based on Probabilistic Rough Set Theory

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predicting Students’ Results Using Rough Sets Theory

Predicting Student Drop-Out Rates Using Data Mining Techniques: A Case Study

Sequential Three-Way Decisions for Reducing Uncertainty in Dropout Prediction for Online Courses

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation