Skip to main content

Early Detection of Possible Undergraduate Drop Out Using a New Method Based on Probabilistic Rough Set Theory

  • Chapter
  • First Online:
Uncertainty Management with Fuzzy and Rough Sets

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 377))

Abstract

For any educational project, it is important and challenging to know, at the moment of enrollment, whether a given student is likely to successfully pass the academic year. This task is not simple at all because many factors contribute to college failure. Being able to infer how likely is an enrolled student to present promotions problems, is undoubtedly an interesting challenge for the areas of data mining and education. In this paper, we propose the use of data mining techniques in order to predict how likely a student is to succeed in the academic year. Normally, there are more students that success than fail, resulting in an imbalanced data representation. To cope with imbalanced data, we introduce a new algorithm based on probabilistic Rough Set Theory (RST). Two ideas are introduced. The first one is the use of two different threshold values for the similarity between objects when dealing with minority or majority examples. The second idea combines the original distribution of the data with the probabilities predicted by the RST method. Our experimental analysis shows that we obtain better results than a range of state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)

    Google Scholar 

  2. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explor. 6(1), 20–29 (2004)

    Article  Google Scholar 

  3. Bello, R., Falcon, R., Pedrycz, W., Kacprzyk, J.: Granular Computing: at the Junction of Rough Sets and Fuzzy Sets. Springer, Berlin (2008)

    Book  Google Scholar 

  4. Bello, R., Garcia, M.M.: Probabilistic approaches to the rough set theory and their applications in decision-making. In: Soft Computing for Business Intelligence, pp. 67–80. Springer, Berlin (2014)

    Google Scholar 

  5. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997)

    Article  Google Scholar 

  6. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Pacific-Asia Conf. Knowl. Discov. Data Min. 3644, 475–482 (2009)

    Article  Google Scholar 

  7. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  8. Chawla, N., Japkowicz, N., Kolcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. 6(1), 1–6 (2004)

    Article  Google Scholar 

  9. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)

    Article  Google Scholar 

  10. Dekker, G.W., Pechenizkiy, M., Vleeshouwers, J.M.: Predicting students drop out: a case study. Educational Data Mining, pp. 41–50 (2009)

    Google Scholar 

  11. Domingo, P.A., Garcia-Crespo, B.R., Iglesias, A.: Edu-ex: A tool for auto-regulated intelligent tutoring systems development based on models. Artif. Intell. Rev. 18, 15–32 (2002)

    Google Scholar 

  12. Dun, L., Huaxiong, L., Xianzhong, Z.: Two decades research on decision-theoretic rough sets. In: Proceedings of the 9th IEEE International Conference on Cognitive Informatics, ICCI 2010, pp. 968–973 (2010)

    Google Scholar 

  13. Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006)

    Article  Google Scholar 

  14. Fawcett, T.E., Provost, F.: Adaptive fraud detection. Data Mining Knowl. Discov. 3, 291–316 (1997)

    Article  Google Scholar 

  15. Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern.-Part C: Appl. Rev. 42(4), 463–484 (2012)

    Article  Google Scholar 

  16. Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognit. 46, 3460–3471 (2013)

    Article  Google Scholar 

  17. Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning, pp. 878–887. Springer (2005)

    Google Scholar 

  18. He, H., García, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  19. Herzog, S.: Measuring determinants of student return vs. dropout/stopout vs. transfer: a first-to-second year analysis of new freshmen. In: Proceedings of 44th Annual Forum of the Association for Institutional Research (AIR) (2004)

    Google Scholar 

  20. Huang, Y.M., Hung, C., Jiau, H.C.: Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Anal.: Real World Appl. 7(4), 720–747 (2006)

    Article  MathSciNet  Google Scholar 

  21. Kotsiantis, S.B.: Use of machine learning techniques for educational proposes: a decision support system for forecasting students grades. Artif. Intell. Rev. 37, 331–344 (2012)

    Article  Google Scholar 

  22. Lassibille, G., Gomez, L.: Why do higher education students drop out? evidence from spain. Edu. Econ. 16(1), 89–105 (2007)

    Article  Google Scholar 

  23. Liu, D., Li, T., Ruan, D.: Probabilistic model criteria with decision-theoretic rough sets. Inf. Sci. 181, 3709–3722 (2011)

    Article  MathSciNet  Google Scholar 

  24. Luan, J.: Data mining and its applications in higher education. New Directions For Institutional Research, pp. 17–36 (2002)

    Google Scholar 

  25. Mazurowski, M., Habas, P., Zurada, J., Lo, J., Baker, J., Tourassi, G.: Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw. 21(2–3), 427–436 (2008)

    Article  Google Scholar 

  26. Napierala, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. Rough Sets Curr. Trends Comput. Lect. Notes Comput. Sci. 6086, 158–167 (2010)

    Article  Google Scholar 

  27. Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11, 145–172 (1982)

    Article  Google Scholar 

  28. Pawlak, Z., Wong, S., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. Int. J. Man-Mach. Stud. 29, 81–95 (1988)

    Article  Google Scholar 

  29. Quinlan, J.: C4.5 Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)

    Google Scholar 

  30. Rahman Ali, M.H.S., Lee, S.: Rough set-based approaches for discretization: a compact review. Artif. Intell. Rev. (2015). https://doi.org/10.1007/s10462-014-9426-2

    Article  Google Scholar 

  31. Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB\(_*\): a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Int. J. Knowl. Inf. Syst. 33, 245–265 (2012)

    Article  Google Scholar 

  32. Romero, C., Ventura, S.: Educational data mining: a survey from 1995 to 2005. Expert Syst. Appl. 33, 135–146 (2007)

    Article  Google Scholar 

  33. Romero, C., Ventura, S., Espejo, P.G., Hervas, C.: Data mining algorithms to classify students. In: Proceedings of the 1st International Conference on Educational Data Mining (EDM 08) (2008)

    Google Scholar 

  34. Slowinski, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. Adv. Mach. Intell. Soft-Comput. 4, 17–33 (1997)

    Google Scholar 

  35. Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)

    Article  Google Scholar 

  36. Superby, J., Vandamme, J.P., Meskens, N.: Determination of factors influencing the achievement of the first-year university students using data mining methods. In: Proceedings of the Workshop on Educational Data Mining at ITS 06 (2006)

    Google Scholar 

  37. Terenzini, P.T., Lorang, W.G., Pascarella, E.: Predicting freshman persistence and voluntary dropout decisions: a replication. Res. Higher Educ. 15(2), 109–127 (1981)

    Article  Google Scholar 

  38. Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14(3), 659–665 (2002)

    Article  MathSciNet  Google Scholar 

  39. Vapnik, V.: The Nature of Statistical Learning. Springer, Berlin (1995)

    Google Scholar 

  40. Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)

    Google Scholar 

  41. Weiss, G., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)

    Article  Google Scholar 

  42. Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)

    Article  Google Scholar 

  43. Yao, Y., Wong, S., Lin, T.: A review of rough set models. In: Lin, T.Y., Cercone, N. (eds.) Rough Sets and Data Mining: Analysis for Imprecise Data, pp. 47–75. Kluwer Academic Publishers, Boston (1997)

    Chapter  Google Scholar 

  44. Yao, Y.Y.: Generalized rough set models. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery, pp. 286–318. Physica, Heidelberg (1998)

    Google Scholar 

  45. Yao, Y.Y.: Probabilistic approaches to rough sets. Expert Syst. 20, 287–297 (2003)

    Article  Google Scholar 

  46. Yao, Y.Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180, 341–353 (2010)

    Article  MathSciNet  Google Scholar 

  47. Yao, Y.Y., Wong, S.K.M.: A decision theoretic framework for approximating concepts. Int. J. Man-mach. Stud. 37, 793–809 (1992)

    Article  Google Scholar 

  48. Zhou, Z.H., Liu, X.Y.: On multi-class cost-sensitive learning. Comput. Intell. 26(3), 232–257 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julio Madera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ramentol, E., Madera, J., Rodríguez, A. (2019). Early Detection of Possible Undergraduate Drop Out Using a New Method Based on Probabilistic Rough Set Theory. In: Bello, R., Falcon, R., Verdegay, J. (eds) Uncertainty Management with Fuzzy and Rough Sets. Studies in Fuzziness and Soft Computing, vol 377. Springer, Cham. https://doi.org/10.1007/978-3-030-10463-4_12

Download citation

Publish with us

Policies and ethics