Abstract
Learning models used for prediction are mostly developed without taking into account the size of datasets that can produce models of high accuracy and better performance. Although, the general believe is that, large dataset is needed to construct a predictive learning model. To describe a data set as large in size depends on the circumstances and context of prediction. This means that what makes a dataset to be considered as being big or small is controversial. In this paper, the ability of the predictive model to adapt to a particular size of data in training is examined. The study experiments on three different sizes of Moroccan agricultural data using a variety of statistical and Machine Learning techniques, to create predictive models with a view to establishing if the size of data has any effect on the accuracy of a model. The output of each model is measured using the Mean Absolute Error (MAE) and r-squared, and comparisons are made. The results of training the models through the three partitioned dataset show that, the models trained with the smallest and largest size of training data appear to be less accurate, while the models trained with a medium sized dataset delivers a much better results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bandyopadhyay, G., Chattopadhyay, S.: Single hidden layer artificial neural network models versus multiple linear regression model in forecasting the time series of total ozone. Int. J. Environ. Sci. Technol. 4, 141–149 (2007). https://doi.org/10.1007/BF03325972
Basavanhally, A., Doyle, S., Madabhushi, A.: Predicting classifier performance with a small training set: applications to computer-aided diagnosis and prognosis. Paper presented at the 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro (2010)
Dobbin, K.K., Simon, R.M.: Sample size planning for developing classifiers using high-dimensional DNA microarray data. Biostatistics 8(1), 101–117 (2007)
Ed-daoudi, R., Alaoui, A., Ettaki, B., Zerouaoui, J.: A review of prediction techniques in some domains of human activity. J. Comput. Sci. Submitted in 2022
Haykin, S.: Neural Networks and Learning Machines. Pearson Education, Upper Saddle River (2009)
McArthur, D.P., Encheva, S., Thorsen, I.: Predicting with a small amount of data: an application of fuzzy reasoning to regional disparities. J. Econ. Stud. 41, 12–28 (2013)
Mukherjee, S., Tamayo, P., Rogers, S., Rifkin, R., Engle, A.: Estimating dataset size requirements for classifying DNA microarray data. J. Comput. Biol. 10, 119–142 (2003)
Oladokun, V., Adebanjo, A., Charles-Owaba, O.: Predicting students’ academic performance using artificial neural network: a case study of an engineering course. Pac. J. Sci. Technol. 9(1), 72–79 (2008)
Osmanbegović: Data mining approach for predicting student performance (2012)
Özel, T., Karpat, Y.: Predictive modeling of surface roughness in hard turning using regression and neural networks. Int. J. Mach. Tools Manuf. 45, 467–479 (2005)
Skillicorn, D.: Understanding datasets: data mining with matrix decompositions (2007)
Suh, S.C.: Practical Applications of Data Mining. Jones & Bartlett Learning, Burlington (2012)
van der Ploeg, T., Austin, P.C., Steyerberg, E.W.: A simulation study for predicting dichotomous endpoints. BMC Med. Res. Methodol. 14(1), 137 (2014)
Witten, I.H., Frank, E., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)
Cuesta, E., Kirane, M., Malik, S.A.: Image structure preserving denoising using generalized fractional time integrals. Signal Process. 92, 553–563 (2012)
Shen, S., Liu, F., Anh, V., Turner, I.: Detailed analysis of a conservative difference approximation for the time fractional diffusion equation. J. Appl. Math. Comput. 22, 1–19 (2006)
Alikhanov, A.A.: A new difference scheme for the time fractional diffusion equation. Comput. Phys. 280, 424–438 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ed-daoudi, R., Alaoui, A., Zerouaoui, J., Ettaki, B., Zerouaoui, J. (2023). Evaluating the Impact of Dataset Size on Univariate Prediction Techniques for Moroccan Agriculture. In: Farhaoui, Y., Rocha, A., Brahmia, Z., Bhushab, B. (eds) Artificial Intelligence and Smart Environment. ICAISE 2022. Lecture Notes in Networks and Systems, vol 635. Springer, Cham. https://doi.org/10.1007/978-3-031-26254-8_57
Download citation
DOI: https://doi.org/10.1007/978-3-031-26254-8_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26253-1
Online ISBN: 978-3-031-26254-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)