Skip to main content

Evaluating the Impact of Dataset Size on Univariate Prediction Techniques for Moroccan Agriculture

  • Conference paper
  • First Online:
Artificial Intelligence and Smart Environment (ICAISE 2022)

Abstract

Learning models used for prediction are mostly developed without taking into account the size of datasets that can produce models of high accuracy and better performance. Although, the general believe is that, large dataset is needed to construct a predictive learning model. To describe a data set as large in size depends on the circumstances and context of prediction. This means that what makes a dataset to be considered as being big or small is controversial. In this paper, the ability of the predictive model to adapt to a particular size of data in training is examined. The study experiments on three different sizes of Moroccan agricultural data using a variety of statistical and Machine Learning techniques, to create predictive models with a view to establishing if the size of data has any effect on the accuracy of a model. The output of each model is measured using the Mean Absolute Error (MAE) and r-squared, and comparisons are made. The results of training the models through the three partitioned dataset show that, the models trained with the smallest and largest size of training data appear to be less accurate, while the models trained with a medium sized dataset delivers a much better results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bandyopadhyay, G., Chattopadhyay, S.: Single hidden layer artificial neural network models versus multiple linear regression model in forecasting the time series of total ozone. Int. J. Environ. Sci. Technol. 4, 141–149 (2007). https://doi.org/10.1007/BF03325972

    Article  Google Scholar 

  2. Basavanhally, A., Doyle, S., Madabhushi, A.: Predicting classifier performance with a small training set: applications to computer-aided diagnosis and prognosis. Paper presented at the 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro (2010)

    Google Scholar 

  3. Dobbin, K.K., Simon, R.M.: Sample size planning for developing classifiers using high-dimensional DNA microarray data. Biostatistics 8(1), 101–117 (2007)

    Article  MATH  Google Scholar 

  4. Ed-daoudi, R., Alaoui, A., Ettaki, B., Zerouaoui, J.: A review of prediction techniques in some domains of human activity. J. Comput. Sci. Submitted in 2022

    Google Scholar 

  5. Haykin, S.: Neural Networks and Learning Machines. Pearson Education, Upper Saddle River (2009)

    Google Scholar 

  6. McArthur, D.P., Encheva, S., Thorsen, I.: Predicting with a small amount of data: an application of fuzzy reasoning to regional disparities. J. Econ. Stud. 41, 12–28 (2013)

    Article  Google Scholar 

  7. Mukherjee, S., Tamayo, P., Rogers, S., Rifkin, R., Engle, A.: Estimating dataset size requirements for classifying DNA microarray data. J. Comput. Biol. 10, 119–142 (2003)

    Article  Google Scholar 

  8. Oladokun, V., Adebanjo, A., Charles-Owaba, O.: Predicting students’ academic performance using artificial neural network: a case study of an engineering course. Pac. J. Sci. Technol. 9(1), 72–79 (2008)

    Google Scholar 

  9. Osmanbegović: Data mining approach for predicting student performance (2012)

    Google Scholar 

  10. Özel, T., Karpat, Y.: Predictive modeling of surface roughness in hard turning using regression and neural networks. Int. J. Mach. Tools Manuf. 45, 467–479 (2005)

    Article  Google Scholar 

  11. Skillicorn, D.: Understanding datasets: data mining with matrix decompositions (2007)

    Google Scholar 

  12. Suh, S.C.: Practical Applications of Data Mining. Jones & Bartlett Learning, Burlington (2012)

    Google Scholar 

  13. van der Ploeg, T., Austin, P.C., Steyerberg, E.W.: A simulation study for predicting dichotomous endpoints. BMC Med. Res. Methodol. 14(1), 137 (2014)

    Article  Google Scholar 

  14. Witten, I.H., Frank, E., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)

    Google Scholar 

  15. Cuesta, E., Kirane, M., Malik, S.A.: Image structure preserving denoising using generalized fractional time integrals. Signal Process. 92, 553–563 (2012)

    Article  Google Scholar 

  16. Shen, S., Liu, F., Anh, V., Turner, I.: Detailed analysis of a conservative difference approximation for the time fractional diffusion equation. J. Appl. Math. Comput. 22, 1–19 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  17. Alikhanov, A.A.: A new difference scheme for the time fractional diffusion equation. Comput. Phys. 280, 424–438 (2015)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rachid Ed-daoudi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ed-daoudi, R., Alaoui, A., Zerouaoui, J., Ettaki, B., Zerouaoui, J. (2023). Evaluating the Impact of Dataset Size on Univariate Prediction Techniques for Moroccan Agriculture. In: Farhaoui, Y., Rocha, A., Brahmia, Z., Bhushab, B. (eds) Artificial Intelligence and Smart Environment. ICAISE 2022. Lecture Notes in Networks and Systems, vol 635. Springer, Cham. https://doi.org/10.1007/978-3-031-26254-8_57

Download citation

Publish with us

Policies and ethics