The effect of simple imputations based on four variants of PCA methods on the quantiles of annual rainfall data

Benahmed, Loucif; Houichi, Larbi

doi:10.1007/s10661-018-6913-y

The effect of simple imputations based on four variants of PCA methods on the quantiles of annual rainfall data

Published: 04 September 2018

Volume 190, article number 569, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Environmental Monitoring and Assessment Aims and scope Submit manuscript

The effect of simple imputations based on four variants of PCA methods on the quantiles of annual rainfall data

Download PDF

Loucif Benahmed¹ &
Larbi Houichi¹

379 Accesses
7 Citations
Explore all metrics

Abstract

Hydrology-related studies often require complete datasets. However, missing data is an unavoidable reality. In this regard, the imputed data could fulfill the same role as the observed ones, while they are uncertain and just estimated. The aim of this study is to compare the performance of four simple imputation variants derived from the principal component analysis (PCA) for imputing annual total rainfall series obtained from stations located in northeast Algeria. On the other hand, the study focuses on the effects on quantiles of annual rainfall data due to imputations by the former methods. The four variants are probabilistic PCA, expectation maximization PCA, regularized PCA, and singular value decomposition PCA. Annual rainfall data from 30 stations for the period ranging from 1935 to 2004 (69 years) are used to generate and impute gaps for four different percentages of missing values (PMV), namely, 10, 20, 30, and 40%. Based on some well-known statistical indices, the results show that the regularized PCA and expectation maximization PCA variants perform better than the other imputation methods considered in this study and result in very good to acceptable predicted quantiles, such as the following: correlation coefficient is equal to 0.97 with 10% of percentage of missing values and 0.66 with 40%; the relative error between observed and predicted quantiles is equal to 4.74% with 10% of percentage of missing values and 3.82% with 40%.

The Effectiveness of a Probabilistic Principal Component Analysis Model and Expectation Maximisation Algorithm in Treating Missing Daily Rainfall Data

Article 13 June 2019

A Study on Bayesian Principal Component Analysis for Addressing Missing Rainfall Data

Article 01 June 2019

Comparison and selection criterion of missing imputation methods and quality assessment of monthly rainfall in the Central Rift Valley Lakes Basin of Ethiopia

Article 22 July 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Environmental Chemistry

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Rainfall is the oldest and most commonly recorded climate variable and is a very valuable indicator for studying climate change, water resource management, irrigation scheduling, flood prevention, and the construction of hydraulic structures (Tabari and Talaee 2011; Tabari et al. 2012; Kebede et al. 2014; Nkiaka et al. 2016; Melanie and Maria 2018). In addition, in order to adequately equip these studies, the correct estimation of hydrological events uses frequency analysis to predict the rainfall that corresponds to certain return times T (quantiles) such as floods and low flows (Karlsson et al. 2016).

According to several articles, many hydrological applications rely on knowledge of these events. Unfortunately, rainfall data remain limited in both time and space, which does not always yield reliable estimates (Cantat 2004). These studies should be based on the series free of missing data and heterogeneity (Bigot 2002; Faizah et al. 2016). Since there is no perfectly reliable and continuous dataset, some uncertainty will remain (Cantat 2004). But for series with gaps, how and what is the reliability of the reconstituted series?

Missing data is a common problem in most areas of scientific research and remains major in Hydrology and Climatology Science. They may result from different human and material sources. These errors are critical because they affect the continuity of precipitation data and ultimately influence the results of hydrological models using precipitation as inputs (Lee and Kang 2015). This problem appears to be more widespread in the developing countries than in the developed countries, particularly Algeria due to various causes such as (i) frequent failures in measuring equipment, (ii) the total closure of some rain stations, and (iii) the gaps on a daily or monthly scale, therefore lead to gaps on an annual scale. Therefore, the evaluation of missing data is an important task for designing hydrological models (Dastorani et al. 2010; Ouarda et al. 2008).

Rubin (1976) defined the missing data according to three failure mechanisms: data missing completely at random (MCAR) when the probability that an instance (case) has a missing value for a variable does not depend on the known values or missing data. Data missing at random (MAR) when the probability of an instance with a missing value for a variable may depend on the known values but not the value of the missing data itself. Data are missing not at random (MNAR) when the probability that an instance has a missing value for a variable may depend on the value of that variable (Little and Rubin 2002).

Missing data may affect the properties of statistical estimators such as means, variances, or percentages, which lead to a loss of power and disastrous misleading conclusions especially for the prediction of extreme events and quantiles (El Methni 2013). A variety of techniques have been proposed to replace missing values with statistical prediction; this process is usually called “imputation of missing data” (Little and Rubin 2002; Audigier et al. 2015).

Various techniques have been used to estimate missing data mainly simple imputation and multiple imputations (Presti et al. 2010; Audigier et al. 2016).

The first solutions provided by researchers to simply manage the problem of missing data were to use simple imputation methods (Audigier et al. 2015). The problem of simple imputation methods is that no distinction is made between the observed data and the imputed data. In 1977, Donald Rubin has proposed idea of multiple imputation technique. The first theoretical work on multiple imputations was subsequently launched in 1987 (Little and Rubin 1987). Since 2005, the scientific community (Van Buuren 2012) has accepted multiple imputation, and the number of publications on this subject grows exponentially. Today, multiple imputation methods are numerous. They differ in particular from the imputation models they use (Sattari et al. 2017). Today, most published articles focus on developing new imputation methods (Brock et al. 2008; Luengo et al. 2012). But, few studies deal with the effect of the rainfall series’ imputation methods on the quantiles.

In this study, we have compared and evaluated four different variants of simple imputation based on principal component analysis (PCA): probabilistic PCA (PPCA), expectation maximization PCA (EMPCA), regularized PCA (RPCA), and singular value decomposition PCA (SVDPCA), according to four evaluation criteria: root mean square error (RMSE), mean absolute error (MAE), quadratic error (EQR), and correlation coefficient (CC). The objective here is not to apply a statistical method on an incomplete table but to evaluate the properties of the four simple imputation methods based on the principal components analysis. Therefore, we have focused on the quality and effect of prediction of missing data and quantiles in data processing.

Study area and data

Study area

The study area is the whole northern extent of Algeria, which is approximately between 34° N and 38° N in latitude and between 2° W and 8° E longitude. Spread over 15 watersheds (Fig. 1) characterized by different climates. The northern zone of Algeria is characterized by a Mediterranean climate with a cold and rainy winter and a hot and dry summer. The annual rainfall is on average 436 mm in the west (Tlemcen), 648 mm in the center (Dar el Beida), 512 mm in the east (Constantine), and 1000 mm for the coast (Jijel).

Data

Annual rainfall series from 30 stations were obtained from the National Meteorological Office (NMO) and National Water Resource Agency (NWRH), and a record length of 69 years (1936/1937–2004/2005) was considered. This period is the maximum common time period of precipitation data recorded. The information about the stations are presented in Tables 1 and 2. The geographical locations of the stations are shown in Fig. 1.

Table 1 Ranges of variables considered in study

Full size table

Table 2 Geographic characteristics of the selected rainfall stations in Northern Algeria

Full size table

Methods

The data of 30 rainfall stations for the 1935–2004 periods (69 years) were used to generate and impute deficiencies according to missing completely at random (MCAR) hypothesis using the package missMDA of the free R software (Josse and Husson 2016).

The R software provides a powerful and comprehensive system for analyzing data, used in conjunction with the R-commander (a graphical user interface, commonly known as Rcmdr); it also provides one that is easy and intuitive to use (Suzuki and Shimodaira 2006).

Gap generation and principle of the analysis

First of all, gaps were generated with the Library “miss Forest” (Stekhoven and Bühlmann 2011) ProdNA algorithm missing completely at random “MCAR” at different percentages 10, 20, 30, and 40% from observed data, so-called reference data. From the original datasets (without missing values), we introduced in the data a varying percentage of missing values (from 10 to 40%) generated MCAR assumption. These simulated missing values were imputed using four methods and four evaluation criteria: RMSE, MAE, EQR, and the CC were measured, and difference between the replaced values and the original true values was evaluated.

Imputation

Four PCA simple imputation methods were selected to cover techniques widely applied in the literature and representative of various statistical strategies.

Expectation maximization PCA

EM is a general algorithmic approach to manage latent variable models (including mixtures) popular in large part because it is typically highly scalable and easy to implement (Lin 2010).

Probabilistic PCA

PPCA combines an EM approach for PCA with a probabilistic model. The EM approach is based on the assumption that the latent variables as well as the noise are normally distributed. In standard PCA data, which is far from the training set but close to the principal subspace, may have the same reconstruction error. PPCA defines a likelihood function such that the likelihood for data far from the training set is much lower, even if they are close to the principal subspace. This allows to improve the estimation accuracy. PPCA is tolerant to amounts of missing values between 10 to 15%. If more data is missing, the algorithm is likely not to converge to a reasonable solution (Stacklies and Redestig 2017).

Regularized PCA

Regularized PCA is based on the regularized iterative algorithm, which allows to obtain a point estimate of the parameters and to overcome the major problem of the unfit (Josse et al. 2012).

Singular value decomposition PCA

This implements the SVD impute algorithm as proposed by Troyanskaya et al. (2001). The idea behind the algorithm is to estimate the missing values as a linear combination of the k most significant eigengenes. The algorithm works iteratively until the change in the estimated solution falls below a certain threshold. Each step, the eigengenes of the current estimate are calculated and used to determine a new estimate. An optimal linear combination is found by regressing an incomplete variable against the k most significant eigengenes. If the value at position j is missing, the value of the eigengenes is not used when determining the regression coefficients. SVD impute seems to be tolerant to relatively high amount of missing data (> 10%).

Results and discussion

Performance of the estimation methods

In this study, the comparison was made on the pluviometric series of real data for 10, 20, 30, and 40% gaps. The performances of the estimation methods used are compared and assessed using four measures of performance. The RMSE, MAE, EQR, and CC as criteria to choose the best method of imputation, which have been selected to cover techniques widely applied in the literature and representative of various statistical strategies (Boke 2017). The error measures the difference between the estimated values (predicted) and their corresponding observed values. The four error indices are given according to the following expressions:

$$ RMSE={\left[\sum \limits_{i=1}^n\frac{{\left( PanObs- Pan\Pr ed\right)}^2}{n}\right]}^{0.5} $$

(1)

$$ MAE=\frac{1}{n}\sum \limits_{i=1}^n\left| Pan\Pr ed- PanObs\right| $$

(2)

$$ \sum {EQR}^2=\sum {\left( PanPObs- Pan\Pr ed\right)}^2 $$

(3)

$$ CC=\frac{\sum_{i=1}^n\left( Pan Obs-\overline{Pan}\right)\;\left( Pan\Pr ed-\overline{Pan}\right)}{\sqrt{\sum_{i=1}^n{\left( Pan Obs-\overline{Pan}\right)}^2}\kern0.24em {\sum}_{i=1}^n{\left( Pan\Pr ed-\overline{Pan}\right)}^2} $$

(4)

where (PanObs) is the amount of precipitation observed. (PanPred) is the expected predicted value of precipitation (in this case, it is the imputed value of precipitation). ($ \overline{Pan} $) is the means of precipitation and n is the number of neighboring station.

The results of the performance of the estimation methods are shown numerically and graphically. Table 3 and Fig. 2 show, respectively, numerical and graphical assessment of simple imputation methods for various percentages of missing values using as criteria: RMSE, MAE, EQR, and CC.

Table 3 Comparison of estimation methods based on RMSE, CC, MAE, EQR, and number of principal component (NCP) used with four different percentages of missing values after imputation

Full size table

For each percentages of missing data (from 10 to 20%), the performance of each estimation of four methods (PPCA, EM, regularized, and SVD) tends to decrease for RMSE, EQR, and MAE values, resulting in the increment in CC coefficient. While, for each percentage from 20 to 40%, the performance of each estimation of four methods (PPCA, EM, regularized, and SVD) tends to increase for RMSE, EQR, and MAE values, resulting in the decrease in CC coefficient. The regularized method is found to be the best for four estimation methods used, and the EM method is the second best based on their values of the four error indices of 10 to 40%. The lowest performances are given by the SVD and PPCA methods.

Influence of the imputations on the quantiles

According to the above performance, the regularized variant proves to be the best for imputation; nevertheless, other estimates after the filling of a rainfall series are necessary to predict hydrological events using frequency analysis.

In this context, is it always the best valid imputation method for quantile estimation?

In order to answer this pertinent question, we have been interested in the estimation of quantiles.

To avoid calculation for all stations (30 stations), we preferred to proceed to a hierarchical classification by Ward method based on the results of a principal component analysis (Brito et al. 2016). The FactoMiner package of Free Software R was used for this purpose (Lê et al. 2008).

The classification of individuals (stations) into four classes is based on the use of the mean rains of the 12 months of the year over the period of 69 years as active variables (the values of the latter are not mentioned here). Geographic coordinates (latitude, longitude) as well as altitude and interannual monthly totals are taken as additional variables (Fig. 3).

Each of the four classes is represented by a station called “Paragon” (Lê et al. 2008). The paragon is an individual (station) which characterizes on average all the characteristics of its corresponding class.

For this purpose, all the analysis will be done only on the four synoptic stations representative of their classes.

Classification of rainfall stations

After a PCA and classification of rainfall stations according to the criteria altitude, attitude, and mean rains of the 12 months, we allowed to have four clusters.

Clusters 1 and 4, respectively, contain 11 and 3 stations, on the other hand, clusters 2 and 3 each contain 8 stations, respectively, illustrated in Fig. 4 and Table 4.

Table 4 Classification of rainfall stations and their paragons

Full size table

Each cluster is carried by a synoptic station called “Paragon,” and the quantiles for the four Paragon stations (Mascara, Batna, Blida and Jijel) were estimated for return periods of 5, 10, 20, 50, 100, 500, and 1000 years, using the normal distribution law, for four PCA imputation variants.

Effect of filling on quantiles

The results and effect of filling on quantiles observed and predicted are shown. Table 5 shows numerical values of predicted quantiles according to return periods for the four Paragon stations (Mascara, Batna, Blida, and Jijel) based on simple imputation methods for various percentages of missing values.

Table 5 Quantiles observed and calculated with PCA methods according to return periods for the fourth station for 10 to 40% of filling, (a) Mascara, (b) Batna, (c) Blida, and (d) Jijel

Full size table

For the Maskara station, Table 5(a) shows that the EM and regularized methods for 10, 30, and 40% of the missing data give a good estimate of the predicted quantiles compared to the observed one with an acceptable positive or negative margin. Also, for 20% of missing data, these methods give a good estimation and the same values of predicted quantiles compared to observed values.

For Batna station, Table 5(b) shows that EM and regularized methods for 10 and 30% of missing data give a good estimation and the same values of predicted quantiles compared to observed quantiles. Also, for 20 and 40% of missing data, these methods give a good estimation of predicted quantiles compared to observed ones with an acceptable positive or negative margin.

For Blida station, Table 5(c) shows that EM and regularized methods for 10 to 40% of missing data give a good estimation of predicted values of quantiles compared to observed values with an acceptable positive or negative margin.

For Jijel station, Table 5(d) shows that EM and regularized methods for 10 to 40% of missing data give a good estimation of predicted quantiles compared to observed quantiles with an acceptable positive or negative margin.

Finally, for each percentage of missing data (from 10 to 40%), the regularized method is found to be the best for four estimation methods used and the EM method is the second best; the lowest performances are given by the SVD and PPCA methods based on their values of the two performance criteria, CC and relative error (RE) indices.

CC and RE of observed quantiles with quantiles after filling

CC for the annual rainfall series filled with the variants of the PCA for 10 to 40% of quantiles observed with quantiles after filling for Paragon stations (Mascara, Batna, Blida and Jijel) are illustrated in Table 6. The values of the CC are acceptable and vary between 0.66 to 0.97 for EM and 0.74 to 0.97 for regularized.

Table 6 Correlation coefficient of quantiles observed with quantiles after filling for the fourth station

Full size table

RE for the annual rainfall series filled with the variants of the PCA for 10 to 40% of observed quantiles with quantiles after filling for Paragon stations (Mascara, Batna, Blida, and Jijel) are illustrated respectively in Table 7.

Table 7 Percentage (%) of relative error of quantiles observed with quantiles after filling for the fourth station for 10 to 40% of filling, (a) Mascara, (b) Batna, (c) Blida, and (d) Jijel

Full size table

The values of the RE for Mascara station vary between 1.7 and 3.4% for EM and 0.20 and 3.5% for regularized (Table 7(a)).

The values of the RE for Batna station vary between 0.17 and 2.71% for EM and 0.46 and 3.64% for regularized (Table 7(b)).

The values of the RE for Blida station vary between 1.32 and 4.63% for EM and 3.15 and 4.74% for regularized (Table 7(c)).

The values of the RE for Jijel station vary between 0.59 and 3.89% for EM and 0.91 and 4.19% for regularized (Table 7(d)).

Conclusion

In the present study, a comparison of four simple imputation methods (probabilistic PCA, expectation maximization PCA, regularized PCA, and singular value decomposition PCA) is performed, based on a real dataset of different rainfall stations in Algeria according to the MCAR hypothesis. The validation of the results and the choice of the best method of imputation is an important step, so the prediction performances of the four methods are assessed by different statistical criteria like root mean square error, mean absolute error, quadratic error, and correlation coefficient. The study examined the effect of the simple imputations on the quantiles of the rainfall series of 30 stations for the period ranging from 1935 to 2004 (69 years), located in northeast Algeria. The results of the imputations for four different percentages of missing values (PMVs), namely, 10, 20, 30, and 40%, suggests that the regularized PCA and expectation maximization PCA are the best methods which could be used with success to filling gaps. The singular value decomposition PCA and the probabilistic PCA methods (Table 3, Fig. 2) give the lowest performances. Moreover, the regularized PCA and expectation maximization PCA methods are the best in estimating of quantiles compared to the reference observed one, for the four Paragons determined by the cluster analysis, and result in very good to acceptable predicted quantiles regarding the values of CC and RE, such as (CC = 0.97 with 10% of PMV and CC = 0.66 with 40% of PMV; RE = 4.74% with 10% of PMV and RE = 3.82% with 40% of PMV).

References

Audigier, V., Husson, F., & Joss, J. (2015). Multiple imputation for continuous variables using a Bayesian principal component analysis. Journal of Statistical Computation and Simulation., 86, 2140–2156. https://doi.org/10.1080/00949655.2015.1104683.
Article Google Scholar
Audigier, V., Husson, F., & Josse, J. (2016). A principal component method to impute missing values for mixed data. Advances in Data Analysis and Classification, 10, 5–26. https://doi.org/10.1007/s11634-014-0195-1.
Article Google Scholar
Bigot, S. (2002). Détection des discontinuités temporelles au sein des séries climatiques: point méthodologique et exemple d’application. Actes des Journées de Climatologie de la Commission «Climat et Société» du Comité National Français de Géographie, 27–46.
Boke, A. S. (2017). Comparative evaluation of spatial interpolation methods for estimation of missing meteorological variables over Ethiopia. Journal of Water Resource and Protection, 9, 945–959. https://doi.org/10.4236/jwarp.2017.98063.
Article Google Scholar
Brito, T. T., Oliveira-Júnior, J. F., Lyra, G. B., Gois, G., & Zeri, M. (2016). Multivariate analysis applied to monthly rainfall over Rio de Janeiro state, Brazil. Meteorology and Atmospheric Physics, 129(5), 469–478. https://doi.org/10.1007/s00703-016-0481-x.
Article Google Scholar
Brock, G., Shaffer, J., Blakesley, R., Lotz, M., & Tseng, G. (2008). Which missing value imputation method to use in expression profiles: A comparative study and two selection schemes. BMC Bioinformatics, 9(1), 12. https://doi.org/10.1186/1471-2105-9-12.
Article CAS Google Scholar
Cantat, O. (2004). Critical analysis of rainfalls trends during the 20th century in low-Normandy. Considerations about reliability of data and climate change. https://doi.org/10.4267/climatologie.963.
Dastorani, M. T., Moghadamnia, A., Piri, J., & Rico-Ramirez, M. (2010). Application of ANN and ANFIS models for reconstructing missing flow data. Environmental Monitoring and Assessment, 166(1–4), 421–434. https://doi.org/10.1007/s10661-009-1012-8.
Article Google Scholar
El Methni, J. (2013). Contributions to the estimation of extreme quantiles. Applications to environmental data. Dissertation, University of Grenoble.
Faizah, C. R., Hiroyuki, T., Lariyah, M. S., & Hidayah, B. (2016). Homogeneity and trends in long-term rainfall data, Kelantan River Basin, Malaysia. International Journal of River Basin Management, 14, 151–163. https://doi.org/10.1080/15715124.2015.1105233.
Article Google Scholar
Josse, J., & Husson, F. (2016). MissMDA: A package for handling missing values in multivariate data analysis. Journal of Statistical Software, 70(1), 1–31. https://doi.org/10.18637/jss.v070.i01.
Article Google Scholar
Josse, J., Chavent, M., Liquet, B., & Husson, F. (2012). Handling missing values with regularized iterative multiple correspondence analysis. Journal of Classification, 29, 91–116. https://doi.org/10.1007/s00357-012-9097-0.
Article Google Scholar
Karlsson, I. B., Sonnenborg, T. O., Refsgaard, J. C., Trolle, D., Børgesen, C. D., Olesen, J. E., Jeppesen, E., & Jensen, K. H. (2016). Combined effects of climate models, hydrological model structures and land use scenarios on hydrological impacts of climate change. Journal of Hydrology. https://doi.org/10.1016/j.jhydrol.2016.01.069.
Article Google Scholar
Kebede, A., Diekkrüger, B., & Moges, S. A. (2014). Comparative study of a physically based distributed hydrological model versus a conceptual hydrological model for assessment of climate change response in the Upper Nile, Baro-Akobo basin: A case study of the Sore watershed, Ethiopia. International Journal of River Basin Management, 12(4), 299–318. https://doi.org/10.1080/15715124.2014.917315.
Article Google Scholar
Lê, S., Josse, J., & Husson, F. (2008). FactoMineR: An R package for multivariate analysis. Journal of Statistical Software, 25(1), 1–18.
Article Google Scholar
Lee, H., & Kang, K. (2015). Interpolation of missing precipitation data using kernel estimations for hydrologic modeling, Hindawi Publishing Corporation Advances in Meteorology. https://doi.org/10.1155/2015/935868,.
Article Google Scholar
Lin, T. H. (2010). A comparison of multiple imputation with EM algorithm and MCMC method for quality of life missing data. Quality and Quantity, 44, 277–287. https://doi.org/10.1007/s11135-008-9196-5.
Article Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken: Wiley-Interscience. https://doi.org/10.1002/9781119013563.
Book Google Scholar
Luengo, J., García, S., & Herrera, F. (2012). On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowledge and Information Systems, 32(1), 77–108. https://doi.org/10.1007/s10115-011-0424-2.
Article Google Scholar
Melanie, M., & Maria, P. L. (2018). Hydrostatistical study ofthe Paraná and Uruguay rivers. International Journal of River Basin Management., 1–12. https://doi.org/10.1080/15715124.2018.1446962.
Nkiaka, E., Nawaz, N. R., & Lovett, J. C. (2016). Using self-organizing maps to infill missing data in hydro-meteorological time series from the Logone catchment, Lake Chad basin. Environmental Monitoring and Assessment, 188(7), 400. https://doi.org/10.1007/s10661-016-5385-1.
Article CAS Google Scholar
Ouarda, T. B. M. J., Ba, K. M., Diaz-Delgado, C., Carstenu, A., Chokmani, K., Gingras, H., Quentin, E., Trujillo, E., & Bobée, B. (2008). Regional flood frequency estimation at ungauged sites in the Balsas River basin, Mexico. Journal of Hydrology, 348, 40–58. https://doi.org/10.1016/j.hydrol.2007.09.031.
Article Google Scholar
Presti, R. L., Barca, E., & Passarella, G. (2010). A methodology for treating missing data applied to daily rainfall data in the Candelaro River basin (Italy). Environmental Monitoring and Assessment, 160(1–4), 1–22. https://doi.org/10.1007/s10661-008-0653-3.
Article Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592.
Article Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in survey. Hoboken: Wiley.
Book Google Scholar
Sattari, M.-T., Rezazadeh-Joudi, A., & Kusiak, A. (2017). Assessment of different methods for estimation of missing data in precipitation studies. Hydrology Research, 48(4), 1032–1044. https://doi.org/10.2166/nh.2016.364.
Article Google Scholar
Stacklies, W., & Redestig, H. (2017). The pcaMethods package. CAS-MPG Partner Institute for Computational Biology (PICB) Shanghai. P.R. China And Max Planck Institute for Molecular Plant Physiology Potsdam, Germany. http://bioinformatics.mpimp-golm.mpg.de/.
Stekhoven, D. J., & Bühlmann, P. (2011). Miss Forest-non-parametric missing value imputation formixed-type data. Bioinformatics, 28(1), 112–118. https://doi.org/10.1093/bioinformatics/btr597.
Article CAS Google Scholar
Suzuki, R., & Shimodaira, H. (2006). Piculs: An R package for assessing the uncertainty in hierarchical clustering. Bioinfomatics, 22(12), 15401542. https://doi.org/10.1093/bioinformatics/btl117.
Article Google Scholar
Tabari, H., & Talaee, P. H. (2011). Temporal variability of precipitation over Iran: 1966–2005. Journal of Hydrology, 396(3), 313–320. https://doi.org/10.1016/j.jhydrol.2010.11.034.
Article Google Scholar
Tabari, H., Kisi, O., Ezani, A., & Talaee, P. H. (2012). SVM, ANFIS, regression and climate-based models for reference evapotranspiration modeling using limited climatic data in a semi-arid highland environment. Journal of Hydrology, 444, 78–89. https://doi.org/10.1016/j.jhydrol.2012.04.007.
Article Google Scholar
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., & Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520–525. https://doi.org/10.1093/bioinformatics/17.6.520.
Article CAS Google Scholar
Van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton: Chapman & Hall/CRC Press 342 pages. ISBN 9781439868249.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Hydraulics Department, Faculty of Technology, University of Bejaia, Targa Ouzemmour , 06000, Bejaia, Algeria
Loucif Benahmed & Larbi Houichi

Authors

Loucif Benahmed
View author publications
You can also search for this author in PubMed Google Scholar
Larbi Houichi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Loucif Benahmed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Benahmed, L., Houichi, L. The effect of simple imputations based on four variants of PCA methods on the quantiles of annual rainfall data. Environ Monit Assess 190, 569 (2018). https://doi.org/10.1007/s10661-018-6913-y

Download citation

Received: 09 May 2018
Accepted: 08 August 2018
Published: 04 September 2018
DOI: https://doi.org/10.1007/s10661-018-6913-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The effect of simple imputations based on four variants of PCA methods on the quantiles of annual rainfall data

Abstract

Similar content being viewed by others

The Effectiveness of a Probabilistic Principal Component Analysis Model and Expectation Maximisation Algorithm in Treating Missing Daily Rainfall Data

A Study on Bayesian Principal Component Analysis for Addressing Missing Rainfall Data

Comparison and selection criterion of missing imputation methods and quality assessment of monthly rainfall in the Central Rift Valley Lakes Basin of Ethiopia

Introduction