1 Introduction

The importance of accurate and reliable modeling, estimation and forecasting of precipitation is becoming increasingly apparent as the rapid worldwide increase in population and water demand puts pressure on limited water resources and dwindling water supplies (Leconte et al. 2013; Popp et al. 2016). Accurate and reliable observations of precipitation are essential to the performance of valid hydrologic studies; yet, many precipitation records are incomplete. Complete records improve the ability of these studies to determine spatial, temporal and quantitative variations in precipitation data, which is crucial to the design of water supply systems. Changes in the water cycle and precipitation patterns, coupled with a warming climate (Hou et al. 2014; Reager and Famiglietti 2009), increase the need for stronger precipitation models (Zhang et al. 2010).

Developments in software technologies in recent decades have allowed traditional hydraulic and data-driven models to support/complement hydrologic models (Solomatine et al. 2008). Data-driven models analyze time series data, but they should not be regarded as computational methods that ignore physical processes. Determining the spatial and temporal interrelationships between precipitation time series data is mathematically equivalent to determining the relationships between the drivers of precipitation. In other words, precipitation is a function of its contributing variables. Thus, the analysis of precipitation time series data comprises the consideration of all variables that contribute to precipitation (though the relationships and variations of the variables are not evaluated); and the success of making accurate estimations of missing data is directly related to the level of understanding of the temporal and quantitative relationships between observed data.

Though precipitation is generally seasonal, the high variability in numerous influencing factors sometimes indicates the existence of a chaotic (Jayawardena and Lai 1994; Sivakumar 2000; Sivakumar et al. 1999) and relatively random behavior. This nonstationary and sometimes erratic behavior results in distinct variations in precipitation across space and time and makes the observation, quantification, estimation and forecasting of precipitation challenging (Wang and Lin 2015). Consequently, although there are a vast number of data-driven modeling studies that estimate hydrologic processes such as streamflow (which generally occur continuously), a very limited number of studies address the data-driven estimation of missing precipitation records. Some prominent studies that have utilized data-driven methods to estimate precipitation have applied artificial neural networks (ANNs), fuzzy rule based systems (FRBSs), genetic algorithms (GAs), support vector machines (SVMs), particle swarm optimization (PSO) and expectation maximization (EM) in the computation of results.

Lack of generality and overfitting are two of the most important problems associated with existing data-driven methods, as discussed in detail by Remesan and Mathew (2015). Both issues result in model failure when the training and testing period ranges change. Unfortunately, most data-driven hydrologic modeling studies do not even mention (or test) these issues. Another problem associated with existing methods is that time series data is generally regarded as a one-dimensional vector. This results in a failure to acknowledge the variation of behavior seen through time series data. For example, hydrological time series generally indicate an annual cycle of seasonality, with values observed in the winter months varying greatly from those observed during the summer months. Instead of using a one-dimensional time series to represent this data, a two-dimensional matrix containing a full cycle in each row would better express this temporal hydrological variability in a more comprehensible way and would enable the investigation of the two-dimensional behavior of time series data (Dikbas 2016b). Detailed information about the concepts, approaches, experiences and problems associated with the data-driven modeling of hydrologic variables exist in literature (Elshorbagy et al. 2010a, b; Maier and Dandy 2000; Maier et al. 2010; Remesan and Mathew 2015; Sikorska et al. 2015; Solomatine et al. 2008; Solomatine 2006; Yozgatligil et al. 2013).

This paper discusses the implementation of the Frequency Based Imputation (FBI) method to analyze observation data from 70 precipitation stations in Turkey. The method was first used to analyze all streamflow observations from 34 stations on the Buyuk Menderes River (Turkey) (Dikbas 2016a). This approach is based on the assumption that an individual observation in a time series is more closely and quantitatively linked to data observed within a short period of time and with data from the same subsection of other periods if the time series is periodic (i.e., same season in different years). The method searches neighboring data cluster pairs of missing data within an observed series, and then estimates the probable range and value of the missing data by utilizing temporal relationships. It is direct and uses all existing raw data to obtain estimates of missing values; and it requires no training/testing periods or input parameters to execute the applied procedure.

2 Materials and methods

2.1 Description of the frequency based imputation method

When precipitation observations are placed on a matrix with months in columns and years in rows, we expect annual fluctuations in the horizontal direction and values similar to each other in the vertical direction. In this setup, the smallest scale representing the temporal and quantitative behavior of precipitation is an adjacent pair of data on the two-dimensional matrix. This micro-statistical reasoning allows the FBI method to extract valuable information based on relationships within the dataset and provides information on the possible range of missing observations.

Figure 1 illustrates the logic behind the FBI method. The blue cell at the center of Fig. 1e (January 1985) is the missing value to be estimated. The method considers that the neighbors within the 7 × 7 matrix surrounding the missing value contain the strongest clues about the expected range of the missing cell. A wider field would add cells with a poorer relationship to the data point in question (like trying to determine the influence of values in September or May on a value in January which are less likely to be as influential as the considered temporally closer values from October to April); and a narrower field would remove cells with potential relationship (like ignoring the influences of October and April on the January value). Similarly, expanding the field vertically would result in the consideration of observations four or more years preceding or following the missing value, even though these values are less likely to relate to the value in question when compared to the values in closer years. The numbers in each cell in Fig. 1 are cluster values calculated by using Eq. 1 or 2 after the observed series was sorted and divided into range clusters (Appendix 1).

Fig. 1
figure 1

The missing observation to be estimated (the blue cell) and eight example cluster pairs to be searched in the data matrix (each pair is shown in a different color) (e), matching cluster pairs found in different sections of the data matrix (take careful note of the relative location of the missing value) (ad, fi) and the probable values of the missing data (cells with blue borders at the center of ad, fi)

After the cluster index values for each cell are determined, the process of generating a cluster frequency table for each missing value begins. To this end, all adjacent cluster pairs within the neighborhood of a missing cell are searched using a data matrix. Figure 1e shows eight of the many cluster pairs in the neighborhood of the missing value for January 1985. The remaining subfigures show the locations of the matching cluster pairs. The aim of the search for matching cluster pairs is to deduce the highest probable cluster value for the missing cell. This task is accomplished by looking at the cluster values of the blue-bounded cells at the relative location of the missing January 1985 cell. These clusters show the probable values for the missing cell in January 1985 by answering the questions constructed using the searched and matched cluster pairs. One of the eight questions illustrated in Fig. 1 is:

“What might the cluster value of the missing cell in January 1985 be when the cluster value in January 1983 is 8 and the cluster value in January 1984 is 10?”

The goal here is to find the third cluster value of three vertically aligned cells when the first value is 8 and the second value is 10. One of the answers to this question is shown in Fig. 1b and is written as follows:

“The cluster value for February 1974 is 9 when the cluster value for February 1972 is 8 and the cluster value for February 1973 is 10”. In other words, the cluster value for January 1985 might be 9 based on previously observed series values.

For all eight cluster pairs in Fig. 1e, the probable cluster values at the relative January 1985 location in the remaining figures are found to be: 12 (2 times), 11 (2 times), 10, 9, 5 and 3. When the search for all pairs in the neighborhood of the missing value is completed, the cluster with the highest frequency is considered to have the highest probability of being the missing value. The estimated precipitation value is calculated by taking the average of the observations that generated the greatest cluster frequency. Details of how the cluster frequencies were determined and generated are provided in Appendix 2.

2.2 Study area and data

To test the applicability of the developed method and provided software on various climate zones, a total of 34,330 monthly total precipitation observations from 70 stations across 21 different basins in Turkey were estimated (Fig. 2). Turkey has a moderately dry climate. Average precipitation tends to be high in the coastal regions of Turkey and decreases towards the inland regions. The area around Rize on the coast of the Black Sea receives an average annual precipitation of 2200 mm, while Salt Lake region receives 250–300 mm. The Aegean and Mediterranean coasts are wet in the winter but dry during the summer. The Black Sea coastline is the only region in Turkey that receives precipitation throughout the year. Figure 2 illustrates the average annual precipitation in Turkey between 1981 and 2010. The selected stations represent the majority of the climate and elevation zones, and cover nearly all hydrological basins in Turkey.

Fig. 2
figure 2

Map of 19812010 average annual precipitation in Turkey, including the locations of the 70 stations used in this study

The General Directorate of State Hydraulic Works of Turkey observes precipitation throughout the country using pluviographs capable of measuring liquid (rainfall) and solid (snow, hail, freezing rain, grain, etc.) precipitation. Therefore, the observations used in this study include liquid precipitation and water equivalents of solid precipitation.

Table 1 outlines the descriptive statistics for all stations, including percentiles and best-fitting distributions. The highest and lowest values (excluding 0.0) are shown in bold in all tables throughout the article. The majority of precipitation series from all stations (48/70) were found to fit the Wakeby distribution. The skewness and excess kurtosis measures indicate that the probability distributions for all stations are positively skewed and leptokurtic (except 21-007). A majority of the stations (67/70) have a minimum monthly precipitation of 0. A total of 59% (41/70) of stations registered zero monthly precipitation for at least 5% of the year, while 36% (25/70) of stations measured zero monthly precipitation more than 10% of the year and 4% (3/70) of stations measured zero monthly precipitation data during more than 25% of the year.

Table 1 Descriptive statistics, percentiles and best-fitting distributions for all stations

A comprehensive explanation of the applied steps for the estimation of the monthly total precipitation is presented below for the observations of station 07-016 in Çivril-Denizli (Turkey). The first seven values from 1962 are missing, and the total number of existing observations at station 07-016 is 521. When 12 observations (a year of data) are removed from the set to test the model’s ability to make estimations, this number decreases to 509, resulting in a missing data rate of 3.6% (Fig. 3).

Fig. 3
figure 3

Heat map of monthly precipitation observations for station 07-016

The details of the estimation process are presented using the observed values from 1985. The entire estimation process was repeated for each missing data point. First, the software removed and estimated data for each year between 1962 and 1984. Then, the 1985 values were removed from the set and estimated. The January value was estimated first. Figure 4 shows the observed values for those months and years surrounding January 1985. The October–December columns represent values from the previous calendar year (current water year).

Fig. 4
figure 4

Observed values for months and years around January 1985

To assess the quantitative relationships between the observations, the observed series are sorted and divided into 2–12 clusters, as explained in Appendix 1. The greatest number of clusters (12) was chosen based on the length and variability of the time series. The results show that this number was sufficient to generate successful results. Figure 5 shows the cluster values for the field surrounding January 1985 at each clustering step. Lower values are shown in shades of red and higher values are shown in shades of green. When the observed data series is divided into two clusters, the first cluster contains the lower precipitation values (0–25.8 mm) from the sorted observations, and the second cluster contains the higher values (26.0–204.8 mm). Each data point is assigned a cluster index: 1 for the data in the first cluster and 2 for the data in the second cluster, as shown in the first table of Fig. 5.

Fig. 5
figure 5

Cluster numbers for data points near January 1985 at each clustering step

In the remaining cluster divisions (3–12), the January 1985 (84.9 mm) value is always located within the highest range of observations and thus the last cluster (bounded in blue in Fig. 5). The temporal and quantitative relationships between the horizontally, vertically and diagonally adjacent cluster pairs in the neighborhood of the missing data are determined as explained in Sect. 2. Then, the relationships are used to estimate the probable cluster value of the deliberately removed data in the center of the neighborhood.

When the sorted observations of the station are divided into 12 clusters, 496 cluster pairs matching with the adjacent cluster pairs in the neighborhood of January 1985 were found in the data matrix. Eight examples of the searched pairs in the neighborhood of the missing data, and matched pairs from various regions of the data matrix are shown in Fig. 1. The process described above for 12 clusters is repeated for 2–11 clusters, and a cluster frequency table is obtained for each month of 1985 (Fig. 6).

Fig. 6
figure 6

Cluster frequency tables of the months of 1985

From left to right, each column in each table shows the frequencies obtained after dividing the observed value range for station 07-016 into 2–12 clusters. Each column heading indicates the number of clusters into which the observed data range is divided. Each row heading indicates the cluster indices. The Min and Max columns on the right show the cluster ranges when the number of clusters is 12. For example, Cluster 1 includes 0 values, Cluster 2 includes values from 0.1 to 4.5 mm, Cluster 11 includes values from 65.1 to 80.3 and Cluster 12 includes the highest values (80.8–204.8 mm).

The frequency table for each month provides information on the possible value of the missing data point in that month. For example, the first column of the frequency table for January 1985 shows the frequency values obtained for the first (the lower values) and the second (the higher values) clusters when the data series is divided into two clusters. The frequency value of the second cluster (10,075) is higher than the frequency value of the first cluster (6623). This shows that it is more probable that the January 1985 data point range was identified by the second cluster (within the 29.0–204.8 mm range).

The division into three clusters yields frequencies of 1388, 2965 and 3397, respectively. The high value of the third cluster indicates that the desired value is most probably within the 45.9–204.8 mm range. Similarly, for the remaining clusters, the higher frequencies trend toward the bottom of the January 1985 cluster frequency table, indicating that the missing data point is most probably in the higher observation range.

The larger the number of clusters, the smaller the data range covered by each cluster. The increase in the number of clusters results in a green path that highlights the highest frequencies generally observed. This green path shows the clusters with the highest probability of representing the missing value range; in contrast, the red cells indicate those clusters with a lower probability of representing the missing value. Months with highly variable observations (like June) result in fuzzy frequency tables, while months with low variability (like August) produce more distinguishable red and green patterns. For 1985, the green trends are more apparent in the January–April and July–September frequency tables.

2.3 Estimation of missing values based on cluster frequencies

The 12th column in the frequency table for January 1985 (Fig. 6) is used to estimate the missing data for that date. The clusters that occur most often provide the most likely ranges of value for the missing data. In the January 1985 example, the highest frequency (70) occurs in cluster 10, which represents the precipitation range between 53.6 and 65.0 mm. The average of the 70 observations (60.04 mm) used to generate this frequency is the most likely estimation of the missing January 1985 value. The obtained estimate will always be within the range of the averaged cluster. In the present example, the actual observed value for the January 1985 data point was 84.9 mm (within the range of the 12th cluster).

The second highest frequency (64) obtained by the example model occurred in cluster 12 (80.8–204.8 mm range). The average of the 64 observations used to generate this frequency is 110.6 mm and is the second probable estimate for the January 1985 value. The third highest frequency (63) occurred in clusters 6 and 11, which represent the third and fourth most likely estimates (71.2 and 25.2 mm) of value. The green path in the January 1985 frequency table indicates that the most likely value will be within the range of clusters 10–12; and, of the first five estimations, the third estimate obtained (cluster 11) is the nearest to the real observed value. This approach is repeated for the five highest total frequency values for each month analyzed, and the five most likely estimates for each month are written in a correlation tables output file by the software. As previously stated, precipitation is relatively chaotic, and the most likely precipitation might not be the experienced precipitation. Therefore, generating multiple precipitation values with a high likelihood of occurrence is very useful to scientists and practitioners who work with precipitation data.

The three lowest frequencies obtained for the 12 clusters occurred in clusters 1–3, indicating that the range 0.00–11.0 mm is the least likely to represent the total precipitation that occurred in January 1985. The actual 1985 data points to be tested were removed prior to the application of the method and were not known by the software at any stage of the estimation process.

The ability of the FBI method to estimate precipitation values can be compared to estimates generated using the EM and MLR methods, which are also direct methods. EM is an iterative method used to identify the maximum likelihood estimates of parameters in statistical models (Dempster et al. 1977). It also enables parameter estimation in probabilistic models with incomplete data. A good introduction to the mathematical foundations and applications of the EM method is provided by Do and Batzoglou (2008). As with the FBI method, the EM and MLR methods have the ability to generate estimates for a series by using existing observations in the series itself; they do not require preprocessing of data, and unlike methods such as ANN, they do not require the adjustment of any input parameters to improve the results. To compare these two models with the FBI method, all existing station 07-016 observations were estimated using the EM and regression modules in the missing value analysis toolbox of the IBM S.P.S.S. software. The same approach used to estimate values in the FBI method was applied. The data from each year was removed and estimated using both methods. Table 2 shows the estimates obtained for the test year using the FBI, EM and regression methods, together with the long-term monthly averages. The correlations obtained using the EM (0.713), regression (0.778) and long-term average (0.733) methods are significantly lower than the correlation found using the FBI method (0.976).

Table 2 Correlations between the observed values and the best estimates from the FBI, EM, regression and long-term average methods for 1985

To test the advantages of generating multiple estimates for a missing value, the increase in correlation with the increase of the number of estimations is assessed for all observations of the station 07-016 annually. Table 3 shows the correlations between the observed values and the best estimates generated within the first 2, 3, 4 and 5 estimations for each year. Annual correlations over 0.7 occurred between the observed values and the nearest estimates in the first two estimations in 58% of cases (25/43). This rate increased to 91% (39/43) when three estimates were generated and to 100% when four or five estimates were produced. Similarly, the rate of annual correlations over 0.8 was 28, 81, 98 and 100% for the first 2, 3, 4 and 5 estimates, respectively; and the rate of annual correlations over 0.9 was 5, 33, 74 and 98%, respectively. These results indicate that increasing the number of estimates generated increases the model’s reliability and accuracy.

Table 3 Correlations between observed values from station 07-016 and the estimates generated using the clusters with the five highest frequencies

The last column in the table (titled “Whole”) shows the correlations between the entire observed series and the series of best estimates derived from the first 2, 3, 4 and 5 estimations. A correlation value of 0.843 obtained for the first three estimations might be regarded as sufficient to estimate precipitation. Increasing the number of estimates to 4 produces a correlation of 0.912, while increasing the number to 5 yields a correlation of 0.944 for the entire series. These correlations indicate the production of extremely reliable precipitation estimates.

Table 4 presents the correlations between the observed values from station 07-016 and the estimates derived using the FBI, EM and regression methods, as well as the long-term averages for each year. For all years, the correlations between the FBI method estimates and the observed values exceed the correlations between the EM, regression and long-term average values and the observed data. While 98% (42/43) of the annual correlations between the FBI method and the observed values are over 0.9, all annual correlations with the compared methods are under 0.9.

Table 4 Correlations between the observed values for station 07-016 and the estimates from the FBI, EM, regression methods, and the long-term averages

The highest and lowest correlations produced by each method are shown in bold. The obtained results reveal that the estimates produced using the EM method tend to be more similar to the long-term averages than the observed values. This resulted similar correlation values for both the EM method and the long-term averages across the years. The correlations of the compared methods follow a similar pattern. Generally, the correlations increase or decrease together over the years. For example, the lowest annual correlations with the EM method (0.015) and the long-term averages (0.030) occurred in 1972. This year represented the sixth lowest annual correlation for the FBI method (0.904) and the third lowest for the regression (0.006) method.

To compare the general performance of the methods used to estimate precipitation values at station 07-016, five statistical measures (correlation (r), Nash–Sutcliffe efficiency coefficient (E), root mean squared error (RMSE), mean absolute error (MAE) and mean bias error (MBE)) were calculated and presented in Table 5. The FBI method performed best using all statistical measures except the MBE. The negative E value obtained using the regression method indicates that the observed mean is a better indicator of value than the regression method. The other statistical measures also reveal that utilization of long-term averages is preferred to use of the regression method. As expected, the MBE for long-term averages was zero, while the MAE was lowest for the FBI method, suggesting that the FBI method estimates are closer to the observed values. The MAE and MBE statistics should be considered together because equal averages for estimates and observed values does not generally mean that the estimations are sufficiently close to the observations. The averages may be similar even though there are significant positive and negative differences between the estimates and observed values. These differences can be detected by calculating the MAE, which has advantages over the RMSE and MBE in assessing average model performance (Willmott and Matsuura 2005).

Table 5 Statistical measures used to compare all observed values from station 07-016 with the estimates generated using the FBI, EM, and regression methods and the long-term averages

The graphs shown in Fig. 7 compare the observed values from station 07-016 with the estimates produced using the FBI, EM and regression methods. A very good fit is seen between the FBI method estimates and the observed values across the time series, indicating that the method is sensitive to the variations in precipitation. On the other hand, the estimates produced using the EM and regression methods lack generality and sensitivity. Figure 7 also shows that the FBI method provides lower estimates for rarely observed high precipitation values even though it produces better estimates compared to the EM and regression methods. Low estimations of extreme values occur as a result of the estimation logic behind the FBI method, which considers the frequency of observed values; it is well known that the frequency of extreme precipitation is generally low. The graphs also show that the estimates obtained for extreme values are always higher than the remaining estimates. This may be considered a disadvantage of the method; however, its ability to estimate extreme values might be improved by considering observations from nearby stations.

Fig. 7
figure 7

Comparisons between estimates produced using the FBI, EM and regression methods with observed data from station 07-016

2.4 Application of the FBI method using the remaining 69 precipitation stations

The above discussion was generated based on estimates and observations for a single station (07-016). A method’s ability to estimate values for a single station is not sufficient to claim that it will be successful in estimating values for other stations. To test the FBI method’s application across multiple stations, we used the above method to estimate precipitation values for 70 stations across 21 different basins in Turkey. Stations were chosen based on location and the variation in observed values. The stations reflect various climates in Turkey, ranging from dry to wet (see the descriptive statistics of the observed series in Table 1). Table 6 presents the statistical measures (r, E, normalized root mean squared error (NRMSE), mean absolute scaled error (MASE), MAE and MBE) generated for each station based on a comparison between produced FBI method estimates and the observed values from each station. The number of years data was available for each station is also presented in the table.

Table 6 Statistical measures of the comparisons between the estimated and observed values of the stations

The correlations between the results of the FBI method and the observations exceeded 0.9 for 24% (17/70) of stations and exceeded 0.85 for 79% (55/70) of stations. The minimum correlation was 0.795, and the maximum correlation was 0.944. 11 of the 15 stations with the highest correlations are located in basins 4, 5, 6, 7, 8 and 9, which are all located within the Eastern Aegean and Eastern Mediterranean regions of Turkey. Similarly, 8 of the 15 stations with the lowest correlations are in basins 12, 13, 14, 15, 16 and 18, which are located in the central and northern regions of Turkey. While the lowest 14 correlation values occurred for stations fitting to the Wakeby distribution, none of the 10 stations with the highest correlations fit this distribution. 8 of the 15 best correlated stations (including the first 3) instead fit the GEV distribution.

All Nash–Sutcliffe efficiency coefficients exceeded 0.591; 73% (51/70) were over 0.70 and 11% (8/70) were over 0.80. The highest Nash–Sutcliffe efficiency coefficient was 0.889. The highest NRMSE value was 0.114; 83% (58/70) of the NRMSE values fell below 0.10, while the lowest NRMSE value was 0.050. All MASE values fell below 0.40; 90% (63/70) of these values were under 0.35 and 41% (29/70) were under 0.30. The lowest MASE value was 0.237. The MAE values ranged between 6.367 (obtained for 18-003) and 27.571 (obtained for 08-006), suggesting that the high correlation value (0.942) obtained for station 08-006 may be misleading because the MAE and MBE values for the station are higher than those for the remaining stations. MBE values ranged between 0.084 and −17.667, with data from 69 stations generating negative MBE values. This indicates that the precipitation estimates generated using the FBI method have a slight negative bias. Greater bias occurred at stations where extreme values and variations were much higher than at other stations, resulting in greater differences between the estimated and observed values. Future studies might investigate ways to obtain average estimates closer to average observations to eliminate bias errors without increasing MAE. A method to overcome this bias might be to multiply all estimates by the ratio between the averages of the observed values and the estimated values for each station. This intervention should only be made if the MAE between the observed and estimated values also decreases. Furthermore, though this intervention may improve the estimation of higher values, a much larger number of values in the lower ranges might be overestimated. A selected bias correction method will not produce the best results for all data series (Ajaaj et al. 2016); thus, the selection of a bias correction method should be left to the users of the FBI method where necessary.

3 Discussion and conclusions

This article assesses the ability of the FBI method to estimate non-continuous monthly precipitation data without the use of observation from neighboring stations. The goodness of fit measures calculated between the observed and estimated series show that the FBI method is capable of estimating monthly precipitation data obtained from various climatic zones. However, it is impossible to claim that the method will always successfully estimate values for stations in other regions without first applying the method to observations from those stations. The practical experiences in the literature show that no data driven methodology is perfect enough to provide the best results for all stations or for all variables.

This method may also be used to estimate weekly or daily precipitation data; however, given that the randomness of precipitation generally increases with decreasing observation periods, it is anticipated that the success of the method will be lower for precipitation estimates at the weekly or daily scale. The inclusion of observations from highly correlated neighboring stations improve the generation of estimates with shorter sampling frequencies. Further studies may investigate the influence of neighboring stations on the estimation power of the presented method.

As noted above, the method analyzed in this study may not be suitable for the estimation of extreme observations that occur at a very low frequency. Values that occur with a very low frequency in a data series also have a low occurrence probability and will not occur frequently enough to be determined among the highest possible values. As is valid for most data-driven methods, the length of the data series used may influence the performance of the proposed method. The method may be less useful when applied to short data series, as the estimates produced by the presented method are based on the frequencies of the observed value ranges. The input dataset should have at least seven rows of input data (i.e., 7 years for monthly data) and more data will generally provide more information about the frequencies of the observations, consequently supporting the possibility of better estimations.

Another limitation of the method is that writing a software code for its implementation might not be easy for every user. With this in mind, a link to the source code written in Visual Basic is provided to the readers in Appendix 3. This will enable users to implement the FBI method on other datasets or in other research areas. Users of other programming languages or operating systems will need to convert the code.

The FBI method may be applied in many scientific disciplines, as it is a generally applicable, direct analysis method that requires no determination of input parameters, nor does it require any preprocessing of data. While most existing methodologies are one-dimensional, the FBI method is two-dimensional and has been shown to perform better when compared to the EM and MLR methods in the estimation of precipitation.