Keywords

1 Introduction

Solar radiation data is essential in many disciplines such as environmental sciences, energy production [2] and climate analysis [11]. The variable most widely used is global horizontal irradiance (G), which is the total amount of downwelling shortwave irradiance reaching the Earth’s surface over a horizontal plane. Thermopile pyranometers and silicon-based photodiodes are the two types of outdoor sensors for measuring G. Thermopiles typically achieve the highest quality and they are based on the thermoelectric effect: a blackened surface absorbs the incoming radiation creating a thermal gradient that is measured with a thermocouple. Photodiodes are made by small silicon cells and are thus based on the photovoltaic effect. They are a low-cost option requiring fewer maintenance and having faster response times, but they have higher uncertainty than thermopiles because their responsivity is limited by the spectral response of silicon.

Ground measurements obtained with pyranometers are the most accurate source of G data [12], but they still contain different types of errors that can be broadly divided into operational and equipment errors [5, 15]. Operational errors are related to the particular operation conditions in the station, such as the station location and maintenance procedures. Some examples include shading by nearby objects, the accumulation of dust over the senor or electrical shutdowns. On the contrary, equipment errors are related to the intrinsic limitations of the instruments and inadequate calibration procedures, being typically more severe in photodiodes than in thermopile pyranometers.

Several quality control (QC) methods have been developed to detect errors in ground records. The majority of QC methods are simple range tests that establish the most probable physical or statistical limits for G values to discard samples out of these ranges [15]. Here, the most widely used method is the BSRN QC [7]. Alternative methods include interpolation between nearby stations [5], graphical analysis [8], analysis of the symmetry of irradiance profiles [3] and coherence tests of the different irradiance components [7]. All these methods are only able to detect large errors whereas most common defects, such as shading and soiling, introduce small deviations in G. The detection of these small but long-lasting errors is not straightforward because filters cannot be too restrictive due to the wide range of physically possible irradiance values.

We recently presented a new QC algorithm specially tailored for detecting small errors in ground records [13]. The method flags those samples in which the deviations between several radiation estimations and the ground records are out of the typical range for that region and time of the year. We assume that if the deviations of several independent radiation models are out of the typical ranges, the most likely cause is an error on the ground record. Even though the quality of estimations is not as high as that of ground records, the algorithm exploits the advances on the stability of solar radiation modeling techniques such as satellite-based products [12] and reanalysis. Besides, filtering out samples in terms of deviations instead of in terms of G produces a more restrictive filter, enabling the detection of low-magnitude defects.

The first goal of this study is to validate the QC algorithm with a heterogeneous dataset comprised by all Spanish monitoring stations that measure G (748 stations), including several regional networks where the probability of finding errors is higher. The second objective is to guide potential users by deriving some statistics about the quality of the monitoring networks available in Spain. In addition, we include an enhanced visual decision support tool for the analysis of flagged samples.

2 Methods

2.1 Radiation Data

Ground records were retrieved from all Spanish stations that measure G from 2005 to 2013 at the highest temporal resolution available at no cost. This results in a dataset comprised by 748 stations and 9 networks (Fig. 1): six dedicated meteorological networks (BSRN, AEMET, Meteo Navarra, Meteocat, Euskalmet, MeteoGalica), two networks for agricultural purposes (SIAR and SIAR Rioja) and one network for emergency situations (SOS Rioja). The networks can be also categorized based on their spatial coverage in worldwide networks (BSRN), national networks (AEMET and SIAR), while the remaining ones are regional networks. Most meteorological networks included thermopile pyranometers (285 stations) whereas photodiodes are the common sensor of agricultural networks (386 sensors). Thermopile pyranometers were classified from highest to lowest quality according to ISO 9060:1990 [4] in (i) Secondary Standard, (ii) First Class and (iii) Second Class. The description of the sensor was not provided in 77 stations.

Fig. 1.
figure 1

Location and type of pyranometer installed in the stations used in this study.

The QC algorithm uses daily values of global horizontal irradiance (\(G_{d}\)). All night values (sun elevation \({<}0^{\circ }\)) were initially set to 0. In stations with 1-min resolution, 15-min means were calculated (5 valid values required) and subsequently averaged to obtain the hourly means (all four 15-min values required). In stations with time resolutions from 5-min to 30-min, hourly means were directly obtained by averaging the original data (all original values required). Daily values were finally obtained by averaging hourly values if at least 20 h values were available.

2.2 Description of the Quality Control (QC) Algorithm

Step 1: Calculation of the Confidence Intervals (CIs). The first step of the QC algorithm is to find the characteristic values for the daily deviations (\(\delta _{d}\)) between each radiation product p and the ground records. This is done by calculating the confidence intervals (CIs) for the monthly bias of each product (temporal averaging) over an spatial region g with uniform irradiance conditions (spatial averaging). These CIs are defined as the median absolute deviation of this bias (\(MAD_{m', g}^{p}\)) around the median (\(\widehat{Bias}\)). They include a tuning parameter (n) to adjust the restriction level of the QC procedure (1).

$$\begin{aligned} CI_{m', g}^{p} = \widehat{Bias}_{m', g}^{p} \pm n \times MAD_{m', g}^{p} \quad m' \in (Jan,...,Dec) , g \in (g_{1}, ..., g_{n}), \ p \in (p_{1}, ..., p_{n})\quad \end{aligned}$$
(1)

where \(m'\) are the different months of the year, p the radiation products used and g the spatial regions defined. The use of median and MAD statistics along with the spatio-temporal averaging of the bias increase the robustness of the CIs. The CIs were calculated only with high-quality ground records in order to reduce the probability of including operational and equipment errors in the CIs. Thus, in the present study only records from AEMET secondary standard pyranometers were used because these are the highest quality radiometers and the maintenance procedures of AEMET are the strictest among all Spanish networks. Besides, we did not define any sub-regions within Spain and hence the same CIs were used to filter out all Spanish stations.

Fig. 2.
figure 2

Flowchart for one run of the window function.

Step 2: Flagging Using a Window Function. Once the CIs are calculated, a window function goes through the time series of each individual stations analyzing groups of consecutive days at a time and flagging potentially erroneous samples. The number of consecutive days analyzed by the window function is defined by the window width (w). The distance between two consecutive windows is controlled by the parameter step, which was set to 5 days along the experiments. Each analysis of the window function (Fig. 2) starts with the calculation of the number of available samples per product (\(d\_valid\)). Products with less than 20% samples available within the window are discarded. The percentage of samples above the upper limit (\(d\_over\)) or below the lower limit of the CIs (\(d\_under\)) are subsequently calculated. If at least one of the products covers the 80% of the window days, the average of \(d\_over\) and \(d\_under\) are computed. These thresholds were set experimentally to ensure that all products used have sufficient amount of samples, and that at least one of these products covers most of the window width. Finally, daily records within the window are flagged if more than 80% of the samples are either over or under the CIs. If estimations from all the products present the same type of unusual deviation (above or below the CIs), we assume that the most likely cause will be a defect in the ground records.

Three independent radiation products were used in this study: two satellite-based models, SARAH-1 [9] and CLARA-A1 [6], and one reanalysis, ERA-Interim [1]. The two most important tuning parameters of the QC algorithm are w and n. The best configuration was found by varying w within (5, 10, 15, 20, 30, 40, 60, 90, 120) and n from 0.2 to 3.5 in intervals of 0.1. Results were analyzed in terms of the Precision-Recall curve, which plots the precision (2) against the recall (3).

$$\begin{aligned} Precision = \frac{TP}{TP + FP} \end{aligned}$$
(2)
$$\begin{aligned} Recall = \frac{TP}{TP + FN} \end{aligned}$$
(3)

where TP stands for true positives, FP false positives and FN false negatives. The analysis of the PR curves was performed with the dataset of European stations described in [13] and it revealed that the best configuration consisted on running the window function two times. One run was to look for short-lived defects (\(n = 2.4\), \(w = 20\) days), and the second was to look for permanent low-magnitude deviations (\(n = 0.4\), \(w = 90\) days).

Step 3: Visual Decision Support System. Two graphs are generated to facilitate the analysis of flagged samples. This contrasts with the majority of available QC methods for solar radiation, which generally just produce numerical flags and leave to the user the interpretation of those flags. The first plot is the daily deviation plot (Fig. 3A), which depicts the deviations between estimations and ground records. The plot includes a visual flag for each run of the QC algorithm (yellow and orange), shading the periods of daily records flagged. It includes two additional flags for periods with missing data (grey) and for samples that do not pass the BSRN QC (red). The use of the BRSN range tests enables the detection of errors that are masked after aggregating to the daily values, e.g. time lags. However, these tests can only be used if sub-daily data are available.

Fig. 3.
figure 3

Example of the two images generated for the graphical analysis of the quality flags. (A) Daily deviation between estimations from radiation products and ground records. (B) Hourly irradiance values of SARAH-1 and the ground sensor. The images correspond to the data recorded during 2007 by the Euskalmet station C064 (Zarautz, Camping). (Color figure online)

The second plot is the hourly irradiance profiles of measured and estimated data overlapped (Fig. 3B). It is only generated for stations with sub-daily time resolution data and it requires at least one product with hourly time resolution, e.g. SARAH-1. Whereas the first plot could be sufficient for detecting false alarms, the second plot provides valuable information for identifying the causes of the defects.

Software. The QC algorithm was implemented in R programming language using the tidyverse [14] collection of packages: dplyr and tidyr for data manipulation, lubridate to work with time series and ggplot2 to create the plots of the visual decision support system.

3 Results and Discussion

3.1 Setting up the QC Algorithm

Results obtained with each combination of w and n represent one point in the PR space (Fig. 4). Two types of PR curves were calculated. The first one (Fig. 4A) considers that each sample of the PR curve is one daily record of a specific station. Even though this is the straightforward analysis of the output of the QC method, it is not the most practical approach. The algorithm rarely finds the exact number of days with defects because it flags all the daily records within a window. This is especially evident at the edges of periods with errors and with low radiation values (winter months or high latitude locations). Moreover, most of these misadjustments are corrected by visual inspecting the flagged samples, so this first set of PR curves do not show the real performance of the QC method. As a consequence, the second set of PR curves was generated considering that each sample corresponds to one ground station (Fig. 4B). These curves illustrate whether the QC method is able to detect the presence of a defect in a ground station, regardless it finds the exact daily records where the error occurs.

Fig. 4.
figure 4

Precision-Recall (PR) curves obtained for the different combinations of n (tuning parameter of the CIs) and w (window width). (A) One sample corresponds to one daily record of the station. (B) One sample corresponds to one station. The variable n goes from 0.2 (up-pointing triangle) to 3.5 (down-pointing triangle) in 0.1 intervals. The red dot represents the results obtained with the chosen configuration based on two runs of the window function. (Color figure online)

The PR curves show that using wider CIs by increasing n leads to a greater recall (more defects detected) but to a smaller precision (more false alarms). The same pattern is observed for decreasing values of w, reducing the number of days analyzed by the window function. With both parameters, more restrictive conditions (small n, small w) lead to larger number of defects identified at the expense of a larger amount of false alarms. In principle, the best configuration should be an intermediate solution that balances the number of true positives and false alarms, somewhat around \(w = 30\) and \(n = 1.5\). Nonetheless, the selection of the best configuration is also affected by the different characteristics of the defects present in ground sensors. Short-lived defects, such as electronic shutdowns or equipment failures, typically last from few hours to few days but the magnitude of the deviations created is usually large. On the contrary, long-lived defects introduce small deviations that can even become permanent, such is the case of shading by surrounding objects. Hence, the type of defects detected with narrow windows (\(w< 20\) days) are not the same as those found with wide ones (\(w > 30\) days), so the use of an intermediate solution is not sufficient to detect all types of defects present in ground sensors.

We found that the best configuration of the QC algorithm was obtained with two runs of the window function. One run looking for short-lived defects (\(w = 20\) days, \(n = 2.4\)), using a wide CIs (high n) in order to reduce the number of false alarms. And another run looking for almost permanent defects (\(w = 90\) days, \(n = 0.4\)), using a more restrictive CIs (small n) in order to detect low-magnitude semi-permanent defects. The use of a window function along with the trade off between w and n enables the detection of defects not found by traditional QC algorithms. The combination of these two runs leads to a precision and recall of 0.66 and 0.92, respectively, which improves the configurations based on a single run of the window function (see red dot in Fig. 4B). The parameters for this two-run configuration were tuned prioritizing the attainment of a high recall. From the users perspective, it is more useful to find all existing defects rather than having a low number of false alarms. This is even more clear here because the QC method incorporates a visual inspection tool that speeds up the identification of false alarms.

3.2 Quality Analysis of Spanish Monitoring Stations

Samples flagged were visually inspected using the two plots generated by the QC method (Fig. 3) to detect false alarms and identify the most likely cause of the deviation observed. Errors were classified into the following categories: shading by nearby objects (shading), accumulation or dust over the sensor (soiling), time lags, diurnal periods with irradiance equal to 0 (diurnal G = 0), incorrect leveling of the sensor (leveling), large errors due to major equipment failures (large errors) and errors of unknown cause (unknown cause).

Fig. 5.
figure 5

Number of stations with errors and types of defect detected by the QC method in the different networks. The numeric values represent the percentage of stations with defects in each network. No errors were found in BSRN and Meteo Navarra. The “multiple errors” category represents the stations with more than one type of defect.

The QC algorithm detected errors in 310 out of 748 stations (Fig. 5), whereas the BSRN QC, which is the most common QC procedure for solar data [10], only found time lags (49 stations) and some isolated cases of leveling issues and large errors. The majority of the defects were found in SIAR, which is also the largest network, with 225 defects (47% of SIAR stations). SIAR is an agricultural network created by the Spanish Ministry for irrigation planning. Most SIAR stations were installed in agricultural areas such as Ebro and Guadalquivir Valleys or the Mediterranean Coast. In some cases the exact placement of the sensor was even influenced by the proximity of other government facilities in order to facilitate the maintenance of the sensors. By contrast, pyranometers must be installed in locations with an obstacle-free horizon and far from potential sources of contamination such as industrial areas, airports or busy roads. This inadequate location selection explains the high amount of shading defects found (36 stations). Moreover, other variables such as temperature and precipitation are more frequently used for agricultural purposes than incoming solar radiation. This little use of G data, along with the poor maintenance of the stations, may also explain the presence of defects such as large errors, time lags or diurnal G = 0 in SIAR.

MeteoGalicia and Euskalmet are similar networks with a substantial amount of defects as well. They are the regional meteorological agencies of Galicia and Euskadi, respectively, providing G data with a high time resolution (10 min). Euskalmet records are obtained with high-quality secondary standard pyranometers, whereas MeteoGalicia uses different types of sensors including a large number of first class pyranometers (Fig. 1). However, the number of defects found in both cases is too high for a meteorological network with high quality equipment (54% for Euskalmet and 47% for MeteoGalicia). The most common defect is large errors, which could be partly explained by the high time resolution provided by both networks. Large errors are usually short-lived defects that get masked when aggregating the data to hourly or daily values. Nonetheless, some of the defects identified, such as long nocturnal periods with physically impossible values (Euskalmet), evidence the lack of quality checks in both meteorological agencies. Shading and soiling are other common defects in both networks that questions the maintenance routines of these networks as well. As a consequence, despite the fact that high quality is a priori expected from meteorological agencies, G records from these two networks should generally be avoided.

Ground records from SOS Rioja present the worst quality overall, with the presence of defects in 79% of the stations (15 out of 19). The most common defect is diurnal periods with G equal 0, which is some cases extend the whole year indicating a null maintenance in either the network or the acquisition system. Shading is another frequent defect in SOS Rioja (4 stations), but compared to other networks the shades are visible around solar noon. This excludes the possibility of shades being caused by obstacles in the horizon, such as mountains, trees or buildings. SOS Rioja sensors are installed in lattice towers, so the most likely scenario is that the shades are being caused the own structure. This evidences an inadequate planning during the installation of the equipment. In addition, the lack of maintenance and quality checks ruins the quality of the sensors (first class thermopile pyranometers), proving that the acquisition of high-quality equipment does not guarantee collecting high-quality records.

The networks with the highest quality are AEMET, SIAR Rioja, Meteo Navarra, Meteocat and BSRN, with only 4 defects among all these networks. The good quality of BSRN and AEMET was expected. BSRN is considered the highest quality radiation network worldwide. It has even one dedicated researcher at each station revising the sensors and checking the consistency of the data. AEMET is the Spanish national meteorology agency and it also includes high quality sensors with elaborated maintenance procedures. Besides, in both networks the pyranometers are always ventilated reducing the accumulation or snow, dust and humidity over the dome of the sensors. The use of BSRN and AEMET data should be therefore preferred in applications that require a small uncertainty of solar radiation data. We conclude that data from Meteocat, Meteo Navarra and SIAR Rioja have enough quality for being used for regional studies in Cataluña, Navarra and La Rioja, respectively.

Fig. 6.
figure 6

Total number of stations in which defects were found but the cause of the error could not be identified categorized by the type of sensor used.

All errors identified were operational errors related to maintenance routines, the location of the sensor and QC of the data. However, there are 78 stations in which the presence of an error was evident but it was not possible to identify the exact cause of the defect. The classification of these errors by the type of pyranometer of the station (Fig. 6) reveals that the majority of these defects appear in low-quality sensors: photodiodes (62 sensors) and second class thermopiles (6 sensors). Besides, the 8 stations without information about the pyranometer belong to SIAR, where the majority of the sensors are photodiodes as well. The cause of the error was unknown only for two high quality sensors, one AEMET secondary standard and one Meteocat first class pyranometers. Both networks only provide daily data without cost, which prevents creating the hourly irradiance plot (plot B) for the visual analysis of flagged samples. Hence, the most likely cause here is the existence of an unidentified low-magnitude operational error. In the case of second class sensors, and specially in photodiodes, the most likely cause of these deviations is the presence of equipment errors.

Compared to thermopiles, photodiodes are more affected by cosine and temperature errors, an besides, they have a limited spectral response because they are made by silicon detectors. Silicon has a spectral response within 350–1100 nm that includes only about the 70–75% of total shortwave incoming radiation. Hence, the calibration constant does not account for non-linear variations of the solar spectrum out of the bandwidths covered by silicon. This occurs with changes in aerosol or water vapor concentrations and with variations of sun elevation that modify the main atmospheric scattering process. As a consequence, photodiodes need to be carefully calibrated against thermopile instruments. Independent correction factors for cosine errors, temperature dependence and spectral response are required to obtain field accuracies within the ranges specified by the manufacturer. These corrections should be applied individually for each location taking into consideration the particular conditions of each place and sensor. Therefore, the use of the same correction factors for all SIAR photodiodes, or even the lack of correction factors, may be the cause of the deviations observed. The maintenance procedures of SIAR network are also questioned after finding a large number of operational errors in SIAR stations. Therefore, these small deviations may be also caused by undetected operational errors. It is not easy to identify which cause leads to the deviations observed in each photodiode. Further work is required to gain a better understanding of the limitations of photodiodes, analyzing the half-hourly measurements provided by SIAR stations. Nonetheless, it is clear there are significant differences in terms of quality between SIAR photodiodes and thermopile sensors. Overall, our QC method was not only able to detect operational errors but also some equipment errors, which are the most difficult to detect due to the low-magnitude deviations introduced.

4 Conclusions

A hybrid QC algorithm for solar radiation data, which is based on the analysis of the deviations between satellite-based models and ground records, was validated using 748 Spanish ground monitoring stations that measure global horizontal irradiance. The results reveal that the QC algorithm can detect operational and equipment errors that are rarely found by conventional QC methods, such as the BSRN tests. Besides, this study manifests the low-quality of some of Spanish networks such as SIAR, MeteoGalicia, Euskalmet and SOS Rioja. These networks present defects in 50% or more of the stations. Most of these defects are operational errors related to an inadequate placement of the sensor, a lack of maintenance and a lack of quality control of the data, but the method was also able to identify potential equipment errors in silicon-based photodiode pyranometers. We conclude that data from these networks should be generally avoided in applications requiring solar radiation data.