Definition

Calibration. The process of quantitatively defining the system responses, under specified conditions, to known, controlled signal inputs. The result of a calibration permits either the assignment of values of measurands to the system output or the determination of corrections with respect to the system output (Joint Committee for Guides in Metrology JCGM (includes ISO) 2008; Randa et al., 2008; CEOS Working Group on Calibration and Validation, 2012).

Validation. The process of assessing, by independent means, the quality of the data products derived from the system outputs. The quality is determined with respect to the specified requirements (Joint Committee for Guides in Metrology JCGM (includes ISO) 2008; Randa et al., 2008; CEOS Working Group on Calibration and Validation, 2012).

Introduction

The value of remotely sensed data products, in the scientific sense in particular, is determined by how well the characteristics of a product are known (e.g., Platt and Sathyendranath, 1988; Wentz and Schabel, 2000; Atlas and Hoffman, 2000; Jung et al., 2010). These characteristics generally include long-and short-term deviation of the product value from the true value corresponding to the measurement, which is estimated through independent means (e.g., Wehr and Attema, 2001), and accuracy of the geographic location assigned to the product (e.g., Wolfe et al., 2002; Small et al., 2004). The process of determining these characteristics for a particular remote sensing product is referred to as validation. However, before a data product is validated, it needs to be calibrated. Therefore, the calibration and validation processes are very closely linked together although they are distinctively two separate processes (see the Definition). This entry discusses calibration and validation in terms of characterization against the true value; geolocation aspect of the validation is a separate topic with specific challenges and solutions.

Remote sensing missions have requirements for the data products they are tasked to produce (e.g., Barre et al., 2008). The aim of the calibration and validation process of a particular mission is then to show that it meets its stated requirements (e.g., Delwart et al., 2008). Since the requirements are typically assigned based on expected scientific utilization of the data, the calibration and validation processes are generally regarded as a scientific function. Furthermore, the science community commonly contributes to calibration and validation efforts of data products independently from the missions in their research, due to the importance of knowing the characteristics and quality of the data (e.g., Donlon et al., 2002; Wang and Key, 2003; Mears and Wentz, 2005; Flanner et al., 2010).

The challenges of calibration and validation are specific to the mission and the data product. However, there are some general challenges concerning most of the remote sensing products. The most common and general issues causing concern are (1) establishment of accurate reference sites where the true value corresponding to the measurement can be estimated independently and accurately (e.g., Cosh et al., 2004) and (2) representing the entire measurement domain, which is often global, with a finite number of these sites (e.g., Morisette et al., 2002). It is typical the calibration and validation effort of a given mission continues during the entire lifetime of a mission and even beyond (e.g., Xu and Ignatov, 2010).

Remote sensing data products can be divided into sensor products and geophysical products. Sensor product refers to output of an instrument after translating the instrument counts to a desired quantity, such as normalized radar cross section (e.g., Srivastava et al., 1999) or radiance (e.g., Abrams, 2000). Geophysical products refer to data products which contain geophysical parameters, such as wind speed (e.g., Liu et al., 1998) or leaf area index (e.g., Yang et al., 2006), retrieved based on the sensor products (and usually with some additional ancillary data). The calibration and validation of the sensor and geophysical products differ in some aspects and are discussed separately in the subsequent text.

International cooperation is in a key role to satisfy the requirements of calibration and validation of typical remote sensing products with very large domains. Committee on Earth Observation Satellites (CEOS) (CEOS Working Group on Calibration and Validation, 2012) is one international organization which has been active in promoting calibration and validation efforts. The Working Group on Calibration and Validation of CEOS has formulated a general approach for calibration and validation of remote sensing products and has established a validation hierarchy based on different stages of extent of validation efforts (see the end of this text).

Historical perspective

At the beginning of the satellite remote sensing era (e.g., Nimbus-1 in 1964 and Landsat-1 in 1972), the calibration and validation activities were mostly limited to activities carried out directly by the space agencies. The current form of utilization of remote sensing data products started roughly with the launch of NASA’s Nimbus-7 satellite in 1978. The new data policy of this mission enabled engagement of wider science community in more rapid manner after the launch of the satellite (Goddard Space Flight Center, National Aeronautics and Space Administration 2004). This also contributed to the start of the community-wide pre-and postlaunch calibration and validation efforts of remote sensing data products (e.g., Austin, 1980; Hovis, 1982; Stowe, 1982; Bernstein and Chelton, 1985; Stowe et al., 1988). After this, most NASA Earth observation satellites have followed the similar data policy. However, other space agencies have highly varying policies regarding data dissemination, which directly affects the extent of the calibration and validation activity.

Currently, it is common that a launch of each new remote sensing instrument initiates the science community to seek opportunities to participate in the calibration and validation activities. Naturally, the increased number of instruments and accumulated experience on the calibration and validation of satellite data products is another main reason for the increasing activity in the calibration and validation front. However, the calibration and validation efforts of remote sensing products have maintained very similar features from the earlier days of remote sensing (e.g., compare (Hilland et al., 1985) with (O’Carroll et al., 2008)). At the same time, new instruments and new applications do require new methods for successful calibration and validation of remote sensing products.

Sensor products

Calibration and validation of the sensor products of a mission is the critical part in ensuring the usefulness of the mission data. The quality of the sensor products typically dominates the quality of the geophysical products. Each remote sensing instrument has an algorithm which is used to translate the raw instrument counts to the desired quantity. The complexity of the algorithm depends on the instrument implementation and the properties of the desired quantity. For example, retrieval of normalized radar cross section requires measurement geometry in addition to instrument parameters (Ulaby et al., 1982), whereas antenna temperature of radiometer is independent of the measurement geometry (Ulaby et al., 1981). In principle, the features of the algorithm dictate the requirements for the calibration effort (i.e., parameters to be adjusted), and the quantity itself determines the requirements for the validation effort (i.e., proper target representing the quantity).

Instrument calibration usually includes some sort of internal calibration sources (e.g., Xiong and Barnes, 2006; Brown et al., 2007). While these sources can be used to remove the effects of some instrument non-idealities, they do not provide reference for the full instrument measurement chain (e.g., Butler and Barnes, 1998). There are different approaches for external calibration: use of an onboard reference target (e.g., Yamaguchi et al., 1998; Twarog et al., 2006); measurement of celestial targets, such as moon or cosmic microwave background radiation (e.g., Sun et al., 2003; Jones et al., 2006); or establishment of a reference target on the ground. Dedicated efforts to improve the stability of the observations (e.g., Gopalan et al., 2009; Eymard et al., 2005) and studies to correct errors caused by the antenna of an instrument (e.g., Njoku, 1980; McKague et al., 2011) are also typical for calibration of remote sensing instrument. Intercalibration between the remote sensing instruments is an important aspect for extending a data record either in time, to lengthen the time series and/or increase the fidelity of the time series, or in space to increase coverage (e.g., Cavalieri et al., 2012; Xiong et al., 2008).

There are areas on the surface of the Earth which provide well-defined response for some types of remote sensing measurements. Therefore, the target on the ground may be a natural scene suitable for calibration purposes (e.g., rain forests for microwave scatterometers (Long and Skouson, 1996)) or it can also be a target built for this specific reason (e.g., corner reflector for synthetic aperture radar (Shimada et al., 2009)). Some of the natural targets can provide relative well-defined absolute reference value; others are more suitable for just tracking stability of the instrument. Examples of Earth scenes used as vicarious references for spaceborne remote sensing instrument calibration are Antarctica ice sheets (e.g., Macelloni et al., 2007, 2011), Amazon rain forests (e.g., Shimada, 2005), oceans (Ruf et al., 2006), deserts (e.g., Slater et al., 1987), and dry lake beds (e.g., Helder et al., 2010). Utility of man-made structures has also been demonstrated, such as a large asphalt field in Biggar et al., (2003).

Validation of sensor data products is usually done by using on-ground reference targets discussed above, since this represents the relevant measurement plane for the scientific utilization of the measurements. After the sensor product has been calibrated, it is compared against selected targets to establish the uncertainty of the product. This process may lead to further calibration too in which case the residual deviation from the targets becomes the result of the validation. Ideally, of course, the targets used for calibration and validation should be different. However, the use of the same targets is the reason why sometimes the nomenclature of sensor cal/val process refers only to calibration and does not include references to validation, even though it is clearly part of the process.

Geophysical products

The retrieval algorithms of geophysical parameters are highly varying in their approach to determine the value of the desired parameter. Regardless of the approach, however, each algorithm requires calibration in order to optimize the correctness of its output. In order for the calibration process to be successful, the structure and error contributions of the algorithm need to be known (e.g., Pulliainen et al., 1993; Keihm et al., 1995; Wentz, 1997; Njoku et al., 2003; Brando and Dekker, 2003). Several algorithms include forward models that require detailed calibration before application to the inverse processing (e.g., Wigneron et al., 2007). In order to accomplish the calibration of an algorithm field, measurements and additional remote sensing measurements are typically exploited to determine the parameter values of the algorithm (e.g., Kelly et al., 2003; Njoku et al., 2003).

The validation process of geophysical products requires knowledge of the true value of the geophysical parameter within the effective measurement area with uncertainty less than the required uncertainty of the product. The following subsection discusses the issues related to scaling the in situ truth measurement to the footprint scale in spatial domain. Even in the absence of the spatial scaling issues, the uncertainty of the actual in situ measurement must be less than the uncertainty requirement of the product (e.g., Emery et al., 2001; Bailey and Werdell, 2006; Henocq et al., 2010). The establishment of these validation sites depends naturally a great deal on the geophysical parameter: The general approach and requirements for wind speed (e.g., Dobson et al., 1987) or chlorophyll a (e.g., Ruiz-Verdu et al., 2008) measurement are completely different from snow water equivalent (e.g., Tedesco and Narvekar, 2010) or leaf area index (e.g., Garrigues et al., 2008) measurements, let alone atmospheric water vapor (e.g., Divakarla et al., 2006), or ozone (e.g., Froidevaux et al., 2008) measurements. The objective in each case is nevertheless the same, to find a representative measurement of the parameter so that it can be compared against the remotely sensed value. After appropriately matching up the remotely sensed product and the in situ measurement, the validation results are typically presented as, for example, root mean square error, correlation, and histograms (e.g., Hooker and McClain, 2000; Bourassa et al., 2003; Hilland et al., 1985).

Spatial scaling

Remote sensing measurements are based on the instrument recordings of interaction of electromagnetic waves with the target. The instruments have a defined sensing area or volume (i.e., footprint) depending on the antenna beam shape and interaction of the measurement signal with the sensed medium. When it comes to calibrating and validating the measurements, the independent reference measurements, in situ measurements in particular, typically do not have the same features as the remotely sensed signal and do not measure the exactly same domain as the footprint represents. The translation of the reference measurements to the remote sensing footprint is often referred to as spatial scaling, and it is a crucial part of the calibration and validation of remote sensing products. Sensor product calibration and validation efforts usually try to utilize homogeneous regions where scaling is not an issue in the same way as with calibration and validation of typical geophysical products.

The challenge of the spatial scaling depends typically on the relationship between the heterogeneity of the measured parameter and the size of the footprint. Scaling of even relative high-resolution (small size) footprint may be challenging for highly heterogeneous parameters (e.g., Liang et al., 2002). Some remote sensing instruments have very low resolution, but if the measured parameter changes slowly over large distances, scaling can be accomplished with relative few resources within the footprint (e.g., Le Vine et al., 2007). The most challenging cases include of course remote sensing measurements of highly heterogeneous parameters with large footprints (e.g., Jackson et al., 2010).

There are several techniques developed for scaling the value of geophysical parameters up to the footprints of remote sensing measurements. As an example, these techniques include aggregation of in situ measurements (e.g., Jackson et al., 2010), model-based techniques (e.g., Chen et al., 1999), timing of the acquisition so that the heterogeneity effect is minimized (e.g., Wang et al., 2008), and temporal stability approach which assumes that single point of the area represents the footprint average (e.g., the challenge then is to find the representative point (Vachaud et al., 1985; Grayson and Western, 1998)). An important aspect of the upscaling of the in situ measurements is estimation of the error associated with the upscaled value. Several techniques have been proposed and used for accomplishing this, for example: investigating the variance of the subscale measurements (e.g., Tian et al., 2002) and combination of several observation sources (e.g., Hilland et al., 1985; O’Carroll et al., 2008; Caires and Sterl, 2003; Miralles et al., 2010).

Coverage

Although it is generally accepted that the validation of the geophysical products should be done against in situ measurements, other references are being applied too. The reason for this is that in general, in situ measurements have limited coverage in space and in time, i.e., they do not cover the entire domain, which is often global (the main reason why remote sensing is applied in the first place), and it may not always be possible to make in situ measurements for long periods or with high frequency for a certain location (e.g., consider limitations of radiosondes, dropsondes, and buoys).

Depending on the size of the covered domain, the challenge for the calibration and validation effort is to find a strategy with which it can be claimed that the product is calibrated and validated over the entire domain. Therefore, other remote sensing sources (e.g., Corlett et al., 2006) and models (e.g., Caires and Sterl, 2003) have been used to complement the in situ measurements. It is also typical to divide the entire domain in sub-domains and then find validation sites which represent each sub-domain (e.g., Hilland et al., 1985) or to cover a diversity of conditions with set of sites which can be claimed to represent the conditions over the entire domain (e.g., Ceccato et al., 2002). But only rarely a validation is accepted without some strategy to reference the measurements back to verifiable in situ acquisitions.

Temporal context

Remote sensing products can be calibrated and validated over a short time period (a few years at most) or a long time period (at least a decade). Calibration and validation of individual missions is usually a short-term effort because the mission requirements are typically set without long-term requirement and, furthermore, the duration of a single mission is seldom long enough to qualify as a long term anyway. Therefore, long-term calibration and validation imply inter-mission effort to extend the calibrated and validated data record to over decade long time frame (e.g., Gallo et al., 2005). Typically high quality long-term data records are required for climate applications (e.g., Flanner et al., 2011; Behrenfield et al., 2006), but also other monitoring and tracking applications require long-term remote sensing observations (e.g., Lepers et al., 2005). The type, quality, and length of remote sensing data records have limited the use of remote sensing data for long-term applications until more recently.

The requirements of calibration and validation process are affected significantly by the climatic temporal context of multiple decades combined with usually very high requirements on the stability. Therefore, a combination of both retrieval intercomparisons and in situ measurements is often necessary to validate the long-term record (e.g., Takala et al., 2009). For example, the importance of sea surface temperature (SST) record for climate studies was understood long time ago (e.g., Harries et al., 1983) and ever since significant effort has been made to establish a well-calibrated long-term SST record (e.g., Stuart-Menteth et al., 2003).

Prelaunch calibration and validation

It is common that remote sensing missions include calibration and validation activities for both sensor and geophysical retrieval algorithms in the prelaunch phase. Essentially, the objective of these activities is to increase the expectation of the mission success to the level that launching a satellite seems worthwhile. Missions set requirements for the instrument performance based on the intended use of the measurements. In the prelaunch phase, this performance is verified through measurements, analysis, and simulations. The calibration strategy of an instrument may require measurement of certain calibration parameters on the ground, and sometimes, when possible, the instrument is entirely calibrated on the ground already. These activities are often referred to as prelaunch calibration activities. However, it should be emphasized that the eventually applicable calibration is almost always conducted on the orbit.

The development of geophysical retrieval algorithms starts before mission definition in research activities which try to identify potential remote sensing measurements concepts. The prelaunch calibration and validation activities include similar components as retrieval algorithm research such as field campaigns and simulations. However, these activities are driven by particular mission characteristics such as exact observation configuration including measurement frequency, footprint, coverage, and instrument performance figures. The prelaunch efforts can only approximate the actual measurements of the mission, and the actual calibration and validation of the retrieval algorithms and products takes place only after the launch of the mission.

Validation stages

CEOS (CEOS Working Group on Calibration and Validation, 2012) has put forward a four-stage validation hierarchy which has been adopted by many data providers. The validation stage increases with increasing product maturity and extensiveness of the validation effort.

  • Stage 1 validation: Product accuracy is assessed from a small (typically <30) set of locations and time periods by comparison with in situ or other suitable reference data.

  • Stage 2 validation: Product accuracy is estimated over a significant set of locations and time periods by comparison with reference in situ or other suitable reference data. Spatial and temporal consistency of the product and with similar products has been evaluated over globally representative locations and time periods. Results are published in the peer-reviewed literature.

  • Stage 3 validation: Uncertainties in the product and its associated structure are well quantified from comparison with reference in situ or other suitable reference data. Uncertainties are characterized in a statistically robust way over multiple locations and time periods representing global conditions. Spatial and temporal consistency of the product and with similar products has been evaluated over globally representative locations and periods. Results are published in the peer-reviewed literature.

  • Stage 4 validation: Validation results for stage 3 are systematically updated when new product versions are released and as the time series expands.

Summary

All remote sensing products require calibration and validation, and it is an essential part of the process of making remote sensing products to meet the requirements of scientific utilization. The main challenges in the calibration and validation of almost any data product are how to make corresponding and representative reference measurements and how to extend the validation over the entire measurement domain. These challenges are overcome by careful design of the reference sites and their in situ measurements, which vary greatly depending on the geophysical parameter and on the footprint of the remote sensing product, and strategizing the utilization of diversity of validation sites with augmentation by other remote sensing products and models. Intersensor and inter-product calibration and validation are conducted to increase the length and fidelity of the time series or the coverage of the measurement. This is essential for utilization of remote sensing data for climate change studies.