1 Introduction

There are many examples of geophysical processes for which the physical mechanisms driving changes are only partly understood (e.g., Cooke 1991; Budnitz et al. 1998; Zickfeld et al. 2007, Zickfeld et al. 2010). As a result, projecting future changes cannot solely rely on deterministic models. Classic examples are earthquakes and volcanism, and there is ample experience with the use of expert assessments in those fields. Sea-level rise (SLR) is another such field. Due to its integrated character involving atmospheric and oceanic processes in interaction with glaciers and ice sheets, the SLR problem is not amenable yet to a fully deterministic approach. Over the last years awareness increased that changes in large ice sheets are not only driven by changes in atmospheric conditions, which can be determined reasonably well with climate models, but also partly by rapid changes in ice dynamics, which are far less understood (Church et al. 2013). There is increasing evidence that the interaction between oceans and ice also influences the basal melt rates below ice shelves and the production of ice bergs (Vaughan and Arthern 2007; Vaughan 2008; Holland et al. 2008; Jacobs et al. 2011), and that disappearance of those shelves leads to an increased flux of grounded ice towards the ocean, thereby causing an additional source of sea-level rise in future (Joughin and Alley 2011; Barrand et al. 2013). During the process of writing the fourth AR4 report in the earlier 2000s, this has led to SLR projections excluding the dynamical contribution of major ice sheets (Meehl et al. 2007). This was in the time that many observational studies appeared that found a rapid retreat of ice tongues in particular South-East Greenland (Howat et al. 2007; Rignot et al. 2011). Since then many more observational studies appeared that lend further support that changes are ongoing, while at the same time major progress has been reported in the development of deterministic models, with improved descriptions of grounding line mechanics (Schoof 2012), and the physics of ice-berg calving (Nick et al. 2010; Nick et al. 2013). All these efforts have culminated in the assessment of the dynamical contribution of ice sheets to sea level change in the fifth IPCC report (Church et al. 2013).

Nevertheless it is realised that part of the contribution of particularly the ice dynamics to SLR is not yet to be quantified in a fully deterministic sense, allowing ample room for other strategies (e.g. Little et al. 2013). One such alternative is expert judgement elicitation, which aims to make an inventory of the spread in opinions amongst a group of experts. For the SLR problem the need for such approaches has been recognized for example by the Inter Academy Council (2010) in its review of the fourth IPCC report. One of the pivotal studies in this respect is the paper by Bamber and Aspinall (2013), hereinafter BA13, who used an expert judgment assessment to estimate SLR over the 21st century with an emphasis on the contributions from the major ice sheets of Greenland (GrIS) and Antarctica (AIS). They distinguished between East and West Antarctica (EAIS and WAIS), because of the different driving mechanisms underlying their possible changes (Davis et al. 2005; Vaughan and Arthern 2007). Horton et al. (2014), hereinafter H14, carried out a similar study, but that study was based on a considerably larger group of experts and different questions with less focus on ice sheet dynamics and more directly on SLR for different scenarios of concentration pathways. Thus, H14 included also components, like the ocean thermal expansion, which are amenable to a more deterministic approach using atmosphere-ocean coupled climate models. Moreover, whereas BA13 invited only a small number of leading scientists, H14 based their group on a literature survey, inviting any author who published more than 6 peer-reviewed papers on sea level. A second major difference lies in the weighting of the opinions of the experts, which is simply equally-weighted in H14 (“all experts are equal”). In contrast, BA13 used a set of core-knowledge questions to test the expert’s expertise, resulting in special performance-weighted expert contributions (“some experts are more expert than others”). Both approaches may have their merits in specific situations, but more importantly, both obviously a priori affect the outcome of the assessment. This brings us to the aim of this paper.

Formally an expert elicitation does not expand our scientific knowledge of a given problem. It is still very relevant as it reveals and synthesises general commonalities and the degree of consensus in a way that is easily accessible to policy makers and decision makers. In general, however, the outcome of such elicitation studies will be sensitive to various underlying assumptions: to the selection of the level of expertise, to the way in which the opinions are combined and to further methodological effects. It is crucial for researchers to be aware of such sensitivities and to point out how they influence the final results. This becomes even more important as those final numbers are the ones most often used by policy makers and in public debate.

In the present paper we address some of the above-mentioned effects. We will not further question or discuss the motivation for using expert opinions to assess possible sea-level rise, but rather consider in some detail how the data of a study as BA13 can also be interpreted and how methodological choices in the data-processing impact the final result. Based on our findings we then explore an alternative approach to interpret the BA13 data and apply it also to the H14 data.

2 Methodology

In this section we describe how the BA13 results for 21st-century integrated SLR from Greenland and Antarctica are influenced by four methodological choices, particularly at the high end.

2.1 Determining pooled estimates

The starting point for the analysis presented in BA13 was that 26 sea-level experts were asked several questions on future sea-level rise and global warming. Of these 26 experts only 13 responded. Their answers consisted of median (P50, MED), low-end (P05, LO) and high-end (P95, HI) estimates for contributions from the three major ice sheets (GrIS, WAIS and EAIS) to the global rate of SLR in the year 2100. The answers of the experts were weighted using a specialised weighting technique, which involved the self-estimated level of expertise and confidence of the respondents (Cooke 1991).

Table 1 (left column) lists the performance-weighted (hereinafter PerfWts) results of BA13. One can notice that while GrIS has the highest median (P50), its P95 value is the lowest. In particular WAIS has a very high upper bound, but also the difference between the median (P50) and high (P95) values of EAIS is considerable. Note that by employing a (weighted) averaging operation, one essentially discards any information on the expert consensus.

Table 1 Left column: Low-end (LO), median (MED) and high-end (HI) estimates for end-of-century rates of SLR [mm/yr] for the Greenland ice sheet (GrIS) and West- and East-Antarctic ice sheet (WAIS and EAIS), as taken from the “PerfWts” group of BA13.

2.2 Fitting a log-normal distribution

Using the information in Table 1 distributions can be fitted through the LO, MED and HI values of each ice sheet. This brings us to the second choice. What type of distribution best fits the data? (e.g., normal, uniform, exponential) BA13 have chosen to use log-normal distributions. This choice seems not unreasonable given the values of Table 1. For each ice sheet, a normal distribution is fitted through the natural logarithm of the numbers of the respective row of Table 1. However, as negative contributions are not principally excluded, an offset τ needs to be introduced. We write:

$$ Y = \tau + e^{X}, $$
(1)

with \(X=\mathcal N(\mu ,\sigma )\) a normal distribution with mean μ and standard deviation σ. For a given value of τ a simple method is to fit a straight line through the quantile-quantile plot associated with \(\ln (Y-\tau )\) and the same quantiles of a normal distribution. This could be done for each ice sheet. But how should one choose τ? For example, should τ be specified differently for each ice-sheet? The more fundamental and underlying problem is of course that only three quantiles are available to estimate the entire distribution (which usually involves a number of free parameters). We argue that if τ is not fixed a priori, it should be considered as a free parameter. As we then have μ, σ and τ to fit the three quantile values of Y, an exact solution can be found. This solution is shown as the black lines in Fig. 1.

Fig. 1
figure 1

Rate of SLR [mm/yr] in 2100 for assumed log-normal distribution shape and various τ (thick black line denotes the exact solution). Vertical axis denote the quantiles (e.g, 0.5 equals the median, P50). Red circles indicate the PerfWts data of Table 1. Small x symbols show raw data, the big X the median. Horizontal boxes indicate the 25–75 % (dark grey) and 5–95 % (light grey) ranges of the experts answers

What does τ correspond to in physical terms? It is the hypothetical lowest quantile (P00), the minimum value of the distribution. Importantly, while τ has been shown crucial to fix the distribution, it has not been obtained from experts answers directly. More paradoxically, while τ appears to be fixing only the minimum of the distribution, it has the largest quantitative consequences at the high quantiles because of the exponential nature of the log-normal distribution. This is illustrated in Fig. 1, where the end of century tendencies are shown for different arbitrarily chosen values of τ. Especially WAIS and EAIS are sensitive to the choice of τ. GrIS is less sensitive.

The grey boxes denote the raw 25–75 % and 5–95 % ranges of the data. Clearly, the PerfWts results (red open circles), especially those at the high-end for WAIS and EAIS, are found in the far right tail of the raw data. This means that from a purely statistical point of view these values are strongly influenced by a few outliers. This becomes very clear by comparing the difference with respect to the median values (big X symbols). Nevertheless we continue with the PerfWts results, the log-normal approach, and the exact τ. It is unclear what τ-values BA13 have used.

2.3 Rate of sea-level rise from the ice sheets in 2100

Given the three distributions for the ice-sheet contributions to the SLR rate in 2100, we need to combine them to obtain a distribution for the total ice-sheet related SLR rate. A sampling method is used because a sum of log-normal distributions is not necessarily log-normal itself. The ice-sheet responses are expected to be correlated, which needs to be accounted for. Here we use the same correlations as used in BA13: ρ(G r I S,W A I S)=0.7, ρ(G r I S,E A I S)=−0.2 and ρ(W A I S,E A I S)=−0.2, where we assumed that these values refer to the correlations of the \(\ln (Y_{i}-\tau _{i})\) (i.e., normally-distributed) variables, rather than to the log-normally distributed SLR values themselves. Using the techniques outlined in Appendix A (suppl. mat.), we obtain three random but correlated series \(X_{i}=\ln (Y_{i}-\tau _{i})\) (i=1,2,3) of arbitrary length (we take N=108), drawn from the end-of-century pdfs of the three ice sheets. The total ice-sheet related rate of SLR in 2100 is then given by

$$ Z = \sum\limits_{i} Y_{i} = \sum\limits_{i} \left(\tau_{i} + e^{X_{i}}\right). $$
(2)

Percentiles of interest are estimated directly from Z.

The SLR rate distributions, including the MED (P50), LO (P05) and HI (P95) estimates are shown in Fig. 2 (top). While the pdf for GrIS is rather smooth, those of the two Antarctic ice sheets have a rather high peak with long right-sided tails. The obtained values for MED and LO are similar to BA13, but our high-end (P95) value is considerably larger. Based on the same data the “exact- τ” approach therefore leads to considerably larger high-end extremes. In deriving the results, the correlations between the different ice sheet responses do matter. As the true inter ice-sheet correlations are not known exactly, they increase the uncertainty. For example, by setting all correlations ρ i j =+1 the high-end estimate increases by more than 5 mm/yr larger, while setting ρ i j =−1 leads to similar reductions of the SLR rate.

2.4 Integrated sea-level change from the ice sheets over 2010–2100

One final assumption is required to convert the end-of-century SLR rates to a distribution of integrated SLR over the period 2010–2100. This assumption is about the time path during the 21st century. As in BA13 we assume a linear increase in the rate of SLR with time, from the observed values in 2010 towards their estimated values in 2100. For the observed value we take 0.9 mm/yr in 2010 (BA13). This value is partitioned as 0.0, 0.6, and 0.3 for EAIS, WAIS and GrIS, respectively. This choice influences the contributions from the individual ice sheets, but not their total sum (our main quantity of interest). The total SLR is a simple cumulative sum of the tendencies over time.

Figure 2 (bottom) shows the distributions for integrated SLR resulting from the ice sheets over the period 2010–2100. A median of +29 cm is found, as in BA13. Also the low-end value (P05) is similar (+10 cm). However, the exact- τ method gives rise to a high-end (P95) value of +117 cm, more than 30 cm higher than the +84 cm of BA13. As the only difference between our method and that of BA13 is the choice of τ, we conclude that τ strongly influences the high end. For example, if τ=−3 is taken (a rather conservative estimate for all three ice sheets) the fitted rates are still within the range of the experts answers (except for the lower value of WAIS, see Fig. 1). In that situation the high-end value reduces to +70 cm (suppl. material Fig. S1), with the distributions of the West and East Antarctic ice sheets becoming less peaked. Note that the perhaps most intuitive setting τ=0 is incompatible with the log-normal assumption for EAIS.

Fig. 2
figure 2

(top) Rate of SLR in 2100 [mm/yr] resulting from the ice sheets. AIS denotes the sum of WAIS and EAIS. The median (P50), low-end (P05) and high-end (P95) values of the total rate of the three ice sheets together are indicated by vertical lines. (bottom) As top panel but for the total SLR [cm] over the period 2010–2100. The median (P50), low-end (P05) and high-end (P95) values of the total SLR are indicated by vertical lines. 2100 as in BA13. The initial SLR (0.9 mm/yr) is partitioned as 0.0, 0.6 and 0.3 between EAIS, WAIS and GrIS

Finally, note that the assumption of a linear increase of the SLR rates to their 2100 values is widely used in literature, but obviously does influence the final results. If SLR rates follow a non-linear time path, increasing slowly initially and more rapidly later, this reduces the total integrated SLR over the 21st century. Investigating this uncertainty is beyond the scope of the present paper.

3 Are there alternatives?

One of the complicating factors in the BA13 paper is that it tries to answer a main question which is different from the ones being asked to the experts. The experts were questioned about the LO, MED and HI values for the individual ice sheet contributions to the rate of sea-level rise in 2100. However, the central aim of BA13 is to seek an answer to a different (but related) question, namely to obtain an estimate for the total integrated sea-level change from all ice sheets together over the period 2010–2100. To answer this latter question one is forced to first reconstruct the individual distributions of end-of-century rates, then to combine them, and finally to compute the integrated SLR. In the previous section we showed that during the process from individual ice-sheet rates to the estimate for high-end integrated SLR, a number of crucial (non-expert based) assumptions has to be made, which may have a large influence on the final outcome.

The question that comes to mind is whether the data can be used at all to reconstruct estimates for integrated SLR during the 21st century, without introducing subjective aspects in the subsequent analysis. If one thing becomes clear from examining the raw data (Fig. 1), it is that there is little consensus between experts on the possible upper-bound values of rates of sea-level rise in 2100, especially due to the WAIS, but also with respect to EAIS. This is a worrying message in itself. If the experts are widely uncertain, this effectively means a “we do not know” statement is not far from the truth. This lack of consensus receives little attention in BA13 because it is in essence eliminated by the PerfWts method (see Section 2.1). By using special performance weighted averages of the experts answers as a starting point, it is indirectly implied that “some experts are more expert than others”.

3.1 Including a measure of consensus

However, we argue that more robust statements on the absolute sea-level changes can be inferred from the expert-opinion data, but that these should incorporate some measure of (the lack of) consensus. To derive a distribution of total SLR changes over 2010–2100 from end-of-century individual ice sheet SLR rates, one cannot circumvent making a number of choices. However, in making these in the end subjective decisions (i.e., not tested in the expert elicitation), one could try to minimise their influence on the end result. The proposed procedure has a number of steps:

  1. 1.

    Consensus distribution: A level of consensus can only be included if one starts with all data, rather than with for example the PerfWts averages. This implies that we start with the premise that “all experts are equal”. We determine a “consensus” distribution for each ice sheet and for each quantile (LO, MED and HI) by fitting a polynomial of degree p through the quantile-quantile plot: \(y\sim {\sum }_{j=0}^{p} a_{j}x^{j}+\epsilon \) with 𝜖 noise, y the quantiles of the raw data, and x the quantiles of a normal distribution. We take p=3, to account for the skewness in the expert opinions. The consensus distributions for the three ice sheets are shown in the top three rows of Fig. S2 (suppl. mat.).

  2. 2.

    Sample: Draw a random sample from the consensus distributions. This gives for each ice sheet a possible LO, MED and HI value for the SLR rate in 2100.

  3. 3.

    Rates and Integrated Changes: Fit a distribution through the three quantiles of each ice sheet. This could be done using a log-normal or another distribution. Because of the sensitivity related to the offset τ in the log-normal distribution, we follow the approach as above using p=1 (as we have only three quantiles). Sample the SLR rate distributions (N=108), using the correlations of BA13. Determine percentiles of interest for the total SLR rate and repeat the procedure of the main text to obtain the integrated SLR changes.

  4. 4.

    Repeat step 2–3: Repeat steps 2–3 a large number of times (M=104), using different drawings from the consensus distribution. This gives then for each percentile of interest a consensus distribution. The bottom row in Fig. S2 shows the result for the LO, MED and HI estimates of total rate of SLR in 2100.

3.2 Results

Figure 3 shows the distributions of SLR rate in 2100 for each of the three ice sheets (and the total) obtained using the new method. Inferred consensus ranges are included as grey bands.

Fig. 3
figure 3

(top two rows) Distributions for the end of century contributions to the rate of SLR (mm/yr) from the large ice sheets. Red circles denote the PerfWts values from BA13, small x-marks the expert-data and the big X the median of the expert values. The grey bands denote estimated bands of consensus on the entire distribution. The red + symbol in the bottom right panel indicates the log-normal distribution estimate. (bottom row) Total integrated sea-level change (in cm) over the period 2010–2100. The additional green shaded band indicates a 90 % confidence interval for the line of medians to variations of the underlying statistical model. White dashed line denote line-of-medians after crude outlier removal. See text for details

The PerfWts values of Table 1 are indicated (red circles), as well as the raw median (big X) and the 5–95 % expert-consensus range (horizontal black lines). Eye catching is that for both Antarctic ice sheets the high-end PerfWts estimates are not anywhere near the medians, and even outside the inter-quartile consensus range (dark-grey). Indeed, they are nearly at the right end of the light-grey box (5-95 % consensus). The exact- τ log-normal approach yields even larger rates (red + in Fig. 3d). While we do not want to question the validity of the PerfWts method here, those results are clearly influenced by the answers of a few experts with very high estimates that also happened to have obtained a relatively high weight. The new approach on the other hand gives equal weight to all experts.

The total integrated SLR change over the period 2010–2100 is shown as the bottom panel of Fig. 3 and constitutes our most important result. The full black line denotes the median for each quantile (referred to as the ‘line of medians’). The median of the upper P95 percentile gives +53 cm. The grey bands denote levels of consensus obtained from the data. Consensus is relatively high (i.e., narrow bands) at low percentiles. However as we go to the high-end estimates the lack of consensus increases rapidly. At P95, the inter-quartile range (dark-grey) reaches from +38 to +77 cm. Both the high-end estimate from BA13 (+84 cm, red circle) as well as the value obtained by using the exact- τ log-normal approach (+117 cm, + symbol) fall systematically outside the interquartile consensus band.

The ‘line of medians’ is reasonably robust against outliers in the data and to choices in the underlying distributions. Because the true distributions are not known, there is no unambiguous test for outliers. As a very crude test we have simply removed the absolute minimum and maximum values from the data. The effect on the line of medians is shown as the dashed white line (very close and largely overlapping with the full black line). The median of upper P95 shifts to +52 cm (was +53 cm). Further robustness to the underlying distributions is tested by varying the degrees of the polynomials used (from p=1 to 5 for the consensus fit, and p=1,2 for fitting the distribution). The green shading shows a 90 % range for the line of medians. It stays within the estimated consensus bands, suggesting that the spread amongst the expert answers is larger than that caused by the subsequent analysis methodology. This is desirable as it implies that the uncertainty added by post priori choices is relatively small.

4 Applying the consensus approach to H14

The technique introduced in the previous section can be applied to other data sets. Here we use it to extend H14. As already described in the Introduction, in H14 a large group of experts were questioned on the estimated ranges (17-83 % and 5–95 %) of total integrated SLR in 2100 and 2300 for two different forcing scenarios. One scenario (denoted ‘B’) considers an evolution following Representative Concentration Pathway RCP2.6 (Meinshausen et al. 2011), while the more extreme ‘R’ scenario follows RCP8.5 with persistent global warming over the next centuries. Our methodology allows one to obtain estimates for the entire distribution instead of only the two ranges. Note that the values found in H14 are generally higher than in BA13, because they include contributions from more processes than just the three ice sheets (i.e., glacier melting, ocean expansion, land-water storage changes). Because the questions answered by the experts were already about the integrated SLR, some of the intermediate steps that were required in BA13, are not needed in this case (i.e., deriving distributions for each ice-sheet, aggregating). This simplifies the subsequent interpretation and limits the possible effects of methodological choices. One can proceed directly to determining the total distributions. Results are shown in Fig. 4.

Fig. 4
figure 4

Total sea-level change [cm] in 2100 (top) and 2300 (bottom) for the B (left) and R (right) scenarios of H14. Note the different scale on the horizontal axes. Grey bands indicate consensus bands. The thick and thin horizontal red lines denote the 25–75 % and 5–95 % range of the raw H14 data, with outlying data shown as x-symbols and the median as a big + symbol. Green shading indicates a 90 % confidence interval for the line of medians to variations of the underlying statistical model. See text for details

By construction the approach captures the ranges of the raw data (red lines) quite well. Similar to the results shown in Fig. 3 (bottom), the consensus strongly decreases as one goes to higher quantiles, especially for the R-scenario. The lines of medians (black lines) are reasonably robust to outliers, because the amount of data is considerably larger than in BA13. Robustness against the choice of the shape of the distribution is also tested. In Fig. 4 we used a polynomial of degree p=5 to fit the consensus distributions, and p=3 to fit the SLR distribution. The green shading in Fig. 4 indicates a 90 % confidence band of the line of medians to variations in the underlying distributions (from p=1 to 5 for the consensus fit, and p=1 to 3 for fitting SLR). The green shading stays within the dark shading implying that the lack of consensus is larger than the uncertainty introduced by the methods. However, even in this case, where post-processing of the data requiring only a few steps, additional uncertainty is unavoidably introduced by the methodology.

5 Conclusion and discussion

In this paper we have reexamined two recent papers (BA13 and H14) on sea-level rise (SLR) resulting from changes of the three largest ice sheets on Earth. In the first, BA13 use expert judgment elicitation to arrive at a P50 median estimate of +29 cm SLR over the period 2010–2100, and an high-end (P95) estimate of +84 cm. We have shown that these estimates are sensitive to choices in analysis methodology (Section 2). Especially the high-end estimate is rather sensitive and we show that values between +70 and +120 cm can be regarded as consistent with the answers of the experts.

One aspect that is very clear in the expert opinion data, is their lack of consensus, especially regarding the high-end estimates. We argue that a representative interpretation of expert opinion data should aim to incorporate a level of consensus. The approach taken in BA13 was not suitable for this because of the use of a special weighting technique. In this paper we present an alternative analysis of the same data, which is potentially more robust to outliers. This approach integrally includes a level of consensus and thereby retains the strong decrease in consensus as one moves to the high-end SLR estimates. Using this alternative approach, we obtain a P50 median expected value of +35 cm (+25–51 cm interquartile consensus range) and a P95 value of +53 cm (+38–77 cm), thus being significantly lower than the +84 cm of BA13. We have subsequently applied our method to data of another recent study on sea-level rise (H14). In that study, expert elicitation was also used, but the setup was different. We were able to recalculate and extend the results of H14 (Fig. 4). Because H14 considered the total integrated SLR change resulting from all contributing processes (including ocean thermal expansion) and not only from the ice sheets it is difficult to compare H14 and BA13 quantitatively.

The results in this paper may provide some guidelines for analysing expert-judgment elicitations. The first and most important step is to make sure that one stays as close as possible to the questions originally asked to the experts. While this may seem obvious, it will limit the number of subsequent analysis steps to be made, and therefore the possible (cumulative) effects thereof on the final result. On the other hand, if multiple steps or choices are required to derive the final result and if these are not unambiguously agreed on in literature, the possibility arises that the results are influenced substantially by the researcher (see Section 2). Making a subjective choice may not be a problem, but the researcher should be aware of them, and discuss robustness of the final result to possible alternatives. This is one of the reasons why the H14 results are more easily reproduced and extended than those of BA13. Finally, an important message in this paper is that despite all efforts one should be quite careful in considering high-end SLR estimates (such as P95) as being by any means well-constrained and well-determined. To the contrary, such numbers are highly uncertain, which is reflected in the wide consensus ranges. Unavoidably they are also influenced by the analysis methodology. While this may well be known to the research community and experts in the field, it may be more difficult to convey this message to the general public. It will remain a challenge to transparently communicate to governments and general public the ranges and types of uncertainties on future sea-level rise projections.