Trend Analysis

The following is by Dennis Shea (NCAR)

The detection, estimation and prediction of trends and associated statistical and physical significance are important aspects  of climate research. Given a time series of (say) temperatures, the trend is the rate at which temperature changes over a time period. The trend may be linear or non-linear. However, generally, it is synonymous with the linear slope of the line fit to the time series.  Simple linear regression  is most commonly used to estimate the linear trend (slope) and  statistical significance  (via a   Student-t test ). The  null hypothesis  is no trend (ie, an unchanging climate). The non-parametric ( ie ., distribution free)  Mann-Kendall  (M-K) test can also used to assess monotonic trend (linear or non-linear) significance. It is much less sensitive to outliers and skewed distributions. (Note: if the distribution of the deviations from the trend line is approximatly normally distributed, the M-K will return essentially the same result as simple linear regression.) The M-K test is often combined with the  Theil-Sen   robust estimate of linear trend. Whatever test is used, the user should understand the underlying assumptions of both the technique used to generate the estimates of trend and the statistical methods used for testing. For example, the Student t-test assumes the residuals have zero mean and constant variance. Further, a time series of N values may have fewer than N independent values due to serial correlation or seasonal effects. The estimate of the number of independent values is sometimes called the  equivalent sample size . There are methodologies to estimate the number of independent values. It is this value that should be used in assessing the statistical significance in the (say) Student t-test.  Alternatively, the series may be pre-whitened or deseasonalized  prior  to applying the regression or M-K test statistical tests.

There are numerous caveats that should be kept in mind when analyzing trend. Some of these include: (1) Long term, observationally based estimates are subject to differing sampling networks. Coarser sampling is likely to result in larger uncertainties. Variables which have a large spatial autocorrelation (eg, temperature, sea level pressure) may have smaller sampling errors than (say) precipitation which generally has lower spatial correlation; (2) The climate system within which the observations are made is not stationary; (3) Station, ship and satellite observations are subject to assorted errors. These could be random, systematic and external such as changing instruments, observation times or observational environments. Much work has been done on creating time series that takes into account these factors; (4) While reanalysis projects provide unchanging data assimilation and model frameworks, the observational mix changes over time. That may introduce discontinuities in the time series that may cause a trend to be estimated significant when in fact it is an artifact of the discontinuities; (5) Even a long series of random numbers may have segments with short term trends. For example, the well known surface temperature record from the Climate Research Unit which spans 1850-present, shows an undeniable long-term warming trend. However, there are short term negative trends of 10-15 years embedded within this series. Also, the rate of warming changes depending on the starting date used in that time series; (6) As noted above, a series on N observations does not necessarily mean these observations are independent. Often, there is some temporal correlation. This should be taken into account for example when computing the degrees of freedom of the t-test.

Cite this page

National Center for Atmospheric Research Staff (Eds). Last modified 05 Sep 2014.  "The Climate Data Guide: Trend Analysis."  Retrieved from https://climatedataguide.ucar.edu/climate-data-tools-and-analysis/trend-analysis.

Acknowledgement of any material taken from this page is appreciated. On behalf of experts who have contributed data, advice, and/or figures, please cite their work as well.

Advertisement

Advertisement

Bringing physical reasoning into statistical practice in climate-change science

  • Open access
  • Published: 01 November 2021
  • Volume 169 , article number  2 , ( 2021 )

Cite this article

You have full access to this open access article

null hypothesis on climate change

  • Theodore G. Shepherd   ORCID: orcid.org/0000-0002-6631-9968 1  

9796 Accesses

23 Citations

72 Altmetric

Explore all metrics

The treatment of uncertainty in climate-change science is dominated by the far-reaching influence of the ‘frequentist’ tradition in statistics, which interprets uncertainty in terms of sampling statistics and emphasizes p -values and statistical significance. This is the normative standard in the journals where most climate-change science is published. Yet a sampling distribution is not always meaningful (there is only one planet Earth). Moreover, scientific statements about climate change are hypotheses, and the frequentist tradition has no way of expressing the uncertainty of a hypothesis. As a result, in climate-change science, there is generally a disconnect between physical reasoning and statistical practice. This paper explores how the frequentist statistical methods used in climate-change science can be embedded within the more general framework of probability theory, which is based on very simple logical principles. In this way, the physical reasoning represented in scientific hypotheses, which underpins climate-change science, can be brought into statistical practice in a transparent and logically rigorous way. The principles are illustrated through three examples of controversial scientific topics: the alleged global warming hiatus, Arctic-midlatitude linkages, and extreme event attribution. These examples show how the principles can be applied, in order to develop better scientific practice.

“La théorie des probabilités n’est que le bon sens reduit au calcul.” (Pierre-Simon Laplace, Essai Philosophiques sur les Probabilités , 1819).

“It is sometimes considered a paradox that the answer depends not only on the observations but on the question; it should be a platitude.” (Harold Jeffreys, Theory of Probability , 1st edition, 1939).

Similar content being viewed by others

null hypothesis on climate change

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

null hypothesis on climate change

The potential of working hypotheses for deductive exploratory research

null hypothesis on climate change

Heat waves: a hot topic in climate change research

Avoid common mistakes on your manuscript.

1 Introduction

As climate change becomes increasingly evident, not only in global indicators but at the local scale and in extreme events, the challenge of developing climate information for decision-making becomes more urgent. It is taken as given that such information should be based on sound science. However, it is far from obvious what that means. As with many other natural sciences, controlled experiments on the real climate system cannot be performed, and climate change is by definition statistically non-stationary. Together this means that scientific hypotheses cannot be tested using traditional scientific methods such as repeated experimentation. (Experiments can be performed on climate simulation models, but the models differ from the real world in important respects, and often disagree with each other.) On the global scale, it is nevertheless possible to make scientific statements with high confidence, and to speak of what can be considered to be effectively climate change ‘facts’ (e.g. the anthropogenic greenhouse effect, the need to go to net-zero greenhouse gas emissions in order to stabilize climate), which are sufficient to justify action on mitigation. This is because the process of spatial aggregation tends to reduce the relevant physical principles to energetic and thermodynamic ones which are anchored in fundamental theory (Shepherd 2019 ), and to beat down much of the climate noise so that the signals of change emerge clearly in the observed record (Sippel et al. 2020 ).

Yet for many aspects of climate-change science, there is no consensus on what constitutes fundamental theory, the signals of change are not unambiguously evident in the observed record, and climate models provide conflicting results. This situation occurs with so-called climate ‘tipping points’, due to uncertainties in particular climate feedbacks (Lenton et al. 2008 ). It also occurs on the space and time scales relevant for climate adaptation, where atmospheric circulation strongly determines climatic conditions, yet there is very little confidence in its response to climate change (Shepherd 2014 ). These uncertainties compound in the adaptation domain, where human and natural systems play a key role (Wilby and Dessai 2010 ).

This situation of ambiguous possible outcomes is illustrated by Fig.  1 , which shows the precipitation response to climate change across the CMIP5 climate models as presented by IPCC ( 2013 ). Stippling indicates where the multi-model mean change is large compared to internal variability, and 90% of the models agree on the sign of change, whilst hatching indicates where the multi-model mean change is small compared to internal variability. These metrics embody the concept of ‘statistical significance’, which underpins the usual approach to uncertainty in climate-change science. Yet they are seen to be agnostic over many populated land regions, including most of the Global South, which are neither hatched nor stippled. Zappa et al. ( 2021 ) have shown that in those regions, the models suggest precipitation responses that are potentially large but are non-robust (i.e. uncertain in sign), and that the same situation holds with the CMIP6 models.

figure 1

Projected changes in precipitation (in %) over the twenty-first century from the CMIP5 models under a high climate forcing scenario (RCP8.5). Stippling indicates where the multi-model mean change is large compared with natural internal variability in 20-year means (greater than two standard deviations) and where at least 90% of models agree on the sign of change. Hatching indicates where the multi-model mean change is small compared with internal variability (less than one standard deviation). From the Summary for Policymakers of IPCC ( 2013 )

It follows that if climate-change science is to be informative for decision-making, it must be able to adequately reflect the considerable uncertainty that can exist in the information. The traditional language of science is usually framed in terms of findings, which for climate change might be explanations of past behaviour (attribution), or predictions of future behaviour (known as ‘projections’ when made conditional on the future climate forcing). To give just one example, the title of Sippel et al. ( 2020 ) is “Climate change now detectable from any single day of weather at global scale”. In the peer-reviewed literature, these findings are generally presented in a definitive, unconditional manner (op. cit., where it is indeed justified); some journals even insist that the titles of their articles are worded that way. Caveats on the findings are invariably provided, but it is not straightforward to convert those to levels of confidence in the finding. When made quantitative, the uncertainties are represented through some kind of error bar, usually around a best estimate. As Stirling ( 2010 ) has argued, such ‘singular, definitive’ representations of knowledge are inappropriate and potentially highly misleading when the state of knowledge is better described as ‘plural, conditional’, as for mean precipitation changes in the unmarked regions in Fig.  1 . There are many methods available for dealing with ‘plural, conditional’ knowledge within a decision framework (Weaver et al. 2013 ; Rosner et al. 2014 ), so there is certainly no requirement for climate information to be expressed in a ‘singular, definitive’ manner in order to be useable.

There are many reasons for this situation, some of which are non-scientific (e.g. the reward system, both for authors and for journals). My goal here is to focus on one of the scientific reasons, namely the statistical practice that characterizes most climate-change science, which is still dominated by procedures that originate from the so-called ‘frequentist’ tradition in statistics. This tradition interprets uncertainty in terms of sampling statistics of a hypothetical population, and places a strong emphasis on p -values and statistical significance. It does not provide a language for expressing the probability of a hypothesis being true, nor does it provide a home for the concept of causality. Yet scientific reasoning is about hypotheses (including the ‘findings’ mentioned earlier), and reasoning under uncertainty is simply a form of extended logic, generalizing the true/false dichotomy of Aristotelian logic to situations where a hypothesis has a probability of being true that lies between 0 and 1 (Jeffreys 1961 ; Jaynes 2003 ). Moreover, the concept of causality is central to physical science, as well as to decision-making since otherwise there is no connection between decisions and consequences, and causality has a logical formulation as well (Pearl and Mackenzie 2018 ; Fenton and Neil 2019 ). Those elements of physical reasoning are part of scientific practice in climate-change science, but are not connected to statistical practice in an explicit way. Thus, it seems crucial to bring these elements into the treatment of uncertainty.

In lay terms, probability is the extent to which something is likely to happen or to be the case. This includes frequency-based (or long-run) probabilities — the frequentist paradigm — as a special case, but it applies to single-outcome situations as well, such as a scientific hypothesis concerning climate change, where probability is interpreted as degree of belief. (For scientists, the word “belief” may cause some discomfort, but we can interpret belief as expert judgement, which is a widely accepted concept in climate-change science, including by the IPCC (Mastrandrea et al. 2011 ).) The two concepts of uncertainty are quite distinct, yet are commonly confused, even by practicing climate scientists. Even the use of frequency-based probabilities requires a degree of belief that they may be appropriately used for the purpose at hand, which is a highly non-trivial point when one is making statements about the real world. Jeffreys ( 1961 ) and Jaynes ( 2003 ) both argue that whilst the frequentist methods generally produce acceptable outcomes in the situations for which they were developed (e.g. agricultural trials, quality control in industry), which are characterized by an abundance of data and little in the way of prior knowledge, they are not founded in rigorous principles of probability (the ‘extended logic’ mentioned above, which is so founded (e.g. Cox 1946 )), and are not appropriate for the opposite situation of an abundance of prior knowledge and little in the way of data. For climate-change science, especially (although not exclusively) in the adaptation context, we are arguably in the latter situation: we have extensive physical knowledge of the workings of the climate system and of the mechanisms involved in climate impacts, and very little data that measures what we are actually trying to predict, let alone under controlled conditions. This motivates a reappraisal of the practice of statistics in climate-change science. In this I draw particularly heavily on Jeffreys ( 1961 ), since he was a geophysicist and thus was grappling with scientific problems that have some commonality with our own.

This paper is aimed at climate scientists. Its goal is to convince them that the frequentist statistical methods that are standard in climate-change science should be embedded within a broader logical framework that can connect physical reasoning to statistical practice in a transparent way. Not only can this help avoid logical errors, it also provides a scientific language for representing physical knowledge even under conditions of deep uncertainty, thereby expanding the set of available scientific tools. In this respect, making explicit and salient the conditionality of any scientific statement is a crucial benefit, especially for adaptation where a variety of societal values come into play (Hulme et al. 2011 ). Note that I am not arguing for the wholesale adoption of Bayesian statistical methods, although these may have their place for particular problems (see further discussion in Sect.  4 ). Rather, I am simply arguing that we should follow Laplace’s dictum and embed our statistical calculations in common sense, so as to combine them with physical reasoning. Section  2 starts by reprising the pitfalls of ‘null hypothesis significance testing’ (NHST); although the pitfalls have been repeatedly pointed out, NHST continues to be widespread in climate-change science, and its dichotomous misinterpretation reinforces the ‘singular, definitive’ representation of knowledge. Section  2 goes on to discuss how the concept of frequency fits within the broader concepts of probability and inference. Section  3 examines a spectrum of case studies: the alleged global warming hiatus, Arctic-midlatitude linkages, and extreme event attribution. Together these illustrate how the principles discussed in Sect.  2 can be applied, in order to improve statistical practice. The paper concludes with a discussion in Sect.  4 .

2 Back to basics

The ubiquitous use of NHST has been widely criticized in the published literature (e.g. Amrhein et al. 2019 ). To quote from the abstract of the psychologist Gerd Gigerenzer’s provocatively titled paper ‘Mindless statistics’ (2004):

Statistical rituals largely eliminate statistical thinking in the social sciences. Rituals are indispensable for identification with social groups, but they should be the subject rather than the procedure of science. What I call the ‘null ritual’ consists of three steps: (1) set up a statistical null hypothesis, but do not specify your own hypothesis nor any alternative hypothesis, (2) use the 5% significance level for rejecting the null and accepting your hypothesis, and (3) always perform this procedure.

Gigerenzer refers to the social sciences, but is it actually any different in climate science Footnote 1 ? Nicholls ( 2000 ) and Ambaum ( 2010 ) both provide detailed assessments showing the widespread use of NHST in climate publications. This practice does not appear to have declined since the publication of those papers; indeed, my impression is that it has only increased, exacerbated by the growing dominance of the so-called ‘high-impact’ journals which enforce the statistical rituals with particular vigour, supposedly in an effort to achieve a high level of scientific rigour. Ambaum ( 2010 ) suggests that the practice may have been facilitated by the ready availability of online packages that offer significance tests as a ‘black box’ exercise, even though no serious statistician would argue that the practice of statistics should become a ‘black box’ exercise. I would add that Gigerenzer’s insightful comment about “identification with social groups” may also apply to climate scientists, in that statistical rituals become a working paradigm for certain journals and reviewers. I suspect I am not alone in admitting that most of the statistical tests in my own papers are performed in order to satisfy these rituals, rather than as part of the scientific discovery process itself.

Gigerenzer ( 2004 ) shows that NHST, as described above, is a bastardized hybrid of Fischer’s null hypothesis testing and Neyman–Pearson decision theory, and has no basis even in orthodox frequentist statistics. According to Fischer, a null hypothesis test should only be performed in the absence of any prior knowledge, and before one has even looked at the data, neither of which applies to the typical applications in climate science. Violation of these conditions leads to the problem known as ‘multiple testing’. Moreover, failure to reject the null hypothesis does not prove the null hypothesis, nor does rejection of the null hypothesis prove an alternative hypothesis. Yet, these inferences are routinely made in climate science, and the oxymoronic phrase “statistically significant trend” is commonplace.

Amrhein et al. ( 2019 ) argue that the main problem lies in the dichotomous interpretation of the result of a NHST — i.e. as the hypothesis being either true or false depending on the p -value — and they argue that the concept of statistical significance should be dropped entirely. (Their comment gathered more than 800 signatories from researchers with statistical expertise.) Instead, they argue that all values of a sampling statistic that are compatible with the data should be considered as plausible; in particular, two studies are not necessarily inconsistent simply because one found a statistically significant effect and the other did not (which, again, is a common misinterpretation in climate science). This aligns with Stirling’s ( 2010 ) warning, mentioned earlier, against ‘singular, definitive’ representations of knowledge when the reality is more complex, and all I can do in this respect is urge climate scientists to become aware of the sweeping revolution against NHST in other areas of science. Instead, I wish to focus here on bringing physical reasoning into statistical practice, which is of particular relevance to climate-change science for the reasons discussed earlier.

Misinterpretation of NHST is rooted in the so-called ‘prosecutor’s fallacy’, which is the transposition of the conditional. The p -value quantifies the probability of observing the data \(D\) under the null hypothesis \(H\) that the apparent effect occurred by chance. This is written \(P(D|H)\) , sometimes called the likelihood function, and is a frequentist calculation based on a specified probability model for the null hypothesis, which could be either theoretical or empirical. (As noted earlier, the specification of an appropriate probability model is itself a scientific hypothesis, but let us set that matter aside for the time being.) However, one is actually interested in the probability that the apparent effect occurred by chance, which is \(P(H|D)\) . The two quantities are not the same, but are related by Bayes’ theorem:

To illustrate the issue, consider the case where \(H\) is not the null hypothesis but is rather the hypothesis that one has a rare illness, having tested positive for the illness (data \(D\) ). Even if the detection power of the test is perfect, i.e. \(P\left(D|H\right)=1,\) a positive test result may nevertheless indicate only a small probability of having the illness, i.e. \(P\left(H|D\right)\) being very small, if the illness is sufficiently rare and there is a non-negligible false alarm rate, such that \(P\left(H\right)\ll P\left(D\right).\) This shows the error that can be incurred from the transposition of the conditional if one does not take proper account of prior probabilities. In psychology, it is known as ‘base rate neglect’ (Gigerenzer and Hoffrage 1995 ).

The example considered above of the medical test is expressed entirely in frequentist language, because the probability of the test subject having the illness (given no other information, and before taking the test), \(P\left(H\right),\) is equated to the base rate of the illness within the general population, which is a frequency. However, this interpretation is not applicable to scientific hypotheses, for which the concept of a ‘long run’ frequency is nonsensical. To consider this situation, we return to the case of \(H\) being the null hypothesis and write Bayes’ theorem instead as

Equation ( 2 ) is mathematically equivalent to Eq. ( 1 ) but has a different interpretation. Now the probability of the apparent effect having occurred by chance, \(P\left(H|D\right),\) is seen to be the prior probability of there being no real effect, \(P\left(H\right),\) multiplied by the factor \(P\left(D|H\right)/P(D)\) . The use of Bayes’ theorem in this way is often criticized for being sensitive to the prior \(P\left(H\right).\) However, expert (prior) knowledge is also used in the formulation ( 1 ) to determine how to control for confounding factors and for other aspects of the statistical analysis, and it is widely used in climate-change science to determine how much weight to place on different pieces of evidence. It is thus a strength, rather than a weakness, of Bayes’ theorem that it makes this aspect of the physical reasoning explicit.

The factor \(P\left(D|H\right)/P(D)\) in Eq. ( 2 ) represents the power of the data for adjusting one’s belief in the null hypothesis. But whilst \(P\left(D|H\right)\) is the p -value, we have another factor, \(P\left(D\right)\) ; how to determine it? This can only be done by considering the alternative hypotheses that could also account for the data \(D\) . We write \(\neg H\) as the complement of \(H\) , so that \(P\left(\neg H\right)=1-P(H)\) . (In practice, \(\neg H\) should be enumerated over all the plausible alternative hypotheses.) From the fundamental rules of probability,

which can be substituted into Eq. ( 2 ). Thus, we can eliminate \(P\left(D\right)\) , but only at the cost of having to determine \(P\left(D|\neg H\right).\) In that case, it is simpler to divide Eq. ( 2 ) by the same expression with \(H\) replaced by \(\neg H\) , which eliminates \(P\left(D\right)\) and yields the ‘odds’ version of Bayes’ theorem:

This states that the odds on the data occurring by chance — the left-hand side of Eq. ( 4 ) — equal the prior odds of the null hypothesis multiplied by the first term on the right-hand side of Eq. ( 4 ), which is known as the Bayes factor (Kass and Raftery 1995 ) and was heavily used by Jeffreys ( 1961 ). The deviation of the Bayes factor from unity represents the power of the data for discriminating between the null hypothesis and its complement. (Note that Eq. ( 4 ) holds for any two hypotheses, but its interpretation is simpler when the two hypotheses are mutually exclusive and exhaustive, as here.) One of the attractive features of the Bayes factor is that it does not depend on the prior odds, and is amenable to frequentist calculation when the alternative hypothesis can be precisely specified.

The formulation ( 4 ) represents in a clear way the aphorism that ‘strong claims require strong evidence’: if the prior odds of the null hypothesis are very high, then it requires a very small Bayes factor to reject the null hypothesis. But Eq. ( 4 ) makes equally clear that the power of the data is represented not in the p -value \(P\left(D|H\right)\) but rather in the Bayes factor, and that failure to consider the Bayes factor is a serious error in inference. To quote Jeffreys ( 1961 , p. 58):

We get no evidence for a hypothesis by merely working out its consequences and showing that they agree with some observations, because it may happen that a wide range of other hypotheses would agree with those observations equally well. To get evidence for it we must also examine its various contradictories and show that they do not fit the observations.

Thus, for both reasons, the p -value \(P\left(D|H\right)\) on its own is useless for inferring the probability of the effect occurring by chance, and thus for rejecting the null hypothesis, even though this is standard practice in climate science. Rather, we need to consider both the prior odds of the null hypothesis, and the p -value for the alternative hypothesis, \(P\left(D|\neg H\right)\) . We will discuss the implications of this in more detail in Sect.  3 in the context of specific climate-science examples. Here, we continue with general considerations. With regard to the difference between the p -value \(P\left(D|H\right)\) and the Bayes factor, Bayesian statisticians have ways of estimating \(P\left(D|\neg H\right)\) in general, and the outcome is quite shocking. Nuzzo ( 2014 ), for example, estimates that a p -value of 0.05 generally corresponds to a Bayes factor of only 0.4 or so, almost 10 times larger. The reason why the p -value can differ so much from the Bayes factor is because the latter penalizes imprecise alternative hypotheses, which are prone to overfitting. The difference between the two is called the ‘Ockham factor’ by Jaynes ( 2003 , Chapter 20), in acknowledgement of Ockham’s razor in favour of parsimony: “The onus of proof is always on the advocate of the more complicated hypothesis” (Jeffreys 1961 , p. 343). The fact that such a well-established principle of logic is absent from frequentist statistics is already telling us that the latter is an incomplete language for describing uncertainty.

It follows that in order for a p -value of 0.05 to imply a 5% likelihood of a false alarm (i.e. no real effect) — which is the common misinterpretation — the alternative hypothesis must already be a good bet. For example, Nuzzo ( 2014 ) estimates a 4% likelihood of a false alarm when \(P(H)=0.1\) , i.e. the null hypothesis is already considered to be highly improbable. For a toss-up with \(P(H)=0.5\) , the likelihood of a false alarm given a p -value of 0.05 rises to nearly 30%, and it rises to almost 90% for a long-shot alternative hypothesis with \(P(H)=0.95\) . Yet despite this enormous sensitivity of the inferential power of a p -value to the prior odds of the null hypothesis, nowhere in any climate science publication have I ever seen a discussion of prior odds (or probabilities) entering the statistical interpretation of a p -value.

In fact, in much published climate science, the alternative hypothesis to the null is already a good bet, having been given plausibility by previous research or by physical arguments advanced within the study itself. In other words, the statistical analysis is only confirmatory, and the p -value calculation performed merely as a sanity check. However, it is important to understand the prior knowledge and assumptions that go into this inference. For transparency, they should be made explicit, and a small p -value should in no way be regarded as a ‘proof’ of the result.

There is an exception to the above, when the data does strongly decide between the two hypotheses. This occurs in the case of detection and attribution of anthropogenic global warming, where the observed warming over the instrumental record can be shown to be inconsistent with natural factors, and fully explainable by anthropogenic forcing (e.g. IPCC 2013 ). In that case, the Bayes factor is very small, and a strong inference is obtained without strong prior assumptions (mainly that all potential explanatory factors have been considered). However, this ‘single, definitive’ situation is generally restricted to thermodynamic aspects of climate change on sufficiently coarse spatial and temporal scales (Shepherd 2014 ).

A similar issue arises with confidence intervals. The frequentist confidence interval represents the probability distribution of a sampling statistic of a population parameter. However, it does not represent the likely range of the population parameter, known as the ‘credible interval’ (Spiegelhalter 2018 ). To equate the two, as is commonly done in climate science publications, is to commit the error of the transposed conditional. In particular, it is common to assess whether the confidence interval around a parameter estimate excludes the null hypothesis value for that parameter, as a basis for rejecting the null hypothesis. This too is an inferential error. However, the confidence interval can approximately correspond to the credible interval if a wide range of prior values are considered equally likely (Fenton and Neil 2019 , Chap. 12), which is effectively assuming that the null hypothesis value (which is only one such value) is highly unlikely. Thus, once again, provided we are prepared to acknowledge that we are assuming the null hypothesis to be highly unlikely, the use of a frequentist confidence interval may be acceptable.

There is one more general point that is worth raising here before we go on to the examples. In most climate science, the use of ‘statistical rituals’ means that particular statistical metrics (such as the p -value) are used without question. However, statisticians well appreciate that every statistical metric involves a trade-off, and that the choice will depend on the decision context. For example, in forecasts, there is a trade-off between discrimination and reliability, and in parameter estimation, there is a trade-off between efficiency and bias. There is no objective basis for how to make those trade-offs. Part of better statistical practice in climate-change science is to recognize these trade-offs and acknowledge them in the presentation of the results.

3.1 The alleged global warming hiatus

The alleged global warming ‘hiatus’ was the apparent slowdown in global-mean warming in the early part of the twenty-first century. Looking at it now (Fig.  2 ), it is hard to see why it attracted so much attention. Yet it was the focus of much media interest, and a major challenge for the IPCC during the completion of the AR5 WGI report, in 2013. There are good scientific reasons to try to understand the mechanisms behind natural variability in climate, and there was also a question at the time of whether the climate models were overestimating the warming response to greenhouse gases. However, the media attention (and the challenge for the IPCC) was mostly focused on whether climate change had weakened, or even stopped — as many climate sceptics claimed (Lewandowsky et al. 2016 ). We focus here on that specific question.

figure 2

NASA GISTEMP time series of estimated observed annual global mean surface air temperature expressed as anomalies relative to the 1951–1980 reference period. The red line is a smoothed version of the time series. The grey band (added here) indicates the time period 1998–2012, which was defined as the hiatus period in Box TS.3 of IPCC ( 2013 ). From https://data.giss.nasa.gov/gistemp/ , downloaded 31 May 2021

Given that there is high confidence in the basic physics of global warming, a reasonable null hypothesis would have been that the hiatus was just the result of natural variability. Then the logical thing to have done would have been to determine the Bayes factor comparing the hypothesis of continued climate change to that of a cessation to climate change. If the Bayes factor was of order unity, then the data would not have differentiated between the two hypotheses; in other words, the hiatus would have been entirely consistent with natural variability together with continued long-term climate change, and there would have been no need to adjust the prior hypothesis (which would have to have been given an extremely high likelihood, given previous IPCC reports).

Yet such an approach was not taken. Instead, there were many published studies examining the statistical significance of the observed trends, and much attention in the technical summary of the AR5 WGI report (Box TS.3 of IPCC 2013 ) was devoted to the hiatus, which was defined by IPCC to be the period 1998–2012 (grey shading in Fig.  2 ). The fact that small adjustments to the data sets could make the difference between statistical significance or not (Cowtan and Way 2014 ) should have raised alarm bells that this frequentist-based approach to the data analysis, with multiple testing together with transposing the conditional to make inferences about physical hypotheses, was ill-founded. Completely ignoring all the knowledge from previous IPCC reports in the statistical assessment was also somewhat perverse, given that our confidence in the physics of anthropogenic global warming does not rest on observed warming alone, let alone warming over a 14-year period.

Rahmstorf et al. ( 2017 ) revisited the hiatus controversy, and deconstructed many of the published analyses of the hiatus, showing how they fell into many of the pitfalls discussed in Sect.  2 . A particularly egregious one is the selection bias that arises from selectively focusing on a particular period and ignoring the others (also known as the ‘multiple testing problem’), which is apparent by eye from Fig.  2 . They also showed that a more hypothesis-driven approach to the data analysis would have deduced that there was nothing unusual about the hiatus, which is equivalent to saying that the Bayes factor would have been close to unity. (That is even before bringing prior odds into the picture.) An independent and very interesting confirmation of this result is the study of Lewandowsky et al. ( 2016 ), which took the observed global-mean temperature time series (up to 2010), relabelled it as “World Agricultural Output”, and asked non-specialists whether they saw any weakening of the trend. The answer was a resounding no. This appears to show the power of the human brain for separating signal from noise, much more reliably than frequentist-based analysis methods.

I cannot resist pointing out that Fig. 10.6 of the IPCC AR5 WGI report showed clearly that the hiatus was entirely explainable from a combination of ENSO variability and the decline in solar forcing, even in the presence of continued anthropogenic warming. To this day, I still cannot understand why the IPCC chose to ignore this piece of evidence in its discussion of the hiatus, relying instead on purely statistical analyses without the incorporation of the huge amount of knowledge within the WGI report itself. When I asked someone about this, the answer I got was that Fig. 10.6 did not “prove” the case. But that’s not the point. Given all the knowledge that existed, it was surely sufficient to show that no other explanation was needed. To again draw on Jeffreys ( 1961 , p. 342):

Variation is random until a contrary is shown; and new parameters in laws, when they are suggested, must be tested one at a time unless there is specific reason to the contrary.

3.2 Arctic-midlatitude connections

The Arctic amplification of global warming is a robust aspect of climate change, and the observed decline in Arctic sea-ice extent is its poster-child. The sea-ice decline is largest in the summer season, but the additional warmth in that season is absorbed by the colder ocean and released back to the atmosphere during winter, when the atmosphere is colder. Hence, Arctic amplification is mainly manifest in the winter season. Based on observed trends, Francis and Vavrus ( 2012 ) made the claim that Arctic amplification led to a wavier jet stream, causing more extreme winter weather at midlatitudes, including (somewhat counterintuitively in a warming climate) more cold spells. This claim has subsequently generated heated debate within the scientific community, and is an extremely active area of research (e.g. Screen et al. 2018 ; Cohen et al. 2020 ).

In contrast to the example of the hiatus, here, the prior knowledge is not very strong. Much of the evidence that is cited in favour of the claim of Arctic-to-midlatitude influence is from congruent trends in observational time series. However, the waviness of the jet stream will itself induce Arctic winter warming through enhanced poleward heat transport (see Shepherd 2016a ), so any attempt to isolate the causal influence of Arctic warming on midlatitude weather must control for this opposing causal influence. Kretschmer et al. ( 2016 ) used the time lags of the various hypothesized physical processes to do this from observations, using a causal network framework, and inferred a causal role for sea ice loss in the Barents–Kara seas inducing midlatitude atmospheric circulation changes that are conducive to cold spells. This approach builds in prior knowledge to constrain the statistical analysis. Mostly, however, researchers have used frequentist-based methods applied to the change in long-term trends since 1990, when Arctic warming appeared to accelerate. This places a lot of weight on what from a climate perspective are relatively short time series (which is similar to the hiatus situation). Moreover, climate change affects both midlatitude conditions and the Arctic, representing a common driver and thus a confounding factor for any statistical analysis (Kretschmer et al. 2021 ). The theoretical arguments for a wavier jet stream are heuristic, and more dynamically based considerations are inconclusive (Hoskins and Woollings 2015 ). Climate models provide inconsistent responses, and there are certainly good reasons to question the fidelity of climate models to capture the phenomenon, given the fact they are known to struggle with the representation of persistent circulation anomalies such as blocking. Overall, there are certainly sufficient grounds to develop plausible physical hypotheses of Arctic-midlatitude linkages, even if not through the Francis and Vavrus ( 2012 ) mechanism. Indeed, several large funding programmes have been established to explore the question.

Yet with all this uncertainty, it is difficult to understand how the published claims can be so strong, on both sides. Whilst the whiplash of conflicting claims may help generate media attention, it must be very confusing for those who want to follow the science on this issue. Adopting a more ‘plural, conditional’ perspective would surely be helpful, and much more representative of the current state of knowledge. Kretschmer et al. ( 2020 ) examined the previously hypothesized link between Barents–Kara sea-ice loss (where the changes are most dramatic) and changes in the strength of the stratospheric polar vortex — known to be a causal factor in midlatitude cold spells (Kretschmer et al. 2018 ) and a major driver of uncertainty in some key wintertime European climate risks (Zappa and Shepherd 2017 ) — across the CMIP5 models. They found that the link in the models was so weak as to be undetectable in the year-to-year variability, which means that it will be difficult to find Bayes factors between the hypothesis of a causal influence and the null hypothesis of no such influence that are very informative. Yet even such a weak link had major implications for the forced changes, given the large extent of projected Barents–Kara sea-ice loss (essentially 100%) compared to other changes in the climate system. Returning to Eq. ( 4 ), the weakness of the link may help explain why the scientific findings in this subject seem to be so closely linked to scientists’ prior beliefs. There would be nothing wrong with that so long as those beliefs were made explicit, which would happen naturally if scientists also considered all plausible alternative hypotheses, as Eq. ( 4 ) obliges them to do. (The quote given earlier from Jeffreys ( 1961 , p. 58) is relevant here.) Alternatively, one can present the different hypotheses in conditional form, as storylines, allowing the user of the information to impose their own beliefs (Shepherd 2019 ). This is useful for decision-making since within probability theory, beliefs can incorporate consequences (Lindley 2014 ). Once again, there is a relevant quote from Jeffreys ( 1961 , p. 397):

There are cases where there is no positive evidence for a new parameter, but important consequences might follow if it was not zero, and we must remember that [a Bayes factor] > 1 does not prove that it is zero, but merely that it is more likely to be zero than not. Then it is worth while to examine the alternative [hypothesis] further and see what limits can be set to the new parameter, and thence to the consequences of introducing it.

3.3 Extreme event attribution

Since weather and climate extremes have significant societal impacts, it is no surprise that many of the most severe impacts of climate change are expected to occur through changes in extreme events. If climate is understood as the distribution of all possible meteorological states, then the effect of climate change on extreme events is manifest in the changes in that distribution. This is the subject of a large literature. Over the last 20 years, the different topic of extreme event attribution has emerged, which seeks to answer the question of whether, or how, a particular extreme event can be attributed to climate change. In contrast to the two previous examples, which concerned clear climate-science questions, here, it is far from obvious how to even pose the question within a climate-science framework, since every extreme event is unique (NAS 2016 ). This ‘framing’ question of how to define the event raises its own set of issues for statistical practice and scientific reasoning.

The most popular approach, first implemented by Stott et al. ( 2004 ) for the 2003 European heat wave, has been to estimate the probability of an event at least as extreme as the observed one occurring (quantified in a return period), under both present-day and pre-industrial conditions, and attributing the change in probability to climate change. This is done by defining an ‘event class’ (typically univariate, and at most bivariate) which is sufficiently sharp to relate to the event in question, but sufficiently broad to allow a frequency-based calculation of probability. Clearly, there is a trade-off involved here, which will depend on a variety of pragmatic factors. For example, in Stott et al. ( 2004 ), the event was defined by the average temperature over a very large region encompassing Southern Europe, over the entire summer period (June through August), for which the observed extreme was only 2.3 °C relative to preindustrial conditions, and around 1.5 °C relative to the expected temperature in 2003. Such an anomaly was very rare for that highly aggregated statistic, but clearly nobody dies from temperatures that are only 1.5 °C above average. Given that this ‘probabilistic event attribution’ (PEA) is based on a frequentist definition of probability, along with related concepts such as statistical significance, it is worth asking how a widening of the perspective of probability and reasoning under uncertainty, along the lines described in this paper, might enlarge the set of scientific tools that are available to address this important scientific topic.

The first point to make is that from the more general perspective of probability theory discussed in Sect.  2 , there is no imperative to adopt a frequentist interpretation of probability. As Jeffreys says (1961, p. 401), ‘No probability….is simply a frequency’. A frequency is at best a useful mathematical model of unexplained variability. The analogy that is often made of increased risk from climate change is that of loaded dice. But if a die turns up 6, whether loaded or unloaded, it is still a 6. On the other hand, if an extreme temperature threshold is exceeded only very rarely in pre-industrial climate vs quite often in present-day climate, the nature of these exceedances will be different. One is in the extreme tail of the distribution, and the other is not, so they correspond to very different meteorological situations and will be associated with very different temporal persistence, correlation with other fields, and so on. Since pretty much every extreme weather or climate event is a compound event in one way or another, this seems like quite a fundamental point. It is perfectly sensible to talk about the probability of a singular event, so we should not feel obliged to abandon that concept.

The fact is that climate change changes everything ; the scientific question is not whether, but how and by how much. When the null hypothesis is logically false, as here, use of NHST is especially dangerous. Following Ockham’s razor, the more relevant question is whether a particular working hypothesis (which would then be the null hypothesis) is enough to provide a satisfactory answer to the question at hand. As noted, most extreme weather and climate events are associated with unusual dynamical conditions conducive to that event, which we denote generically by \(N\) . The event itself we denote by \(E\) . An example event might be a heat wave, for which \(N\) could be atmospheric blocking conditions; or a drought, for which \(N\) could be the phase of ENSO. The effect of climate change can then be represented as the change in the joint probability \(P(E,N)\) between present-day, or factual (subscript \(f\) ) conditions, and the same conditions without climate change, which are a counter-factual (subscript \(c\) ), expressed as a risk ratio. From NAS ( 2016 ),

This simple equation, which is based on the fundamental laws of probability theory, shows that the risk ratio factorizes into the product of two terms. The first is a ratio of conditional probabilities, namely the change in probability for a given dynamical conditioning factor \(N\) . The second expresses how the probability of the conditioning factor might itself change.

The scientific challenge here is that for pretty much any relevant dynamical conditioning factor for extreme events, there is very little confidence in how it will change under climate change (Shepherd 2019 ). This lack of strong prior knowledge arises from a combination of small signal-to-noise ratio in observations, inconsistent projections from climate models, and the lack of any consensus theory. If one insists on a frequentist interpretation of this second factor, as in PEA, then this can easily lead to inconclusive results, and that is indeed what tends to happen for extreme events that are not closely tied to global-mean warming (NAS 2016 ). But there is an alternative. We can instead interpret the second factor on the right-hand side of (5) as a degree of belief — which is far from inappropriate, given that the uncertainty here is mainly epistemic — and consider various hypotheses, or storylines (Shepherd 2016b ). The simplest hypothesis is that the second factor is unity, which can be considered a reasonable null hypothesis. One should of course be open to the possibility that the second factor differs from unity, but in the absence of strong prior knowledge in that respect, that uncertainty would be represented by a prior distribution centred around unity. The advantage of this partitioning is that the first term on the right-hand side of Eq. ( 5 ) is generally much more amenable to frequentist quantification than is the second, and if the dynamical conditioning is sufficiently strong, it tends to focus the calculation on the thermodynamic aspects of climate change about which there is comparatively much greater confidence.

It seems worth noting that this approach is very much in line with the IPCC’s guidance on the treatment of uncertainty (Mastrandrea et al. 2011 ), which only allows a probabilistic quantification of uncertainty when the confidence levels are high.

This approach is actually used implicitly in much PEA. For example, in the analogue method (e.g. Cattiaux et al. 2010 ), atmospheric circulation regimes are used for \(N\) , and when using large-ensemble atmosphere-only models (as in Stott et al. 2004 ), sea-surface temperature anomalies are used for \(N\) . Both methods have been considered as perfectly acceptable within the PEA framework (Stott et al. 2016 ), despite effectively assuming that the second factor in Eq. ( 5 ) is unity. This assumption is very often not even discussed, and if it is, the argument is typically made that there is no strong evidence in favour of a value other than unity (see e.g. van Oldenborgh et al. 2021 for a recent example). Yet for some reason, when exactly the same approach was proposed for the detailed dynamical situation of a highly unusual meteorological configuration (Trenberth et al. 2015 ), it was argued by the PEA community that it was invalid scientific reasoning. For example, Stott et al. ( 2016 , p. 33) say:

By always finding a role for human-induced effects, attribution assessments that only consider thermodynamics could overstate the role of anthropogenic climate change, when its role may be small in comparison with that of natural variability, and do not say anything about how the risk of such events has changed.

There is a lack of logical consistency here. First, since climate change has changed everything, at least to some degree, there is nothing logically wrong with “always finding a role for human-induced effects”. Second, this approach is not biased towards overstating the role of anthropogenic climate change, as it could equally well understate it. As Lloyd and Oreskes ( 2018 ) have argued, whether one is more concerned about possible overstatement or understatement of an effect is not a scientific matter, but one of values and decision context. Third, “small compared with natural variability” can be consistent with an effect of anthropogenic climate change. For example, in van Garderen et al. ( 2021 ), global spectral nudging was used to apply the storyline approach to the 2003 European and 2010 Russian heat waves. The study clearly showed that the anthropogenic warming was small in magnitude compared to the natural variability that induced the heat waves, but the high signal-to-noise ratio achieved through the approach provided a quantitative attribution at very fine temporal and spatial scales, potentially allowing for reliable impact studies (and avoiding the need to choose an arbitrary ‘event class’, which blurs out the event). Finally, whilst it is true that the approach does not say anything about how the risk of such events has changed, I am not aware of a single PEA study that has a definitive attribution of changes in the conditioning factor \(N\) leading to the event, so they are subject to exactly the same criticism. Instead, the attribution in PEA studies is invariably explained in terms of well-understood thermodynamic processes. That seems like a pretty good justification for the storyline approach. In this way, the two approaches can be very complementary (see Table 2 of van Garderen et al. 2021 ). And if there are strong grounds for considering changes in dynamical conditions (as in Schaller et al. 2016 , where the change in flood risk in the Thames Valley changed sign depending on the modelled circulation changes), then probability theory, as in Eq. ( 5 ), provides the appropriate logical framework for considering this question in a hypothesis-driven manner, through storylines of circulation change (Zappa and Shepherd 2017 ). In such cases, a ‘plural, conditional’ perspective is called for.

Yet again, there is a relevant quote from Jeffreys ( 1961 , p. 302):

In induction there is no harm in being occasionally wrong; it is inevitable that we shall be. But there is harm in stating results in such a form that they do not represent the evidence available at the time when they are stated, or make it impossible for future workers to make the best use of that evidence.

4 Discussion

In an application where there is little in the way of prior knowledge, and a lot of data, the Bayes factor rapidly overpowers the influence of the prior knowledge, and the result is largely insensitive to the prior. However, many aspects of climate-change science, especially (although not exclusively) in the adaptation context, are in the opposite situation of having a large amount of prior knowledge, and being comparatively data-poor (in terms of data matching what we are actually trying to predict). In particular, the observed record provides only a very limited sample of what is possible, and is moreover affected by sources of non-stationarity, many of which may be unknown. Larger data sets can be generated from simulations using climate models, but those models have many failings, and it is far from clear which aspects of model simulations contain useful information, and which do not. Physical reasoning is therefore needed at every step. In such a situation, using statistical methods that eschew physical reasoning and prior knowledge — “letting the data speak for itself”, some might say — is a recipe for disaster. Statistical practice in climate-change science simply has to change.

A statistician might at this point argue that the answer is to use Bayesian statistics. Indeed, Bayesian methods are used in particular specialized areas of climate science, such as inverse methods for atmospheric sounding (Rodgers 2000 ) including pollution-source identification (Palmer et al. 2003 ), sea-level and ice-volume variations on palaeoclimate timescales (Lambeck et al. 2014 ), and climate sensitivity (Sherwood et al. 2020 ). Mostly, this involves introducing prior probability distributions on the estimated parameters, but Sherwood et al. ( 2020 ) discuss the constraints on climate sensitivity in terms of the confidence that can be placed in physical hypotheses. There have been brave attempts to employ Bayesian methods more widely, e.g. in the UK climate projections (Sexton et al. 2012 ). The difficulty is that Bayesian calibration for climate-change projections requires knowing the relationship between model bias in present-day climate (which is measurable) and the spread in a particular aspect of model projections. Such a relationship is known as an ‘emergent constraint’ (Hall et al. 2019 ), and it has been recognized from the outset that in order to be predictive, it must be causal. Given the huge number of potential relationships, data mining can easily lead to spurious but apparently statistically significant relationships (Caldwell et al. 2014 ), and correlations can also reflect common drivers. Indeed, several published emergent constraints have subsequently been debunked (by Pithan and Mauritsen 2013 ; Simpson and Polvani 2016 ; Caldwell et al. 2018 ), and the field is something of a Wild West. Hall et al. ( 2019 ) emphasize the crucial importance of anchoring emergent constraints in physical mechanisms, and argue that emergent constraints are most likely to be found when those mechanisms are direct and linear. This may help explain why it has been so challenging to find emergent constraints for circulation aspects of climate change (relevant for adaptation), since there is no consensus on the relevant mechanisms and the circulation responses appear to involve multiple interacting factors, and potential nonlinearity.

For climate information to be useable, its uncertainties must be comprehensible and salient, especially in the face of apparently conflicting sources of information, and the connection between statistical analysis and physical reasoning must be explicit rather than implicit. This argues for bringing the Bayesian spirit of hypothesis testing more explicitly into our scientific reasoning, forgoing the ‘mindless’ performance of statistical rituals as a substitute for reasoning, resisting true/false dichotomization, and being ever vigilant for logical errors such as multiple testing and the transposed conditional. As a recent Nature editorial states (Anonymous 2019 ), “Looking beyond a much used and abused measure [statistical significance] would make science harder, but better.” Yet we can still use familiar statistical tools, such as p -values and confidence intervals, so long as we remember what they do and do not mean. They are useful heuristics, which researchers have some experience interpreting. And we need to make sure that we are not chasing phantoms.

Neuroscience has shown that human decision-making cannot proceed from facts alone but involves an emotional element, which provides a narrative within which the facts obtain meaning (Damasio 1994 ). Narratives are causal accounts, which in the scientific context can be regarded as hypotheses. To connect physical reasoning and statistical practice, these narratives need to run through the entire scientific analysis, not simply be a ‘translation’ device bolted on at the end. To return to the quote from Jeffreys at the beginning of this piece, we need to recognize that data does not speak on its own; there is no answer without a question, and the answer depends not only on the question but also on how it is posed.

Availability of data and materials

Not applicable.

Code availability

I sometimes use the term “climate science”, because the points I make are applicable to climate science in general, but use “climate-change science” when I wish to emphasize that particular aspect of climate science.

Ambaum MHP (2010) Significance tests in climate science. J Clim 23:5927–5932

Google Scholar  

Amrhein V, Greenland S, McShane B (2019) Retire statistical significance. Nature 567:305–307

Anonymous (2019) It’s time to talk about ditching statistical significance. Nature 567:283 (online version)

Caldwell PM, Bretherton CS, Zelinka MD, Klein SA, Santer BD, Sanderson BM (2014) Statistical significance of climate sensitivity predictors obtained by data mining. Geophys Res Lett 41:1803–1808

Caldwell PM, Zelinka MD, Klein SA (2018) Evaluating emergent constraints on equilibrium climate sensitivity. J Clim 31:3921–3942

Cattiaux J, Vautard R, Cassou C, Yiou P, Masson-Delmotte V, Codron F (2010) Winter 2010 in Europe: a cold extreme in a warming climate. Geophys Res Lett 37:L20704

Cohen J, Zhang X, Francis J, Jung T, Kwok R, Overland J, Ballinger TJ, Bhatt US, Chen HW, Coumou D, Feldstein S, Gu H, Handorf D, Henderson G, Ionita M, Kretschmer M, Laliberte F, Lee S, Linderholm HW, Maslowski W, Peings Y, Pfeiffer K, Rigor I, Semmler T, Stroeve J, Taylor PC, Vavrus S, Vihma T, Wang S, Wendisch M, Wu Y, Yoon J (2020) Divergent consensuses on Arctic amplification influence on midlatitude severe winter weather. Nature Clim Chang 10:20–29

Cowtan K, Way RG (2014) Coverage bias in the HadCRUT4 temperature series and its impact on recent temperature trends. Quart J Roy Meteor Soc 140:1935–1944

Cox RT (1946) Probability, frequency and reasonable expectation. Amer J Phys 14:1–13

Damasio A (1994) Descartes’ error . G. P. Putnam’s Sons

Fenton N, Neil M (2019) Risk assessment and decision analysis with Bayesian networks, 2nd edn. CRC Press

Francis JA, Vavrus SJ (2012) Evidence linking Arctic amplification to extreme weather in mid-latitudes. Geophys Res Lett 39:L06801

Gigerenzer G (2004) Mindless statistics. J Socio-Econom 33:587–606

Gigerenzer G, Hoffrage U (1995) How to improve Bayesian reasoning without instructions: frequency formats. Psychol Rev 102:684–704

Hall A, Cox P, Huntingford C, Klein S (2019) Progressing emergent constraints on future climate change. Nature Clim Chang 9:269–278

Hoskins B, Woollings T (2015) Persistent extratropical regimes and climate extremes. Curr Clim Chang Rep 1:115–124

Hulme M, O’Neill SJ, Dessai S (2011) Is weather event attribution necessary for adaptation funding? Science 334:764–765

IPCC (Intergovernmental Panel on Climate Change) (2013) Climate change 2013: the physical basis. Stocker TF et al. (eds) Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge, UK

Jaynes ET (2003) Probability theory: the logic of science. Cambridge University Press

Jeffreys H (1961) The theory of probability, 3rd edn. Oxford University Press

Kass RE, Raftery AE (1995) Bayes factors. J Amer Stat Assoc 90:773–795

Kretschmer M, Coumou D, Donges JF, Runge J (2016) Using causal effect networks to analyze different Arctic drivers of midlatitude winter circulation. J Clim 29:4069–4081

Kretschmer M, Coumou D, Agel L, Barlow M, Tziperman E, Cohen J (2018) More-persistent weak stratospheric polar vortex states linked to cold extremes. Bull Amer Meteor Soc 99:49–60

Kretschmer M, Zappa G, Shepherd TG (2020) The role of Barents-Kara sea ice loss in projected polar vortex changes. Wea Clim Dyn 1:715–730

Kretschmer M, Adams SV, Arribas A, Prudden R, Robinson N, Saggioro E, Shepherd TG (2021) Quantifying causal pathways of teleconnections. Bull Amer Meteor Soc, in press. https://doi.org/10.1175/BAMS-D-20-0117.1

Article   Google Scholar  

Lambeck K, Rouby H, Purcell A, Sun Y, Sambridge M (2014) Sea level and global ice volumes from the Last Glacial Maximum to the Holocene. Proc Natl Acad Sci USA 111:15296–15303

Lenton TM, Held H, Kriegler E, Hall JW, Lucht W, Rahmstorf S, Shellnhuber HJ (2008) Tipping elements in the Earth’s climate system. Proc Natl Acad Sci USA 105:1786–1793

Lewandowsky S, Risbey JS, Oreskes N (2016) The ‘pause’ in global warming: turning a routine fluctuation into a problem for science. Bull Amer Meteor Soc 97:723–733

Lindley DV (2014) Understanding uncertainty, revised edition. Wiley

Lloyd EA, Oreskes N (2018) Climate change attribution: when is it appropriate to accept new methods? Earth’s Future 6:311–325

Mastrandrea MD, Mach KJ, Plattner GK et al (2011) The IPCC AR5 guidance note on consistent treatment of uncertainties: a common approach across the working groups. Clim Chang 108:675–691

NAS (National Academies of Sciences, Engineering and Medicine) (2016) Attribution of extreme weather events in the context of climate change. The National Academies Press, Washington, DC. https://doi.org/10.17226/21852

Nicholls N (2000) The insignificance of significance testing. Bull Amer Meteor Soc 82:981–986

Nuzzo R (2014) Statistical errors. Nature 506:150–152

Palmer PI, Jacob DJ, Jones DBA, Heald CL, Yantosca RM, Logan JA, Sachse GW, Streets DG (2003) Inverting for emissions of carbon monoxide from Asia using aircraft observations over the western Pacific. J Geophys Res 108:8828

Pearl J, Mackenzie D (2018) The book of why. Penguin Random House

Pithan F, Mauritsen T (2013) Comments on “Current GCMs’ unrealistic negative feedback in the Arctic”. J Clim 26:7783–7788

Rahmstorf S, Foster G, Cahill N (2017) Global temperature evolution: recent trends and some pitfalls. Environ Res Lett 12:1–7

Rodgers CD (2000) Inverse methods for atmospheric sounding: theory and practice. World Scientific

Rosner A, Vogel RM, Kirshen PH (2014) A risk-based approach to flood management decisions in a nonstationary world. Water Resources Res 50:1928–1942

Schaller N, Kay AL, Lamb R, Massey NR, van Oldenborgh GJ, Otto FEL, Sparrow SN, Vautard R, Yiou P, Ashpole I, Bowery A, Crooks SM, Haustein K, Huntingford C, Ingram WJ, Jones RG, Legg T, Miller J, Skeggs J, Wallom D, Weisheimer A, Wilson S, Stott PA, Allen MR (2016) Human influence on climate in the 2014 southern England winter floods and their impacts. Nature Clim Chang 6:627–634

Screen JA, Deser C, Smith DM, Zhang X, Blackport R, Kushner PJ, Oudar T, McCusker KE, Sun L (2018) Consistency and discrepancy in the atmospheric response to Arctic sea-ice loss across climate models. Nature Geosci 11:155–163

Sexton DMH, Murphy JM, Collins M, Webb MJ (2012) Multivariate probabilistic projections using imperfect climate models. Part I: Outline of methodology. Clim Dyn 38:2513–2542

Shepherd TG (2014) Atmospheric circulation as a source of uncertainty in climate change projections. Nat Geosci 7:703–708

Shepherd TG (2016a) Effects of a warming Arctic. Science 353:989–990

Shepherd TG (2016b) A common framework for approaches to extreme event attribution. Curr Clim Chang Rep 2:28–38

Shepherd TG (2019) Storyline approach to the construction of regional climate change information. Proc R Soc A 475:20190013

Sherwood S, Webb MJ, Annan JD, Armour KC, Forster PM, Hargreaves JC, Hegerl G, Klein SA, Marvel KD, Rohling EJ, Watanabe M, Andrews T, Braconnot P, Bretherton CS, Foster GL, Hausfather Z, von der Heydt AS, Knutti R, Mauritsen T, JNorris JR, Proistosescu C, Rugenstein M, Schmidt GA, Tokarska KB, Zelinka MD (2020) An assessment of Earth’s climate sensitivity using multiple lines of evidence. Rev Geophys 58:e2019RG000678

Simpson IR, Polvani L (2016) Revisiting the relationship between jet position, forced response, and annular mode variability in the southern midlatitudes. Geophys Res Lett 43:2896–2903

Sippel S, Meinshausen N, Fischer EM, Székely E, Knutti R (2020) Climate change now detectable from any single day of weather at global scale. Nature Clim Chang 10:35–41

Spiegelhalter D (2018) The art of statistics: learning from data. Pelican Books

Stirling A (2010) Keep it complex. Nature 468:1029–1031

Stott PA, Stone DA, Allen MR (2004) Human contribution to the European heatwave of 2003. Nature 432:610–614

Stott PA, Christidis N, Otto FEL, Sun Y, Vanderlinden JP, van Oldenborgh GJ, Vautard R, von Storch H, Walton P, Yiou P, Zwiers FW (2016) Attribution of extreme weather and climate-related events. WIREs Clim Chang 7:23–41

Trenberth KE, Fasullo JT, Shepherd TG (2015) Attribution of climate extreme events. Nature Clim Chang 5:725–730

van Garderen L, Feser F, Shepherd TG (2021) A methodology for attributing the role of climate change in extreme events: a global spectrally nudged storyline. Nat Hazards Earth Syst Sci 21:171–186

van Oldenborgh GJ, Krikken F, Lewis S, Leach NJ, Lehner F, Saunders KR, van Weele M, Haustein K, Li S, Wallom D, Sparrow S, Arrighi J, Singh RK, van Aalst MK, Philip SY, Vautard R, Otto FEL (2021) Attribution of the Australian bushfire risk to anthropogenic climate change. Nat Hazards Earth Syst Sci 21:941–960

Weaver CP, Lempert RJ, Brown C, Hall JA, Revell D, Sarewitz D (2013) Improving the contribution of climate model information to decision making: the value and demands of robust decision frameworks. WIREs Clim Chang 4:39–60

Wilby RL, Dessai S (2010) Robust adaptation to climate change. Weather 65:180–185

Zappa G, Shepherd TG (2017) Storylines of atmospheric circulation change for European regional climate impact assessment. J Clim 30:6561–6577

Zappa G, Bevacqua E, Shepherd TG (2021) Communicating potentially large but non-robust changes in multi-model projections of future climate. Int J Clim 41:3657–3669

Download references

Acknowledgements

The author acknowledges the support provided through the Grantham Chair in Climate Science at the University of Reading. He is grateful to Michaela Hegglin, Marlene Kretschmer, Michael McIntyre, and Marina Baldisserra Pacchetti, as well as the two reviewers, for comments on an earlier version of this manuscript.

Author information

Authors and affiliations.

Department of Meteorology, University of Reading, Reading, RG6 6BB, UK

Theodore G. Shepherd

You can also search for this author in PubMed   Google Scholar

Contributions

Corresponding author.

Correspondence to Theodore G. Shepherd .

Ethics declarations

Ethical approval, consent to participate, consent for publication, conflict of interest.

The author declares no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the topical collection “ Perspectives on the quality of climate information for adaptation decision support ”, edited by Marina Baldissera Pacchetti, Suraje Dessai, David A. Stainforth, Erica Thompson, and James Risbey.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Shepherd, T.G. Bringing physical reasoning into statistical practice in climate-change science. Climatic Change 169 , 2 (2021). https://doi.org/10.1007/s10584-021-03226-6

Download citation

Received : 07 June 2021

Accepted : 16 September 2021

Published : 01 November 2021

DOI : https://doi.org/10.1007/s10584-021-03226-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Climate change
  • Uncertainty
  • Bayes factor
  • Bayes theorem
  • Find a journal
  • Publish with us
  • Track your research

EurekAlert! Science News

  • News Releases

The human cause of climate change: Where does the burden of proof lie?

Dr. Kevin Trenberth advocates reversing the 'null hypothesis'

The debate may largely be drawn along political lines, but the human role in climate change remains one of the most controversial questions in 21st century science. Writing in WIREs Climate Change Dr Kevin Trenberth, from the National Center for Atmospheric Research, argues that the evidence for anthropogenic climate change is now so clear that the burden of proof should lie with research which seeks to disprove the human role.

In response to Trenberth's argument a second review, by Dr Judith Curry, focuses on the concept of a 'null hypothesis' the default position which is taken when research is carried out. Currently the null hypothesis for climate change attribution research is that humans have no influence.

"Humans are changing our climate. There is no doubt whatsoever," said Trenberth. "Questions remain as to the extent of our collective contribution, but it is clear that the effects are not small and have emerged from the noise of natural variability. So why does the science community continue to do attribution studies and assume that humans have no influence as a null hypothesis?"

To show precedent for his position Trenberth cites the 2007 report by the Intergovernmental Panel on Climate Change which states that global warming is "unequivocal", and is "very likely" due to human activities.

Trenberth also focused on climate attribution studies which claim the lack of a human component, and suggested that the assumptions distort results in the direction of finding no human influence, resulting in misleading statements about the causes of climate change that can serve to grossly underestimate the role of humans in climate events.

"Scientists must challenge misconceptions in the difference between weather and climate while attribution studies must include a human component," concluded Trenberth. "The question should no longer be is there a human component, but what is it?"

In a second paper Dr Judith Curry, from the Georgia Institute of Technology, questions this position, but argues that the discussion on the null hypothesis serves to highlight fuzziness surrounding the many hypotheses related to dangerous climate change.

"Regarding attribution studies, rather than trying to reject either hypothesis regardless of which is the null, there should be a debate over the significance of anthropogenic warming relative to forced and unforced natural climate variability," said Curry.

Curry also suggested that the desire to reverse the null hypothesis may have the goal of seeking to marginalise the climate sceptic movement, a vocal group who have challenged the scientific orthodoxy on climate change.

"The proponents of reversing the null hypothesis should be careful of what they wish for," concluded Curry. "One consequence may be that the scientific focus, and therefore funding, would also reverse to attempting to disprove dangerous anthropogenic climate change, which has been a position of many sceptics."

"I doubt Trenberth's suggestion will find much support in the scientific community," said Professor Myles Allen from Oxford University, "but Curry's counter proposal to abandon hypothesis tests is worse. We still have plenty of interesting hypotheses to test: did human influence on climate increase the risk of this event at all? Did it increase it by more than a factor of two?"

All three papers are free online:

Trenberth. K, "Attribution of climate variations and trends to human influences and natural variability": http://doi.wiley.com/10.1002/wcc.142

Curry. J, "Nullifying the climate null hypothesis": http://doi.wiley.com/10.1002/wcc.141

Allen. M, "In defense of the traditional null hypothesis: remarks on the Trenberth and Curry opinion articles": http://doi.wiley.com/10.1002/wcc.145

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.

Is climate change amping up the Pacific Northwest heat wave? Yes — and it’s time to stop asking.

Scientists say the burden of proof has shifted for connecting heat waves and global warming..

Portland residents in a cooling center

There’s no longer any need to ask if heat waves are influenced by climate change.

On Monday, temperatures in Washington and Oregon soared to well over 100 degrees Fahrenheit, crushing records and leaving locals — many of whom don’t have air conditioning — struggling to find shelter from the suffocating heat. In Seattle, fans and air conditioning units were sold out at major retailers as temperatures reached an all-time high of 106 degrees Fahrenheit, marking an unprecedented third straight day of 100-degree heat. In Portland, light rail cables literally melted amid record-smashing temperatures up to 40 degrees Fahrenheit above normal. 

And some scientists are beginning to say that not only is the once-in-a-millennium heat wave broiling the Pacific Northwest linked to climate change — but that it’s safe to assume that all heat waves are being made more severe or more likely as a result of all the carbon emissions pumped into the atmosphere. “Now, if we have an extreme heat wave, the null hypothesis is, ‘Climate change is making that worse,’” said Andrew Dessler, a professor of atmospheric sciences at Texas A&M University. Instead of having to prove that climate change did affect a heat wave, Dessler explained, the burden of proof is now on any scientist to prove that global warming didn’t play a role. 

That’s a far cry from two decades ago, when scientists hesitated to link extreme weather events to climate change at all. When researchers did connect an extreme flood, drought, or heat wave to human-caused warming, their research was often released years later, due to the long process of peer review. But now, as heat extremes (and research) continue to pile up, scientists have grown increasingly confident that climate change plays a role in essentially every one of them. 

Grist thanks its sponsors. Become one .

To support our nonprofit environmental journalism, please consider disabling your ad-blocker to allow ads on Grist. Here's How

“Every heat wave occurring today is made more likely and more intense by human-induced climate change,” tweeted Friederike Otto, a climate scientist and the associate director of the Environmental Change Institute at the University of Oxford.

Over the last decade, researchers have analyzed over 100 heat waves around the world, concluding that climate change made almost all of them more likely or more severe. (In a few studies, the results were inconclusive.) One study found that the 2017 European heat wave nicknamed “Lucifer” was made four times more likely; another found that an exceptionally warm summer in Texas in 2011 was made 10 times more likely . 

In a few cases, researchers have calculated that heat waves would have been virtually impossible without the 2 degrees F (1.2 degrees Celsius) that the planet has already warmed since pre-industrial times. Scientists estimated a heat wave in Siberia last year — which brought temperatures in the Arctic circle to 100.4 degrees F — was made 600 times more likely due to the warming climate. “The analogy that people often use is loading the dice — you have dice and they used to be fair but now we’re loading up the sixes,” Dessler said. “But what’s actually happening is we’re hitting the point where we’ve added another side. Now we’re rolling sevens.” 

Many areas of the U.S. simply aren’t prepared for that kind of extreme heat. According to the U.S. Global Change Research Program, major U.S. cities experienced two heat waves a year in the 1960s; now it’s more like six. The heat wave season is also 47 days longer — which helps explain why the Pacific Northwest is getting hit by extreme heat as early as June. 

For Dessler, the scary thing is that all of this is occurring at only 2 degrees F (1.2 degrees C) of warming, and the planet is slated for around 5.4 degrees Fahrenheit (3 degrees C) of warming before the end of the century. And climate change, he pointed out, doesn’t proceed linearly. Things might seem gradual until all hell breaks loose. “It’s going to be a lot worse than three times as bad,” he said.

A message from   

All donations DOUBLED!

Grist is the only award-winning newsroom focused on exploring equitable solutions to climate change. It’s vital reporting made entirely possible by loyal readers like you. At Grist, we don’t believe in paywalls. Instead, we rely on our readers to pitch in what they can so that we can continue bringing you our solution-based climate news.

Grist is the only award-winning newsroom focused on exploring equitable solutions to climate change. It’s vital reporting made entirely possible by loyal readers like you. At Grist, we don’t believe in paywalls. Instead, we rely on our readers to pitch in what they can so that we can continue bringing you our solution-based climate news.  

‘The water is coming’: Florida Keys faces stark reality as seas rise

This enzyme is responsible for life on earth. it’s a hot mess., a trillion cicadas will emerge in the next few weeks. this hasn’t happened since 1803., can the harsh conditions of space breed more resistant crops for earth, an early-life wildfire exposure sickened these monkeys for decades, bottled water is full of microplastics. is it still ‘natural’, oil companies contaminated a family farm. the courts and regulators let the drillers walk away., as fossil fuel plants face retirement, a puerto rico community pushes for rooftop solar, vermont passed a bill making big oil pay. now comes the hard part., modal gallery.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Wiley Open Access Collection

Logo of blackwellopen

Null Hypothesis Testing ≠ Scientific Inference: A Critique of the Shaky Premise at the Heart of the Science and Values Debate, and a Defense of Value‐Neutral Risk Assessment

Many philosophers and statisticians argue that risk assessors are morally obligated to evaluate the probabilities and consequences of methodological error, and to base their decisions of whether to adopt a given parameter value, model, or hypothesis on those considerations. This argument is couched within the rubric of null hypothesis testing, which I suggest is a poor descriptive and normative model for risk assessment. Risk regulation is not primarily concerned with evaluating the probability of data conditional upon the null hypothesis, but rather with measuring risks, estimating the consequences of available courses of action and inaction, formally characterizing uncertainty, and deciding what to do based upon explicit values and decision criteria. In turn, I defend an ideal of value‐neutrality, whereby the core inferential tasks of risk assessment—such as weighing evidence, estimating parameters, and model selection—should be guided by the aim of correspondence to reality. This is not to say that value judgments be damned, but rather that they should be accounted for within a structured approach to decision analysis, rather than embedded within risk assessment in an informal manner.

1. SCIENCE, VALUES, AND OBJECTIVITY: IS THERE ANYTHING MORE THAT CAN BE SAID?

Can science be value‐free? Is this even desirable? These simple questions have generated long‐standing debates in philosophy, and carry significant implications for scientific practice and public policy (Douglas, 2009 ; Hempel, 1965 ; Lacey, 2005 ; Rudner, 1953 ). The arguments vary. Objectivity is the hallmark of science in many classical accounts, where adherence to the rules of logic and principles of inductive reasoning are synonymous with freedom from bias and the pursuit of truth (Quine, 1955 ). If science is not objective, then its authority is seemingly undermined, as political values or personal interests may be shaping the knowledge that it generates (Reiss & Sprenger, 2017 ). Or perhaps disinterested scientific reasoning is ill‐suited to respond to the environmental challenges of modernity? Science needs to be normative, on some accounts, for example, how can we characterize risks without an (implicit) judgment that something that we value lies in harm's way (Slovic, 1999 )? Indeed maybe the idea of objectivity is a dangerous illusion (Greenland, 2012 ). Empirical inquiry is after all a different beast from formal logic, and subjective judgments color all stages of the research process (Polya, 1968 ). Appeals to objectivity may be little more than rhetoric designed to mask underlying value judgments (Havstad & Brown, 2017 ). Even the fact‐value dichotomy has come under attack, with concepts such as resilience and risk argued to carry both normative and descriptive connotations (Putnam, 2002 ).

Rather than weigh in on these grand philosophical questions, this article has the more modest aim of critiquing what is known as the “argument from inductive risk” (AfIR), and offering a defense of value‐neutral risk assessment. The latter may seem a rather reactionary and anachronistic stance to take, and is probably at odds with most modern philosophical and social science perspectives of risk (Jasanoff, 1999 ; Krimsky & Golding, 1992 ; Wynne, 2001 ). However, value‐neutrality is likely not a controversial ideal within the risk assessment community, and so a reasonable question to ask is: Why should risk assessors care about what philosophers think of their field? Yet recent years have seen a growing interest in the foundations of risk research from within the discipline (Aven, 2016 ; Borgonovo, Cappelli, Maccheroni, & Marinacci, 2018 ; Cox, 2012a ; Hansson & Aven, 2014 ). The question of the role of values in risk assessment is of particular interest because it bears on long‐standing debates about the separation of risk assessment and management, on the objectivity of risk assessment, and on the question of whether risk assessment is a fully‐fledged science (Hansson & Aven, 2014 ; National Research Council [NRC], 1983 , 2009 ). Nevertheless, one could still ask why risk assessors should care about what outsiders think about their discipline. In practical terms, the views of outsiders matter because they play a role in shaping what publics, policymakers, and institutions think about what risk assessors are and should be doing. Risk assessments conducted within government agencies are significantly shaped by laws, regulations, and conventions, and these are developed by committees typically including members drawn from a broad range of disciplines (i.e., not just practicing risk assessors, but also ethicists, legal scholars, economists, subject‐matter experts, etc.) (Albert, 1994 ; North, 2003 ; NRC, 1983 , 2009 ). Moreover, risk assessment operates within a broader societal context, which includes a range of interest groups that may invoke arguments from arenas including philosophy and statistical theory. Indeed, the process and practices of risk assessment have been a locus of controversy within legal, political, and administrative institutions, and have often been met with suspicion or distrust from citizen groups, NGOs, industrialists, and scientists (Douglas, 2005 ; Slovic, 1999 ). Crucially, this suspicion has often been articulated on grounds relating to values; for example, that risk assessments conceal value judgments, or do not adequately incorporate ethical concerns, or are simply a tool for advancing economic interests at the expense of public and environmental health (Douglas, 2000 ; Slovic, 1999 ). And so a clear articulation of the proper role of values in risk assessment could: advance thinking about the foundations of risk analysis; enhance trust in the discipline in public and political spheres; and inform the regulations and conventions that shape risk assessment practice.

Before proceeding with my core argument, I will set out my scope and introduce key concepts and terms. I am not arguing that the broader endeavor of risk analysis can or should be free of values. I take it as read that such a position would be ludicrous: questions of which risks to prioritize, how they should be framed, which mitigation options should be considered, how consequences should be evaluated, and which decision criteria to adopt are inextricably value‐laden (Pidgeon, Hood, Jones, Turner, & Gibson, 1992 ; Shrader‐Frechette, 2012 ; Slovic, 1999 ). My focus is restricted to core scientific inference within risk assessment, namely, the analysis, synthesis, and interpretation of evidence. I take the view that risk assessments are primarily concerned with making and communicating informative, good predictions. By informative , I mean that they offer outputs that are relevant to some real‐world decision problem. By good predictions , I mean that they seek to produce reliable statements about the world, in the sense that they aim to correspond with an as yet unobserved reality. These statements relate to potentially observable quantities or events, although they may not be testable in practice (Goerlandt & Reniers, 2018 ; Goldstein, 2011 ; Popper, 2005 ). I consider that (1) risk can be defined in terms of a triplet of scenarios, consequences, and probabilities, and that (2) any risk assessment is conditional upon a knowledge base whose uncertainties should be explicitly accounted for (Aven, 2013 ; Kaplan & Garrick, 1981 ). I use the term regulatory science as shorthand for the range of scientific and technical analyses conducted with the aim of informing public policy. I will talk a great deal of hypothesis testing, as this is the conceptual framework within which the AfIR is both defended and critiqued. There are of course multiple approaches to hypothesis testing, most notably Fisher, Neyman–Pearson, and the amalgamation known as null hypothesis significance testing (NHST) (for an overview, see Barnett, 1999 ; Lehmann, 1993 ). I will (somewhat loosely) refer to the AfIR as an extension of the Neyman–Pearson paradigm as it is based on the question of whether one should treat a hypothesis as true, rather than whether one should (provisionally) believe it to be so (Fisher's approach). 1

The article proceeds as follows. I begin by explicating the concept lying at the heart of the science and values debate—underdetermination—before focusing on the influential AfIR. In its simplest form, the argument states that scientists are morally obligated to evaluate the probabilities and consequences of incorrectly accepting or rejecting the hypothesis under examination, and to base (in part) their decision of whether to accept a given hypothesis on those considerations (e.g., via altering significance thresholds) (Douglas, 2000 , 2009 ; Neyman, 1957 ; Rudner, 1953 ; Steel, 2010 ). 2 I then briefly consider prominent rebuttals. The core of the article argues that the AfIR is based on several untenable (implicit) assumptions about the aims and practices of regulatory science. These include the belief that the probabilities and consequences of methodological errors can be estimated and accounted for informally; an overly restrictive framing of policy options; and an at best marginal role afforded to formal decision analysis. I argue that these assumptions stem from conceiving of risk assessment within the rubric of null hypothesis testing. But risk regulation is not primarily concerned with evaluating the probability of data conditional upon the null hypothesis, but rather with measuring risks, estimating the consequences of available courses of action, formally characterizing uncertainty, and deciding what to do based upon explicit values and decision criteria. Doing so in a rigorous and transparent manner requires a value‐neutral approach to risk assessment.

2. UNDERDETERMINATION, THE AfIR, AND PROMINENT REBUTTALS

At the heart of the science and values debate lies the idea that the evidence available to us at any given time is insufficient to fix the beliefs that we should hold in response to it, for example, when evaluating a theory, model, or hypothesis (Stanford, 2017 ). In a sense, this problem of underdetermination reflects the truism that empirical inquiry cannot proceed by deduction alone. Hypotheses only have empirical implications when conjoined with auxiliary hypothesis or background beliefs (e.g., about measurement techniques), and so a failed prediction does not determine which of our beliefs should be updated or abandoned (Duhem, 1991 ; Laudan, 1990 ). Some scholars take this to mean that there is a “gap” between evidence and the beliefs that we should form in response to it, and that this gap might as well be filled by “values.” However, values come in many different stripes, some of which are unthreatening to classical notions of objectivity. Epistemic values—those that promote the acquisition of true beliefs (Goldman, 1999 )—include notions such as simplicity, testability, and internal consistency. Defenders of the value‐free ideal argue that these principles of inductive reasoning, together with empirical evidence, are sufficient to fix beliefs relating to hypotheses without any need for intrusion by “nonepistemic values” (normative values, such as social or ethical ones) (Norton, 2008 ). However, there is no consensus on the relative importance of epistemic values, nor on how they should be interpreted, nor indeed on what values can be properly considered epistemic (Douglas, 2013 ; Kelly, 2011 ; Kuhn, 1977 ). Yet this merely shows that there is an unavoidable element of judgment in the application of epistemic values, rather than necessarily undermining the standard account of objectivity. To do the latter, one needs to show that normative judgments, such as social or political values, (should) play a role in scientific reasoning. This brings us to the AfIR.

In brief, the AfIR holds that (1) given that the evaluation of a hypothesis is unsettled by logical or evidential considerations, then (2) there is a nontrivial probability of forming erroneous beliefs in relation to it, and by extension (3) scientists have a moral obligation to take the costs of errors (false positives and false negatives) into account in hypothesis evaluation (Douglas, 2000 , 2009 ; Neyman, 1957 ; Rudner, 1953 ; Steel, 2010 ). On this account, social values play a role in handling uncertainty rather than as reasons for belief formation. That is, they determine the evidentiary thresholds that must be met for a hypothesis to be treated as true, based on the consequences of getting it wrong. There are two types of inductive risks—wrongly rejecting a true hypothesis, and wrongly accepting a false hypothesis—and social values properly determine acceptable risks of error in a particular context, with different contexts legitimately calling for a different balance between the two types of error. While Rudner ( 1953 ) forwarded this argument in relation to both pure and applied science, most modern versions of it are restricted to science that is developed for the purpose of informing public policy (i.e., “regulatory science”) (Douglas, 2000 , 2009 ; Steel, 2010 ).

In a sense the AfIR is a restatement of Neyman and Pearson's view that the consequences of error should play a role in evaluating hypotheses (Neyman, 1957 ; Pearson, 1955 ). Under this viewpoint, the crucial point is not whether a hypothesis is true, but rather whether one should treat it as though it were true. In other words, hypothesis testing is an act of choice rather than pure inference (Neyman, 1957 ). As such, consequences and their associated utilities play a crucial role in this account. However, proponents of the AfIR have had little to say on how nonepistemic values should be used to set inferential thresholds, seemingly advocating an informal approach (Kaivanto & Steel, 2017 ).

Jeffrey's (1956) classic rebuttal to AfIR is that scientists should neither accept nor reject hypotheses, but attach probabilities to them, sidestepping judgments about whether claims are certain enough to warrant treating them as though they were true. Standard counter‐responses to Jeffrey include that the (inevitable) existence of second‐order uncertainty—uncertainty about the probability judgments—lets the inductive risk argument return through the back door (Douglas, 2009 ). This is unpersuasive, at least from a “probabalist” perspective, wherein second‐order uncertainty simply collapses into first‐order uncertainty (i.e., “any set of base and higher level probabilities can be collapsed into a single set of base level probabilities” [Lehner, Laskey, & Dubois, 1996 ]). Of course, it is true that risk assessors will have more “confidence” in some probability estimates than in others. However, measures of this confidence can be conveyed to decisionmakers, for example, through assertions of the degree to which assessors expect to update their probabilities in the face of new data (Lehner et al., 1996 ). This is now standard practice in the IPCC's climate change assessments (Mastrandrea et al., 2011 ). Another counter‐response to Jeffrey is that decisionmakers are notoriously uncomfortable with uncertainty. This implies that were scientists to simply report the relation between evidence and hypotheses to decisionmakers—perhaps in the form of confidence intervals or p ‐values—they would be sowing confusion (Elliott, 2011 ). Yet risk communication, however challenging, is arguably a shared responsibility. Risk assessors achieve little if they provide incomprehensible (to decisionmakers) characterizations of uncertainty, but decisionmakers share a responsibility to grapple with uncertainty in appropriate forms. 3 The final counter‐response to Jeffrey's argument is that there are numerous methodological choices prior to the appraisal of a hypothesis—for example, model selection, interpreting data, choosing parameter values, weighing evidence, and so on (Douglas, 2000 ). Unlike with hypothesis appraisal, these decisions cannot as a practical matter be left unmade, and so the AfIR slips back in (Douglas, 2009 ). A rebuttal to this is that rather than relying upon fixed choices, scientists can use a plurality of models, perform sensitivity analysis to reflect parameter uncertainty, and feed this into a formal decision analysis that would explicitly account for values (e.g., utilities) (Betz, 2013 ; Frank, 2017 ; Parker, 2010 ). I find this argument persuasive and will return to it later.

A second rebuttal to AfIR focuses on its technocratic leanings (Betz, 2013 ). On what moral or political authority are scientists warranted to make judgments about the relative desirability of certain social consequences? Why should scientists’ normative judgments matter, rather than those of the public, stakeholders, or elected representatives? What grounds do we have to believe that they hold expertise in moral reasoning? And should we not be wary of placing such authority in the hands of a group that is distinctly unrepresentative of broader society? Indeed, the importance of distinguishing between risk assessment and risk management has long been emphasized by the risk analysis community (NRC, 1983 , 2009 ). However, is the philosopher king the logical conclusion of the AfIR? Not necessarily, as many of its proponents have written at length on various methods of eliciting or co‐producing value judgments, from citizen juries to participatory modeling exercises (e.g., Douglas, 2009 ). Nevertheless, in practice these kind of processes tend to be geared more toward questions of framing—for example, what sort of questions are posed, what kind of consequences matter, and what moral calculus should be adopted in decision making—and eliciting explicitly normative values (e.g., in expressed preference surveys, deliberations over whether equity weights should be adopted in health technology appraisal). They have not focused on engaging publics within the core inferential tasks of risk assessment.

A third rebuttal is that AfIR overstates the degrees of freedom available to the typical risk assessor. Many regulatory domains are characterized by legal rules or conventions setting out methodological choices to be adopted under conditions of uncertainty (MacGillivray, 2014 , 2017 ; NRC, 1983 , 2009 ). These include guidance on the alpha level to select in hypothesis testing, hierarchies that set out how different lines of evidence should be weighed, the preferred model to be adopted in dose‐response assessment, rules for aggregating data, and guidelines on what constitutes valid evidence versus junk science. These heuristics serve to constrain interpretive possibilities, conferring a degree of stability, consistency, and transparency to a process that might otherwise be (seen as) highly sensitive to the choices of individual analysts and thus open to legal attack (Albert, 1994 ). There are, of course, substantive critiques of rule‐bound inference, focusing on the idea that the uncritical adherence to methodological conventions is an impoverished version of objectivity, and one that is at variance with the ideal of truth‐seeking. (Feyerabend, 1993 ; Gelman & Hennig, 2017 ; Greenland, 2017a , 2017b ). Nevertheless, the point is that analytical discretion cannot simply be assumed to exist within regulatory science, suggesting that the discussions of “moral obligations” that are central to AfIR rather miss the point. Even in the absence of (binding) methodological guidelines, conventions may emerge autonomously and wield significant normative force (Franklin, 2016 ; Saltelli et al., 2008 ; Thompson, Frigg, & Helgeson, 2016 ). Nevertheless, this rebuttal only suggests that analytical degrees of freedom are more limited than supposed by proponents of the AfIR, rather than nonexistent. Moreover, the rebuttal at best pushes the AfIR to another level: that of the institutions responsible for establishing these default inference rules and conventions, rather than the analysts who apply them. In other words, the moral obligation for considering the costs and benefits of methodological error would lie with risk regulation institutions. This would be a more sensible argument—given that institutions (rather than individual analysts) in principle have the time, expertise, authority, and resources to evaluate the costs and benefits of methodological errors in an explicit and formal manner—and more or less describes the current state of affairs in many jurisdictions. However, problems arise when the assumptions and scope conditions underlying such conventions are not clearly stated, as well as when they are not well supported by empirical or theoretical evidence (MacGillivray, 2014 , 2017 ). A particular example is that the convention of NHST has crept into aspects of regulatory practice where its underlying assumptions are questionable (i.e., no uncontrolled confounders and zero measurement error) (MacGillivray, 2014 , 2017 ).

Having overviewed the terrain of the debate on science, values, and objectivity, below I present my own critique of the AfIR, and defend a value‐ neutral approach to risk assessment.

3. NULL HYPOTHESIS TESTING IS A POOR DESCRIPTIVE AND NORMATIVE MODEL FOR RISK ASSESSMENT

3.1. risk assessment is primarily concerned with estimation, not hypothesis testing.

[N]o analysis of what constitutes the method of science would be satisfactory unless it comprised some assertion to the effect that the scientist as scientist accepts or rejects hypotheses. (Rudner, 1953 ) On the Churchman‐Braithwaite‐Rudner view it is the task of the scientist as such to accept and reject hypotheses in such a way as to maximise the expectation of good for, say, a community for which he is acting. (Jeffrey, 1956 )

Proponents (and critics) of the AfIR typically frame regulatory science as being focused on (null) hypothesis testing. Some of the examples they discuss include: Is a chemical carcinogenic or not (Brown, 2013 )? Does a specific level of exposure to a chemical have a toxic effect (Steel, 2010 )? Is a drug currently on the market safe (Biddle, 2013 )? Is a toxic contaminant present in a drug in a lethal quantity (Rudner, 1953 )? Is a vaccine stock free from the active polio virus (Jeffrey, 1956 )? While this model of regulatory science—in which things are either safe or unsafe, and unsafe things should be regulated—was perhaps a reasonable one in Rudner's time, it is now broadly untenable. The most famous example of this categorical or absolutist approach to risk regulation is probably the U.S. Delaney Clause:

No additive shall be deemed to be safe if it is found to induce cancer when ingested by man or animal, or if it is found, after tests which are appropriate for the evaluation of the safety of food additives, to induce cancer in man or animals.

This clause soon became untenable as advances in toxicology and analytical chemistry revealed that there were many more carcinogens present in foodstuffs than initially expected and that there were marked differences in their potencies (Majone, 2010 ). As a result, absolutist rules now play a more limited role in risk regulation, replaced by a broad acceptance that decision making should be based on levels of risk and associated cost‐benefit considerations (Graham & Wiener, 1995 ). This is not to say that hypotheses play no role in risk assessment (MacGillivray, 2017 ; Spiegelhalter, Abrams, & Myles, 2004 ; Suter, 1996 ). For instance, does a spike in hospital mortality rates indicate an underperforming institution (e.g., substandard surgical practices or conditions), or is it random variation? Can a change in weather patterns (e.g., altered frequency or strength of North Atlantic storms) be attributed to anthropogenic causes, or is it within the bounds of natural system behavior? In the context of pharmacovigilance, has a drug‐event pair been reported disproportionately? Variants of null hypothesis testing are widely used to structure these kinds of inferences, particularly where data are generated by randomized experiments (in theory ruling out systematic error) and there is limited prior information. Such practices have been subject to the standard critiques of: using arbitrary thresholds to discriminate between signal and noise; adopting strong assumptions of no uncontrolled confounders and zero measurement error (particularly implausible when observational data are used); only indirectly considering statistical aspects that are logically informative for causal inference (e.g., effect sizes in clinical trials); and ignoring priors and utilities (Gigerenzer & Marewski, 2015 ; Spiegelhalter et al., 2004 ; Suter, 1996 ). A long‐established but often overlooked principle is that p ‐values are not measures of the truth or probability of the null hypothesis, but rather measures of the probability of the data conditional upon the truth of the null hypothesis (and auxiliary assumptions) (Greenland et al., 2016 ). This measure will not always have direct relevance for policy making. As such, null hypothesis testing is less often used as a strict decision procedure within risk assessment, but rather as one line of evidence among many that contribute to causal inference. Causal inference in practice is typically guided by domain‐specific criteria—such as Koch's postulates (Doll, 2002 ) or Hill's ( 1965 ) criteria—rather than the ritualistic application of significance levels (although abuses remain [MacGillivray, 2017 ; Suter, 1996 ], perhaps because NHST better fits a desire for absolute safety). Moreover, establishing causation is often the starting rather than end point of risk assessment, where the fundamental question of interest is what is the level of risk posed by a process, product, or activity, rather than simply whether a causal association has been demonstrated.

3.2. Hypothesis Testing Frameworks Neglect Bias and Focus on Random Error; The AfIR Inherits These Weaknesses

Recall that risk assessment involves numerous unforced methodological choices to which the AfIR putatively applies (Douglas, 2000 ). The implications are that scientists are morally obliged (within legal constraints) to (1) estimate probabilities of over‐ or underestimating a parameter value; (2) estimate the consequences of those errors; (3) evaluate those consequences (in a normative sense); and (4) select an optimal parameter value in light of some (unspecified) decision criteria. And all of this should be repeated for uncertain model choices, questions about how to characterize ambiguous data, disputes over which extrapolation method to use, and so on, seemingly without the aid of formal uncertainty analysis (see also Kaivanto & Steel, 2017 ). 4 What are the problems with this? Null hypothesis testing frameworks—or p ‐values more accurately—provide a measure of the probability of obtaining data at least as extreme as that observed, given that the null hypothesis is true (e.g., that a parameter lies within a certain range), and conditional on assumptions of no uncontrolled confounders and zero measurement error (Greenland et al., 2016 ). The only uncertainty that they formally express is that of random error, leaving uncertainty surrounding the auxiliary assumptions to be handled informally. This is hard to justify when risk assessments frequently rely on noisy data—often from proxy variables rather than the attributes of direct interest—obtained from nonexperimental settings, where random error will typically be a second‐order problem compared to measurement error and bias (Greenland, 2005 ; Lash, Fox, Cooney, Lu, & Forshee, 2016 ). Formal approaches to uncertainty analysis would help (Betz, 2013 ; Frank, 2017 ; Parker, 2010 ), and indeed are widely applied in (best) practice.

For example, parametric uncertainty is widely handled through sensitivity analysis. Conventionally this is done through varying one parameter or input value at a time over an arbitrarily limited space, but this is strictly speaking inadvisable for correlated sources of error (Saltelli et al., 2008 ). “Global sensitivity analysis” (GSA) is a promising method for covering a fuller (though still incomplete) space of parameter uncertainty and considering correlated sources of errors. Moreover, it does not require the specification of (often arbitrarily chosen) probability distributions. GSA produces a set of outcomes believed to be plausible (or at least not implausible) conditional on underlying model structure. 5 However, it may be computationally demanding to apply (Saltelli et al., 2008 ). The use of emulators may be a reasonable compromise (Coutts & Yokomizo, 2014 ), particularly in situations where the underlying model is well‐calibrated against observations, and where further observations can be used to benchmark the performance of the emulator itself. 6 Box's ( 1979 ) aphorism that all models are wrong (but some useful) implies that characterizations of uncertainty that are conditional on the truth of a model are insufficient, particularly when dealing with tasks of extrapolation or out‐of‐sample prediction (Greenland, 2005 ). And so logic trees are sometimes used to convey how the distribution of risk estimates is conditional on unforced methodological choices at multiple stages throughout risk assessment. Directed acyclic graphs together with structural equation modeling can be used to explore the sensitivity of risk estimates to violations of standard assumptions of no measurement error and no uncontrolled confounding (Pearl, 2009 ; VanderWeele & Ding, 2017 ). Alternatively, ensemble methods may be used to provide a (lower‐bound) characterization of model uncertainty, most famously in climate science (Knutti, 2010 ; Morgan & Henrion, 1990 ). Structured approaches to eliciting and aggregating expert judgments—for example, probability distributions of uncertain parameters that correspond to real‐world variables, 7 or of the weights that should be applied to alternative models in ensemble forecasting—can provide rigor and transparency to a process that may otherwise be hostage to cognitive biases and groupthink (Aspinall, 2010 ; Clemen & Winkler, 1999 ; Morgan, 2014 ). A particularly challenging type of model uncertainty stems from the omission of physical processes that are thought to be significant, yet that are insufficiently understood to allow for formalization. The method of probabilistic inversion—used to combine expert judgments with the outputs from physical models—offers a coherent, reproducible basis for correcting for the biases stemming from such omissions (e.g., through adjusting mean estimates or widening distributions) (Oppenheimer, Little, & Cooke, 2016 ). More recent attention has been placed on the conditionality of analysis outputs to arbitrary choices in data processing, given that converting raw observations into data sets amenable to formal analysis often involves many unforced choices (Gelman & Loken, 2014 ). The robustness (or fragility) of analysis outputs can be clarified by reporting the results for all reasonable data sets, rather than a single data set that arises from unforced data‐processing choices (Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016 ). These methods require a degree of sophistication to apply. Good practice, after all, requires hard work.

The general point is that there are formal methods for uncertainty analysis, which to a large extent remove the obligation of the analyst to estimate, through untutored introspection, the probabilities, and consequences of methodological error. However, the complexity of the problems that most risk assessments deal with means that we are rarely in a situation where all sources of uncertainty can be fully, formally characterized. Extensive formal uncertainty analysis is resource intensive, and the length required to conduct quantitative risk assessments is already a major concern in some domains. As such, the resources expended on formal uncertainty analyses should be commensurate with the scale of the problem under analysis, and the value of information that such analyses might provide (NRC, 2011 ).

Many proponents of the AfIR appear motivated by the view that decisionmakers desire precise, definitive analysis outputs and that transparent, rigorous characterizations of uncertainty will only sow confusion. Whether this view is correct is disputed within the risk assessment community. One founder of the domain of carcinogen risk assessment argued that risk managers “do not like uncertainty because it makes it difficult to formulate and defend regulatory action” (Albert, 1994 ; for similar arguments, see Goldstein, 2011 ). Others have found decisionmakers to be capable of interpreting probabilistic statements, and surprisingly receptive to risk assessments that clarify underlying assumptions and uncertainties (Stirling, 2010 ). My own belief is that risk assessments that do not explicitly acknowledge uncertainties and potential biases are misleading, regardless of what decisionmakers’ preferences may be (see also Finkel, 1995 ; Jasanoff, 1989 ; Morgan, 2018 ).

3.3. The NHST Framework Neglects Substantive Features of Regulatory Decision Making; Formal Decision Analysis Is a Superior Model

Fundamentally, the AfIR is concerned with whether one should treat a hypothesis (or risk estimate) as though it were true, rather than whether one should believe it to be so, putting it squarely within the realm of decision theory. However, the AfIR is typically discussed within the rubric of null hypothesis testing, wherein questions of consequences, utilities, and decision criteria are addressed informally if at all, and the decision in question is whether to accept or reject the hypothesis under examination. Why does this matter? Proponents of AfIR as a consequence tend to adopt a model of regulatory decision making wherein risk assessors are responsible for testing hypotheses (effect vs. no effect) or estimating risk levels, the outputs of which are sufficient for decision making, and where choice is reduced to the question of whether to regulate or not. Is this idealized picture obscuring significant details?

To begin with, are there contexts in which estimates of the level of risk are sufficient for decision making? In some domains and jurisdictions quantitative thresholds 8 are indeed used to distinguish between negligible risks and those that require mitigation, based either on theoretical grounds (e.g., threshold models of risk, or “tipping points”), pragmatic concerns (e.g., analytical detection limits), or arbitrary conventions (e.g., an upper‐bound lifetime incremental cancer risk of 10 −6 has been used as a measure of unacceptable risk by many U.S. regulatory agencies) (MacGillivray, 2017 ; Rodricks, Brett, & Wrenn, 1987 ). Although de minimis thresholds can be justified on the grounds that “the law does not concern itself with trifles”(Peterson, 2002 ), de manifestis thresholds are often on shakier ground, given their typically arbitrary foundations and explicit neglect of the risk management options on the table as well as their associated cost‐benefit and equity considerations.

Indeed, even where such thresholds are dictated by law, agencies often circumvent them via creative statutory interpretations, preferring to make their decisions on the basis of (officially forbidden) factors such as costs, benefits, and equity considerations (Coglianese & Marchant, 2004 ). This is to say that regulators are rarely simply interested in estimates of risk, but rather in the outcomes expected to follow from their (potential) actions, and of how they might be valued. These are counterfactual questions with an explicit component of valuation, rather than questions of pure inference, and so are surely properly handled within the apparatus of decision analysis, wherein regulatory options are specified, uncertainty (parameter and model) characterized, consequences calculated and valued (e.g., by assigning utilities), and options ranked with respect to some agreed upon criteria (see also Definetti, 1951 ; Jeffrey, 1956 ).

Moreover, a further weakness of the AfIR is that its proponents typically portray false positives (overregulation) as merely imposing economic burdens on the regulated industry, while portraying false negatives (underregulation) as incurring a range of social, environmental, and health impacts (Biddle, 2013 ; Brown, 2013 ; Douglas, 2000 ; Frank, 2017 ; Hicks, 2018 ). 9 The implication is that in the context of environmental and public health protection, false negatives will generally prove more costly than false positives, and that significance levels should be set accordingly. While these assumptions are not a necessary component of the AfIR, they are pervasive and worth examining.

To begin with, the long‐standing (and controversial) argument that wealth = health (e.g., Lutter, Morrall, & Viscusi, 1999 ) poses a challenge to their assumptions. The rough idea is that imposing (unwarranted) burdens on industry lowers productivity and by extension the income levels of the public. Given that wealthier people tend to live longer and healthier lives, it follows that false positives may impose welfare burdens beyond simply economic harm to industry. More simply, many risk reduction measures incur significant direct costs to the state, whether structural measures designed to protect coastal areas from flooding, pharmacovigilance systems designed to monitor for adverse drug outcomes, or the infrastructure of buoys, cables, alarms, simulation models, and shelters that make up modern tsunami warning systems. In a world of constrained resources, (unnecessary) expenditures divert public funds that could have been used to advance social welfare through other means (Tengs et al., 1995 ). Yet the problem of tradeoff neglect is broader than this. A defining feature of modernity is that we are engaged in transforming risks rather than solving them, in managing tradeoffs between risks, in substituting one set of risks with another, and in shifting harms from one jurisdiction to the next (Busby, Alcock, & MacGillivray, 2012 ; Graham & Wiener, 1995 ; MacGillivray, Alcock, & Busby, 2011 ; Sunstein, 1990 ; Viscusi, 1996 ). 10

None of the above is intended to suggest that overregulation is a larger concern than underregulation. 11 Instead, the foregoing examples are intended to support a long‐standing criticism of the AfIR, dating back to Jeffrey ( 1956 ) and DeFinetti ( 1951 ), that we need to know the specific actions being considered before we can meaningfully estimate the consequences of error. For example, flood risk can be handled either by population resettlement, watershed management practices, structural measures such as levees and dams, improved emergency planning and evacuation procedures, or some combination therein. The consequences of methodological error—for example, of overestimating risk levels—will differ depending on which intervention is under consideration (Definetti, 1951 ; Finkel, 2011 ; Jeffrey, 1956 ). The AfIR seems to imply that in such contexts, analysts should provide multiple estimates, conditional on each potential intervention, that take the variable consequences of error into account, including tradeoffs and side‐effects. This is surely cognitively intractable.

A separate challenge is that risk assessment outputs can take on a life of their own, traveling across boundaries and scales to be applied in decision contexts far removed from those originally intended. On this account, regulatory science is not so different from pure science, in that the domains of potential application are heterogeneous (with different loss functions) and cannot reasonably be foreseen in advance, presenting further difficulties for the AfIR. Moreover, even within a given context, there will be multiple audiences with diverse interests and values. Surely the more reasonable alternative is to produce value‐neutral risk assessments and to incorporate societal values, loss functions, and equity considerations within formal decision analysis frameworks where possible.

A natural counterargument to the above is that formal decision‐theoretic methods are only applicable to a relatively limited subset of problem types, where states of the world, available choices, and their associated consequences and probabilities are known to the decisionmaker (Savage's [ 1972 ] “small worlds”). In “large worlds,” characterized by uncertainty relating to these problem dimensions, Gigerenzer and colleagues (e.g., Gigerenzer & Marewski, 2015 ) have claimed that Savage viewed the application of the full Bayesian apparatus as “utterly ridiculous.” 12 This, however, misreads or at least overstates Savage's point. Savage argued that in order to apply Bayesian methods to large worlds, we need to make various simplifying assumptions so that they can be analyzed as if they were small worlds. This involves, for example, describing states of the world and consequences stemming from potential actions at some fixed and idealized level of detail (Shafer, 1986 ). Without doing so, the application of Bayesian methods would be “utterly ridiculous” as the problem structure would be ill‐defined and the task intractable. The basic point is that while it is true that probability and decision theory can never solve problems of actual practice, they can in fact solve idealizations of those problems. And so the application of these approaches is valuable to the extent that those idealizations are good ones and can be communicated to interested parties (Jaynes, 2003 ; Savage, 1972 ). Using formal decision‐theoretic apparatus to identify a single “optimal” policy in situations of deep uncertainty—where boundary and initial conditions are poorly understood; parameter values weakly constrained by theory or empirics; and model structures contain significant omissions—is an act of faith rather than of science (Freedman, 1997 ). In such situations, inexact methods of problem solving may be more defensible (Jaynes, 2003 ). Precautionary approaches—for example, heuristic decision rules based on feasibility standards or worst‐case scenarios (Kysar, 2010 ; MacGillivray, 2017 )—may prove useful where the costs of underregulating are likely to dwarf those of overregulating (e.g., tsunami early warning systems), and minimax or low‐regret principles can enhance their rigor and transparency (Heal & Millner, 2014 ). Another alternative is robust decision making, which involves selecting policies that perform well across a wide range of plausible outcomes. These frameworks have appealing properties in conditions of deep uncertainty, and do not depend on characterizing uncertainty via probability distributions (Cox, 2012b ; Heal & Millner, 2014 ; Lempert & Collins, 2007 ). A final alternative is sequential strategies, which build flexibility into decision making through staged implementation of mitigation efforts, leaving space for adaptation to changing conditions (Simpson et al., 2016 ). The general point is that risk assessments are typically not a direct input to the decisionmaker, but rather fed into a broader analysis framework wherein decisionmakers’ (or societal) preferences are explicitly incorporated, for example, through assigning utilities, measures of risk aversion and equity, and so forth, or informally incorporated via structured decision processes. The corollary of this is that risk assessors are morally obliged to be value‐neutral in their methodological choices and to make clear what those choices are, rather than to impose their own preferences.

4. CONCLUSIONS

The AfIR states that risk assessors are morally obligated to evaluate the probabilities and consequences of methodological error, and to base decisions of whether to act as though a given parameter, model, or hypothesis is true on those considerations (e.g., via altering significance thresholds). Proponents of this argument express and defend their claims within the rubric of null hypothesis testing, which I have argued to be a poor descriptive and normative model for risk assessment. It is a poor model because it only indirectly considers effect sizes; rests on typically implausible assumptions of no measurement error and no uncontrolled confounding; answers a question that is often of little substantive interest to policy making; neglects utilities; and restricts the choice‐set to whether one should treat the null hypothesis as true or false. I also claimed that the AfIR places unreasonable cognitive demands on risk assessors. Risk assessment involves multiple complex inferences, which may combine in nonintuitive ways, with multidimensional impacts and varied tradeoffs, and so the idea that analysts can reliably foresee the implications of methodological errors seems questionable. The argument also rather misses the point because the application of formal uncertainty and decision analysis in regulatory contexts already systematizes this practice, in a way that (ideally) rigorously and transparently acknowledges uncertainty, and that ranks regulatory options based upon agreed upon decision criteria and explicit evaluations of consequences. My general normative argument has been that risk assessment should aspire toward value‐neutrality. This means that the core inferential tasks of risk assessment—such as weighting data, estimating parameters, and model selection (or combination)—should be guided by the aim of correspondence to reality. Epistemic pluralism—in the sense of openly accounting for the range of plausible methods, data, parameter values, and models within risk assessment—is fundamental to value‐neutrality. Even in the absence of formal uncertainty and decision analysis, value‐neutral risk assessments offer the most useful and informative kinds of predictions. This is because they offer a sense of consistency in priority setting, and moreover allow publics and decisionmakers to bring their own interests, values, and decision rules to bear on discussions about how to act given our best understanding of (future) states of the world. None of this is to say that value judgments be damned, but rather that they should be accounted for within a structured approach to decision analysis or an informed governance processes, rather than embedded within the core inferential tasks of risk assessment in an informal manner.

ACKNOWLEDGMENTS

This research was supported by NERC Grant No. NE/N012240/1, “Resilience to Earthquake‐Induced Landslide Hazard in China.” Sander Greenland, Jerry Busby, and three anonymous reviewers provided helpful comments on previous drafts.

1 The Neyman–Pearson paradigm is concerned with the practical consequences of accepting or rejecting a hypothesis. As such, it is more directly relevant for risk assessment, given that such assessments take place within a broader decision‐making context.

2 We can recast this argument in triplet form, where the scenarios in question are acceptance and rejection of the hypothesis (or more precisely, as I shall argue, the specific actions that would follow acceptance or rejection), the probability relates to the chances of erroneously accepting or rejecting the hypothesis, and the consequences are valuations of the outcomes that stem from acceptance or rejection. Of course, at the risk assessment stage, the specific actions may not yet have been identified. This is part of the reason the AfIR is untenable, as I shall argue later.

3 I owe this point to a reviewer.

4 Although proponents of the AfIR do not explicitly preclude formal uncertainty or decision analysis, these methods do not feature much within their writings (e.g., Douglas, 2000 , 2009 ; Rudner, 1953 ). Crucially, at no point have proponents of the AfIR proposed a formal method for combining judgments of probabilities and consequences to derive inferential thresholds such as p ‐values (Kaivanto & Steel, 2017 ). When they have discussed formal methods of uncertainty or decision analysis, it is typically to express skepticism that these methods can be used to circumvent social or ethical judgments (e.g., Havstad & Brown, 2017 ).

5 I owe this interpretation to a reviewer.

6 Again, I owe these qualifiers to a reviewer.

7 A reviewer emphasized that many model parameters will not have direct physical interpretations, and in such cases it may be unreasonable to expect experts to have any useful intuitions about their respective distribution functions.

8 De minimis decision rules, based on the notion that “the law does not concern itself with trifles,” set out risk thresholds that are negligible (Peterson, 2002 ), whereas de manifestis rules set out risk thresholds deemed unacceptably high.

9 “Overregulation presents excess costs to the industries that would bear the costs of regulations. Underregulation presents costs to public health and to other areas affected by damage to public health” (Douglas, 2000 ). “If we wrongly chose the threshold model, many people would get sick and die prematurely of cancer; the moral cost, in this case, is very high, not to mention the economic costs of treating these individuals. On the other hand, if we wrongly chose the no‐threshold model, the worst that would happen is that corporate profits would be slightly reduced” (Biddle, 2013 ). “For HTT [high‐throughput toxicology], this would mean taking into account (at least) that some epistemic errors will lead to detrimental effects on human health and the environment (due to underregulation); others will lead to detrimental economic effects (due to overregulation)” (Hicks, 2018 ). “[Disutilities] associated with outcomes stemming from…not regulating a chemical that is actually toxic (i.e., harms to public health and the environment)….[costs] associated with…regulating a chemical that is non‐toxic (i.e., unnecessarily burdensome and costly regulation)” (Frank, 2017 ). “If you want to be absolutely sure that you do not say that the chemical is safe when it in fact is not (because you value safety, precaution, welfare of potential third parties), you should decrease your rate of type II errors, and thus increase your statistical significance factor and your rate of type I errors. If you want to avoid ‘crying wolf’ and asserting a link where none exists (because you value economic benefits that come with avoiding overregulation), you should do the reverse” (Brown, 2013 ).

10 For example, airbags may protect adults but kill children; gas mileage standards may protect the environment at the cost of thousands of lives annually, as they encourage manufacturers to sacrifice sturdiness for fuel efficiency; structural risk mitigation measures may create a “levee effect,” whereby a (false) sense of security encourages populations to settle in floodplains and court catastrophe should the levee overtop; drug lags stemming from stringent testing requirements may protect the public from potential adverse effects of untested pharmaceuticals, while diminishing the health of those who urgently need them; bans on carcinogens in food additives may lead consumers to use noncarcinogenic products, which nevertheless carry even greater health risks, and so on (Busby et al., 2012 ; Graham & Wiener, 1995 ; MacGillivray et al., 2011 ; Sunstein, 1990 ; Viscusi, 1996 ).

11 Risk management interventions often have ancillary benefits; compliance costs are routinely overestimated in regulatory proceedings; and many risks are ignored when it appears it is simply too challenging to establish the costs of management options.

12 “Savage carefully limited Bayesian decision theory to ‘small worlds’ in which all alternatives, consequences, and probabilities are known. And he warned that it would be ‘utterly ridiculous’ to apply Bayesian theory outside a well‐defined world—for him, ‘to plan a picnic’ was already outside because the planners cannot know all consequences in advance” (Gigerenzer & Marewski, 2015 ).

  • Albert, R. E. (1994). Carcinogen risk assessment in the US environmental protection agency . Critical Reviews in Toxicology , 24 ( 1 ), 75–85. [ PubMed ] [ Google Scholar ]
  • Aspinall, W. (2010). A route to more tractable expert advice . Nature , 463 ( 7279 ), 294–295. [ PubMed ] [ Google Scholar ]
  • Aven, T. (2013). Practical implications of the new risk perspectives . Reliability Engineering & System Safety , 115 , 136–145. [ Google Scholar ]
  • Aven, T. (2016). Risk assessment and risk management: Review of recent advances on their foundation . European Journal of Operational Research , 253 ( 1 ), 1–13. [ Google Scholar ]
  • Barnett, V. (1999). Comparative statistical inference . New York, NY: Wiley. [ Google Scholar ]
  • Betz, G. (2013). In defence of the value free ideal . European Journal for Philosophy of Science , 3 ( 2 ), 207–220. [ Google Scholar ]
  • Biddle, J. (2013). State of the field: Transient underdetermination and values in science . Studies in History and Philosophy of Science Part A , 44 ( 1 ), 124–133. [ Google Scholar ]
  • Borgonovo, E. , Cappelli, V. , Maccheroni, F. , & Marinacci, M. (2018). Risk analysis and decision theory: A bridge . European Journal of Operational Research , 264 ( 1 ), 280–293. [ Google Scholar ]
  • Box, G. E. (1979). Robustness in the strategy of scientific model building In Launer R. L. & Wilkinson G. N. (Eds.), Robustness in statistics (pp. 201–236). Cambridge, MA: Academic Press. [ Google Scholar ]
  • Brown, M. J. (2013). Values in science beyond underdetermination and inductive risk . Philosophy of Science , 80 ( 5 ), 829–839. [ Google Scholar ]
  • Busby, J. S. , Alcock, R. E. , & MacGillivray, B. H. (2012). Types of risk transformation: A case study . Journal of Risk Research , 15 ( 1 ), 67–84. [ Google Scholar ]
  • Clemen, R. T. , & Winkler, R. L. (1999). Combining probability distributions from experts in risk analysis . Risk Analysis , 19 ( 2 ), 187–203. [ PubMed ] [ Google Scholar ]
  • Coglianese, C. , & Marchant, G. E. (2004). The EPA's risky reasoning . Regulation , 27 , 16–22. [ Google Scholar ]
  • Coutts, S. R. , & Yokomizo, H. (2014). Meta‐models as a straightforward approach to the sensitivity analysis of complex models . Population Ecology , 56 ( 1 ), 7–19. [ Google Scholar ]
  • Cox, L. A., Jr. (2012a). Risk analysis foundations, models, and methods (Vol. 45). Berlin, Germany: Springer Science & Business Media. [ Google Scholar ]
  • Cox, L. A. T. (2012b). Confronting deep uncertainties in risk analysis . Risk Analysis , 32 ( 10 ), 1607–1629. [ PubMed ] [ Google Scholar ]
  • Definetti, B. (1951). Recent suggestions for the reconciliation of theories of probability In Neyman J. (Ed.), Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability (pp. 217–225). Berkeley, CA: University of California Press. [ Google Scholar ]
  • Doll, R. (2002). Proof of causality: Deduction from epidemiological observation . Perspectives in Biology and Medicine , 45 ( 4 ), 499–515. [ PubMed ] [ Google Scholar ]
  • Douglas, H. (2000). Inductive risk and values in science . Philosophy of Science , 67 ( 4 ), 559–579. [ Google Scholar ]
  • Douglas, H. (2005). Inserting the public into science In Maasen S. & Weingart P. (Eds.), Democratization of expertise? (pp. 153–169). Dordrecht, The Netherlands: Springer. [ Google Scholar ]
  • Douglas, H. (2009). Science, policy, and the value‐free ideal . Pittsburgh, CA: University of Pittsburgh Press. [ Google Scholar ]
  • Douglas, H. (2013). The value of cognitive values . Philosophy of Science , 80 ( 5 ), 796–806. [ Google Scholar ]
  • Duhem, P. M. M. (1991). The aim and structure of physical theory (Vol. 13). Princeton, NJ: Princeton University Press. [ Google Scholar ]
  • Elliott, K. C. (2011). Is a little pollution good for you? Incorporating societal values in environmental research . Oxford, UK: Oxford University Press. [ Google Scholar ]
  • Feyerabend, P. (1993). Against method . New York, NY: Verso. [ Google Scholar ]
  • Finkel, A. M. (1995). Toward less misleading comparisons of uncertain risks: The example of aflatoxin and alar . Environmental Health Perspectives , 103 ( 4 ), 376–385. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Finkel, A. M. (2011). “ Solution‐focused risk assessment”: A proposal for the fusion of environmental analysis and action . Human and Ecological Risk Assessment , 17 ( 4 ), 754–787. [ Google Scholar ]
  • Frank, D. M. (2017). Making uncertainties explicit In Elliott K. C. & Richards T. (Eds.), Exploring inductive risk: Case studies of values in science (pp. 79–100). Oxford, UK: Oxford University Press. [ Google Scholar ]
  • Franklin, A. (2016). What makes a good experiment? Reasons and roles in science . Pittsburgh, CA: University of Pittsburgh Press. [ Google Scholar ]
  • Freedman, D. (1997). Some issues in the foundation of statistics In van Fraassen B. C. (Ed.), Topics in the foundation of statistics (pp. 19–39). Dordrecht, The Netherlands: Springer. [ Google Scholar ]
  • Gelman, A. , & Hennig, C. (2017). Beyond subjective and objective in statistics . Journal of the Royal Statistical Society: Series A (Statistics in Society) , 180 ( 4 ), 967–1033. [ Google Scholar ]
  • Gelman, A. , & Loken, E. (2014). The statistical crisis in science . American Scientist , 102 ( 6 ), 460–465. [ Google Scholar ]
  • Gigerenzer, G. , & Marewski, J. N. (2015). Surrogate science: The idol of a universal method for scientific inference . Journal of Management , 41 ( 2 ), 421–440. [ Google Scholar ]
  • Goerlandt, F. , & Reniers, G. (2018). Prediction in a risk analysis context: Implications for selecting a risk perspective in practical applications . Safety Science , 101 , 344–351. [ Google Scholar ]
  • Goldman, A. I. (1999). Knowledge in a social world (Vol. 281). Oxford, UK: Clarendon Press. [ Google Scholar ]
  • Goldstein, B. D. (2011). Risk assessment of environmental chemicals: If it ain't broke . Risk Analysis , 31 ( 9 ), 1356–1362. [ PubMed ] [ Google Scholar ]
  • Graham J. D., & Wiener J. B. (Eds.) (1995). Risk vs. risk . Cambridge, MA: Harvard University Press. [ Google Scholar ]
  • Greenland, S. (2005). Multiple‐bias modelling for analysis of observational data . Journal of the Royal Statistical Society: Series A (Statistics in Society) , 168 ( 2 ), 267–306. [ Google Scholar ]
  • Greenland, S. (2012). Transparency and disclosure, neutrality and balance: Shared values or just shared words ? Journal of Epidemiology and Community Health , 66 , 967–970. [ PubMed ] [ Google Scholar ]
  • Greenland, S. (2017a). For and against methodologies: Some perspectives on recent causal and statistical inference debates . European Journal of Epidemiology , 32 ( 1 ), 3–20. [ PubMed ] [ Google Scholar ]
  • Greenland, S. (2017b). Invited commentary: The need for cognitive science in methodology . American Journal of Epidemiology , 186 ( 6 ), 639–645. [ PubMed ] [ Google Scholar ]
  • Greenland, S. , Senn, S. J. , Rothman, K. J. , Carlin, J. B. , Poole, C. , Goodman, S. N. , & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations . European Journal of Epidemiology , 31 ( 4 ), 337–350. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hansson, S. O. , & Aven, T. (2014). Is risk analysis scientific ? Risk Analysis , 34 ( 7 ), 1173–1183. [ PubMed ] [ Google Scholar ]
  • Havstad, J. C. , & Brown, M. J. (2017). Inductive risk, deferred decisions, and climate science advising In Elliott K. C. & Richards T. (Eds.), Exploring inductive risk: Case studies of values in science (pp. 101–126). Oxford, UK: Oxford University Press. [ Google Scholar ]
  • Heal, G. , & Millner, A. (2014). Reflections: Uncertainty and decision making in climate change economics . Review of Environmental Economics and Policy , 8 ( 1 ), 120–137. [ Google Scholar ]
  • Hempel, C. G. (1965). Science and human values In Hempel C. G. (Ed.), Aspects of scientific explanation and other essays in the philosophy of science (pp. 81–96). New York, NY: Free Press. [ Google Scholar ]
  • Hicks, D. J. (2018). Inductive risk and regulatory toxicology: A comment on de Melo‐Martín and Intemann . Philosophy of Science , 85 ( 1 ), 164–174. [ Google Scholar ]
  • Hill, A. B. (1965). The environment and disease: Association or causation ? Journal of the Royal Society of Medicine , 58 ( 5 ), 295–300. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Jasanoff, S. (1989). Norms for evaluating regulatory science . Risk Analysis , 9 ( 3 ), 271–273. [ Google Scholar ]
  • Jasanoff, S. (1999). The songlines of risk . Environmental Values , 8 ( 2 ), 135–152. [ Google Scholar ]
  • Jaynes, E. T. (2003). Probability theory: The logic of science . Cambridge, UK: Cambridge University Press. [ Google Scholar ]
  • Jeffrey, R. C. (1956). Valuation and acceptance of scientific hypotheses . Philosophy of Science , 23 ( 3 ), 237–246. [ Google Scholar ]
  • Kaivanto, K. , & Steel, D. (2017). Adjusting inferential thresholds to reflect non‐epistemic values . Philosophy of Science . 10.13140/RG.2.2.21481.70247 [ CrossRef ] [ Google Scholar ]
  • Kaplan, S. , & Garrick, B. J. (1981). On the quantitative definition of risk . Risk Analysis , 1 ( 1 ), 11–27. [ Google Scholar ]
  • Kelly, K. T. (2011). Simplicity, truth, and probability In Bandyopadhyay P. S. & Forster M. (Eds.), Philosophy of statistics (pp. 983–1024). Dordrecht, The Netherlands: Elsevier. [ Google Scholar ]
  • Knutti, R. (2010). The end of model democracy ? Climatic Change , 102 , 395–404. [ Google Scholar ]
  • Krimsky, S. , & Golding, D. (1992). Social theories of risk . Westport, CN: Praeger. [ Google Scholar ]
  • Kuhn, T. (1977). Objectivity, value judgment, and theory choice In Bird A. & Ladyman J. (Eds.), Arguing about science (pp. 74–86). Abingdon, UK: Routledge. [ Google Scholar ]
  • Kysar, D. A. (2010). Regulating from nowhere: Environmental law and the search for objectivity . New Haven, CT: Yale University Press. [ Google Scholar ]
  • Lacey, H. (2005). Is science value free? Values and scientific understanding . London, UK: Psychology Press. [ Google Scholar ]
  • Lash, T. L. , Fox, M. P. , Cooney, D. , Lu, Y. , & Forshee, R. A. (2016). Quantitative bias analysis in regulatory settings . American Journal of Public Health , 106 ( 7 ), 1227–1230. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Laudan, L. (1990). Demystifying underdetermination In Savage C. W. (Ed.), Scientific theories (pp. 267–297). Minneapolis, MN: University of Minnesota Press. [ Google Scholar ]
  • Lehmann, E. L. (1993). The Fisher, Neyman‐Pearson theories of testing hypotheses: One theory or two ? Journal of the American Statistical Association , 88 ( 424 ), 1242–1249. [ Google Scholar ]
  • Lehner, P. E. , Laskey, K. B. , & Dubois, D. (1996). An introduction to issues in higher order uncertainty . IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans , 26 ( 3 ), 289–293. [ Google Scholar ]
  • Lempert, R. J. , & Collins, M. T. (2007). Managing the risk of uncertain threshold responses: Comparison of robust, optimum, and precautionary approaches . Risk Analysis , 27 ( 4 ), 1009–1026. [ PubMed ] [ Google Scholar ]
  • Lutter, R. , Morrall, J. F. , & Viscusi, W. K. (1999). The cost‐per‐life‐saved cutoff for safety‐enhancing regulations . Economic Inquiry , 37 ( 4 ), 599–608. [ Google Scholar ]
  • MacGillivray, B. H. (2014). Heuristics structure and pervade formal risk assessment . Risk Analysis , 34 ( 4 ), 771–787. [ PubMed ] [ Google Scholar ]
  • MacGillivray, B. H. (2017). Characterising bias in regulatory risk and decision analysis: An analysis of heuristics applied in health technology appraisal, chemicals regulation, and climate change governance . Environment International , 105 , 20–33. [ PubMed ] [ Google Scholar ]
  • MacGillivray, B. H. , Alcock, R. E. , & Busby, J. (2011). Is risk‐based regulation feasible? The case of polybrominated diphenyl ethers (PBDEs) . Risk Analysis , 31 ( 2 ), 266–281. [ PubMed ] [ Google Scholar ]
  • Majone, G. (2010). Foundations of risk regulation: Science, decision‐making, policy learning and institutional reform . European Journal of Risk Regulation , 1 ( 1 ), 5–19. [ Google Scholar ]
  • Mastrandrea, M. D. , Mach, K. J. , Plattner, G. K. , Edenhofer, O. , Stocker, T. F. , Field, C. B. , … Matschoss, P. R. (2011). The IPCC AR5 guidance note on consistent treatment of uncertainties: A common approach across the working groups . Climatic Change , 108 ( 4 ), 675–691. [ Google Scholar ]
  • Morgan, M. G. (2014). Use (and abuse) of expert elicitation in support of decision making for public policy . Proceedings of the National Academy of Sciences , 111 ( 20 ), 7176–7184. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Morgan, M. G. (2018). Uncertainty in long‐run forecasts of quantities such as per capita gross domestic product . Proceedings of the National Academy of Sciences , 115 ( 21 ), 5314–5316. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Morgan, M. G. , & Henrion, M. (1990). Uncertainty: A guide to dealing with uncertainty in quantitative risk and policy analysis . Cambridge, UK: Cambridge University Press. [ Google Scholar ]
  • National Research Council (NRC). (1983). Risk assessment in the federal government: Managing the process . Washington, DC: National Academies Press. [ PubMed ] [ Google Scholar ]
  • National Research Council (NRC). (2009). Science and decisions: Advancing risk assessment . Washington, DC: National Academies Press. [ PubMed ] [ Google Scholar ]
  • National Research Council (NRC). (2011). Sustainability and the US EPA . Washington, DC: National Academies Press. [ Google Scholar ]
  • Neyman, J. (1957). “Inductive behavior” as a basic concept of philosophy of science . Revue de l'Institut International de Statistique , 25 ( 1/3 ), 7–22. [ Google Scholar ]
  • North, D. W. (2003). Reflections on the Red/mis‐read Book, 20 years later . Human and Ecological Risk Assessment , 9 ( 5 ), 1145–1154. [ Google Scholar ]
  • Norton, J. (2008). Must evidence underdetermine theory? In Carrier M., Howard D., & Kourany J. (Eds.), The challenge of the social and the pressure of practice: Science and values revisited (pp. 17–44). Pittsburgh, PA: University of Pittsburgh Press. [ Google Scholar ]
  • Oppenheimer, M. , Little, C. M. , & Cooke, R. M. (2016). Expert judgement and uncertainty quantification for climate change . Nature Climate Change , 6 ( 5 ), 445–451. [ Google Scholar ]
  • Parker, W. S. (2010). Whose probabilities? Predicting climate change with ensembles of models . Philosophy of Science , 77 ( 5 ), 985–997. [ Google Scholar ]
  • Pearl, J. (2009). Causality . Cambridge, UK: Cambridge University Press. [ Google Scholar ]
  • Pearson, E. S. (1955). Statistical concepts in the relation to reality . Journal of the Royal Statistical Society: Series B (Methodological) , 17 ( 2 ), 204–207. [ Google Scholar ]
  • Peterson, M. (2002). What is a de minimis risk ? Risk Management , 4 ( 2 ), 47–55. [ Google Scholar ]
  • Pidgeon, N. , Hood, C. , Jones, D. , Turner, B., & Gibson, R. (1992). Risk perception In Royal Society Study Group (Ed.), Risk: Analysis, perception and management (pp. 89–134). London, UK: The Royal Society. [ Google Scholar ]
  • Polya, G. (1968). Mathematics and plausible reasoning: Patterns of plausible inference (Vol. 2). Princeton, NJ: Princeton University Press. [ Google Scholar ]
  • Popper, K. (2005). The logic of scientific discovery . Abingdon, UK: Routledge. [ Google Scholar ]
  • Putnam, H. (2002). The collapse of the fact/value dichotomy and other essays . Cambridge, MA: Harvard University Press. [ Google Scholar ]
  • Quine, W. V. (1955). Posits and reality In Quine W. V. (Ed.), The ways of paradox and other essays (pp. 246–254). Cambridge, MA: Harvard University Press. [ Google Scholar ]
  • Reiss, T. , & Sprenger, J. (2017). Scientific objectivity In Zalta E. N. (Ed.), The Stanford encyclopedia of philosophy . Retrieved from https://plato.stanford.edu/zarchives/sum2017/entries/scientific-objectivity/ . [ Google Scholar ]
  • Rodricks, J. V. , Brett, S. M. , & Wrenn, G. C. (1987). Significant risk decisions in federal regulatory agencies . Regulatory Toxicology and Pharmacology , 7 ( 3 ), 307–320. [ PubMed ] [ Google Scholar ]
  • Rudner, R. (1953). The scientist qua scientist makes value judgments . Philosophy of Science , 20 ( 1 ), 1–6. [ Google Scholar ]
  • Saltelli, A. , Ratto, M. , Andres, T. , Campolongo, F. , Cariboni, J. , Gatelli, D. , … Tarantola, S. (2008). Global sensitivity analysis: The primer . Hoboken, NJ: John Wiley & Sons. [ Google Scholar ]
  • Savage, L. J. (1972). The foundations of statistics . Dover, NY: Courier Corporation. [ Google Scholar ]
  • Shafer, G. (1986). Savage revisited . Statistical Science , 1 , 463–485. [ Google Scholar ]
  • Shrader‐Frechette, K. (2012). Risk analysis and scientific method: Methodological and ethical problems with evaluating societal hazards . Berlin: Springer Science & Business Media. [ Google Scholar ]
  • Simpson, M. , James, R. , Hall, J. W. , Borgomeo, E. , Ives, M. C. , Almeida, S. , … Wagener, T. (2016). Decision analysis for management of natural hazards . Annual Review of Environment and Resources , 41 , 489–516. [ Google Scholar ]
  • Slovic, P. (1999). Trust, emotion, sex, politics, and science: Surveying the risk‐assessment battlefield . Risk Analysis , 19 ( 4 ), 689–701. [ PubMed ] [ Google Scholar ]
  • Spiegelhalter, D. J. , Abrams, K. R. , & Myles, J. P. (2004). Bayesian approaches to clinical trials and health‐care evaluation (Vol. 13). Hoboken, NJ: John Wiley & Sons. [ Google Scholar ]
  • Stanford, K. (2017). Underdetermination of scientific theory In Zalta E. N. (Ed.), The Stanford encyclopedia of philosophy . Retrieved from https://plato.stanford.edu/archives/win2017/entries/scientific-underdetermination/ . [ Google Scholar ]
  • Steegen, S. , Tuerlinckx, F. , Gelman, A. , & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis . Perspectives on Psychological Science , 11 ( 5 ), 702–712. [ PubMed ] [ Google Scholar ]
  • Steel, D. (2010). Epistemic values and the argument from inductive risk . Philosophy of Science , 77 ( 1 ), 14–34. [ Google Scholar ]
  • Stirling, A. (2010). Keep it complex . Nature , 468 ( 7327 ), 1029–1031. [ PubMed ] [ Google Scholar ]
  • Sunstein, C. R. (1990). Paradoxes of the regulatory state . University of Chicago Law Review , 57 ( 2 ), 407–441. [ Google Scholar ]
  • Suter, G. W. (1996). Abuse of hypothesis testing statistics in ecological risk assessment . Human and Ecological Risk Assessment , 2 ( 2 ), 331–347. [ Google Scholar ]
  • Tengs, T. O. , Adams, M. E. , Pliskin, J. S. , Safran, D. G. , Siegel, J. E. , Weinstein, M. C. , & Graham, J. D. (1995). Five‐hundred life‐saving interventions and their cost‐effectiveness . Risk Analysis , 15 ( 3 ), 369–390. [ PubMed ] [ Google Scholar ]
  • Thompson, E. , Frigg, R. , & Helgeson, C. (2016). Expert judgment for climate change adaptation . Philosophy of Science , 83 ( 5 ), 1110–1121. [ Google Scholar ]
  • VanderWeele, T. J. , & Ding, P. (2017). Sensitivity analysis in observational research: Introducing the E‐value . Annals of Internal Medicine , 167 ( 4 ), 268–274. [ PubMed ] [ Google Scholar ]
  • Viscusi, W. K. (1996). Regulating the regulators . University of Chicago Law Review , 63 , 1423–1461. [ Google Scholar ]
  • Wynne, B. (2001). Creating public alienation: Expert cultures of risk and ethics on GMOs . Science as Culture , 10 ( 4 ), 445–481. [ PubMed ] [ Google Scholar ]

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 17 April 2024

The economic commitment of climate change

  • Maximilian Kotz   ORCID: orcid.org/0000-0003-2564-5043 1 , 2 ,
  • Anders Levermann   ORCID: orcid.org/0000-0003-4432-4704 1 , 2 &
  • Leonie Wenz   ORCID: orcid.org/0000-0002-8500-1568 1 , 3  

Nature volume  628 ,  pages 551–557 ( 2024 ) Cite this article

125k Accesses

3706 Altmetric

Metrics details

  • Environmental economics
  • Environmental health
  • Interdisciplinary studies
  • Projection and prediction

Global projections of macroeconomic climate-change damages typically consider impacts from average annual and national temperatures over long time horizons 1 , 2 , 3 , 4 , 5 , 6 . Here we use recent empirical findings from more than 1,600 regions worldwide over the past 40 years to project sub-national damages from temperature and precipitation, including daily variability and extremes 7 , 8 . Using an empirical approach that provides a robust lower bound on the persistence of impacts on economic growth, we find that the world economy is committed to an income reduction of 19% within the next 26 years independent of future emission choices (relative to a baseline without climate impacts, likely range of 11–29% accounting for physical climate and empirical uncertainty). These damages already outweigh the mitigation costs required to limit global warming to 2 °C by sixfold over this near-term time frame and thereafter diverge strongly dependent on emission choices. Committed damages arise predominantly through changes in average temperature, but accounting for further climatic components raises estimates by approximately 50% and leads to stronger regional heterogeneity. Committed losses are projected for all regions except those at very high latitudes, at which reductions in temperature variability bring benefits. The largest losses are committed at lower latitudes in regions with lower cumulative historical emissions and lower present-day income.

Similar content being viewed by others

null hypothesis on climate change

Climate damage projections beyond annual temperature

null hypothesis on climate change

Investment incentive reduced by climate damages can be restored by optimal policy

null hypothesis on climate change

Climate economics support for the UN climate targets

Projections of the macroeconomic damage caused by future climate change are crucial to informing public and policy debates about adaptation, mitigation and climate justice. On the one hand, adaptation against climate impacts must be justified and planned on the basis of an understanding of their future magnitude and spatial distribution 9 . This is also of importance in the context of climate justice 10 , as well as to key societal actors, including governments, central banks and private businesses, which increasingly require the inclusion of climate risks in their macroeconomic forecasts to aid adaptive decision-making 11 , 12 . On the other hand, climate mitigation policy such as the Paris Climate Agreement is often evaluated by balancing the costs of its implementation against the benefits of avoiding projected physical damages. This evaluation occurs both formally through cost–benefit analyses 1 , 4 , 5 , 6 , as well as informally through public perception of mitigation and damage costs 13 .

Projections of future damages meet challenges when informing these debates, in particular the human biases relating to uncertainty and remoteness that are raised by long-term perspectives 14 . Here we aim to overcome such challenges by assessing the extent of economic damages from climate change to which the world is already committed by historical emissions and socio-economic inertia (the range of future emission scenarios that are considered socio-economically plausible 15 ). Such a focus on the near term limits the large uncertainties about diverging future emission trajectories, the resulting long-term climate response and the validity of applying historically observed climate–economic relations over long timescales during which socio-technical conditions may change considerably. As such, this focus aims to simplify the communication and maximize the credibility of projected economic damages from future climate change.

In projecting the future economic damages from climate change, we make use of recent advances in climate econometrics that provide evidence for impacts on sub-national economic growth from numerous components of the distribution of daily temperature and precipitation 3 , 7 , 8 . Using fixed-effects panel regression models to control for potential confounders, these studies exploit within-region variation in local temperature and precipitation in a panel of more than 1,600 regions worldwide, comprising climate and income data over the past 40 years, to identify the plausibly causal effects of changes in several climate variables on economic productivity 16 , 17 . Specifically, macroeconomic impacts have been identified from changing daily temperature variability, total annual precipitation, the annual number of wet days and extreme daily rainfall that occur in addition to those already identified from changing average temperature 2 , 3 , 18 . Moreover, regional heterogeneity in these effects based on the prevailing local climatic conditions has been found using interactions terms. The selection of these climate variables follows micro-level evidence for mechanisms related to the impacts of average temperatures on labour and agricultural productivity 2 , of temperature variability on agricultural productivity and health 7 , as well as of precipitation on agricultural productivity, labour outcomes and flood damages 8 (see Extended Data Table 1 for an overview, including more detailed references). References  7 , 8 contain a more detailed motivation for the use of these particular climate variables and provide extensive empirical tests about the robustness and nature of their effects on economic output, which are summarized in Methods . By accounting for these extra climatic variables at the sub-national level, we aim for a more comprehensive description of climate impacts with greater detail across both time and space.

Constraining the persistence of impacts

A key determinant and source of discrepancy in estimates of the magnitude of future climate damages is the extent to which the impact of a climate variable on economic growth rates persists. The two extreme cases in which these impacts persist indefinitely or only instantaneously are commonly referred to as growth or level effects 19 , 20 (see Methods section ‘Empirical model specification: fixed-effects distributed lag models’ for mathematical definitions). Recent work shows that future damages from climate change depend strongly on whether growth or level effects are assumed 20 . Following refs.  2 , 18 , we provide constraints on this persistence by using distributed lag models to test the significance of delayed effects separately for each climate variable. Notably, and in contrast to refs.  2 , 18 , we use climate variables in their first-differenced form following ref.  3 , implying a dependence of the growth rate on a change in climate variables. This choice means that a baseline specification without any lags constitutes a model prior of purely level effects, in which a permanent change in the climate has only an instantaneous effect on the growth rate 3 , 19 , 21 . By including lags, one can then test whether any effects may persist further. This is in contrast to the specification used by refs.  2 , 18 , in which climate variables are used without taking the first difference, implying a dependence of the growth rate on the level of climate variables. In this alternative case, the baseline specification without any lags constitutes a model prior of pure growth effects, in which a change in climate has an infinitely persistent effect on the growth rate. Consequently, including further lags in this alternative case tests whether the initial growth impact is recovered 18 , 19 , 21 . Both of these specifications suffer from the limiting possibility that, if too few lags are included, one might falsely accept the model prior. The limitations of including a very large number of lags, including loss of data and increasing statistical uncertainty with an increasing number of parameters, mean that such a possibility is likely. By choosing a specification in which the model prior is one of level effects, our approach is therefore conservative by design, avoiding assumptions of infinite persistence of climate impacts on growth and instead providing a lower bound on this persistence based on what is observable empirically (see Methods section ‘Empirical model specification: fixed-effects distributed lag models’ for further exposition of this framework). The conservative nature of such a choice is probably the reason that ref.  19 finds much greater consistency between the impacts projected by models that use the first difference of climate variables, as opposed to their levels.

We begin our empirical analysis of the persistence of climate impacts on growth using ten lags of the first-differenced climate variables in fixed-effects distributed lag models. We detect substantial effects on economic growth at time lags of up to approximately 8–10 years for the temperature terms and up to approximately 4 years for the precipitation terms (Extended Data Fig. 1 and Extended Data Table 2 ). Furthermore, evaluation by means of information criteria indicates that the inclusion of all five climate variables and the use of these numbers of lags provide a preferable trade-off between best-fitting the data and including further terms that could cause overfitting, in comparison with model specifications excluding climate variables or including more or fewer lags (Extended Data Fig. 3 , Supplementary Methods Section  1 and Supplementary Table 1 ). We therefore remove statistically insignificant terms at later lags (Supplementary Figs. 1 – 3 and Supplementary Tables 2 – 4 ). Further tests using Monte Carlo simulations demonstrate that the empirical models are robust to autocorrelation in the lagged climate variables (Supplementary Methods Section  2 and Supplementary Figs. 4 and 5 ), that information criteria provide an effective indicator for lag selection (Supplementary Methods Section  2 and Supplementary Fig. 6 ), that the results are robust to concerns of imperfect multicollinearity between climate variables and that including several climate variables is actually necessary to isolate their separate effects (Supplementary Methods Section  3 and Supplementary Fig. 7 ). We provide a further robustness check using a restricted distributed lag model to limit oscillations in the lagged parameter estimates that may result from autocorrelation, finding that it provides similar estimates of cumulative marginal effects to the unrestricted model (Supplementary Methods Section 4 and Supplementary Figs. 8 and 9 ). Finally, to explicitly account for any outstanding uncertainty arising from the precise choice of the number of lags, we include empirical models with marginally different numbers of lags in the error-sampling procedure of our projection of future damages. On the basis of the lag-selection procedure (the significance of lagged terms in Extended Data Fig. 1 and Extended Data Table 2 , as well as information criteria in Extended Data Fig. 3 ), we sample from models with eight to ten lags for temperature and four for precipitation (models shown in Supplementary Figs. 1 – 3 and Supplementary Tables 2 – 4 ). In summary, this empirical approach to constrain the persistence of climate impacts on economic growth rates is conservative by design in avoiding assumptions of infinite persistence, but nevertheless provides a lower bound on the extent of impact persistence that is robust to the numerous tests outlined above.

Committed damages until mid-century

We combine these empirical economic response functions (Supplementary Figs. 1 – 3 and Supplementary Tables 2 – 4 ) with an ensemble of 21 climate models (see Supplementary Table 5 ) from the Coupled Model Intercomparison Project Phase 6 (CMIP-6) 22 to project the macroeconomic damages from these components of physical climate change (see Methods for further details). Bias-adjusted climate models that provide a highly accurate reproduction of observed climatological patterns with limited uncertainty (Supplementary Table 6 ) are used to avoid introducing biases in the projections. Following a well-developed literature 2 , 3 , 19 , these projections do not aim to provide a prediction of future economic growth. Instead, they are a projection of the exogenous impact of future climate conditions on the economy relative to the baselines specified by socio-economic projections, based on the plausibly causal relationships inferred by the empirical models and assuming ceteris paribus. Other exogenous factors relevant for the prediction of economic output are purposefully assumed constant.

A Monte Carlo procedure that samples from climate model projections, empirical models with different numbers of lags and model parameter estimates (obtained by 1,000 block-bootstrap resamples of each of the regressions in Supplementary Figs. 1 – 3 and Supplementary Tables 2 – 4 ) is used to estimate the combined uncertainty from these sources. Given these uncertainty distributions, we find that projected global damages are statistically indistinguishable across the two most extreme emission scenarios until 2049 (at the 5% significance level; Fig. 1 ). As such, the climate damages occurring before this time constitute those to which the world is already committed owing to the combination of past emissions and the range of future emission scenarios that are considered socio-economically plausible 15 . These committed damages comprise a permanent income reduction of 19% on average globally (population-weighted average) in comparison with a baseline without climate-change impacts (with a likely range of 11–29%, following the likelihood classification adopted by the Intergovernmental Panel on Climate Change (IPCC); see caption of Fig. 1 ). Even though levels of income per capita generally still increase relative to those of today, this constitutes a permanent income reduction for most regions, including North America and Europe (each with median income reductions of approximately 11%) and with South Asia and Africa being the most strongly affected (each with median income reductions of approximately 22%; Fig. 1 ). Under a middle-of-the road scenario of future income development (SSP2, in which SSP stands for Shared Socio-economic Pathway), this corresponds to global annual damages in 2049 of 38 trillion in 2005 international dollars (likely range of 19–59 trillion 2005 international dollars). Compared with empirical specifications that assume pure growth or pure level effects, our preferred specification that provides a robust lower bound on the extent of climate impact persistence produces damages between these two extreme assumptions (Extended Data Fig. 3 ).

figure 1

Estimates of the projected reduction in income per capita from changes in all climate variables based on empirical models of climate impacts on economic output with a robust lower bound on their persistence (Extended Data Fig. 1 ) under a low-emission scenario compatible with the 2 °C warming target and a high-emission scenario (SSP2-RCP2.6 and SSP5-RCP8.5, respectively) are shown in purple and orange, respectively. Shading represents the 34% and 10% confidence intervals reflecting the likely and very likely ranges, respectively (following the likelihood classification adopted by the IPCC), having estimated uncertainty from a Monte Carlo procedure, which samples the uncertainty from the choice of physical climate models, empirical models with different numbers of lags and bootstrapped estimates of the regression parameters shown in Supplementary Figs. 1 – 3 . Vertical dashed lines show the time at which the climate damages of the two emission scenarios diverge at the 5% and 1% significance levels based on the distribution of differences between emission scenarios arising from the uncertainty sampling discussed above. Note that uncertainty in the difference of the two scenarios is smaller than the combined uncertainty of the two respective scenarios because samples of the uncertainty (climate model and empirical model choice, as well as model parameter bootstrap) are consistent across the two emission scenarios, hence the divergence of damages occurs while the uncertainty bounds of the two separate damage scenarios still overlap. Estimates of global mitigation costs from the three IAMs that provide results for the SSP2 baseline and SSP2-RCP2.6 scenario are shown in light green in the top panel, with the median of these estimates shown in bold.

Damages already outweigh mitigation costs

We compare the damages to which the world is committed over the next 25 years to estimates of the mitigation costs required to achieve the Paris Climate Agreement. Taking estimates of mitigation costs from the three integrated assessment models (IAMs) in the IPCC AR6 database 23 that provide results under comparable scenarios (SSP2 baseline and SSP2-RCP2.6, in which RCP stands for Representative Concentration Pathway), we find that the median committed climate damages are larger than the median mitigation costs in 2050 (six trillion in 2005 international dollars) by a factor of approximately six (note that estimates of mitigation costs are only provided every 10 years by the IAMs and so a comparison in 2049 is not possible). This comparison simply aims to compare the magnitude of future damages against mitigation costs, rather than to conduct a formal cost–benefit analysis of transitioning from one emission path to another. Formal cost–benefit analyses typically find that the net benefits of mitigation only emerge after 2050 (ref.  5 ), which may lead some to conclude that physical damages from climate change are simply not large enough to outweigh mitigation costs until the second half of the century. Our simple comparison of their magnitudes makes clear that damages are actually already considerably larger than mitigation costs and the delayed emergence of net mitigation benefits results primarily from the fact that damages across different emission paths are indistinguishable until mid-century (Fig. 1 ).

Although these near-term damages constitute those to which the world is already committed, we note that damage estimates diverge strongly across emission scenarios after 2049, conveying the clear benefits of mitigation from a purely economic point of view that have been emphasized in previous studies 4 , 24 . As well as the uncertainties assessed in Fig. 1 , these conclusions are robust to structural choices, such as the timescale with which changes in the moderating variables of the empirical models are estimated (Supplementary Figs. 10 and 11 ), as well as the order in which one accounts for the intertemporal and international components of currency comparison (Supplementary Fig. 12 ; see Methods for further details).

Damages from variability and extremes

Committed damages primarily arise through changes in average temperature (Fig. 2 ). This reflects the fact that projected changes in average temperature are larger than those in other climate variables when expressed as a function of their historical interannual variability (Extended Data Fig. 4 ). Because the historical variability is that on which the empirical models are estimated, larger projected changes in comparison with this variability probably lead to larger future impacts in a purely statistical sense. From a mechanistic perspective, one may plausibly interpret this result as implying that future changes in average temperature are the most unprecedented from the perspective of the historical fluctuations to which the economy is accustomed and therefore will cause the most damage. This insight may prove useful in terms of guiding adaptation measures to the sources of greatest damage.

figure 2

Estimates of the median projected reduction in sub-national income per capita across emission scenarios (SSP2-RCP2.6 and SSP2-RCP8.5) as well as climate model, empirical model and model parameter uncertainty in the year in which climate damages diverge at the 5% level (2049, as identified in Fig. 1 ). a , Impacts arising from all climate variables. b – f , Impacts arising separately from changes in annual mean temperature ( b ), daily temperature variability ( c ), total annual precipitation ( d ), the annual number of wet days (>1 mm) ( e ) and extreme daily rainfall ( f ) (see Methods for further definitions). Data on national administrative boundaries are obtained from the GADM database version 3.6 and are freely available for academic use ( https://gadm.org/ ).

Nevertheless, future damages based on empirical models that consider changes in annual average temperature only and exclude the other climate variables constitute income reductions of only 13% in 2049 (Extended Data Fig. 5a , likely range 5–21%). This suggests that accounting for the other components of the distribution of temperature and precipitation raises net damages by nearly 50%. This increase arises through the further damages that these climatic components cause, but also because their inclusion reveals a stronger negative economic response to average temperatures (Extended Data Fig. 5b ). The latter finding is consistent with our Monte Carlo simulations, which suggest that the magnitude of the effect of average temperature on economic growth is underestimated unless accounting for the impacts of other correlated climate variables (Supplementary Fig. 7 ).

In terms of the relative contributions of the different climatic components to overall damages, we find that accounting for daily temperature variability causes the largest increase in overall damages relative to empirical frameworks that only consider changes in annual average temperature (4.9 percentage points, likely range 2.4–8.7 percentage points, equivalent to approximately 10 trillion international dollars). Accounting for precipitation causes smaller increases in overall damages, which are—nevertheless—equivalent to approximately 1.2 trillion international dollars: 0.01 percentage points (−0.37–0.33 percentage points), 0.34 percentage points (0.07–0.90 percentage points) and 0.36 percentage points (0.13–0.65 percentage points) from total annual precipitation, the number of wet days and extreme daily precipitation, respectively. Moreover, climate models seem to underestimate future changes in temperature variability 25 and extreme precipitation 26 , 27 in response to anthropogenic forcing as compared with that observed historically, suggesting that the true impacts from these variables may be larger.

The distribution of committed damages

The spatial distribution of committed damages (Fig. 2a ) reflects a complex interplay between the patterns of future change in several climatic components and those of historical economic vulnerability to changes in those variables. Damages resulting from increasing annual mean temperature (Fig. 2b ) are negative almost everywhere globally, and larger at lower latitudes in regions in which temperatures are already higher and economic vulnerability to temperature increases is greatest (see the response heterogeneity to mean temperature embodied in Extended Data Fig. 1a ). This occurs despite the amplified warming projected at higher latitudes 28 , suggesting that regional heterogeneity in economic vulnerability to temperature changes outweighs heterogeneity in the magnitude of future warming (Supplementary Fig. 13a ). Economic damages owing to daily temperature variability (Fig. 2c ) exhibit a strong latitudinal polarisation, primarily reflecting the physical response of daily variability to greenhouse forcing in which increases in variability across lower latitudes (and Europe) contrast decreases at high latitudes 25 (Supplementary Fig. 13b ). These two temperature terms are the dominant determinants of the pattern of overall damages (Fig. 2a ), which exhibits a strong polarity with damages across most of the globe except at the highest northern latitudes. Future changes in total annual precipitation mainly bring economic benefits except in regions of drying, such as the Mediterranean and central South America (Fig. 2d and Supplementary Fig. 13c ), but these benefits are opposed by changes in the number of wet days, which produce damages with a similar pattern of opposite sign (Fig. 2e and Supplementary Fig. 13d ). By contrast, changes in extreme daily rainfall produce damages in all regions, reflecting the intensification of daily rainfall extremes over global land areas 29 , 30 (Fig. 2f and Supplementary Fig. 13e ).

The spatial distribution of committed damages implies considerable injustice along two dimensions: culpability for the historical emissions that have caused climate change and pre-existing levels of socio-economic welfare. Spearman’s rank correlations indicate that committed damages are significantly larger in countries with smaller historical cumulative emissions, as well as in regions with lower current income per capita (Fig. 3 ). This implies that those countries that will suffer the most from the damages already committed are those that are least responsible for climate change and which also have the least resources to adapt to it.

figure 3

Estimates of the median projected change in national income per capita across emission scenarios (RCP2.6 and RCP8.5) as well as climate model, empirical model and model parameter uncertainty in the year in which climate damages diverge at the 5% level (2049, as identified in Fig. 1 ) are plotted against cumulative national emissions per capita in 2020 (from the Global Carbon Project) and coloured by national income per capita in 2020 (from the World Bank) in a and vice versa in b . In each panel, the size of each scatter point is weighted by the national population in 2020 (from the World Bank). Inset numbers indicate the Spearman’s rank correlation ρ and P -values for a hypothesis test whose null hypothesis is of no correlation, as well as the Spearman’s rank correlation weighted by national population.

To further quantify this heterogeneity, we assess the difference in committed damages between the upper and lower quartiles of regions when ranked by present income levels and historical cumulative emissions (using a population weighting to both define the quartiles and estimate the group averages). On average, the quartile of countries with lower income are committed to an income loss that is 8.9 percentage points (or 61%) greater than the upper quartile (Extended Data Fig. 6 ), with a likely range of 3.8–14.7 percentage points across the uncertainty sampling of our damage projections (following the likelihood classification adopted by the IPCC). Similarly, the quartile of countries with lower historical cumulative emissions are committed to an income loss that is 6.9 percentage points (or 40%) greater than the upper quartile, with a likely range of 0.27–12 percentage points. These patterns reemphasize the prevalence of injustice in climate impacts 31 , 32 , 33 in the context of the damages to which the world is already committed by historical emissions and socio-economic inertia.

Contextualizing the magnitude of damages

The magnitude of projected economic damages exceeds previous literature estimates 2 , 3 , arising from several developments made on previous approaches. Our estimates are larger than those of ref.  2 (see first row of Extended Data Table 3 ), primarily because of the facts that sub-national estimates typically show a steeper temperature response (see also refs.  3 , 34 ) and that accounting for other climatic components raises damage estimates (Extended Data Fig. 5 ). However, we note that our empirical approach using first-differenced climate variables is conservative compared with that of ref.  2 in regard to the persistence of climate impacts on growth (see introduction and Methods section ‘Empirical model specification: fixed-effects distributed lag models’), an important determinant of the magnitude of long-term damages 19 , 21 . Using a similar empirical specification to ref.  2 , which assumes infinite persistence while maintaining the rest of our approach (sub-national data and further climate variables), produces considerably larger damages (purple curve of Extended Data Fig. 3 ). Compared with studies that do take the first difference of climate variables 3 , 35 , our estimates are also larger (see second and third rows of Extended Data Table 3 ). The inclusion of further climate variables (Extended Data Fig. 5 ) and a sufficient number of lags to more adequately capture the extent of impact persistence (Extended Data Figs. 1 and 2 ) are the main sources of this difference, as is the use of specifications that capture nonlinearities in the temperature response when compared with ref.  35 . In summary, our estimates develop on previous studies by incorporating the latest data and empirical insights 7 , 8 , as well as in providing a robust empirical lower bound on the persistence of impacts on economic growth, which constitutes a middle ground between the extremes of the growth-versus-levels debate 19 , 21 (Extended Data Fig. 3 ).

Compared with the fraction of variance explained by the empirical models historically (<5%), the projection of reductions in income of 19% may seem large. This arises owing to the fact that projected changes in climatic conditions are much larger than those that were experienced historically, particularly for changes in average temperature (Extended Data Fig. 4 ). As such, any assessment of future climate-change impacts necessarily requires an extrapolation outside the range of the historical data on which the empirical impact models were evaluated. Nevertheless, these models constitute the most state-of-the-art methods for inference of plausibly causal climate impacts based on observed data. Moreover, we take explicit steps to limit out-of-sample extrapolation by capping the moderating variables of the interaction terms at the 95th percentile of the historical distribution (see Methods ). This avoids extrapolating the marginal effects outside what was observed historically. Given the nonlinear response of economic output to annual mean temperature (Extended Data Fig. 1 and Extended Data Table 2 ), this is a conservative choice that limits the magnitude of damages that we project. Furthermore, back-of-the-envelope calculations indicate that the projected damages are consistent with the magnitude and patterns of historical economic development (see Supplementary Discussion Section  5 ).

Missing impacts and spatial spillovers

Despite assessing several climatic components from which economic impacts have recently been identified 3 , 7 , 8 , this assessment of aggregate climate damages should not be considered comprehensive. Important channels such as impacts from heatwaves 31 , sea-level rise 36 , tropical cyclones 37 and tipping points 38 , 39 , as well as non-market damages such as those to ecosystems 40 and human health 41 , are not considered in these estimates. Sea-level rise is unlikely to be feasibly incorporated into empirical assessments such as this because historical sea-level variability is mostly small. Non-market damages are inherently intractable within our estimates of impacts on aggregate monetary output and estimates of these impacts could arguably be considered as extra to those identified here. Recent empirical work suggests that accounting for these channels would probably raise estimates of these committed damages, with larger damages continuing to arise in the global south 31 , 36 , 37 , 38 , 39 , 40 , 41 , 42 .

Moreover, our main empirical analysis does not explicitly evaluate the potential for impacts in local regions to produce effects that ‘spill over’ into other regions. Such effects may further mitigate or amplify the impacts we estimate, for example, if companies relocate production from one affected region to another or if impacts propagate along supply chains. The current literature indicates that trade plays a substantial role in propagating spillover effects 43 , 44 , making their assessment at the sub-national level challenging without available data on sub-national trade dependencies. Studies accounting for only spatially adjacent neighbours indicate that negative impacts in one region induce further negative impacts in neighbouring regions 45 , 46 , 47 , 48 , suggesting that our projected damages are probably conservative by excluding these effects. In Supplementary Fig. 14 , we assess spillovers from neighbouring regions using a spatial-lag model. For simplicity, this analysis excludes temporal lags, focusing only on contemporaneous effects. The results show that accounting for spatial spillovers can amplify the overall magnitude, and also the heterogeneity, of impacts. Consistent with previous literature, this indicates that the overall magnitude (Fig. 1 ) and heterogeneity (Fig. 3 ) of damages that we project in our main specification may be conservative without explicitly accounting for spillovers. We note that further analysis that addresses both spatially and trade-connected spillovers, while also accounting for delayed impacts using temporal lags, would be necessary to adequately address this question fully. These approaches offer fruitful avenues for further research but are beyond the scope of this manuscript, which primarily aims to explore the impacts of different climate conditions and their persistence.

Policy implications

We find that the economic damages resulting from climate change until 2049 are those to which the world economy is already committed and that these greatly outweigh the costs required to mitigate emissions in line with the 2 °C target of the Paris Climate Agreement (Fig. 1 ). This assessment is complementary to formal analyses of the net costs and benefits associated with moving from one emission path to another, which typically find that net benefits of mitigation only emerge in the second half of the century 5 . Our simple comparison of the magnitude of damages and mitigation costs makes clear that this is primarily because damages are indistinguishable across emissions scenarios—that is, committed—until mid-century (Fig. 1 ) and that they are actually already much larger than mitigation costs. For simplicity, and owing to the availability of data, we compare damages to mitigation costs at the global level. Regional estimates of mitigation costs may shed further light on the national incentives for mitigation to which our results already hint, of relevance for international climate policy. Although these damages are committed from a mitigation perspective, adaptation may provide an opportunity to reduce them. Moreover, the strong divergence of damages after mid-century reemphasizes the clear benefits of mitigation from a purely economic perspective, as highlighted in previous studies 1 , 4 , 6 , 24 .

Historical climate data

Historical daily 2-m temperature and precipitation totals (in mm) are obtained for the period 1979–2019 from the W5E5 database. The W5E5 dataset comes from ERA-5, a state-of-the-art reanalysis of historical observations, but has been bias-adjusted by applying version 2.0 of the WATCH Forcing Data to ERA-5 reanalysis data and precipitation data from version 2.3 of the Global Precipitation Climatology Project to better reflect ground-based measurements 49 , 50 , 51 . We obtain these data on a 0.5° × 0.5° grid from the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) database. Notably, these historical data have been used to bias-adjust future climate projections from CMIP-6 (see the following section), ensuring consistency between the distribution of historical daily weather on which our empirical models were estimated and the climate projections used to estimate future damages. These data are publicly available from the ISIMIP database. See refs.  7 , 8 for robustness tests of the empirical models to the choice of climate data reanalysis products.

Future climate data

Daily 2-m temperature and precipitation totals (in mm) are taken from 21 climate models participating in CMIP-6 under a high (RCP8.5) and a low (RCP2.6) greenhouse gas emission scenario from 2015 to 2100. The data have been bias-adjusted and statistically downscaled to a common half-degree grid to reflect the historical distribution of daily temperature and precipitation of the W5E5 dataset using the trend-preserving method developed by the ISIMIP 50 , 52 . As such, the climate model data reproduce observed climatological patterns exceptionally well (Supplementary Table 5 ). Gridded data are publicly available from the ISIMIP database.

Historical economic data

Historical economic data come from the DOSE database of sub-national economic output 53 . We use a recent revision to the DOSE dataset that provides data across 83 countries, 1,660 sub-national regions with varying temporal coverage from 1960 to 2019. Sub-national units constitute the first administrative division below national, for example, states for the USA and provinces for China. Data come from measures of gross regional product per capita (GRPpc) or income per capita in local currencies, reflecting the values reported in national statistical agencies, yearbooks and, in some cases, academic literature. We follow previous literature 3 , 7 , 8 , 54 and assess real sub-national output per capita by first converting values from local currencies to US dollars to account for diverging national inflationary tendencies and then account for US inflation using a US deflator. Alternatively, one might first account for national inflation and then convert between currencies. Supplementary Fig. 12 demonstrates that our conclusions are consistent when accounting for price changes in the reversed order, although the magnitude of estimated damages varies. See the documentation of the DOSE dataset for further discussion of these choices. Conversions between currencies are conducted using exchange rates from the FRED database of the Federal Reserve Bank of St. Louis 55 and the national deflators from the World Bank 56 .

Future socio-economic data

Baseline gridded gross domestic product (GDP) and population data for the period 2015–2100 are taken from the middle-of-the-road scenario SSP2 (ref.  15 ). Population data have been downscaled to a half-degree grid by the ISIMIP following the methodologies of refs.  57 , 58 , which we then aggregate to the sub-national level of our economic data using the spatial aggregation procedure described below. Because current methodologies for downscaling the GDP of the SSPs use downscaled population to do so, per-capita estimates of GDP with a realistic distribution at the sub-national level are not readily available for the SSPs. We therefore use national-level GDP per capita (GDPpc) projections for all sub-national regions of a given country, assuming homogeneity within countries in terms of baseline GDPpc. Here we use projections that have been updated to account for the impact of the COVID-19 pandemic on the trajectory of future income, while remaining consistent with the long-term development of the SSPs 59 . The choice of baseline SSP alters the magnitude of projected climate damages in monetary terms, but when assessed in terms of percentage change from the baseline, the choice of socio-economic scenario is inconsequential. Gridded SSP population data and national-level GDPpc data are publicly available from the ISIMIP database. Sub-national estimates as used in this study are available in the code and data replication files.

Climate variables

Following recent literature 3 , 7 , 8 , we calculate an array of climate variables for which substantial impacts on macroeconomic output have been identified empirically, supported by further evidence at the micro level for plausible underlying mechanisms. See refs.  7 , 8 for an extensive motivation for the use of these particular climate variables and for detailed empirical tests on the nature and robustness of their effects on economic output. To summarize, these studies have found evidence for independent impacts on economic growth rates from annual average temperature, daily temperature variability, total annual precipitation, the annual number of wet days and extreme daily rainfall. Assessments of daily temperature variability were motivated by evidence of impacts on agricultural output and human health, as well as macroeconomic literature on the impacts of volatility on growth when manifest in different dimensions, such as government spending, exchange rates and even output itself 7 . Assessments of precipitation impacts were motivated by evidence of impacts on agricultural productivity, metropolitan labour outcomes and conflict, as well as damages caused by flash flooding 8 . See Extended Data Table 1 for detailed references to empirical studies of these physical mechanisms. Marked impacts of daily temperature variability, total annual precipitation, the number of wet days and extreme daily rainfall on macroeconomic output were identified robustly across different climate datasets, spatial aggregation schemes, specifications of regional time trends and error-clustering approaches. They were also found to be robust to the consideration of temperature extremes 7 , 8 . Furthermore, these climate variables were identified as having independent effects on economic output 7 , 8 , which we further explain here using Monte Carlo simulations to demonstrate the robustness of the results to concerns of imperfect multicollinearity between climate variables (Supplementary Methods Section  2 ), as well as by using information criteria (Supplementary Table 1 ) to demonstrate that including several lagged climate variables provides a preferable trade-off between optimally describing the data and limiting the possibility of overfitting.

We calculate these variables from the distribution of daily, d , temperature, T x , d , and precipitation, P x , d , at the grid-cell, x , level for both the historical and future climate data. As well as annual mean temperature, \({\bar{T}}_{x,y}\) , and annual total precipitation, P x , y , we calculate annual, y , measures of daily temperature variability, \({\widetilde{T}}_{x,y}\) :

the number of wet days, Pwd x , y :

and extreme daily rainfall:

in which T x , d , m , y is the grid-cell-specific daily temperature in month m and year y , \({\bar{T}}_{x,m,{y}}\) is the year and grid-cell-specific monthly, m , mean temperature, D m and D y the number of days in a given month m or year y , respectively, H the Heaviside step function, 1 mm the threshold used to define wet days and P 99.9 x is the 99.9th percentile of historical (1979–2019) daily precipitation at the grid-cell level. Units of the climate measures are degrees Celsius for annual mean temperature and daily temperature variability, millimetres for total annual precipitation and extreme daily precipitation, and simply the number of days for the annual number of wet days.

We also calculated weighted standard deviations of monthly rainfall totals as also used in ref.  8 but do not include them in our projections as we find that, when accounting for delayed effects, their effect becomes statistically indistinct and is better captured by changes in total annual rainfall.

Spatial aggregation

We aggregate grid-cell-level historical and future climate measures, as well as grid-cell-level future GDPpc and population, to the level of the first administrative unit below national level of the GADM database, using an area-weighting algorithm that estimates the portion of each grid cell falling within an administrative boundary. We use this as our baseline specification following previous findings that the effect of area or population weighting at the sub-national level is negligible 7 , 8 .

Empirical model specification: fixed-effects distributed lag models

Following a wide range of climate econometric literature 16 , 60 , we use panel regression models with a selection of fixed effects and time trends to isolate plausibly exogenous variation with which to maximize confidence in a causal interpretation of the effects of climate on economic growth rates. The use of region fixed effects, μ r , accounts for unobserved time-invariant differences between regions, such as prevailing climatic norms and growth rates owing to historical and geopolitical factors. The use of yearly fixed effects, η y , accounts for regionally invariant annual shocks to the global climate or economy such as the El Niño–Southern Oscillation or global recessions. In our baseline specification, we also include region-specific linear time trends, k r y , to exclude the possibility of spurious correlations resulting from common slow-moving trends in climate and growth.

The persistence of climate impacts on economic growth rates is a key determinant of the long-term magnitude of damages. Methods for inferring the extent of persistence in impacts on growth rates have typically used lagged climate variables to evaluate the presence of delayed effects or catch-up dynamics 2 , 18 . For example, consider starting from a model in which a climate condition, C r , y , (for example, annual mean temperature) affects the growth rate, Δlgrp r , y (the first difference of the logarithm of gross regional product) of region r in year y :

which we refer to as a ‘pure growth effects’ model in the main text. Typically, further lags are included,

and the cumulative effect of all lagged terms is evaluated to assess the extent to which climate impacts on growth rates persist. Following ref.  18 , in the case that,

the implication is that impacts on the growth rate persist up to NL years after the initial shock (possibly to a weaker or a stronger extent), whereas if

then the initial impact on the growth rate is recovered after NL years and the effect is only one on the level of output. However, we note that such approaches are limited by the fact that, when including an insufficient number of lags to detect a recovery of the growth rates, one may find equation ( 6 ) to be satisfied and incorrectly assume that a change in climatic conditions affects the growth rate indefinitely. In practice, given a limited record of historical data, including too few lags to confidently conclude in an infinitely persistent impact on the growth rate is likely, particularly over the long timescales over which future climate damages are often projected 2 , 24 . To avoid this issue, we instead begin our analysis with a model for which the level of output, lgrp r , y , depends on the level of a climate variable, C r , y :

Given the non-stationarity of the level of output, we follow the literature 19 and estimate such an equation in first-differenced form as,

which we refer to as a model of ‘pure level effects’ in the main text. This model constitutes a baseline specification in which a permanent change in the climate variable produces an instantaneous impact on the growth rate and a permanent effect only on the level of output. By including lagged variables in this specification,

we are able to test whether the impacts on the growth rate persist any further than instantaneously by evaluating whether α L  > 0 are statistically significantly different from zero. Even though this framework is also limited by the possibility of including too few lags, the choice of a baseline model specification in which impacts on the growth rate do not persist means that, in the case of including too few lags, the framework reverts to the baseline specification of level effects. As such, this framework is conservative with respect to the persistence of impacts and the magnitude of future damages. It naturally avoids assumptions of infinite persistence and we are able to interpret any persistence that we identify with equation ( 9 ) as a lower bound on the extent of climate impact persistence on growth rates. See the main text for further discussion of this specification choice, in particular about its conservative nature compared with previous literature estimates, such as refs.  2 , 18 .

We allow the response to climatic changes to vary across regions, using interactions of the climate variables with historical average (1979–2019) climatic conditions reflecting heterogenous effects identified in previous work 7 , 8 . Following this previous work, the moderating variables of these interaction terms constitute the historical average of either the variable itself or of the seasonal temperature difference, \({\hat{T}}_{r}\) , or annual mean temperature, \({\bar{T}}_{r}\) , in the case of daily temperature variability 7 and extreme daily rainfall, respectively 8 .

The resulting regression equation with N and M lagged variables, respectively, reads:

in which Δlgrp r , y is the annual, regional GRPpc growth rate, measured as the first difference of the logarithm of real GRPpc, following previous work 2 , 3 , 7 , 8 , 18 , 19 . Fixed-effects regressions were run using the fixest package in R (ref.  61 ).

Estimates of the coefficients of interest α i , L are shown in Extended Data Fig. 1 for N  =  M  = 10 lags and for our preferred choice of the number of lags in Supplementary Figs. 1 – 3 . In Extended Data Fig. 1 , errors are shown clustered at the regional level, but for the construction of damage projections, we block-bootstrap the regressions by region 1,000 times to provide a range of parameter estimates with which to sample the projection uncertainty (following refs.  2 , 31 ).

Spatial-lag model

In Supplementary Fig. 14 , we present the results from a spatial-lag model that explores the potential for climate impacts to ‘spill over’ into spatially neighbouring regions. We measure the distance between centroids of each pair of sub-national regions and construct spatial lags that take the average of the first-differenced climate variables and their interaction terms over neighbouring regions that are at distances of 0–500, 500–1,000, 1,000–1,500 and 1,500–2000 km (spatial lags, ‘SL’, 1 to 4). For simplicity, we then assess a spatial-lag model without temporal lags to assess spatial spillovers of contemporaneous climate impacts. This model takes the form:

in which SL indicates the spatial lag of each climate variable and interaction term. In Supplementary Fig. 14 , we plot the cumulative marginal effect of each climate variable at different baseline climate conditions by summing the coefficients for each climate variable and interaction term, for example, for average temperature impacts as:

These cumulative marginal effects can be regarded as the overall spatially dependent impact to an individual region given a one-unit shock to a climate variable in that region and all neighbouring regions at a given value of the moderating variable of the interaction term.

Constructing projections of economic damage from future climate change

We construct projections of future climate damages by applying the coefficients estimated in equation ( 10 ) and shown in Supplementary Tables 2 – 4 (when including only lags with statistically significant effects in specifications that limit overfitting; see Supplementary Methods Section  1 ) to projections of future climate change from the CMIP-6 models. Year-on-year changes in each primary climate variable of interest are calculated to reflect the year-to-year variations used in the empirical models. 30-year moving averages of the moderating variables of the interaction terms are calculated to reflect the long-term average of climatic conditions that were used for the moderating variables in the empirical models. By using moving averages in the projections, we account for the changing vulnerability to climate shocks based on the evolving long-term conditions (Supplementary Figs. 10 and 11 show that the results are robust to the precise choice of the window of this moving average). Although these climate variables are not differenced, the fact that the bias-adjusted climate models reproduce observed climatological patterns across regions for these moderating variables very accurately (Supplementary Table 6 ) with limited spread across models (<3%) precludes the possibility that any considerable bias or uncertainty is introduced by this methodological choice. However, we impose caps on these moderating variables at the 95th percentile at which they were observed in the historical data to prevent extrapolation of the marginal effects outside the range in which the regressions were estimated. This is a conservative choice that limits the magnitude of our damage projections.

Time series of primary climate variables and moderating climate variables are then combined with estimates of the empirical model parameters to evaluate the regression coefficients in equation ( 10 ), producing a time series of annual GRPpc growth-rate reductions for a given emission scenario, climate model and set of empirical model parameters. The resulting time series of growth-rate impacts reflects those occurring owing to future climate change. By contrast, a future scenario with no climate change would be one in which climate variables do not change (other than with random year-to-year fluctuations) and hence the time-averaged evaluation of equation ( 10 ) would be zero. Our approach therefore implicitly compares the future climate-change scenario to this no-climate-change baseline scenario.

The time series of growth-rate impacts owing to future climate change in region r and year y , δ r , y , are then added to the future baseline growth rates, π r , y (in log-diff form), obtained from the SSP2 scenario to yield trajectories of damaged GRPpc growth rates, ρ r , y . These trajectories are aggregated over time to estimate the future trajectory of GRPpc with future climate impacts:

in which GRPpc r , y =2020 is the initial log level of GRPpc. We begin damage estimates in 2020 to reflect the damages occurring since the end of the period for which we estimate the empirical models (1979–2019) and to match the timing of mitigation-cost estimates from most IAMs (see below).

For each emission scenario, this procedure is repeated 1,000 times while randomly sampling from the selection of climate models, the selection of empirical models with different numbers of lags (shown in Supplementary Figs. 1 – 3 and Supplementary Tables 2 – 4 ) and bootstrapped estimates of the regression parameters. The result is an ensemble of future GRPpc trajectories that reflect uncertainty from both physical climate change and the structural and sampling uncertainty of the empirical models.

Estimates of mitigation costs

We obtain IPCC estimates of the aggregate costs of emission mitigation from the AR6 Scenario Explorer and Database hosted by IIASA 23 . Specifically, we search the AR6 Scenarios Database World v1.1 for IAMs that provided estimates of global GDP and population under both a SSP2 baseline and a SSP2-RCP2.6 scenario to maintain consistency with the socio-economic and emission scenarios of the climate damage projections. We find five IAMs that provide data for these scenarios, namely, MESSAGE-GLOBIOM 1.0, REMIND-MAgPIE 1.5, AIM/GCE 2.0, GCAM 4.2 and WITCH-GLOBIOM 3.1. Of these five IAMs, we use the results only from the first three that passed the IPCC vetting procedure for reproducing historical emission and climate trajectories. We then estimate global mitigation costs as the percentage difference in global per capita GDP between the SSP2 baseline and the SSP2-RCP2.6 emission scenario. In the case of one of these IAMs, estimates of mitigation costs begin in 2020, whereas in the case of two others, mitigation costs begin in 2010. The mitigation cost estimates before 2020 in these two IAMs are mostly negligible, and our choice to begin comparison with damage estimates in 2020 is conservative with respect to the relative weight of climate damages compared with mitigation costs for these two IAMs.

Data availability

Data on economic production and ERA-5 climate data are publicly available at https://doi.org/10.5281/zenodo.4681306 (ref. 62 ) and https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5 , respectively. Data on mitigation costs are publicly available at https://data.ene.iiasa.ac.at/ar6/#/downloads . Processed climate and economic data, as well as all other necessary data for reproduction of the results, are available at the public repository https://doi.org/10.5281/zenodo.10562951  (ref. 63 ).

Code availability

All code necessary for reproduction of the results is available at the public repository https://doi.org/10.5281/zenodo.10562951  (ref. 63 ).

Glanemann, N., Willner, S. N. & Levermann, A. Paris Climate Agreement passes the cost-benefit test. Nat. Commun. 11 , 110 (2020).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Burke, M., Hsiang, S. M. & Miguel, E. Global non-linear effect of temperature on economic production. Nature 527 , 235–239 (2015).

Article   ADS   CAS   PubMed   Google Scholar  

Kalkuhl, M. & Wenz, L. The impact of climate conditions on economic production. Evidence from a global panel of regions. J. Environ. Econ. Manag. 103 , 102360 (2020).

Article   Google Scholar  

Moore, F. C. & Diaz, D. B. Temperature impacts on economic growth warrant stringent mitigation policy. Nat. Clim. Change 5 , 127–131 (2015).

Article   ADS   Google Scholar  

Drouet, L., Bosetti, V. & Tavoni, M. Net economic benefits of well-below 2°C scenarios and associated uncertainties. Oxf. Open Clim. Change 2 , kgac003 (2022).

Ueckerdt, F. et al. The economically optimal warming limit of the planet. Earth Syst. Dyn. 10 , 741–763 (2019).

Kotz, M., Wenz, L., Stechemesser, A., Kalkuhl, M. & Levermann, A. Day-to-day temperature variability reduces economic growth. Nat. Clim. Change 11 , 319–325 (2021).

Kotz, M., Levermann, A. & Wenz, L. The effect of rainfall changes on economic production. Nature 601 , 223–227 (2022).

Kousky, C. Informing climate adaptation: a review of the economic costs of natural disasters. Energy Econ. 46 , 576–592 (2014).

Harlan, S. L. et al. in Climate Change and Society: Sociological Perspectives (eds Dunlap, R. E. & Brulle, R. J.) 127–163 (Oxford Univ. Press, 2015).

Bolton, P. et al. The Green Swan (BIS Books, 2020).

Alogoskoufis, S. et al. ECB Economy-wide Climate Stress Test: Methodology and Results European Central Bank, 2021).

Weber, E. U. What shapes perceptions of climate change? Wiley Interdiscip. Rev. Clim. Change 1 , 332–342 (2010).

Markowitz, E. M. & Shariff, A. F. Climate change and moral judgement. Nat. Clim. Change 2 , 243–247 (2012).

Riahi, K. et al. The shared socioeconomic pathways and their energy, land use, and greenhouse gas emissions implications: an overview. Glob. Environ. Change 42 , 153–168 (2017).

Auffhammer, M., Hsiang, S. M., Schlenker, W. & Sobel, A. Using weather data and climate model output in economic analyses of climate change. Rev. Environ. Econ. Policy 7 , 181–198 (2013).

Kolstad, C. D. & Moore, F. C. Estimating the economic impacts of climate change using weather observations. Rev. Environ. Econ. Policy 14 , 1–24 (2020).

Dell, M., Jones, B. F. & Olken, B. A. Temperature shocks and economic growth: evidence from the last half century. Am. Econ. J. Macroecon. 4 , 66–95 (2012).

Newell, R. G., Prest, B. C. & Sexton, S. E. The GDP-temperature relationship: implications for climate change damages. J. Environ. Econ. Manag. 108 , 102445 (2021).

Kikstra, J. S. et al. The social cost of carbon dioxide under climate-economy feedbacks and temperature variability. Environ. Res. Lett. 16 , 094037 (2021).

Article   ADS   CAS   Google Scholar  

Bastien-Olvera, B. & Moore, F. Persistent effect of temperature on GDP identified from lower frequency temperature variability. Environ. Res. Lett. 17 , 084038 (2022).

Eyring, V. et al. Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev. 9 , 1937–1958 (2016).

Byers, E. et al. AR6 scenarios database. Zenodo https://zenodo.org/records/7197970 (2022).

Burke, M., Davis, W. M. & Diffenbaugh, N. S. Large potential reduction in economic damages under UN mitigation targets. Nature 557 , 549–553 (2018).

Kotz, M., Wenz, L. & Levermann, A. Footprint of greenhouse forcing in daily temperature variability. Proc. Natl Acad. Sci. 118 , e2103294118 (2021).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Myhre, G. et al. Frequency of extreme precipitation increases extensively with event rareness under global warming. Sci. Rep. 9 , 16063 (2019).

Min, S.-K., Zhang, X., Zwiers, F. W. & Hegerl, G. C. Human contribution to more-intense precipitation extremes. Nature 470 , 378–381 (2011).

England, M. R., Eisenman, I., Lutsko, N. J. & Wagner, T. J. The recent emergence of Arctic Amplification. Geophys. Res. Lett. 48 , e2021GL094086 (2021).

Fischer, E. M. & Knutti, R. Anthropogenic contribution to global occurrence of heavy-precipitation and high-temperature extremes. Nat. Clim. Change 5 , 560–564 (2015).

Pfahl, S., O’Gorman, P. A. & Fischer, E. M. Understanding the regional pattern of projected future changes in extreme precipitation. Nat. Clim. Change 7 , 423–427 (2017).

Callahan, C. W. & Mankin, J. S. Globally unequal effect of extreme heat on economic growth. Sci. Adv. 8 , eadd3726 (2022).

Diffenbaugh, N. S. & Burke, M. Global warming has increased global economic inequality. Proc. Natl Acad. Sci. 116 , 9808–9813 (2019).

Callahan, C. W. & Mankin, J. S. National attribution of historical climate damages. Clim. Change 172 , 40 (2022).

Burke, M. & Tanutama, V. Climatic constraints on aggregate economic output. National Bureau of Economic Research, Working Paper 25779. https://doi.org/10.3386/w25779 (2019).

Kahn, M. E. et al. Long-term macroeconomic effects of climate change: a cross-country analysis. Energy Econ. 104 , 105624 (2021).

Desmet, K. et al. Evaluating the economic cost of coastal flooding. National Bureau of Economic Research, Working Paper 24918. https://doi.org/10.3386/w24918 (2018).

Hsiang, S. M. & Jina, A. S. The causal effect of environmental catastrophe on long-run economic growth: evidence from 6,700 cyclones. National Bureau of Economic Research, Working Paper 20352. https://doi.org/10.3386/w2035 (2014).

Ritchie, P. D. et al. Shifts in national land use and food production in Great Britain after a climate tipping point. Nat. Food 1 , 76–83 (2020).

Dietz, S., Rising, J., Stoerk, T. & Wagner, G. Economic impacts of tipping points in the climate system. Proc. Natl Acad. Sci. 118 , e2103081118 (2021).

Bastien-Olvera, B. A. & Moore, F. C. Use and non-use value of nature and the social cost of carbon. Nat. Sustain. 4 , 101–108 (2021).

Carleton, T. et al. Valuing the global mortality consequences of climate change accounting for adaptation costs and benefits. Q. J. Econ. 137 , 2037–2105 (2022).

Bastien-Olvera, B. A. et al. Unequal climate impacts on global values of natural capital. Nature 625 , 722–727 (2024).

Malik, A. et al. Impacts of climate change and extreme weather on food supply chains cascade across sectors and regions in Australia. Nat. Food 3 , 631–643 (2022).

Article   ADS   PubMed   Google Scholar  

Kuhla, K., Willner, S. N., Otto, C., Geiger, T. & Levermann, A. Ripple resonance amplifies economic welfare loss from weather extremes. Environ. Res. Lett. 16 , 114010 (2021).

Schleypen, J. R., Mistry, M. N., Saeed, F. & Dasgupta, S. Sharing the burden: quantifying climate change spillovers in the European Union under the Paris Agreement. Spat. Econ. Anal. 17 , 67–82 (2022).

Dasgupta, S., Bosello, F., De Cian, E. & Mistry, M. Global temperature effects on economic activity and equity: a spatial analysis. European Institute on Economics and the Environment, Working Paper 22-1 (2022).

Neal, T. The importance of external weather effects in projecting the macroeconomic impacts of climate change. UNSW Economics Working Paper 2023-09 (2023).

Deryugina, T. & Hsiang, S. M. Does the environment still matter? Daily temperature and income in the United States. National Bureau of Economic Research, Working Paper 20750. https://doi.org/10.3386/w20750 (2014).

Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146 , 1999–2049 (2020).

Cucchi, M. et al. WFDE5: bias-adjusted ERA5 reanalysis data for impact studies. Earth Syst. Sci. Data 12 , 2097–2120 (2020).

Adler, R. et al. The New Version 2.3 of the Global Precipitation Climatology Project (GPCP) Monthly Analysis Product 1072–1084 (University of Maryland, 2016).

Lange, S. Trend-preserving bias adjustment and statistical downscaling with ISIMIP3BASD (v1.0). Geosci. Model Dev. 12 , 3055–3070 (2019).

Wenz, L., Carr, R. D., Kögel, N., Kotz, M. & Kalkuhl, M. DOSE – global data set of reported sub-national economic output. Sci. Data 10 , 425 (2023).

Article   PubMed   PubMed Central   Google Scholar  

Gennaioli, N., La Porta, R., Lopez De Silanes, F. & Shleifer, A. Growth in regions. J. Econ. Growth 19 , 259–309 (2014).

Board of Governors of the Federal Reserve System (US). U.S. dollars to euro spot exchange rate. https://fred.stlouisfed.org/series/AEXUSEU (2022).

World Bank. GDP deflator. https://data.worldbank.org/indicator/NY.GDP.DEFL.ZS (2022).

Jones, B. & O’Neill, B. C. Spatially explicit global population scenarios consistent with the Shared Socioeconomic Pathways. Environ. Res. Lett. 11 , 084003 (2016).

Murakami, D. & Yamagata, Y. Estimation of gridded population and GDP scenarios with spatially explicit statistical downscaling. Sustainability 11 , 2106 (2019).

Koch, J. & Leimbach, M. Update of SSP GDP projections: capturing recent changes in national accounting, PPP conversion and Covid 19 impacts. Ecol. Econ. 206 (2023).

Carleton, T. A. & Hsiang, S. M. Social and economic impacts of climate. Science 353 , aad9837 (2016).

Article   PubMed   Google Scholar  

Bergé, L. Efficient estimation of maximum likelihood models with multiple fixed-effects: the R package FENmlm. DEM Discussion Paper Series 18-13 (2018).

Kalkuhl, M., Kotz, M. & Wenz, L. DOSE - The MCC-PIK Database Of Subnational Economic output. Zenodo https://zenodo.org/doi/10.5281/zenodo.4681305 (2021).

Kotz, M., Wenz, L. & Levermann, A. Data and code for “The economic commitment of climate change”. Zenodo https://zenodo.org/doi/10.5281/zenodo.10562951 (2024).

Dasgupta, S. et al. Effects of climate change on combined labour productivity and supply: an empirical, multi-model study. Lancet Planet. Health 5 , e455–e465 (2021).

Lobell, D. B. et al. The critical role of extreme heat for maize production in the United States. Nat. Clim. Change 3 , 497–501 (2013).

Zhao, C. et al. Temperature increase reduces global yields of major crops in four independent estimates. Proc. Natl Acad. Sci. 114 , 9326–9331 (2017).

Wheeler, T. R., Craufurd, P. Q., Ellis, R. H., Porter, J. R. & Prasad, P. V. Temperature variability and the yield of annual crops. Agric. Ecosyst. Environ. 82 , 159–167 (2000).

Rowhani, P., Lobell, D. B., Linderman, M. & Ramankutty, N. Climate variability and crop production in Tanzania. Agric. For. Meteorol. 151 , 449–460 (2011).

Ceglar, A., Toreti, A., Lecerf, R., Van der Velde, M. & Dentener, F. Impact of meteorological drivers on regional inter-annual crop yield variability in France. Agric. For. Meteorol. 216 , 58–67 (2016).

Shi, L., Kloog, I., Zanobetti, A., Liu, P. & Schwartz, J. D. Impacts of temperature and its variability on mortality in New England. Nat. Clim. Change 5 , 988–991 (2015).

Xue, T., Zhu, T., Zheng, Y. & Zhang, Q. Declines in mental health associated with air pollution and temperature variability in China. Nat. Commun. 10 , 2165 (2019).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Liang, X.-Z. et al. Determining climate effects on US total agricultural productivity. Proc. Natl Acad. Sci. 114 , E2285–E2292 (2017).

Desbureaux, S. & Rodella, A.-S. Drought in the city: the economic impact of water scarcity in Latin American metropolitan areas. World Dev. 114 , 13–27 (2019).

Damania, R. The economics of water scarcity and variability. Oxf. Rev. Econ. Policy 36 , 24–44 (2020).

Davenport, F. V., Burke, M. & Diffenbaugh, N. S. Contribution of historical precipitation change to US flood damages. Proc. Natl Acad. Sci. 118 , e2017524118 (2021).

Dave, R., Subramanian, S. S. & Bhatia, U. Extreme precipitation induced concurrent events trigger prolonged disruptions in regional road networks. Environ. Res. Lett. 16 , 104050 (2021).

Download references

Acknowledgements

We gratefully acknowledge financing from the Volkswagen Foundation and the Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) GmbH on behalf of the Government of the Federal Republic of Germany and Federal Ministry for Economic Cooperation and Development (BMZ).

Open access funding provided by Potsdam-Institut für Klimafolgenforschung (PIK) e.V.

Author information

Authors and affiliations.

Research Domain IV, Research Domain IV, Potsdam Institute for Climate Impact Research, Potsdam, Germany

Maximilian Kotz, Anders Levermann & Leonie Wenz

Institute of Physics, Potsdam University, Potsdam, Germany

Maximilian Kotz & Anders Levermann

Mercator Research Institute on Global Commons and Climate Change, Berlin, Germany

Leonie Wenz

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the design of the analysis. M.K. conducted the analysis and produced the figures. All authors contributed to the interpretation and presentation of the results. M.K. and L.W. wrote the manuscript.

Corresponding author

Correspondence to Leonie Wenz .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature thanks Xin-Zhong Liang, Chad Thackeray and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 constraining the persistence of historical climate impacts on economic growth rates..

The results of a panel-based fixed-effects distributed lag model for the effects of annual mean temperature ( a ), daily temperature variability ( b ), total annual precipitation ( c ), the number of wet days ( d ) and extreme daily precipitation ( e ) on sub-national economic growth rates. Point estimates show the effects of a 1 °C or one standard deviation increase (for temperature and precipitation variables, respectively) at the lower quartile, median and upper quartile of the relevant moderating variable (green, orange and purple, respectively) at different lagged periods after the initial shock (note that these are not cumulative effects). Climate variables are used in their first-differenced form (see main text for discussion) and the moderating climate variables are the annual mean temperature, seasonal temperature difference, total annual precipitation, number of wet days and annual mean temperature, respectively, in panels a – e (see Methods for further discussion). Error bars show the 95% confidence intervals having clustered standard errors by region. The within-region R 2 , Bayesian and Akaike information criteria for the model are shown at the top of the figure. This figure shows results with ten lags for each variable to demonstrate the observed levels of persistence, but our preferred specifications remove later lags based on the statistical significance of terms shown above and the information criteria shown in Extended Data Fig. 2 . The resulting models without later lags are shown in Supplementary Figs. 1 – 3 .

Extended Data Fig. 2 Incremental lag-selection procedure using information criteria and within-region R 2 .

Starting from a panel-based fixed-effects distributed lag model estimating the effects of climate on economic growth using the real historical data (as in equation ( 4 )) with ten lags for all climate variables (as shown in Extended Data Fig. 1 ), lags are incrementally removed for one climate variable at a time. The resulting Bayesian and Akaike information criteria are shown in a – e and f – j , respectively, and the within-region R 2 and number of observations in k – o and p – t , respectively. Different rows show the results when removing lags from different climate variables, ordered from top to bottom as annual mean temperature, daily temperature variability, total annual precipitation, the number of wet days and extreme annual precipitation. Information criteria show minima at approximately four lags for precipitation variables and ten to eight for temperature variables, indicating that including these numbers of lags does not lead to overfitting. See Supplementary Table 1 for an assessment using information criteria to determine whether including further climate variables causes overfitting.

Extended Data Fig. 3 Damages in our preferred specification that provides a robust lower bound on the persistence of climate impacts on economic growth versus damages in specifications of pure growth or pure level effects.

Estimates of future damages as shown in Fig. 1 but under the emission scenario RCP8.5 for three separate empirical specifications: in orange our preferred specification, which provides an empirical lower bound on the persistence of climate impacts on economic growth rates while avoiding assumptions of infinite persistence (see main text for further discussion); in purple a specification of ‘pure growth effects’ in which the first difference of climate variables is not taken and no lagged climate variables are included (the baseline specification of ref.  2 ); and in pink a specification of ‘pure level effects’ in which the first difference of climate variables is taken but no lagged terms are included.

Extended Data Fig. 4 Climate changes in different variables as a function of historical interannual variability.

Changes in each climate variable of interest from 1979–2019 to 2035–2065 under the high-emission scenario SSP5-RCP8.5, expressed as a percentage of the historical variability of each measure. Historical variability is estimated as the standard deviation of each detrended climate variable over the period 1979–2019 during which the empirical models were identified (detrending is appropriate because of the inclusion of region-specific linear time trends in the empirical models). See Supplementary Fig. 13 for changes expressed in standard units. Data on national administrative boundaries are obtained from the GADM database version 3.6 and are freely available for academic use ( https://gadm.org/ ).

Extended Data Fig. 5 Contribution of different climate variables to overall committed damages.

a , Climate damages in 2049 when using empirical models that account for all climate variables, changes in annual mean temperature only or changes in both annual mean temperature and one other climate variable (daily temperature variability, total annual precipitation, the number of wet days and extreme daily precipitation, respectively). b , The cumulative marginal effects of an increase in annual mean temperature of 1 °C, at different baseline temperatures, estimated from empirical models including all climate variables or annual mean temperature only. Estimates and uncertainty bars represent the median and 95% confidence intervals obtained from 1,000 block-bootstrap resamples from each of three different empirical models using eight, nine or ten lags of temperature terms.

Extended Data Fig. 6 The difference in committed damages between the upper and lower quartiles of countries when ranked by GDP and cumulative historical emissions.

Quartiles are defined using a population weighting, as are the average committed damages across each quartile group. The violin plots indicate the distribution of differences between quartiles across the two extreme emission scenarios (RCP2.6 and RCP8.5) and the uncertainty sampling procedure outlined in Methods , which accounts for uncertainty arising from the choice of lags in the empirical models, uncertainty in the empirical model parameter estimates, as well as the climate model projections. Bars indicate the median, as well as the 10th and 90th percentiles and upper and lower sixths of the distribution reflecting the very likely and likely ranges following the likelihood classification adopted by the IPCC.

Supplementary information

Supplementary information, peer review file, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Kotz, M., Levermann, A. & Wenz, L. The economic commitment of climate change. Nature 628 , 551–557 (2024). https://doi.org/10.1038/s41586-024-07219-0

Download citation

Received : 25 January 2023

Accepted : 21 February 2024

Published : 17 April 2024

Issue Date : 18 April 2024

DOI : https://doi.org/10.1038/s41586-024-07219-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

null hypothesis on climate change

IMAGES

  1. The Best Visualizations on Climate Change Facts

    null hypothesis on climate change

  2. The UCL Climate Action Unit

    null hypothesis on climate change

  3. Null Hypothesis

    null hypothesis on climate change

  4. Explainer: How the rise and fall of CO2 levels influenced the ice ages

    null hypothesis on climate change

  5. Theory, Hypothesis, And Law

    null hypothesis on climate change

  6. Climate Change: Consequences and Repercussions

    null hypothesis on climate change

VIDEO

  1. How Capitalism Caused our Climate Problem

  2. Uncovering Earth's Ancient Cataclysm: The Younger Dryas Impact Hypothesis

  3. Climate Change and National Security--let's test it!

  4. Scientists Have Discovered Why Earth Transformed Into A Colossal Icy Sphere 700 Million Years Ago

COMMENTS

  1. Nullifying the climate null hypothesis

    It is argued that the statement of a null hypothesis is not particularly useful in the broader context of the scientific inferences surrounding the topic of the attribution of climate change and also policy decisions. WIREs Clim Change 2011, 2:919-924. doi: 10.1002/wcc.141. This article is categorized under: Paleoclimates and Current Trends ...

  2. The Philosophy of Climate Science

    In practice, the challenge is to define an appropriate null hypothesis (the expected behaviour of the system in the absence of changing external influences), against which the observed outcomes can be tested. Because the climate system is a dynamical system with processes and feedbacks operating on all scales, this is a non-trivial exercise ...

  3. Hypothesis Testing and Climate Change

    Final Thoughts. Hypothesis testing provides a reliable framework for making data decisions about the population of interest. It helps the researcher to successfully extrapolate data from the sample to the larger population. Whether you are working on models to solve climate change issues, or to predict the value of stocks in the next month, you ...

  4. Global concurrent climate extremes exacerbated by ...

    Anthropogenic climate change alters the mean and variability of climate variables as well as the strength of dependence between climate ... The statistical framework identifies the spatial dependence of climate extremes when the null hypothesis that climate extremes occur independently at two grid cells is rejected at a confidence level of 95% ...

  5. Climate Change, Health and Existential Risks to Civilization: A

    Climate Change, Health and Existential Risks to Civilization: A Comprehensive Review (1989-2013) ... If there is a bias from scoring this paper as two it is towards the null hypothesis. In 2006 a widely cited paper stated "Other important climatic risks to health, from changes in regional food yields, disruption of fisheries, loss of ...

  6. Trend Analysis

    The null hypothesis is no trend (ie, an unchanging climate). The non-parametric ... For example, the well known surface temperature record from the Climate Research Unit which spans 1850-present, shows an undeniable long-term warming trend. However, there are short term negative trends of 10-15 years embedded within this series. ...

  7. Attribution of climate extreme events

    By starting from the null hypothesis of no climate change, the conventional approach to extreme-event attribution has to re-establish an anthropogenic influence for each kind of event ab initio ...

  8. Testing ensembles of climate change scenarios for "statistical

    In case of climate change scenarios (for example, simulated differences between future and present precipitation), tests of the null hypothesis that there is no change on average, H o : Δ μ P = 0, versus the alternative hypothesis H a : Δ μ P > 0, are often applied to ensembles of scenarios constructed from multiple climate models.

  9. Nullifying the climate null hypothesis

    Abstract: Past attribution studies of climate change have assumed a null hypothesis of no role of human activities. The challenge, then, is to prove that there is an anthropogenic component. I argue that because global warming is "unequivocal" and 'very likely' caused by human activities, the reverse should now be the case. ...

  10. Overestimated global warming over the past 20 years

    The horizontal dashed lines indicate the threshold below which we reject the null hypothesis. Full size image. The evidence, therefore, indicates that the current generation of climate models ...

  11. Observed changes in dry-season water availability attributed ...

    The null hypothesis is that there is no signal in the observations resulting from human-induced climate change, and therefore corr(mdl,hist and obs) is only a consequence of natural climate ...

  12. Climate change and the null hypothesis

    Climate change and the null hypothesis An excellent post by my colleague John Nielsen-Gammon, the Texas State climatologist, can be found here.An excerpt:

  13. Bringing physical reasoning into statistical practice in climate-change

    Section 2 starts by reprising the pitfalls of 'null hypothesis significance testing' (NHST); ... Then the logical thing to have done would have been to determine the Bayes factor comparing the hypothesis of continued climate change to that of a cessation to climate change. If the Bayes factor was of order unity, then the data would not have ...

  14. The human cause of climate change: Where does

    Currently the null hypothesis for climate change attribution research is that humans have no influence. "Humans are changing our climate. There is no doubt whatsoever," said Trenberth.

  15. Is climate change amplifying heat waves? Yes

    "Now, if we have an extreme heat wave, the null hypothesis is, 'Climate change is making that worse,'" said Andrew Dessler, a professor of atmospheric sciences at Texas A&M University.

  16. Climate change now detectable from any single day of weather ...

    The fingerprint of climate change is detected from any single day in the observed global record since early 2012, and since 1999 on the basis of a year of data. Detection is robust even when ...

  17. Null Hypothesis Testing ≠ Scientific Inference: A Critique of the Shaky

    This is now standard practice in the IPCC's climate change assessments ... However, the AfIR is typically discussed within the rubric of null hypothesis testing, wherein questions of consequences, utilities, and decision criteria are addressed informally if at all, and the decision in question is whether to accept or reject the hypothesis under ...

  18. Disconnects between ecological theory and data in phenological ...

    Synchrony baseline is a hypothesis that before climate change, the most energetically demanding period of the consumer was at the same time of peak resource availability, and thus consumer fitness ...

  19. Thermal Enhancement on Planetary Bodies and the ...

    A new null hypothesis of global warming or climate change is therefore proposed and argued for; one which does not include any anomalous or net warming from greenhouse gases in the tropospheric ...

  20. Climate Change 2023: Synthesis Report

    The much-anticipated Climate Change 2023: Synthesis Report is based on years of work by hundreds of scientists during the Intergovernmental Panel on Climate Change's (IPCC) sixth assessment cycle which began in 2015. The report provides the main scientific input to COP28 and the Global Stocktake at the end of this year, when countries will review progress towards the Paris Agreement goals.

  21. The economic commitment of climate change

    Global projections of macroeconomic climate-change damages typically consider impacts from average annual and national temperatures over long time horizons1-6. Here we use recent empirical ...