Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 18 May 2022

Sensitivity analysis in clinical trials: three criteria for a valid sensitivity analysis

  • Sameer Parpia 1 , 2 ,
  • Tim P. Morris 3 ,
  • Mark R. Phillips   ORCID: orcid.org/0000-0003-0923-261X 2 ,
  • Charles C. Wykoff 4 , 5 ,
  • David H. Steel   ORCID: orcid.org/0000-0001-8734-3089 6 , 7 ,
  • Lehana Thabane   ORCID: orcid.org/0000-0003-0355-9734 2 , 8 ,
  • Mohit Bhandari   ORCID: orcid.org/0000-0001-9608-4808 2 , 9 &
  • Varun Chaudhary   ORCID: orcid.org/0000-0002-9988-4146 2 , 9

for the Retina Evidence Trials InterNational Alliance (R.E.T.I.N.A.) Study Group

Eye volume  36 ,  pages 2073–2074 ( 2022 ) Cite this article

13k Accesses

6 Citations

27 Altmetric

Metrics details

  • Outcomes research

What is a sensitivity analysis?

Randomized clinical trials are a tool to generate high-quality evidence of efficacy and safety for new interventions. The statistical analysis plan (SAP) of a trial is generally pre-specified and documented prior to seeing outcome data, and it is encouraged that researchers follow the pre-specified analysis plan. The process of pre-specification of the primary analysis involves making assumptions about methods, models, and data that may not be supported by the final trial data. Sensitivity analysis examines the robustness of the result by conducting the analyses under a range of plausible assumptions about the methods, models, or data that differ from the assumptions used in the pre-specified primary analysis. If the results of the sensitivity analyses are consistent with the primary results, researchers can be confident that the assumptions made for the primary analysis have had little impact on the results, giving strength to the trial findings. Recent guidance documents for statistical principles have emphasized the importance of sensitivity analysis in clinical trials to ensure a robust assessment of the observed results [ 1 ].

When is a sensitivity analysis valid?

While the importance of conducting sensitivity analysis has been widely acknowledged, what constitutes a valid sensitivity analysis has been unclear. To address this ambiguity, Morris et al. proposed a framework to conduct such analysis [ 2 ] and suggest that a particular analysis can be classified as a sensitivity analysis if it meets the following criteria: (1) the proposed analysis aims to answer the same question as to the primary analysis, (2) there is a possibility that the proposed analysis will lead to conclusions that differ from those of the primary analysis, and (3) there would be uncertainty as to which analysis to believe if the proposed analysis led to different conclusions than the primary analysis. These criteria can guide the conduct of sensitivity analysis and indicate what to consider when interpreting sensitivity analysis.

Criterion 1: do the sensitivity and primary analysis answer the same question?

The first criterion aims to ascertain whether the question being answered by the two analyses is the same. If the analysis addresses a different question than the primary question, then it should be referred to as a supplementary (or secondary) analysis. This may seem obvious, but it is important to consider, as if the questions being answered are different, the results could lead to unwarranted uncertainty regarding the robustness of the primary conclusions.

This misconception is commonly observed in trials where a primary analysis according to intention-to-treat (ITT) principle is followed by a per-protocol (PP) analysis, which many consider a sensitivity analysis. The ITT analysis considers the effect of a decision to treat regardless of if the treatment was received, while the PP analysis considers the effect of actually receiving treatment as intended. While the results of the PP analysis may be of value to certain stakeholders, the PP analysis is not a sensitivity analysis to a primary ITT analysis. Because the analyses address two distinct questions, it would not be surprising if the results differ. However, failure to appreciate that they ask different questions could lead to confusion over the robustness of the primary conclusions.

Criterion 2: could the sensitivity analysis yield different results than the primary analysis?

The second criterion relates to the assumptions made for the sensitivity analysis; if these assumptions will always lead to conclusions that are equivalent to the primary analysis, then we have learned nothing about the true sensitivity of the trial conclusion. Thus, a sensitivity analysis must be designed under a reasonable assumption that the findings could potentially differ from the primary analysis.

Consider the sensitivity analysis utilized in the LEAVO trial that assessed the effect of aflibercept and bevacizumab versus ranibizumab for patients with macular oedema secondary to central retinal vein occlusion [ 3 ]. The primary outcome of this study evaluated best-corrected visual acuity (BCVA) change from baseline for aflibercept, or bevacizumab, versus ranibizumab. At the end of the study, the primary outcome of the trial, BCVA score, was missing in some patients. For the purposes of imputation of the missing data, the investigators considered a range of values (from −20 to 20) as assumed values for the mean difference in BCVA scores between patients with observed and missing data. An example of this criterion not being met would be if a mean difference of 0 was used to impute BCVA scores for the missing patients, as it would be equivalent to re-running the primary analysis, leading to similar conclusions as to the primary analysis. This would provide a misleading belief in the robustness of results, as the “sensitivity” analysis conducted did not actually fulfill the appropriate criterion to be labeled as such.

On the other hand, modifying the assumptions to differ from the primary analysis by varying mean difference from −20 to 20 provides a useful analysis to assess the sensitivity of the primary analysis under a range of possible values that the missing participants may have had. One could reasonably postulate that assuming a mean change in BCVA scores of −20 to 20 to impute missing data could impact the primary analysis findings, as these values range from what one might consider a “best” and “worst” case scenario for the results observed among participants with missing data. In the LEAVO trial the authors demonstrated that, under these scenarios, the results of the sensitivity analysis support the primary conclusions of the trial.

Criterion 3: what should one believe if the sensitivity and primary analyses differ?

The third criterion assesses whether there would be uncertainty as to which analysis is to be believed if the proposed analysis leads to a different conclusion than the primary analysis. If one analysis will always be believed over another, then it is not worthwhile performing the analysis that will not be believed as it is impossible for that analysis to change our understanding of the outcome. Consider a trial in which an individual is randomized to intervention or control, and the primary outcome is measured for each eye. Because the results from each eye within a given patient are not independent, if researchers perform analyses both accounting for and not accounting for this dependence, it is clear that the analysis accounting for the dependence will be preferred. This is not a proper sensitivity analysis. In this situation, the analysis accounting for the dependence should be the primary analysis and the analysis not accounting for the dependence should not be performed, or be designated a secondary outcome.

Conclusions

Sensitivity analyses are important to perform in order to assess the robustness of the conclusions of the trial. It is critical to distinguish between sensitivity and supplementary or other analysis, and the above three criteria can inform an understanding of what constitutes a sensitivity analysis. Often, sensitivity analyses are underreported in published reports, making it difficult to assess whether appropriate sensitivity analyses were performed. We recommend that sensitivity analysis be considered a key part of any clinical trial SAP and be consistently and clearly reported with trial outcomes.

Food and Drug Administration. E9 (R1) statistical principles for clinical trials: addendum: estimands and sensitivity analysis in clinical trials. Guidance for Industry. May 2021.

Morris TP, Kahan BC, White IR. Choosing sensitivity analyses for randomised trials: principles. BMC Med Res Methodol. 2014;14:1–5. https://doi.org/10.1186/1471-2288-14-11 .

Article   CAS   Google Scholar  

Hykin P, Prevost AT, Vasconcelos JC, Murphy C, Kelly J, Ramu J, et al. Clinical effectiveness of intravitreal therapy with ranibizumab vs aflibercept vs bevacizumab for macular edema secondary to central retinal vein occlusion: a randomized clinical trial. JAMA Ophthalmol. 2019;137:1256–64. https://doi.org/10.1001/jamaophthalmol.2019.3305 .

Article   PubMed   PubMed Central   Google Scholar  

Download references

Author information

Authors and affiliations.

Department of Oncology, McMaster University, Hamilton, ON, Canada

Sameer Parpia

Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada

Sameer Parpia, Mark R. Phillips, Lehana Thabane, Mohit Bhandari & Varun Chaudhary

MRC Clinical Trials Unit, University College London, London, UK

Tim P. Morris

Retina Consultants of Texas (Retina Consultants of America), Houston, TX, USA

Charles C. Wykoff

Blanton Eye Institute, Houston Methodist Hospital, Houston, TX, USA

Sunderland Eye Infirmary, Sunderland, UK

David H. Steel

Biosciences Institute, Newcastle University, Newcastle Upon Tyne, UK

Biostatistics Unit, St. Joseph’s Healthcare-Hamilton, Hamilton, ON, Canada

Lehana Thabane

Department of Surgery, McMaster University, Hamilton, ON, Canada

Mohit Bhandari & Varun Chaudhary

NIHR Moorfields Biomedical Research Centre, Moorfields Eye Hospital, London, UK

Sobha Sivaprasad

Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Peter Kaiser

Retinal Disorders and Ophthalmic Genetics, Stein Eye Institute, University of California, Los Angeles, CA, USA

David Sarraf

Department of Ophthalmology, Mayo Clinic, Rochester, MN, USA

Sophie J. Bakri

The Retina Service at Wills Eye Hospital, Philadelphia, PA, USA

Sunir J. Garg

Center for Ophthalmic Bioinformatics, Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Rishi P. Singh

Cleveland Clinic Lerner College of Medicine, Cleveland, OH, USA

Department of Ophthalmology, University of Bonn, Bonn, Germany

Frank G. Holz

Singapore Eye Research Institute, Singapore, Singapore

Tien Y. Wong

Singapore National Eye Centre, Duke-NUD Medical School, Singapore, Singapore

Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia

Robyn H. Guymer

Department of Surgery (Ophthalmology), The University of Melbourne, Melbourne, VIC, Australia

You can also search for this author in PubMed   Google Scholar

  • Varun Chaudhary
  • , Mohit Bhandari
  • , Charles C. Wykoff
  • , Sobha Sivaprasad
  • , Lehana Thabane
  • , Peter Kaiser
  • , David Sarraf
  • , Sophie J. Bakri
  • , Sunir J. Garg
  • , Rishi P. Singh
  • , Frank G. Holz
  • , Tien Y. Wong
  •  & Robyn H. Guymer

Contributions

SP was responsible for writing, critical review, and feedback on the manuscript. TPM was responsible for writing, critical review, and feedback on the manuscript. MRP was responsible for conception of idea, writing, critical review, and feedback on the manuscript. CCW was responsible for critical review and feedback on the manuscript. DHS was responsible for critical review and feedback on the manuscript. LT was responsible for critical review and feedback on the manuscript. MB was responsible for conception of idea, critical review, and feedback on the manuscript. VC was responsible for conception of idea, critical review, and feedback on the manuscript.

Corresponding author

Correspondence to Varun Chaudhary .

Ethics declarations

Competing interests.

SP: nothing to disclose. TPM: nothing to disclose. MRP: nothing to disclose. CCW: consultant: Acuela, Adverum Biotechnologies, Inc, Aerpio, Alimera Sciences, Allegro Ophthalmics, LLC, Allergan, Apellis Pharmaceuticals, Bayer AG, Chengdu Kanghong Pharmaceuticals Group Co, Ltd, Clearside Biomedical, DORC (Dutch Ophthalmic Research Center), EyePoint Pharmaceuticals, Gentech/Roche, GyroscopeTx, IVERIC bio, Kodiak Sciences Inc, Novartis AG, ONL Therapeutics, Oxurion NV, PolyPhotonix, Recens Medical, Regeneron Pharmaceuticals, Inc, REGENXBIO Inc, Santen Pharmaceutical Co, Ltd, and Takeda Pharmaceutical Company Limited; research funds: Adverum Biotechnologies, Inc, Aerie Pharmaceuticals, Inc, Aerpio, Alimera Sciences, Allergan, Apellis Pharmaceuticals, Chengdu Kanghong Pharmaceutical Group Co, Ltd, Clearside Biomedical, Gemini Therapeutics, Genentech/Roche, Graybug Vision, Inc, GyroscopeTx, Ionis Pharmaceuticals, IVERIC bio, Kodiak Sciences Inc, Neurotech LLC, Novartis AG, Opthea, Outlook Therapeutics, Inc, Recens Medical, Regeneron Pharmaceuticals, Inc, REGENXBIO Inc, Samsung Pharm Co, Ltd, Santen Pharmaceutical Co, Ltd, and Xbrane Biopharma AB—unrelated to this study. DHS: consultant: Gyroscope, Roche, Alcon, BVI; research funding for IIS: Alcon, Bayer, DORC, Gyroscope, Boehringer-Ingelheim—unrelated to this study. LT: nothing to disclose. MB: research funds: Pendopharm, Bioventus, Acumed—unrelated to this study. VC: advisory board member: Alcon, Roche, Bayer, Novartis; Grants: Bayer, Novartis—unrelated to this study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Parpia, S., Morris, T.P., Phillips, M.R. et al. Sensitivity analysis in clinical trials: three criteria for a valid sensitivity analysis. Eye 36 , 2073–2074 (2022). https://doi.org/10.1038/s41433-022-02108-0

Download citation

Received : 06 May 2022

Revised : 09 May 2022

Accepted : 12 May 2022

Published : 18 May 2022

Issue Date : November 2022

DOI : https://doi.org/10.1038/s41433-022-02108-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

sensitivity analysis research definition

9.7   Sensitivity analyses

The process of undertaking a systematic review involves a sequence of decisions. Whilst many of these decisions are clearly objective and non-contentious, some will be somewhat arbitrary or unclear. For instance, if inclusion criteria involve a numerical value, the choice of value is usually arbitrary: for example, defining groups of older people may reasonably have lower limits of 60, 65, 70 or 75 years, or any value in between. Other decisions may be unclear because a study report fails to include the required information. Some decisions are unclear because the included studies themselves never obtained the information required: for example, the outcomes of those who unfortunately were lost to follow-up. Further decisions are unclear because there is no consensus on the best statistical method to use for a particular problem.

It is desirable to prove that the findings from a systematic review are not dependent on such arbitrary or unclear decisions. A sensitivity analysis is a repeat of the primary analysis or meta-analysis, substituting alternative decisions or ranges of values for decisions that were arbitrary or unclear. For example, if the eligibility of some studies in the meta-analysis is dubious because they do not contain full details, sensitivity analysis may involve undertaking the meta-analysis twice: first, including all studies and second, only including those that are definitely known to be eligible. A sensitivity analysis asks the question, “Are the findings robust to the decisions made in the process of obtaining them?”.

There are many decision nodes within the systematic review process which can generate a need for a sensitivity analysis. Examples include:

Searching for studies:

Should abstracts whose results cannot be confirmed in subsequent publications be included in the review?

Eligibility criteria:

Characteristics of participants: where a majority but not all people in a study meet an age range, should the study be included?

Characteristics of the intervention: what range of doses should be included in the meta-analysis?

Characteristics of the comparator: what criteria are required to define usual care to be used as a comparator group?

Characteristics of the outcome: what time-point or range of time-points are eligible for inclusion?

Study design: should blinded and unblinded outcome assessment be included, or should study inclusion be restricted by other aspects of methodological criteria?

What data should be analysed?

Time-to-event data: what assumptions of the distribution of censored data should be made?

Continuous data: where standard deviations are missing, when and how should they be imputed? Should analyses be based on change scores or on final values?

Ordinal scales: what cut-point should be used to dichotomize short ordinal scales into two groups?

Cluster-randomized trials: what values of the intraclass correlation coefficient should be used when trial analyses have not been adjusted for clustering?

Cross-over trials: what values of the within-subject correlation coefficient should be used when this is not available in primary reports?

All analyses: what assumptions should be made about missing outcomes to facilitate intention-to-treat analyses? Should adjusted or unadjusted estimates of treatment effects used?

Analysis methods:

Should fixed-effect or random-effects methods be used for the analysis?

For dichotomous outcomes, should odds ratios, risk ratios or risk differences be used?

And for continuous outcomes, where several scales have assessed the same dimension, should results be analysed as a standardized mean difference across all scales or as mean differences individually for each scale?

Some sensitivity analyses can be pre-specified in the study protocol, but many issues suitable for sensitivity analysis are only identified during the review process where the individual peculiarities of the studies under investigation are identified. When sensitivity analyses show that the overall result and conclusions are not affected by the different decisions that could be made during the review process, the results of the review can be regarded with a higher degree of certainty. Where sensitivity analyses identify particular decisions or missing information that greatly influence the findings of the review, greater resources can be deployed to try and resolve uncertainties and obtain extra information, possibly through contacting trial authors and obtained individual patient data. If this cannot be achieved, the results must be interpreted with an appropriate degree of caution. Such findings may generate proposals for further investigations and future research.

Reporting of sensitivity analyses in a systematic review may best be done by producing a summary table. Rarely is it informative to produce individual forest plots for each sensitivity analysis undertaken.

Sensitivity analyses are sometimes confused with subgroup analysis. Although some sensitivity analyses involve restricting the analysis to a subset of the totality of studies, the two methods differ in two ways. First, sensitivity analyses do not attempt to estimate the effect of the intervention in the group of studies removed from the analysis, whereas in subgroup analyses, estimates are produced for each subgroup. Second, in sensitivity analyses, informal comparisons are made between different ways of estimating the same thing, whereas in subgroup analyses, formal statistical comparisons are made across the subgroups.

  • Open access
  • Published: 16 July 2013

A tutorial on sensitivity analyses in clinical trials: the what, why, when and how

  • Lehana Thabane 1 , 2 , 3 , 4 , 5 ,
  • Lawrence Mbuagbaw 1 , 4 ,
  • Shiyuan Zhang 1 , 4 ,
  • Zainab Samaan 1 , 6 , 7 ,
  • Maura Marcucci 1 , 4 ,
  • Chenglin Ye 1 , 4 ,
  • Marroon Thabane 1 , 8 ,
  • Lora Giangregorio 9 ,
  • Brittany Dennis 1 , 4 ,
  • Daisy Kosa 1 , 4 , 10 ,
  • Victoria Borg Debono 1 , 4 ,
  • Rejane Dillenburg 11 ,
  • Vincent Fruci 12 ,
  • Monica Bawor 13 ,
  • Juneyoung Lee 14 ,
  • George Wells 15 &
  • Charles H Goldsmith 1 , 4 , 16  

BMC Medical Research Methodology volume  13 , Article number:  92 ( 2013 ) Cite this article

274k Accesses

452 Citations

114 Altmetric

Metrics details

Sensitivity analyses play a crucial role in assessing the robustness of the findings or conclusions based on primary analyses of data in clinical trials. They are a critical way to assess the impact, effect or influence of key assumptions or variations—such as different methods of analysis, definitions of outcomes, protocol deviations, missing data, and outliers—on the overall conclusions of a study.

The current paper is the second in a series of tutorial-type manuscripts intended to discuss and clarify aspects related to key methodological issues in the design and analysis of clinical trials.

In this paper we will provide a detailed exploration of the key aspects of sensitivity analyses including: 1) what sensitivity analyses are, why they are needed, and how often they are used in practice; 2) the different types of sensitivity analyses that one can do, with examples from the literature; 3) some frequently asked questions about sensitivity analyses; and 4) some suggestions on how to report the results of sensitivity analyses in clinical trials.

When reporting on a clinical trial, we recommend including planned or posthoc sensitivity analyses, the corresponding rationale and results along with the discussion of the consequences of these analyses on the overall findings of the study.

Peer Review reports

The credibility or interpretation of the results of clinical trials relies on the validity of the methods of analysis or models used and their corresponding assumptions. An astute researcher or reader may be less confident in the findings of a study if they believe that the analysis or assumptions made were not appropriate. For a primary analysis of data from a prospective randomized controlled trial (RCT), the key questions for investigators (and for readers) include:

How confident can I be about the results?

Will the results change if I change the definition of the outcome (e.g., using different cut-off points)?

Will the results change if I change the method of analysis?

Will the results change if we take missing data into account? Will the method of handling missing data lead to different conclusions?

How much influence will minor protocol deviations have on the conclusions?

How will ignoring the serial correlation of measurements within a patient impact the results?

What if the data were assumed to have a non-Normal distribution or there were outliers?

Will the results change if one looks at subgroups of patients?

Will the results change if the full intervention is received (i.e. degree of compliance)?

The above questions can be addressed by performing sensitivity analyses—testing the effect of these “changes” on the observed results. If, after performing sensitivity analyses the findings are consistent with those from the primary analysis and would lead to similar conclusions about treatment effect, the researcher is reassured that the underlying factor(s) had little or no influence or impact on the primary conclusions. In this situation, the results or the conclusions are said to be “robust”.

The objectives of this paper are to provide an overview of how to approach sensitivity analyses in clinical trials. This is the second in a series of tutorial-type manuscripts intended to discuss and clarify aspects related to some key methodological issues in the design and analysis of clinical trials. The first was on pilot studies [ 1 ]. We start by describing what sensitivity analysis is, why it is needed and how often it is done in practice. We then describe the different types of sensitivity analyses that one can do, with examples from the literature. We also address some of the commonly asked questions about sensitivity analysis and provide some guidance on how to report sensitivity analyses.

Sensitivity Analysis

What is a sensitivity analysis in clinical research.

Sensitivity Analysis (SA) is defined as “a method to determine the robustness of an assessment by examining the extent to which results are affected by changes in methods, models, values of unmeasured variables, or assumptions” with the aim of identifying “results that are most dependent on questionable or unsupported assumptions” [ 2 ]. It has also been defined as “a series of analyses of a data set to assess whether altering any of the assumptions made leads to different final interpretations or conclusions” [ 3 ]. Essentially, SA addresses the “what-if-the-key-inputs-or-assumptions-changed”-type of question. If we want to know whether the results change when something about the way we approach the data analysis changes, we can make the change in our analysis approach and document the changes in the results or conclusions. For more detailed coverage of SA, we refer the reader to these references [ 4 – 7 ].

Why is sensitivity analysis necessary?

The design and analysis of clinical trials often rely on assumptions that may have some effect, influence or impact on the conclusions if they are not met. It is important to assess these effects through sensitivity analyses. Consistency between the results of primary analysis and the results of sensitivity analysis may strengthen the conclusions or credibility of the findings. However, it is important to note that the definition of consistency may depend in part on the area of investigation, the outcome of interest or even the implications of the findings or results.

It is equally important to assess the robustness to ensure appropriate interpretation of the results taking into account the things that may have an impact on them. Thus, it imperative for every analytic plan to have some sensitivity analyses built into it.

The United States (US) Food and Drug Administration (FDA) and the European Medicines Association (EMEA), which offer guidance on Statistical Principles for Clinical Trials, state that “it is important to evaluate the robustness of the results and primary conclusions of the trial.” Robustness refers to “the sensitivity of the overall conclusions to various limitations of the data, assumptions, and analytic approaches to data analysis” [ 8 ]. The United Kingdom (UK) National Institute of Health and Clinical Excellence (NICE) also recommends the use of sensitivity analysis in “exploring alternative scenarios and the uncertainty in cost-effectiveness results” [ 9 ].

How often is sensitivity analysis reported in practice?

To evaluate how often sensitivity analyses are used in medical and health research, we surveyed the January 2012 editions of major medical journals (British Medical Journal, New England Journal of Medicine, the Lancet, Journal of the American Medical Association and the Canadian Medical Association Journal) and major health economics journals (Pharmaco-economics, Medical Decision making, European Journal of Health Economics, Health Economics and the Journal of Health Economics). From every article that included some form of statistical analyses, we evaluated: i) the percentage of published articles that reported results of some sensitivity analyses; and ii) the types of sensitivity analyses that were performed. Table  1 provides a summary of the findings. Overall, the point prevalent use of sensitivity analyses is about 26.7% (36/135) —which seems very low. A higher percentage of papers published in health economics than in medical journals (30.8% vs. 20.3%) reported some sensitivity analyses. Among the papers in medical journals, 18 (28.1%) were RCTs, of which only 3 (16.6%) reported sensitivity analyses. Assessing robustness of the findings to different methods of analysis was the most common type of sensitivity analysis reported in both types of journals. Therefore despite their importance, sensitivity analyses are under-used in practice. Further, sensitivity analyses are more common in health economics research—for example in conducting cost-effectiveness analyses, cost-utility analyses or budget-impact analyses—than in other areas of health or medical research.

Types of sensitivity analyses

In this section, we describe scenarios that may require sensitivity analyses, and how one could use sensitivity analyses to assess the robustness of the statistical analyses or findings of RCTs. These are not meant to be exhaustive, but rather to illustrate common situations where sensitivity analyses might be useful to consider (Table  2 ). In each case, we provide examples of actual studies where sensitivity analyses were performed, and the implications of these sensitivity analyses.

Impact of outliers

An outlier is an observation that is numerically distant from the rest of the data. It deviates markedly from the rest of the sample from which it comes [ 14 , 15 ]. Outliers are usually exceptional cases in a sample. The problem with outliers is that they can deflate or inflate the mean of a sample and therefore influence any estimates of treatment effect or association that are derived from the mean. To assess the potential impact of outliers, one would first assess whether or not any observations meet the definition of an outlier—using either a boxplot or z-scores [ 16 ]. Second, one could perform a sensitivity analysis with and without the outliers.

In a cost–utility analysis of a practice-based osteopathy clinic for subacute spinal pain, Williams et al. reported lower costs per quality of life year ratios when they excluded outliers [ 17 ]. In other words, there were certain participants in the trial whose costs were very high, and were making the average costs look higher than they probably were in reality. The observed cost per quality of life year was not robust to the exclusion of outliers, and changed when they were excluded.

A primary analysis based on the intention-to-treat principle showed no statistically significant differences in reducing depression between a nurse-led cognitive self-help intervention program compared to standard care among 218 patients hospitalized with angina over 6 months. Some sensitivity analyses in this trial were performed by excluding participants with high baseline levels of depression (outliers) and showed a statistically significant reduction in depression in the intervention group compared to the control. This implies that the results of the primary analysis were affected by the presence of patients with baseline high depression [ 18 ].

Impact of non-compliance or protocol deviations

In clinical trials some participants may not adhere to the intervention they were allocated to receive or comply with the scheduled treatment visits. Non-adherence or non-compliance is a form of protocol deviation. Other types of protocol deviations include switching between intervention and control arms (i.e. treatment switching or crossovers) [ 19 , 20 ], or not implementing the intervention as prescribed (i.e. intervention fidelity) [ 21 , 22 ].

Protocol deviations are very common in interventional research [ 23 – 25 ]. The potential impact of protocol deviations is the dilution of the treatment effect [ 26 , 27 ]. Therefore, it is crucial to determine the robustness of the results to the inclusion of data from participants who deviate from the protocol. Typically, for RCTs the primary analysis is based on an intention-to-treat (ITT) principle—in which participants are analyzed according to the arm to which they were randomized, irrespective of whether they actually received the treatment or completed the prescribed regimen [ 28 , 29 ]. Two common types of sensitivity analyses can be performed to assess the robustness of the results to protocol deviations: 1) per-protocol (PP) analysis—in which participants who violate the protocol are excluded from the analysis [ 30 ]; and 2) as-treated (AT) analysis—in which participants are analyzed according to the treatment they actually received [ 30 ]. The PP analysis provides the ideal scenario in which all the participants comply, and is more likely to show an effect; whereas the ITT analysis provides a “real life” scenario, in which some participants do not comply. It is more conservative, and less likely to show that the intervention is effective. For trials with repeated measures, some protocol violations which lead to missing data can be dealt with alternatively. This is covered in more detail in the next section.

A trial was designed to investigate the effects of an electronic screening and brief intervention to change risky drinking behaviour in university students. The results of the ITT analysis (on all 2336 participants who answered the follow-up survey) showed that the intervention had no significant effect. However, a sensitivity analysis based on the PP analysis (including only those with risky drinking at baseline and who answered the follow-up survey; n = 408) suggested a small beneficial effect on weekly alcohol consumption [ 31 ]. A reader might be less confident in the findings of the trial because of the inconsistency between the ITT and PP analyses—the ITT was not robust to sensitivity analyses. A researcher might choose to explore differences in the characteristics of the participants who were included in the ITT versus the PP analyses.

A study compared the long-term effects of surgical versus non-surgical management of chronic back pain. Both the ITT and AT analyses showed no significant difference between the two management strategies [ 32 ]. A reader would be more confident in the findings because the ITT and AT analyses were consistent—the ITT was robust to sensitivity analyses.

Impact of missing data

Missing data are common in every research study. This is a problem that can be broadly defined as “missing some information on the phenomena in which we are interested” [ 33 ]. Data can be missing for different reasons including (1) non-response in surveys due to lack of interest, lack of time, nonsensical responses, and coding errors in data entry/transfer; (2) incompleteness of data in large data registries due to missing appointments, not everyone is captured in the database, and incomplete data; and (3) missingness in prospective studies as a result of loss to follow up, dropouts, non-adherence, missing doses, and data entry errors.

The choice of how to deal with missing data would depend on the mechanisms of missingness. In this regard, data can be missing at random (MAR), missing not at random (MNAR), or missing completely at random (MCAR). When data are MAR, the missing data are dependent on some other observed variables rather than any unobserved one. For example, consider a trial to investigate the effect of pre-pregnancy calcium supplementation on hypertensive disorders in pregnancy. Missing data on the hypertensive disorders is dependent (conditional) on being pregnant in the first place. When data are MCAR, the cases with missing data may be considered a random sample drawn from all the cases. In other words, there is no “cause” of missingness. Consider the example of a trial comparing a new cancer treatment to standard treatment in which participants are followed at 4, 8, 12 and 16 months. If a participant misses the follow up at the 8th and 16th months and these are unrelated to the outcome of interest, in this case mortality, then this missing data is MCAR. Reasons such as a clinic staff being ill or equipment failure are often unrelated to the outcome of interest. However, the MCAR assumption is often challenging to prove because the reason data is missing may not be known and therefore it is difficult to determine if it is related to the outcome of interest. When data are MNAR, missingness is dependent on some unobserved data. For example, in the case above, if the participant missed the 8th month appointment because he was feeling worse or the 16th month appointment because he was dead, the missingness is dependent on the data not observed because the participant was absent. When data are MAR or MCAR, they are often referred to as ignorable (provided the cause of MAR is taken into account). MNAR on the other hand, is nonignorable missingness. Ignoring the missingness in such data leads to biased parameter estimates [ 34 ]. Ignoring missing data in analyses can have implications on the reliability, validity and generalizability of research findings.

The best way to deal with missing data is prevention, by steps taken in the design and data collection stages, some of which have been described by Little et al. [ 35 ]. But this is difficult to achieve in most cases. There are two main approaches to handling missing data: i) ignore them—and use complete case analysis; and ii) impute them—using either single or multiple imputation techniques. Imputation is one of the most commonly used approaches to handling missing data. Examples of single imputation methods include hot deck, cold deck method, mean imputation, regression technique, last observation carried forward (LOCF) and composite methods—which uses a combination of the above methods to impute missing values. Single imputation methods often lead to biased estimates and under-estimation of the true variability in the data. Multiple imputation (MI) technique is currently the best available method of dealing with missing data under the assumption that data are missing at random (MAR) [ 33 , 36 – 38 ]. MI addresses the limitations of single imputation by using multiple imputed datasets which yield unbiased estimates, and also accounts for the within- and between-dataset variability. Bayesian methods using statistical models that assume a prior distribution for the missing data can also be used to impute data [ 35 ].

It is important to note that ignoring missing data in the analysis would be implicitly assuming that the data are MCAR, an assumption that is often hard to verify in reality.

There are some statistical approaches to dealing with missing data that do not necessarily require formal imputation methods. For example, in studies using continuous outcomes, linear mixed models for repeated measures are used for analyzing outcomes measured repeatedly over time [ 39 , 40 ]. For categorical responses or count data, generalized estimating equations [GEE] and random-effects generalized linear mixed models [GLMM] methods may be used [ 41 , 42 ]. In these models it is assumed that missing data are MAR. If this assumption is valid, then the complete-case analysis by including predictors of missing observations will provide consistent estimates of the parameter.

The choice of whether to ignore or impute missing data, and how to impute it, may affect the findings of the trial. Although one approach (ignore or impute, and if the latter, how to impute) should be made a priori, a sensitivity analysis can be done with a different approach to see how “robust” the primary analysis is to the chosen method for handling missing data.

A 2011 paper reported the sensitivity analyses of different strategies for imputing missing data in cluster RCTs with a binary outcome using the community hypertension assessment trial (CHAT) as an example. They found that variance in the treatment effect was underestimated when the amount of missing data was large and the imputation strategy did not take into account the intra-cluster correlation. However, the effects of the intervention under various methods of imputation were similar. The CHAT intervention was not superior to usual care [ 43 ].

In a trial comparing methotrexate with to placebo in the treatment of psoriatic arthritis, the authors reported both an intention-to-treat analysis (using multiple imputation techniques to account for missing data) and a complete case analysis (ignoring the missing data). The complete case analysis, which is less conservative, showed some borderline improvement in the primary outcome (psoriatic arthritis response criteria), while the intention-to-treat analysis did not [ 44 ]. A reader would be less confident about the effects of methotrexate on psoriatic arthritis, due to the discrepancy between the results with imputed data (ITT) and the complete case analysis.

Impact of different definitions of outcomes (e.g. different cut-off points for binary outcomes)

Often, an outcome is defined by achieving or not achieving a certain level or threshold of a measure. For example in a study measuring adherence rates to medication, levels of adherence can be dichotomized as achieving or not achieving at least 80%, 85% or 90% of pills taken. The choice of the level a participant has to achieve can affect the outcome—it might be harder to achieve 90% adherence than 80%. Therefore, a sensitivity analysis could be performed to see how redefining the threshold changes the observed effect of a given intervention.

In a trial comparing caspofungin to amphotericin B for febrile neutropoenic patients, a sensitivity analysis was conducted to investigate the impact of different definitions of fever resolution as part of a composite endpoint which included: resolution of any baseline invasive fungal infection, no breakthrough invasive fungal infection, survival, no premature discontinuation of study drug, and fever resolution for 48 hours during the period of neutropenia. They found that response rates were higher when less stringent fever resolution definitions were used, especially in low-risk patients. The modified definitions of fever resolution were: no fever for 24 hours before the resolution of neutropenia; no fever at the 7-day post-therapy follow-up visit; and removal of fever resolution completely from the composite endpoint. This implies that the efficacy of both medications depends somewhat on the definition of the outcomes [ 45 ].

In a phase II trial comparing minocycline and creatinine to placebo for Parkinson’s disease, a sensitivity analysis was conducted based on another definition (threshold) for futility. In the primary analysis a predetermined futility threshold was set at 30% reduction in mean change in Unified Parkinson’s Disease Rating Scale (UPDRS) score, derived from historical control data. If minocycline or creatinine did not bring about at least a 30% reduction in UPDRS score, they would be considered as futile and no further testing will be conducted. Based on the data derived from the current control (placebo) group, a new threshold of 32.4% (more stringent) was used for the sensitivity analysis. The findings from the primary analysis and the sensitivity analysis both confirmed that that neither creatine nor minocycline could be rejected as futile and should both be tested in Phase III trials [ 46 ]. A reader would be more confident of these robust findings.

Impact of different methods of analysis to account for clustering or correlation

Interventions can be administered to individuals, but they can also be administered to clusters of individuals, or naturally occurring groups. For example, one might give an intervention to students in one class, and compare their outcomes to students in another class – the class is the cluster. Clusters can also be patients treated by the same physician, physicians in the same practice center or hospital, or participants living in the same community. Likewise, in the same trial, participants may be recruited from multiple sites or centers. Each of these centers will represent a cluster. Patients or elements within a cluster often have some appreciable degree of homogeneity as compared to patients between clusters. In other words, members of the same cluster are more likely to be similar to each other than they are to members of another cluster, and this similarity may then be reflected in the similarity or correlation measure, on the outcome of interest.

There are several methods of accounting or adjusting for similarities within clusters, or “clustering” in studies where this phenomenon is expected or exists as part of the design (e.g., in cluster randomization trials). Therefore, in assessing the impact of clustering one can build into the analytic plans two forms of sensitivity analyses: i) analysis with and without taking clustering into account—comparing the analysis that ignores clustering (i.e. assumes that the data are independent) to one primary method chosen to account for clustering; ii) analysis that compares several methods of accounting for clustering.

Correlated data may also occur in longitudinal studies through repeat or multiple measurements from the same patient, taken over time or based on multiple responses in a single survey. Ignoring the potential correlation between several measurements from an individual can lead to inaccurate conclusions [ 47 ].

Here are a few references to studies that compared the outcomes that resulted when different methods were/were not used to account for clustering. Noteworthy, is the fact that the analytical approaches for cluster-RCTs and multi-site RCTs are similar.

Ma et al. performed sensitivity analyses of different methods of analysing cluster RCTs [ 48 ]. In this paper they compared three cluster-level methods (un-weighted linear regression, weighted linear regression and random-effects meta-regression) to six individual level analysis methods (standard logistic regression, robust standard errors approach, GEE, random effects meta-analytic approach, random-effects logistic regression and Bayesian random-effects regression). Using data from the CHAT trial, in this analysis, all nine methods provided similar results, re-enforcing the hypothesis that the CHAT intervention was not superior to usual care.

Peters et al. conducted sensitivity analyses to compare different methods—three cluster-level (un-weighted regression of practice log odds, regression of log odds weighted by their inverse variance and random-effects meta-regression of log odds with cluster as a random effect) and five individual-level methods (standard logistic regression ignoring clustering, robust standard errors, GEE, random-effects logistic regression and Bayesian random-effects logistic regression.)—for analyzing cluster randomized trials using an example involving a factorial design [ 13 ]. In this analysis, they demonstrated that the methods used in the analysis of cluster randomized trials could give varying results, with standard logistic regression ignoring clustering being the least conservative.

Cheng et al. used sensitivity analyses to compare different methods (six models for clustered binary outcomes and three models for clustered nominal outcomes) of analysing correlated data in discrete choice surveys [ 49 ]. The results were robust to various statistical models, but showed more variability in the presence of a larger cluster effect (higher within-patient correlation).

A trial evaluated the effects of lansoprazole on gastro-esophageal reflux disease in children from 19 clinics with asthma. The primary analysis was based on GEE to determine the effect of lansoprazole in reducing asthma symptoms. Subsequently they performed a sensitivity analysis by including the study site as a covariate. Their finding that lansoprazole did not significantly improve symptoms was robust to this sensitivity analysis [ 50 ].

In addition to comparing the performance of different methods to estimate treatment effects on a continuous outcome in simulated multicenter randomized controlled trials [ 12 ], the authors used data from the Computerization of Medical Practices for the Enhancement of Therapeutic Effectiveness (COMPETE) II [ 51 ] to assess the robustness of the primary results (based on GEE to adjust for clustering by provider of care) under different methods of adjusting for clustering. The results, which showed that a shared electronic decision support system improved care and outcomes in diabetic patients, were robust under different methods of analysis.

Impact of competing risks in analysis of trials with composite outcomes

A competing risk event happens in situations where multiple events are likely to occur in a way that the occurrence of one event may prevent other events from being observed [ 48 ]. For example, in a trial using a composite of death, myocardial infarction or stroke, if someone dies, they cannot experience a subsequent event, or stroke or myocardial infarction—death can be a competing risk event. Similarly, death can be a competing risk in trials of patients with malignant diseases where thrombotic events are important. There are several options for dealing with competing risks in survival analyses: (1) to perform a survival analysis for each event separately, where the other competing event(s) is/are treated as censored; the common representation of survival curves using the Kaplan-Meier estimator is in this context replaced by the cumulative incidence function (CIF) which offers a better interpretation of the incidence curve for one risk, regardless of whether the competing risks are independent; (2) to use a proportional sub-distribution hazard model (Fine & Grey approach) in which subjects that experience other competing events are kept in the risk set for the event of interest (i.e. as if they could later experience the event); (3) to fit one model, rather than separate models, taking into account all the competing risks together (Lunn-McNeill approach) [ 13 ]. Therefore, the best approach to assessing the influence of a competing risk would be to plan for sensitivity analysis that adjusts for the competing risk event.

A previously-reported trial compared low molecular weight heparin (LMWH) with oral anticoagulant therapy for the prevention of recurrent venous thromboembolism (VTE) in patients with advanced cancer, and a subsequent study presented sensitivity analyses comparing the results from standard survival analysis (Kaplan-Meier method) with those from competing risk methods—namely, the cumulative incidence function (CIF) and Gray's test [ 52 ]. The results using both methods were similar. This strengthened their confidence in the conclusion that LMWH reduced the risk of recurrent VTE.

For patients at increased risk of end stage renal disease (ESRD) but also of premature death not related to ESRD, such as patients with diabetes or with vascular disease, analyses considering the two events as different outcomes may be misleading if the possibility of dying before the development of ESRD is not taken into account [ 49 ]. Different studies performing sensitivity analyses demonstrated that the results on predictors of ESRD and death for any cause were dependent on whether the competing risks were taken into account or not [ 53 , 54 ], and on which competing risk method was used [ 55 ]. These studies further highlight the need for a sensitivity analysis of competing risks when they are present in trials.

Impact of baseline imbalance in RCTs

In RCTs, randomization is used to balance the expected distribution of the baseline or prognostic characteristics of the patients in all treatment arms. Therefore the primary analysis is typically based on ITT approach unadjusted for baseline characteristics. However, some residual imbalance can still occur by chance. One can perform a sensitivity analysis by using a multivariable analysis to adjust for hypothesized residual baseline imbalances to assess their impact on effect estimates.

A paper presented a simulation study where the risk of the outcome, effect of the treatment, power and prevalence of the prognostic factors, and sample size were all varied to evaluate their effects on the treatment estimates. Logistic regression models were compared with and without adjustment for the prognostic factors. The study concluded that the probability of prognostic imbalance in small trials could be substantial. Also, covariate adjustment improved estimation accuracy and statistical power [ 56 ].

In a trial testing the effectiveness of enhanced communication therapy for aphasia and dysarthria after stroke, the authors conducted a sensitivity analysis to adjust for baseline imbalances. Both primary and sensitivity analysis showed that enhanced communication therapy had no additional benefit [ 57 ].

Impact of distributional assumptions

Most statistical analyses rely on distributional assumptions for observed data (e.g. Normal distribution for continuous outcomes, Poisson distribution for count data, or binomial distribution for binary outcome data). It is important not only to test for goodness-of-fit for these distributions, but to also plan for sensitivity analyses using other suitable distributions. For example, for continuous data, one can redo the analysis assuming a Student-T distribution—which is symmetric, bell-shaped distribution like the Normal distribution, but with thicker tails; for count data, once can use the Negative-binomial distribution—which would be useful to assess the robustness of the results if over-dispersion is accounted for [ 52 ]. Bayesian analyses routinely include sensitivity analyses to assess the robustness of findings under different models for the data and prior distributions [ 58 ]. Analyses based on parametric methods—which often rely on strong distributional assumptions—may also need to be evaluated for robustness using non-parametric methods. The latter often make less stringent distributional assumptions. However, it is essential to note that in general non-parametric methods are less efficient (i.e. have less statistical power) than their parametric counter-parts if the data are Normally distributed.

Ma et al. performed sensitivity analyses based on Bayesian and classical methods for analysing cluster RCTs with a binary outcome in the CHAT trial. The similarities in the results after using the different methods confirmed the results of the primary analysis: the CHAT intervention was not superior to usual care [ 10 ].

A negative binomial regression model was used [ 52 ] to analyze discrete outcome data from a clinical trial designed to evaluate the effectiveness of a pre-habilitation program in preventing functional decline among physically frail, community-living older persons. The negative binomial model provided an improved fit to the data than the Poisson regression model. The negative binomial model provides an alternative approach for analyzing discrete data where over-dispersion is a problem [ 59 ].

Commonly asked questions about sensitivity analyses

Q: Do I need to adjust the overall level of significance for performing sensitivity analyses?

A: No. Sensitivity analysis is typically a re-analysis of either the same outcome using different approaches, or different definitions of the outcome—with the primary goal of assessing how these changes impact the conclusions. Essentially everything else including the criterion for statistical significance needs to be kept constant so that we can assess whether any impact is attributable to underlying sensitivity analyses.

Q: Do I have to report all the results of the sensitivity analyses?

A: Yes, especially if the results are different or lead to different a conclusion from the original results—whose sensitivity was being assessed. However, if the results remain robust (i.e. unchanged), then a brief statement to this effect may suffice.

Q: Can I perform sensitivity analyses posthoc?

A: It is desirable to document all planned analyses including sensitivity analyses in the protocol a priori . Sometimes, one cannot anticipate all the challenges that can occur during the conduct of a study that may require additional sensitivity analyses. In that case, one needs to incorporate the anticipated sensitivity analyses in the statistical analysis plan (SAP), which needs to be completed before analyzing the data. Clear rationale is needed for every sensitivity analysis. This may also occur posthoc .

Q: How do I choose between the results of different sensitivity analyses? (i.e. which results are the best?)

A: The goal of sensitivity analyses is not to select the “best” results. Rather, the aim is to assess the robustness or consistency of the results under different methods, subgroups, definitions, assumptions and so on. The assessment of robustness is often based on the magnitude, direction or statistical significance of the estimates. You cannot use the sensitivity analysis to choose an alternate conclusion to your study. Rather, you can state the conclusion based on your primary analysis, and present your sensitivity analysis as an example of how confident you are that it represents the truth. If the sensitivity analysis suggests that the primary analysis is not robust, it may point to the need for future research that might address the source of the inconsistency. Your study cannot answer the question which results are best? To answer the question of which method is best and under what conditions, simulation studies comparing the different approaches on the basis of bias, precision, coverage or efficiency may be necessary.

Q: When should one perform sensitivity analysis?

A: The default position should be to plan for sensitivity analysis in every clinical trial. Thus, all studies need to include some sensitivity analysis to check the robustness of the primary findings. All statistical methods used to analyze data from clinical trials rely on assumptions—which need to either be tested whenever possible, with the results assessed for robustness through some sensitivity analyses. Similarly, missing data or protocol deviations are common occurrences in many trials and their impact on inferences needs to be assessed.

Q: How many sensitivity analyses can one perform for a single primary analysis?

A: The number is not an important factor in determining what sensitivity analyses to perform. The most important factor is the rationale for doing any sensitivity analysis. Understanding the nature of the data, and having some content expertise are useful in determining which and how many sensitivity analyses to perform. For example, varying the ways of dealing with missing data is unlikely to change the results if 1% of data are missing. Likewise, understanding the distribution of certain variables can help to determine which cut points would be relevant. Typically, it is advisable to limit sensitivity analyses to the primary outcome. Conducting multiple sensitivity analysis on all outcomes is often neither practical, nor necessary.

Q: How many factors can I vary in performing sensitivity analyses?

A: Ideally, one can study the impact of all key elements using a factorial design—which would allow the assessment of the impact of individual and joint factors. Alternatively, one can vary one factor at a time to be able to assess whether the factor is responsible for the resulting impact (if any). For example, in a sensitivity analysis to assess the impact of the Normality assumption (analysis assuming Normality e.g. T-test vs. analysis without assuming Normality e.g. Based on a sign test) and outlier (analysis with and without outlier), this can be achieved through 2x2 factorial design.

Q: What is the difference between secondary analyses and sensitivity analyses?

A: Secondary analyses are typically analyses of secondary outcomes. Like primary analyses which deal with primary outcome(s), such analyses need to be documented in the protocol or SAP. In most studies such analyses are exploratory—because most studies are not powered for secondary outcomes. They serve to provide support that the effects reported in the primary outcome are consistent with underlying biology. They are different from sensitivity analyses as described above.

Q: What is the difference between subgroup analyses and sensitivity analyses?

A: Subgroup analyses are intended to assess whether the effect is similar across specified groups of patients or modified by certain patient characteristics [ 60 ]. If the primary results are statistically significant, subgroup analyses are intended to assess whether the observed effect is consistent across the underlying patient subgroups—which may be viewed as some form of sensitivity analysis. In general, for subgroup analyses one is interested in the results for each subgroup, whereas in subgroup “sensitivity” analyses, one is interested in the similarity of results across subgroups (ie. robustness across subgroups). Typically subgroup analyses require specification of the subgroup hypothesis and rationale, and performed through inclusion of an interaction term (i.e. of the subgroup variable x main exposure variable) in the regression model. They may also require adjustment for alpha—the overall level of significance. Furthermore, most studies are not usually powered for subgroup analyses.

Reporting of sensitivity analyses

There has been considerable attention paid to enhancing the transparency of reporting of clinical trials. This has led to several reporting guidelines, starting with the CONSORT Statement [ 61 ] in 1996 and its extensions [ http://www.equator-network.org ]. Not one of these guidelines specifically addresses how sensitivity analyses need to be reported. On the other hand, there is some guidance on how sensitivity analyses need to be reported in economic analyses [ 62 ]—which may partly explain the differential rates of reporting of sensitivity analyses shown in Table  1 . We strongly encourage some modifications of all reporting guidelines to include items on sensitivity analyses—as a way to enhance their use and reporting. The proposed reporting changes can be as follows:

In Methods Section: Report the planned or posthoc sensitivity analyses and rationale for each.

In Results Section: Report whether or not the results of the sensitivity analyses or conclusions are similar to those based on primary analysis. If similar, just state that the results or conclusions remain robust. If different, report the results of the sensitivity analyses along with the primary results.

In Discussion Section: Discuss the key limitations and implications of the results of the sensitivity analyses on the conclusions or findings. This can be done by describing what changes the sensitivity analyses bring to the interpretation of the data, and whether the sensitivity analyses are more stringent or more relaxed than the primary analysis.

Some concluding remarks

Sensitivity analyses play an important role is checking the robustness of the conclusions from clinical trials. They are important in interpreting or establishing the credibility of the findings. If the results remain robust under different assumptions, methods or scenarios, this can strengthen their credibility. The results of our brief survey of January 2012 editions of major medical and health economics journals that show that their use is very low. We recommend that some sensitivity analysis should be the default plan in statistical or economic analyses of any clinical trial. Investigators need to identify any key assumptions, variations, or methods that may impact or influence the findings, and plan to conduct some sensitivity analyses as part of their analytic strategy. The final report must include the documentation of the planned or posthoc sensitivity analyses, rationale, corresponding results and a discussion of their consequences or repercussions on the overall findings.

Abbreviations

  • Sensitivity analysis

United States

Food and Drug Administration

European Medicines Association

United Kingdom

National Institute of Health and Clinical Excellence

Randomized controlled trial

Intention-to-treat

Per-protocol

Last observation carried forward

Multiple imputation

Missing at random

Generalized estimating equations

Generalized linear mixed models

Community hypertension assessment trial

Prostate specific antigen

Cumulative incidence function

End stage renal disease

Instrumental variable

Analysis of covariance

Statistical analysis plan

Consolidated Standards of Reporting Trials.

Thabane L, Ma J, Chu R, Cheng J, Ismaila A, Rios LP, Robson R, Thabane M, Giangregorio L, Goldsmith CH: A tutorial on pilot studies: the what, why and how. BMC Med Res Methodol. 2010, 10: 1-10.1186/1471-2288-10-1.

Article   PubMed   PubMed Central   Google Scholar  

Schneeweiss S: Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol Drug Saf. 2006, 15 (5): 291-303. 10.1002/pds.1200.

Article   PubMed   Google Scholar  

Viel JF, Pobel D, Carre A: Incidence of leukaemia in young people around the La Hague nuclear waste reprocessing plant: a sensitivity analysis. Stat Med. 1995, 14 (21–22): 2459-2472.

Article   CAS   PubMed   Google Scholar  

Goldsmith CH, Gafni A, Drummond MF, Torrance GW, Stoddart GL: Sensitivity Analysis and Experimental Design: The Case of Economic Evaluation of Health Care Programmes. Proceedings of the Third Canadian Conference on Health Economics 1986. 1987, Winnipeg MB: The University of Manitoba Press

Google Scholar  

Saltelli A, Tarantola S, Campolongo F, Ratto M: Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models. 2004, New York, NY: Willey

Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, Saisana M, Tarantola S: Global Sensitivity Analysis: The Primer. 2008, New York, NY: Wiley-Interscience

Hunink MGM, Glasziou PP, Siegel JE, Weeks JC, Pliskin JS, Elstein AS, Weinstein MC: Decision Making in Health and Medicine: Integrating Evidence and Values. 2001, Cambridge: Cambridge University Press

USFDA: International Conference on Harmonisation; Guidance on Statistical Principles for Clinical Trials. Guideline E9. Statistical principles for clinical trials. Federal Register, 16 September 1998, Vol. 63, No. 179, p. 49583. [ http://www.fda.gov/downloads/RegulatoryInformation/Guidances/UCM129505.pdf ],

NICE: Guide to the methods of technology appraisal. [ http://www.nice.org.uk/media/b52/a7/tamethodsguideupdatedjune2008.pdf ],

Ma J, Thabane L, Kaczorowski J, Chambers L, Dolovich L, Karwalajtys T, Levitt C: Comparison of Bayesian and classical methods in the analysis of cluster randomized controlled trials with a binary outcome: the Community Hypertension Assessment Trial (CHAT). BMC Med Res Methodol. 2009, 9: 37-10.1186/1471-2288-9-37.

Peters TJ, Richards SH, Bankhead CR, Ades AE, Sterne JA: Comparison of methods for analysing cluster randomized trials: an example involving a factorial design. Int J Epidemiol. 2003, 32 (5): 840-846. 10.1093/ije/dyg228.

Chu R, Thabane L, Ma J, Holbrook A, Pullenayegum E, Devereaux PJ: Comparing methods to estimate treatment effects on a continuous outcome in multicentre randomized controlled trials: a simulation study. BMC Med Res Methodol. 2011, 11: 21-10.1186/1471-2288-11-21.

Kleinbaum DG, Klein M: Survival Analysis – A-Self Learning Text. 2012, Springer, 3

Barnett V, Lewis T: Outliers in Statistical Data. 1994, John Wiley & Sons, 3

Grubbs FE: Procedures for detecting outlying observations in samples. Technometrics. 1969, 11: 1-21. 10.1080/00401706.1969.10490657.

Article   Google Scholar  

Thabane L, Akhtar-Danesh N: Guidelines for reporting descriptive statistics in health research. Nurse Res. 2008, 15 (2): 72-81.

Williams NH, Edwards RT, Linck P, Muntz R, Hibbs R, Wilkinson C, Russell I, Russell D, Hounsome B: Cost-utility analysis of osteopathy in primary care: results from a pragmatic randomized controlled trial. Fam Pract. 2004, 21 (6): 643-650. 10.1093/fampra/cmh612.

Zetta S, Smith K, Jones M, Allcoat P, Sullivan F: Evaluating the Angina Plan in Patients Admitted to Hospital with Angina: A Randomized Controlled Trial. Cardiovascular Therapeutics. 2011, 29 (2): 112-124. 10.1111/j.1755-5922.2009.00109.x.

Morden JP, Lambert PC, Latimer N, Abrams KR, Wailoo AJ: Assessing methods for dealing with treatment switching in randomised controlled trials: a simulation study. BMC Med Res Methodol. 2011, 11: 4-10.1186/1471-2288-11-4.

White IR, Walker S, Babiker AG, Darbyshire JH: Impact of treatment changes on the interpretation of the Concorde trial. AIDS. 1997, 11 (8): 999-1006. 10.1097/00002030-199708000-00008.

Borrelli B: The assessment, monitoring, and enhancement of treatment fidelity in public health clinical trials. J Public Health Dent. 2011, 71 (Suppl 1): S52-S63.

Article   PubMed Central   Google Scholar  

Lawton J, Jenkins N, Darbyshire JL, Holman RR, Farmer AJ, Hallowell N: Challenges of maintaining research protocol fidelity in a clinical care setting: a qualitative study of the experiences and views of patients and staff participating in a randomized controlled trial. Trials. 2011, 12: 108-10.1186/1745-6215-12-108.

Ye C, Giangregorio L, Holbrook A, Pullenayegum E, Goldsmith CH, Thabane L: Data withdrawal in randomized controlled trials: Defining the problem and proposing solutions: a commentary. Contemp Clin Trials. 2011, 32 (3): 318-322. 10.1016/j.cct.2011.01.016.

Horwitz RI, Horwitz SM: Adherence to treatment and health outcomes. Arch Intern Med. 1993, 153 (16): 1863-1868. 10.1001/archinte.1993.00410160017001.

Peduzzi P, Wittes J, Detre K, Holford T: Analysis as-randomized and the problem of non-adherence: an example from the Veterans Affairs Randomized Trial of Coronary Artery Bypass Surgery. Stat Med. 1993, 12 (13): 1185-1195. 10.1002/sim.4780121302.

Montori VM, Guyatt GH: Intention-to-treat principle. CMAJ. 2001, 165 (10): 1339-1341.

CAS   PubMed   PubMed Central   Google Scholar  

Gibaldi M, Sullivan S: Intention-to-treat analysis in randomized trials: who gets counted?. J Clin Pharmacol. 1997, 37 (8): 667-672. 10.1002/j.1552-4604.1997.tb04353.x.

Porta M: A dictionary of epidemiology. 2008, Oxford: Oxford University Press, Inc, 5

Everitt B: Medical statistics from A to Z. 2006, Cambridge: Cambridge University Press, 2

Book   Google Scholar  

Sainani KL: Making sense of intention-to-treat. PM R. 2010, 2 (3): 209-213. 10.1016/j.pmrj.2010.01.004.

Bendtsen P, McCambridge J, Bendtsen M, Karlsson N, Nilsen P: Effectiveness of a proactive mail-based alcohol internet intervention for university students: dismantling the assessment and feedback components in a randomized controlled trial. J Med Internet Res. 2012, 14 (5): e142-10.2196/jmir.2062.

Brox JI, Nygaard OP, Holm I, Keller A, Ingebrigtsen T, Reikeras O: Four-year follow-up of surgical versus non-surgical therapy for chronic low back pain. Ann Rheum Dis. 2010, 69 (9): 1643-1648. 10.1136/ard.2009.108902.

McKnight PE, McKnight KM, Sidani S, Figueredo AJ: Missing Data: A Gentle Introduction. 2007, New York, NY: Guilford

Graham JW: Missing data analysis: making it work in the real world. Annu Rev Psychol. 2009, 60: 549-576. 10.1146/annurev.psych.58.110405.085530.

Little RJ, D'Agostino R, Cohen ML, Dickersin K, Emerson SS, Farrar JT, Frangakis C, Hogan JW, Molenberghs G, Murphy SA, et al: The Prevention and Treatment of Missing Data in Clinical Trials. New England Journal of Medicine. 2012, 367 (14): 1355-1360. 10.1056/NEJMsr1203730.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Little RJA, Rubin DB: Statistical Analysis with Missing Data. 2002, New York NY: Wiley, 2

Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, John Wiley & Sons, Inc: New York NY

Schafer JL: Analysis of Incomplete Multivariate Data. 1997, New York: Chapman and Hall

Son H, Friedmann E, Thomas SA: Application of pattern mixture models to address missing data in longitudinal data analysis using SPSS. Nursing research. 2012, 61 (3): 195-203. 10.1097/NNR.0b013e3182541d8c.

Peters SA, Bots ML, den Ruijter HM, Palmer MK, Grobbee DE, Crouse JR, O'Leary DH, Evans GW, Raichlen JS, Moons KG, et al: Multiple imputation of missing repeated outcome measurements did not add to linear mixed-effects models. J Clin Epidemiol. 2012, 65 (6): 686-695. 10.1016/j.jclinepi.2011.11.012.

Zhang H, Paik MC: Handling missing responses in generalized linear mixed model without specifying missing mechanism. J Biopharm Stat. 2009, 19 (6): 1001-1017. 10.1080/10543400903242761.

Chen HY, Gao S: Estimation of average treatment effect with incompletely observed longitudinal data: application to a smoking cessation study. Statistics in medicine. 2009, 28 (19): 2451-2472. 10.1002/sim.3617.

Ma J, Akhtar-Danesh N, Dolovich L, Thabane L: Imputation strategies for missing binary outcomes in cluster randomized trials. BMC Med Res Methodol. 2011, 11: 18-10.1186/1471-2288-11-18.

Kingsley GH, Kowalczyk A, Taylor H, Ibrahim F, Packham JC, McHugh NJ, Mulherin DM, Kitas GD, Chakravarty K, Tom BD, et al: A randomized placebo-controlled trial of methotrexate in psoriatic arthritis. Rheumatology (Oxford). 2012, 51 (8): 1368-1377. 10.1093/rheumatology/kes001.

Article   CAS   Google Scholar  

de Pauw BE, Sable CA, Walsh TJ, Lupinacci RJ, Bourque MR, Wise BA, Nguyen BY, DiNubile MJ, Teppler H: Impact of alternate definitions of fever resolution on the composite endpoint in clinical trials of empirical antifungal therapy for neutropenic patients with persistent fever: analysis of results from the Caspofungin Empirical Therapy Study. Transpl Infect Dis. 2006, 8 (1): 31-37. 10.1111/j.1399-3062.2006.00127.x.

A randomized, double-blind, futility clinical trial of creatine and minocycline in early Parkinson disease. Neurology. 2006, 66 (5)): 664-671.

Song P-K: Correlated Data Analysis: Modeling, Analytics and Applications. 2007, New York, NY: Springer Verlag

Pintilie M: Competing Risks: A Practical Perspective. 2006, New York, NY: John Wiley

Tai BC, Grundy R, Machin D: On the importance of accounting for competing risks in pediatric brain cancer: II. Regression modeling and sample size. Int J Radiat Oncol Biol Phys. 2011, 79 (4): 1139-1146. 10.1016/j.ijrobp.2009.12.024.

Holbrook JT, Wise RA, Gold BD, Blake K, Brown ED, Castro M, Dozor AJ, Lima JJ, Mastronarde JG, Sockrider MM, et al: Lansoprazole for children with poorly controlled asthma: a randomized controlled trial. JAMA. 2012, 307 (4): 373-381.

Holbrook A, Thabane L, Keshavjee K, Dolovich L, Bernstein B, Chan D, Troyan S, Foster G, Gerstein H: Individualized electronic decision support and reminders to improve diabetes care in the community: COMPETE II randomized trial. CMAJ: Canadian Medical Association journal = journal de l’Association medicale canadienne. 2009, 181 (1–2): 37-44.

Hilbe JM: Negative Binomial Regression. 2011, Cambridge: Cambridge University Press, 2

Forsblom C, Harjutsalo V, Thorn LM, Waden J, Tolonen N, Saraheimo M, Gordin D, Moran JL, Thomas MC, Groop PH: Competing-risk analysis of ESRD and death among patients with type 1 diabetes and macroalbuminuria. J Am Soc Nephrol. 2011, 22 (3): 537-544. 10.1681/ASN.2010020194.

Grams ME, Coresh J, Segev DL, Kucirka LM, Tighiouart H, Sarnak MJ: Vascular disease, ESRD, and death: interpreting competing risk analyses. Clin J Am Soc Nephrol. 2012, 7 (10): 1606-1614. 10.2215/CJN.03460412.

Lim HJ, Zhang X, Dyck R, Osgood N: Methods of competing risks analysis of end-stage renal disease and mortality among people with diabetes. BMC Med Res Methodol. 2010, 10: 97-10.1186/1471-2288-10-97.

Chu R, Walter SD, Guyatt G, Devereaux PJ, Walsh M, Thorlund K, Thabane L: Assessment and implication of prognostic imbalance in randomized controlled trials with a binary outcome–a simulation study. PLoS One. 2012, 7 (5): e36677-10.1371/journal.pone.0036677.

Bowen A, Hesketh A, Patchick E, Young A, Davies L, Vail A, Long AF, Watkins C, Wilkinson M, Pearl G, et al: Effectiveness of enhanced communication therapy in the first four months after stroke for aphasia and dysarthria: a randomised controlled trial. BMJ. 2012, 345: e4407-10.1136/bmj.e4407.

Spiegelhalter DJ, Best NG, Lunn D, Thomas A: Bayesian Analysis using BUGS: A Practical Introduction. 2009, New York, NY: Chapman and Hall

Byers AL, Allore H, Gill TM, Peduzzi PN: Application of negative binomial modeling for discrete outcomes: a case study in aging research. J Clin Epidemiol. 2003, 56 (6): 559-564. 10.1016/S0895-4356(03)00028-3.

Yusuf S, Wittes J, Probstfield J, Tyroler HA: Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. JAMA: the journal of the American Medical Association. 1991, 266 (1): 93-98. 10.1001/jama.1991.03470010097038.

Altman DG: Better reporting of randomised controlled trials: the CONSORT statement. BMJ. 1996, 313 (7057): 570-571. 10.1136/bmj.313.7057.570.

Mauskopf JA, Sullivan SD, Annemans L, Caro J, Mullins CD, Nuijten M, Orlewska E, Watkins J, Trueman P: Principles of good practice for budget impact analysis: report of the ISPOR Task Force on good research practices–budget impact analysis. Value Health. 2007, 10 (5): 336-347. 10.1111/j.1524-4733.2007.00187.x.

Pre-publication history

The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288/13/92/prepub

Download references

Acknowledgements

This work was supported in part by funds from the CANNeCTIN programme.

Author information

Authors and affiliations.

Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, ON, Canada

Lehana Thabane, Lawrence Mbuagbaw, Shiyuan Zhang, Zainab Samaan, Maura Marcucci, Chenglin Ye, Marroon Thabane, Brittany Dennis, Daisy Kosa, Victoria Borg Debono & Charles H Goldsmith

Departments of Pediatrics and Anesthesia, McMaster University, Hamilton, ON, Canada

Lehana Thabane

Center for Evaluation of Medicine, St Joseph’s Healthcare Hamilton, Hamilton, ON, Canada

Biostatistics Unit, Father Sean O’Sullivan Research Center, St Joseph’s Healthcare Hamilton, Hamilton, ON, Canada

Lehana Thabane, Lawrence Mbuagbaw, Shiyuan Zhang, Maura Marcucci, Chenglin Ye, Brittany Dennis, Daisy Kosa, Victoria Borg Debono & Charles H Goldsmith

Population Health Research Institute, Hamilton Health Sciences, Hamilton, ON, Canada

Department of Psychiatry and Behavioral Neurosciences, McMaster University, Hamilton, ON, Canada

Zainab Samaan

Population Genomics Program, McMaster University, Hamilton, ON, Canada

GSK, Mississauga, ON, Canada

Marroon Thabane

Department of Kinesiology, University of Waterloo, Waterloo, ON, Canada

Lora Giangregorio

Department of Nephrology, Toronto General Hospital, Toronto, ON, Canada

Department of Pediatrics, McMaster University, Hamilton, ON, Canada

Rejane Dillenburg

Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada

Vincent Fruci

McMaster Integrative Neuroscience Discovery & Study (MiNDS) Program, McMaster University, Hamilton, ON, Canada

Monica Bawor

Department of Biostatistics, Korea University, Seoul, South Korea

Juneyoung Lee

Department of Clinical Epidemiology, University of Ottawa, Ottawa, ON, Canada

George Wells

Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada

Charles H Goldsmith

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Lehana Thabane .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors’ contributions

LT conceived the idea and drafted the outline and paper. GW, CHG and MT commented on the idea and draft outline. LM and SZ performed literature search and data abstraction. ZS, LG and CY edited and formatted the manuscript. MM, BD, DK, VBD, RD, VF, MB, JL reviewed and revised draft versions of the manuscript. All authors reviewed several draft versions of the manuscript and approved the final manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article.

Thabane, L., Mbuagbaw, L., Zhang, S. et al. A tutorial on sensitivity analyses in clinical trials: the what, why, when and how. BMC Med Res Methodol 13 , 92 (2013). https://doi.org/10.1186/1471-2288-13-92

Download citation

Received : 11 December 2012

Accepted : 10 July 2013

Published : 16 July 2013

DOI : https://doi.org/10.1186/1471-2288-13-92

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Clinical trials

BMC Medical Research Methodology

ISSN: 1471-2288

sensitivity analysis research definition

Introduction to Sensitivity Analysis

  • Living reference work entry
  • First Online: 01 January 2016
  • Cite this living reference work entry

Book cover

  • Bertrand Iooss 4 , 5 &
  • Andrea Saltelli 6 , 7  

2029 Accesses

13 Citations

Sensitivity analysis provides users of mathematical and simulation models with tools to appreciate the dependency of the model output from model input and to investigate how important is each model input in determining its output. All application areas are concerned, from theoretical physics to engineering and socio-economics. This introductory paper provides the sensitivity analysis aims and objectives in order to explain the composition of the overall “Sensitivity Analysis” chapter of the Springer Handbook. It also describes the basic principles of sensitivity analysis, some classification grids to understand the application ranges of each method, a useful software package, and the notations used in the chapter papers. This section also offers a succinct description of sensitivity auditing, a new discipline that tests the entire inferential chain including model development, implicit assumptions, and normative issues and which is recommended when the inference provided by the model needs to feed into a regulatory or policy process. For the “Sensitivity Analysis” chapter, in addition to this introduction, eight papers have been written by around twenty practitioners from different fields of application. They cover the most widely used methods for this subject: the deterministic methods as the local sensitivity analysis, the experimental design strategies, the sampling-based and variance-based methods developed from the 1980s, and the new importance measures and metamodel-based techniques established and studied since the 2000s. In each paper, toy examples or industrial applications illustrate their relevance and usefulness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Berger, J.: An overview of robust Bayesian analysis (with discussion). Test 3 , 5–124 (1994)

Article   MathSciNet   MATH   Google Scholar  

Borgonovo, E., Plischke, E.: Sensitivity analysis: a review of recent advances. Eur. J. Oper. Res. 248 , 869–887 (2016)

Article   MathSciNet   Google Scholar  

Cacuci, D.: Sensitivity and Uncertainty Analysis – Theory. Chapman & Hall/CRC, Boca Raton (2003)

Book   MATH   Google Scholar  

Castaings, W., Dartus, D., Le Dimet, F.X., Saulnier, G.M.: Sensitivity analysis and parameter estimation for distributed hydrological modeling: potential of variational methods. Hydrol. Earth Syst. Sci. Discuss. 13 , 503–517 (2009)

Article   Google Scholar  

Chastaing, G., Gamboa, F., Prieur, C.: Generalized Hoeffding-Sobol decomposition for dependent variables – application to sensitivity analysis. Electron. J. Stat. 6 , 2420–2448 (2012)

Da Veiga, S.: Global sensitivity analysis with dependence measures. J. Stat. Comput. Simul. 85 , 1283–1305 (2015)

Da Veiga, S., Wahl, F., Gamboa, F.: Local polynomial estimation for sensitivity analysis on models with correlated inputs. Technometrics 51 (4), 452–463 (2009)

Dean, A., Lewis, S. (eds.): Screening – Methods for Experimentation in Industry, Drug Discovery and Genetics. Springer, New York (2006)

MATH   Google Scholar  

De Castro, Y., Janon, A.: Randomized pick-freeze for sparse Sobol indices estimation in high dimension. ESAIM Probab. Stat. 19 , 725–745 (2015)

de Rocquigny, E., Devictor, N., Tarantola, S. (eds.): Uncertainty in Industrial Practice. Wiley, Chichester/Hoboken (2008)

Google Scholar  

Faivre, R., Iooss, B., Mahévas, S., Makowski, D., Monod, H. (eds.): Analyse de sensibilité et exploration de modèles. Éditions Quaé (2013)

Fang, K.T., Li, R., Sudjianto, A.: Design and Modeling for Computer Experiments. Chapman & Hall/CRC, Boca Raton (2006)

Fisher, R.W.: Remembering Carol Reed, Aesop’s Fable, Kenneth Arrow and Thomas Dewey. In: Speech: An Economic Overview: What’s Next, Federal Reserve Bank of Dallas. http://www.dallasfed.org/news/speeches/fisher/2011/fs110713.cfm (2011)

Fort, J., Klein, T., Rachdi, N.: New sensitivity analysis subordinated to a contrast. Commun. Stat. Theory Methods (2014, in press). http://www.tandfonline.com/doi/full/10.1080/03610926.2014.901369#abstract

Frey, H., Patil, S.: Identification and review of sensitivity analysis methods. Risk Anal. 22 , 553–578 (2002)

Fruth, J., Roustant, O., Kuhnt, S.: Total interaction index: a variance-based sensitivity index for second-order interaction screening. J. Stat. Plan. Inference 147 , 212–223 (2014)

Funtowicz, S., Ravetz, J.: Uncertainty and Quality in Science for Policy. Kluwer Academic, Dordrecht (1990)

Book   Google Scholar  

Geraci, G., Congedo, P., Iaccarino, G.: Decomposing high-order statistics for sensitivity analysis. In: Thermal & Fluid Sciences Industrial Affiliates and Sponsors Conference, Stanford University, Stanford (2015)

Grundmann, R.: The role of expertise in governance processes. For. Policy Econ. 11 , 398–403 (2009)

Helton, J.: Uncertainty and sensitivity analysis techniques for use in performance assesment for radioactive waste disposal. Reliab. Eng. Syst. Saf. 42 , 327–367 (1993)

Helton, J.: Uncertainty and sensitivity analysis for models of complex systems. In: Graziani, F. (ed.) Computational Methods in Transport: Verification and Validation, pp. 207–228. Springer, New-York (2008)

Chapter   Google Scholar  

Helton, J., Johnson, J., Obekampf, W., Salaberry, C.: Sensitivity analysis in conjunction with evidence theory representations of epistemic uncertainty. Reliab. Eng. Syst. Saf. 91 , 1414–1434 (2006a)

Helton, J., Johnson, J., Salaberry, C., Storlie, C.: Survey of sampling-based methods for uncertainty and sensitivity analysis. Reliab. Eng. Syst. Saf. 91 , 1175–1209 (2006b)

Insua, D., Ruggeri, F. (eds.): Robust Bayesian Analysis. Springer, New York (2000)

Ioannidis, J.P.A.: Why most published research findings are false. PLoS Med. 2 (8), 696–701 (2005)

Iooss, B., Lemaître, P.: A review on global sensitivity analysis methods. In: Meloni, C., Dellino, G. (eds.) Uncertainty Management in Simulation-Optimization of Complex Systems: Algorithms and Applications. Springer, New York (2015)

Jacques, J., Lavergne, C., Devictor, N.: Sensitivity analysis in presence of model uncertainty and correlated inputs. Reliab. Eng. Syst. Saf. 91 , 1126–1134 (2006)

Kay, J.: A wise man knows one thing – the limits of his knowledge. Financial Times 29 Nov 2011

Kennedy, P.: A Guide to Econometrics, 5th edn. Blackwell Publishing, Oxford (2007)

Kleijnen, J.: Sensitivity analysis and related analyses: a review of some statistical techniques. J. Stat. Comput. Simul. 57 , 111–142 (1997)

Article   MATH   Google Scholar  

Kucherenko, S., Tarantola, S., Annoni, P.: Estimation of global sensitivity indices for models with dependent variables. Comput. Phys. Commun. 183 , 937–946 (2012)

Kurowicka, D., Cooke, R.: Uncertainty Analysis with High Dimensional Dependence Modelling. Wiley, Chichester/Hoboken (2006)

Latour, B.: We Have Never Been Modern. Harvard University Press, Cambridge (1993)

Leamer, E.E.: Tantalus on the road to asymptopia. J. Econ. Perspect. 4 (2), 31–46 (2010)

Lemaître, P., Sergienko, E., Arnaud, A., Bousquet, N., Gamboa, F., Iooss, B.: Density modification based reliability sensitivity analysis. J. Stat. Comput. Simul. 85 , 1200–1223 (2015)

Li, G., Rabitz, H., Yelvington, P., Oluwole, O., Bacon, F., Kolb, C., Schoendorf, J.: Global sensitivity analysis for systems with independent and/or correlated inputs. J. Phys. Chem. 114 , 6022–6032 (2010)

Mara, T.: Extension of the RBD-FAST method to the computation of global sensitivity indices. Reliab. Eng. Syst. Saf. 94 , 1274–1281 (2009)

Mara, T., Joseph, O.: Comparison of some efficient methods to evaluate the main effect of computer model factors. J. Stat. Comput. Simul. 78 , 167–178 (2008)

Mara, T., Tarantola, S.: Variance-based sensitivity indices for models with dependent inputs. Reliability Engineering and System Safety 107 , 115–121 (2012)

Marrel, A., Iooss, B., Da Veiga, S., Ribatet, M.: Global sensitivity analysis of stochastic computer models with joint metamodels. Stat. Comput. 22 , 833–847 (2012)

Marris, C., Wynne, B., Simmons, P., Weldon, S.: Final report of the PABE research project funded by the Commission of European Communities. Technical report contract number: FAIR CT98-3844 (DG12 – SSMI), Commission of European Communities (2001)

Monbiot, G.: Beware the rise of the government scientists turned lobbyists. The Guardian 29 Apr 2013

Moutoussamy, V., Nanty, S., Pauwels, B.: Emulators for stochastic simulation codes. ESAIM: Proc. Surv. 48 , 116–155 (2015)

Oreskes, N., Conway, E.M.: Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming. Bloomsbury Press, New York (2010)

Owen, A.: Better estimation of small Sobol’ sensitivity indices. ACM Trans. Model. Comput. Simul. 23 , 11 (2013a)

Owen, A.: Variance components and generalized Sobol’ indices. J. Uncert. Quantif. 1 , 19–41 (2013b)

Owen, A., Dick, J., Chen, S.: Higher order Sobol’ indices. Inf. Inference: J. IMA 3 , 59–81 (2014)

Park, K., Xu, L.: Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications. Springer, Dordrecht (2008)

Pujol, G., Iooss, B., Janon, A.: Sensitivity Package, Version 1.11. The Comprenhensive R Archive Network. http://www.cran.r-project.org/web/packages/sensitivity/ (2015)

Rakovec, O., Hill, M.C., Clark, M.P., Weerts, A.H., Teuling, A.J., Uijlenhoet, R.: Distributed evaluation of local sensitivity analysis (DELSA), with application to hydrologic models. Water Resour. Res. 50 , 1–18 (2014)

Saltelli, A.: Making best use of model evaluations to compute sensitivity indices. Comput. Phys. Commun. 145 , 280–297 (2002)

Saltelli, A., d’Hombres, B.: Sensitivity analysis didn’t help. A practitioners critique of the Stern review. Glob. Environ. Change 20 (2), 298–302 (2010)

Saltelli, A., Funtowicz, S.: When all models are wrong: more stringent quality criteria are needed for models used at the science-policy interface. Issues Sci. Technol. XXX (2), 79–85 (2014, Winter)

Saltelli, A., Funtowicz, S.: Evidence-based policy at the end of the Cartesian dream: the case of mathematical modelling. In: Pereira, G., Funtowicz, S. (eds.) The End of the Cartesian Dream. Beyond the Techno–Scientific Worldview. Routledge’s Series: Explorations in Sustainability and Governance, pp. 147–162. Routledge, London (2015)

Saltelli, A., Tarantola, S.: On the relative importance of input factors in mathematical models: safety assessment for nuclear waste disposal. J. Am. Stat. Assoc. 97 , 702–709 (2002)

Saltelli, A., Chan, K., Scott, E. (eds.): Sensitivity Analysis. Wiley Series in Probability and Statistics. Wiley, Chichester/New York (2000a)

Saltelli, A., Tarantola, S., Campolongo, F.: Sensitivity analysis as an ingredient of modelling. Stat. Sci. 15 , 377–395 (2000b)

Saltelli, A., Tarantola, S., Campolongo, F., Ratto, M.: Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models. Wiley, Chichester/Hoboken (2004)

Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Salsana, M., Tarantola, S.: Global Sensitivity Analysis – The Primer. Wiley, Chichester (2008)

Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., Tarantola, S.: Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index. Comput. Phys. Commun. 181 , 259–270 (2010)

MathSciNet   MATH   Google Scholar  

Saltelli, A., Pereira, G., Van der Sluijs, J.P., Funtowicz, S.: What do I make of your latinorum? Sensitivity auditing of mathematical modelling. Int. J. Foresight Innov. Policy 9 (2/3/4), 213–234 (2013)

Saltelli, A., Stark, P., Becker, W., Stano, P.: Climate models as economic guides. Scientific challenge or quixotic quest? Issues Sci. Technol. XXXI (3), 79–84 (2015)

Savage, S.L.: The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty. Wiley, Hoboken (2009)

Stiglitz, J.: Freefall, Free Markets and the Sinking of the Global Economy. Penguin, London (2010)

Szenberg, M.: Eminent Economists: Their Life Philosophies. Cambridge University Press, Cambridge (1992)

The Economist: How science goes wrong. The Economist 19 Oct 2013

Tissot, J.Y., Prieur, C.: A randomized orthogonal array-based procedure for the estimation of first- and second-order Sobol’ indices. J. Stat. Comput. Simul. 85 , 1358–1381 (2015)

Turanyi, T.: Sensitivity analysis for complex kinetic system, tools and applications. J. Math. Chem. 5 , 203–248 (1990)

Van der Sluijs, J.P., Craye, M., Funtowicz, S., Kloprogge, P., Ravetz, J., Risbey, J.: Combining quantitative and qualitative measures of uncertainty in model based environmental assessment: the NUSAP system. Risk Anal. 25 (2), 481–492 (2005)

Wang, J., Faivre, R., Richard, H., Monod, H.: mtk: a general-purpose and extensible R environment for uncertainty and sensitivity analyses of numerical experiments. R J. 7/2 , 206–226 (2016)

Winner, L.: The Whale and the Reactor: A Search for Limits in an Age of High Technology. The University of Chicago Press, Chicago (1989)

Xu, C., Gertner, G.: Extending a global sensitivity analysis technique to models with correlated parameters. Comput. Stat. Data Anal. 51 , 5579–5590 (2007)

Download references

Author information

Authors and affiliations.

Industrial Risk Management Department, EDF R&D, Chatou, France

Bertrand Iooss

Institut de Mathématiques de Toulouse, Université Paul Sabatier, Toulouse, France

Centre for the Study of the Sciences and the Humanities (SVT), University of Bergen (UIB), Bergen, Norway

Andrea Saltelli

Institut de Ciència i Tecnologia Ambientals (ICTA), Universitat Autonoma de Barcelona (UAB), Barcelona, Spain

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Bertrand Iooss .

Editor information

Editors and affiliations.

Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA

Roger Ghanem

Los Alamos National Laboratory, Los Alamos, New Mexico, USA

David Higdon

California Institute of Technology , Pasadena, California, USA

Houman Owhadi

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this entry

Cite this entry.

Iooss, B., Saltelli, A. (2015). Introduction to Sensitivity Analysis. In: Ghanem, R., Higdon, D., Owhadi, H. (eds) Handbook of Uncertainty Quantification. Springer, Cham. https://doi.org/10.1007/978-3-319-11259-6_31-1

Download citation

DOI : https://doi.org/10.1007/978-3-319-11259-6_31-1

Received : 20 December 2014

Accepted : 24 June 2015

Published : 26 March 2016

Publisher Name : Springer, Cham

Online ISBN : 978-3-319-11259-6

eBook Packages : Springer Reference Mathematics Reference Module Computer Science and Engineering

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Sensitivity Analysis in Observational Research: Introducing the E-Value

Affiliation.

  • 1 From Harvard T.H. Chan School of Public Health, Boston, Massachusetts, and University of California, Berkeley, Berkeley, California.
  • PMID: 28693043
  • DOI: 10.7326/M16-2607

Sensitivity analysis is useful in assessing how robust an association is to potential unmeasured or uncontrolled confounding. This article introduces a new measure called the "E-value," which is related to the evidence for causality in observational studies that are potentially subject to confounding. The E-value is defined as the minimum strength of association, on the risk ratio scale, that an unmeasured confounder would need to have with both the treatment and the outcome to fully explain away a specific treatment-outcome association, conditional on the measured covariates. A large E-value implies that considerable unmeasured confounding would be needed to explain away an effect estimate. A small E-value implies little unmeasured confounding would be needed to explain away an effect estimate. The authors propose that in all observational studies intended to produce evidence for causality, the E-value be reported or some other sensitivity analysis be used. They suggest calculating the E-value for both the observed association estimate (after adjustments for measured confounders) and the limit of the confidence interval closest to the null. If this were to become standard practice, the ability of the scientific community to assess evidence from observational studies would improve considerably, and ultimately, science would be strengthened.

  • Confounding Factors, Epidemiologic
  • Epidemiologic Research Design
  • Observational Studies as Topic / statistics & numerical data*
  • Sensitivity and Specificity*

Capital City Training Ltd Logo

Sensitivity Analysis Explained: Definitions, Formulas and Examples

Sensitivity analysis is an indispensable tool utilized in corporate finance and business analysis to comprehend how the variability in key input variables influences the performance of a business. By methodically adjusting the inputs and observing the ensuing effect on outputs, analysts can discern which variables have the most profound impact on the bottom line. This enables companies to concentrate on managing the most sensitive factors to enhance profitability and mitigate risk.

Article Contents

What is a sensitivity analysis, sensitivity analysis formula, how to do a sensitivity analysis in excel, sensitivity analysis methods, advantages and disadvantages of sensitivity analysis, exercises and examples for sensitivity analysis, key takeaways, sign-up for our free sensitivity analysis template.

A sensitivity analysis measures how susceptible the output of a model is to alterations in the value of the inputs. It aids in identifying which input variables drive most of the variation in the output. For example, in a financial model measuring a company’s profitability, key inputs typically encompass sales growth, cost of goods sold, operating expenses, interest rates, inflation and tax rates. By increasing and decreasing each of these inputs and observing the impact on profits, you can determine which inputs are most sensitive – where minor changes instigate major swings in profits.

While there isn’t a single formula for sensitivity analysis, the general approach is to select an input, modify it by a specified amount, and ascertain the impact on the output. Analysts typically vary inputs up and down by a fixed percentage, such as 10%, to assess sensitivity. The simplistic formula is:

New Output = Base Output x (1 + Change in Input)

For instance, if revenue is amplified by 10% from $100 to $110, the formula is:

New Profit = Base Profit x (1 + 10%) = Base Profit x 1.10

Note: This formula represents a straightforward scenario and actual scenarios may exhibit more complex relationships between input changes and output results.

Diagram of formula for Sensitivity Analysis

Typically, in reviewing client forecasts as a credit analyst, the “base case” provided by the client will show steady growth in sales and margins.  The analyst will typically sensitise this, making a no growth and no margin improvement case, to see if debt ca still be serviced satisfactorily. A separate Combined downside will also typically be modelled where the company is deemed to have experienced difficult trading such as might occur in a recession.

Data services like S&P Capital IQ and FactSet allow analyst to look back and see exactly how variable sales and margins have been in previous recessions.  This can provide a very concrete and rational basis for designing a “downside/recession” scenario.

Excel is a practical tool for conducting sensitivity analysis. Here are the general steps:

  • Build a financial model to calculate the baseline output, such as net income.
  • Create input variables for the major value drivers, like unit sales, price per unit, variable costs per unit, fixed costs , tax rate, etc.
  • Save a copy of the baseline model. Then change one input variable at a time by a fixed amount, like 10%. Recalculate the new output.
  • Repeat step 3 for each input variable. Record the new output values each time.
  • Compare the range of outputs to determine which inputs had the greatest impact. Produce charts in Excel to visualize the sensitivity analysis.
  • Optionally, automate the process using Excel Data Tables.
  • More complex inputs can be modelled in Excel using tools like index or choose together with data validation or VBA tools such as combo boxes.

Below, we’ve created an example of a Sensitivity Analysis for an operating income statement, using Excel’s data analysis functions to perform the analysis:

Excel example of a sensitivity analysis performed on an operating income statement

To implement the sensitivity analysis DATA TABLE:

  • Input a cell reference for the operating income (=D14) in as the starting value for the table (D17), and your sensitivity variance factors in below (C18 to C21).
  • Select your sensitivity factors and operating income column (C17:D21)
  • Navigate the Excel menu ribbon to Data, What if analysis, Table, and you will see the following dialog box.

Excel window requesting data entry for sensitivity analysis table

  • Input the cell for your initial Sensitivity Factor (D9) into the “Column Input cell box”. Press OK.

Excel will then perform your sensitivity analysis: it will take your sensitivity factors (from C18 to C21) one by one, enter them into your given sensitivity factor (D9) and then return the corresponding result from (D17, the cell at the top of the table). It will output the result into the cell next to the input tested. Try them out individually by typing them one by one into D9 using the initial table.

There are several common methods and techniques for performing sensitivity analysis:

  • One-at-a-time (OAT) analysis: Alter one input variable while maintaining others constant. This method is straightforward but can miss interactive effects between variables.
  • Differential analysis: Calculate the rate of change in output based on minute changes in input, thereby allowing ranking of sensitivity.
  • Scenario analysis: Adjust multiple inputs simultaneously to model various scenarios, like worst-case and best-case, which offers a spectrum of possible outcomes.
  • Monte Carlo simulation: Utilize repeated random sampling of input variables to generate a probability distribution of potential outcomes. This is especially useful for models incorporating uncertainty.
  • Tornado diagrams: Graphically illustrate the sensitivity ranking of inputs. The wider the bar, the larger the impact.

Advantages:

  • Identifies pivotal value drivers upon which to focus management attention.
  • Helps in quantifying the risk in a project or forecast.
  • Guides decisions and mitigates risk.
  • Explores scenarios and formulates contingency plans.
  • Enhances comprehension of the nature of the key success variables.
  • Static analysis might overlook dynamic interactions.

Disadvantages:

  • Can be time-consuming when testing numerous scenarios.
  • Necessitates resources and specialized skills.
  • Does not optimize inputs.
  • Limited to model inputs, even if the model itself is incomplete or inaccurate.

Here are some examples to practice conducting sensitivity analysis:

  • A company has fixed costs of $100,000. Unit variable costs are $50, and units sold are projected at 5,000
  • Calculate operating income sensitivity to a 5%, 10%, and 15% variation in units sold.
  • A loan has a principal of $500,000, an interest rate of 6%, and a term of 10 years. Calculate the sensitivity of total repayments to a 0.5%, 1%, 1.5% change in interest rate.
  • An oil company’s net income is based on revenue of $2 million, operating costs of $1.2 million, and a tax rate of 40%. Test sensitivity to 10% changes in revenue, costs, and tax rate.
  • For a capital budgeting project with: NPV = -$1250, Investment = $5000, Lifespan = 5 years, and Discount Rate = 15%, determine the sensitivity of NPV to changes in each input.

Sensitivity analysis is a critical financial modelling technique in the sphere of corporate finance. By discerning which inputs have the most substantial impact on outcomes, companies can hone their efforts on the value drivers that matter most. Performing sensitivity analysis leads to better-informed, data-driven decisions, providing a structured approach towards understanding financial variability and risk.

sensitivity analysis research definition

Learn Essential Skills Needed to Build Robust Financial Models

Sensitivity analysis faqs, what is an example of a sensitivity analysis.

A sensitivity analysis is a technique used to determine how changes in the values of input variables affect the output or outcome of a model or decision. A common example is varying the interest rate assumptions in a financial model to see how it impacts the net present value or internal rate of return.

How do you conduct a sensitivity analysis?

To conduct a sensitivity analysis, you typically:

  • Identify the key input variables that have the greatest impact on the output.
  • Determine the likely range of values for those input variables.
  • Systematically change the values of the input variables within their ranges and observe the resulting changes in the output.
  • Analyze the sensitivity of the output to changes in each input variable.

What is a sensitivity analysis for P&L?

A sensitivity analysis for a profit and loss (P&L) statement involves examining how changes in revenue, expenses, or other key factors would impact the overall profitability of a business. This can help identify the most critical drivers of financial performance and inform strategic decision-making.

What is DSS sensitivity analysis?

DSS stands for Decision Support System. A DSS sensitivity analysis is the process of evaluating how changes in the input variables of a decision support system model affect the outputs or recommended decisions. This helps quantify the uncertainty and risk associated with the model’s recommendations, allowing decision-makers to make more informed choices.

Remedies of a Healthy Patient

One thing that FCA is good at is trailing its views. So it is that the Asset Management Market Study

Net Present Value (NPV) Explained: Definitions, Formula and Examples

Net Present Value (NPV) Explained: Definitions, Formula and Examples Net present value (NPV) is a core concept in

What is EV/EBITDA? Concepts, Formula and Examples

What is EV/EBITDA? Concepts, Formula and Examples EV/EBITDA is an important financial ratio used in corporate finance and

Share This Story, Choose Your Platform!

About the author: mark.

' src=

  • Search Search Please fill out this field.
  • Corporate Finance
  • Financial Analysis

How Is Sensitivity Analysis Used?

J.B. Maverick is an active trader, commodity futures broker, and stock market analyst 17+ years of experience, in addition to 10+ years of experience as a finance writer and book editor.

sensitivity analysis research definition

Pete Rathburn is a copy editor and fact-checker with expertise in economics and personal finance and over twenty years of experience in the classroom.

sensitivity analysis research definition

Sensitivity analysis  is an analysis method that is used to identify how much variations in the input values for a given variable will impact the results for a mathematical model. Sensitivity analysis can be applied in several different disciplines, including business analysis,  investing , environmental studies, engineering, physics, and chemistry.

Key Takeaways

  • Sensitivity analysis is used to identify how much variations in the input values for a given variable impact the results for a mathematical model.
  • Sensitivity analysis can identify the best data to be collected for analyses to evaluate a project's return on investment (ROI).
  • Sensitivity analysis helps engineers create more reliable, robust designs by assessing points of uncertainty in the design's structure.

Understanding Sensitivity Analysis

Sensitivity analysis is concerned with the uncertainty inherent in mathematical models where the values for the inputs used in the model can vary. It is the companion analytical tool to uncertainty analysis, and the two are often used together. All of the models composed and studies executed, to draw conclusions or inferences for policy decisions, are based on assumptions regarding the validity of the inputs used in calculations.

For example, in equity valuation , the return on assets (ROA) ratio assumes that a valid, accurate calculation of a company's assets can be figured out and that it is reasonable to analyze profits, or returns, concerning assets as a means of evaluating a company for investment purposes.

The conclusions drawn from studies or mathematical calculations can be significantly altered, depending on such things as how a certain variable is defined or the parameters chosen for a study. When the results of a study or computation do not significantly change due to variations in underlying assumptions, they are considered to be robust. If variations in foundational inputs or assumptions significantly change outcomes, sensitivity analysis can be employed to determine how changes in inputs, definitions, or modeling can improve the accuracy or robustness of any results.

How Sensitivity Analysis Is Used

Sensitivity analysis can be helpful in various situations, including forecasting or predicting as well as identifying where improvements or adjustments need to be made in a process. However, the use of historical data can sometimes lead to inaccurate results when forecasting since past results don't necessarily lead to future outcomes. Below are a few common applications of sensitivity analysis.

Return on Investment

In a business context, sensitivity analysis can be used to improve decisions based on certain calculations or modeling. A company can use sensitivity analysis to identify the inputs which have the biggest impact on the return on a company's investment ( ROI ). The inputs that have the greatest affect returns should then be considered more carefully. Sensitivity analysis can also be used to allocate assets and resources.

One simple example of sensitivity analysis used in business is an analysis of the effect of including a certain piece of information in a company's advertising, comparing sales results from ads that differ only in whether or not they include the specific piece of information.

Climate Models

Computer models are commonly used in weather, environmental, and climate change forecasting . Sensitivity analysis can be used to improve such models by analyzing how various systematic sampling methods, inputs, and model parameters affect the accuracy of results or conclusions obtained from the computer models.

Scientific Research

The disciplines of physics and chemistry often employ sensitivity analysis to evaluate results and conclusions. Sensitivity analysis has proven particularly useful in the evaluation and adjustment of kinetic models that involve using several differential equations. The importance of various inputs and the effects of variance in the inputs on model outcomes can be analyzed.

Engineering

It is standard practice in engineering to use computer models to test the design of structures before they are built. Sensitivity analysis helps engineers create more reliable, robust designs by assessing points of uncertainty or wide variations in possible inputs and their corresponding effects on the viability of the model. Refinement of computer models can significantly impact the accuracy of evaluations of such things as bridge stress ability or tunneling risks.

sensitivity analysis research definition

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Public Health

Sensitivity, Specificity, and Predictive Values: Foundations, Pliabilities, and Pitfalls in Research and Practice

Robert trevethan.

1 Independent academic researcher and author, Albury, NSW, Australia

Within the context of screening tests, it is important to avoid misconceptions about sensitivity, specificity, and predictive values. In this article, therefore, foundations are first established concerning these metrics along with the first of several aspects of pliability that should be recognized in relation to those metrics. Clarification is then provided about the definitions of sensitivity, specificity, and predictive values and why researchers and clinicians can misunderstand and misrepresent them. Arguments are made that sensitivity and specificity should usually be applied only in the context of describing a screening test’s attributes relative to a reference standard; that predictive values are more appropriate and informative in actual screening contexts, but that sensitivity and specificity can be used for screening decisions about individual people if they are extremely high; that predictive values need not always be high and might be used to advantage by adjusting the sensitivity and specificity of screening tests; that, in screening contexts, researchers should provide information about all four metrics and how they were derived; and that, where necessary, consumers of health research should have the skills to interpret those metrics effectively for maximum benefit to clients and the healthcare system.

Introduction

There are arguably two kinds of tests used for assessing people’s health: diagnostic tests and screening tests. Diagnostic tests are regarded as providing definitive information about the presence or absence of a target disease or condition. By contrast, screening tests—which are the focus of this article—typically have advantages over diagnostic tests such as placing fewer demands on the healthcare system and being more accessible as well as less invasive, less dangerous, less expensive, less time-consuming, and less physically and psychologically discomforting for clients. Screening tests are also, however, well-known for being imperfect and they are sometimes ambiguous. It is, therefore, important to determine the extent to which these tests are able to identify the likely presence or absence of a condition of interest so that their findings encourage appropriate decision making.

If practitioners are confident when using screening tests, but their confidence is not justified, the consequences could be serious for both individuals and the healthcare system ( 1 , 2 ). It is important, therefore, that confusion should be avoided with regard to how the adequacy and usefulness of screening tests are determined and described. In this article, an attempt is made to identify why confusion can exist, how it might be resolved, and how, once resolved, improvements could be made with regard to the description and use of screening tests. The focus is on the sensitivity, specificity, and predictive values of those tests.

Determining Sensitivity, Specificity, and Predictive Values

When the adequacy, also known as the predictive power or predictive validity, of a screening test is being established, the outcomes yielded by that screening test are initially inspected to see whether they correspond to what is regarded as a definitive indicator, often referred to as a gold standard, of the same target condition. The analyses are typically characterized in the way shown in Figure ​ Figure1. 1 . There it can be seen from the two columns under the heading Status of person according to “gold standard” that people are categorized as either having, or as not having, the target condition. The words “gold standard” suggest that this initial categorization is made on the basis of a test that provides authoritative, and presumably indisputable, evidence that a condition does or does not exist. Because there can be concerns about the validity of these so-called gold standards ( 3 , 4 ), they have increasingly been referred to less glowingly as reference standards ( 5 ), thus removing what seemed to be unreserved endorsement. That wording (i.e., reference standard) will be used for the remainder of this article.

An external file that holds a picture, illustration, etc.
Object name is fpubh-05-00307-g001.jpg

Diagram demonstrating the basis for deriving sensitivity, specificity, and positive and negative predictive values.

Independent of the categorization established on the basis of the reference standard, people are also assessed on the screening test of interest. That test might comprise a natural dichotomy or it might be based on whether the test outcomes fall below or above a specified cutoff point on a continuum. It might also comprise a battery of tests that, together, are regarded as a single test ( 6 – 8 ).

Based on their reference standard and screening test results, people are assigned to one of the four cells labeled a through d in Figure ​ Figure1 1 depending on whether they are definitely regarded as having or as not having the target condition based on the reference standard, and whether the screening test yielded a positive result (the person appears to have the condition) or a negative result (the person appears not to have the condition). What are referred to as sensitivity, specificity, and predictive values can then be calculated from the numbers of people in each of the four cells, and, if expressed as percentages, are based on the following formulas:

These are the metrics that are cited—i.e., often as percentages, although sometimes as decimal fractions, and preferably with accompanying 95% confidence intervals—when researchers and clinicians refer to sensitivity, specificity, and predictive values to describe the characteristics of a screening test. The simplicity, and even familiarity, of these four metrics can mask the existence of a number of complexities that sometimes appear to be underappreciated, however. Deficiencies in either the reference standard or the screening test, or in both, can exist. Furthermore, the four metrics should not be regarded as unquestionably valid and fixed attributes of a screening test: the values that are entered into the cells of Figure ​ Figure1 1 depend on how stringent the screening test is and the prevalence of the target condition in the sample of people used in the analysis.

Because of these complexities, it is sometimes necessary to examine the validity of measurement procedures within both the reference standard and the screening test ( 3 , 8 ). It might also be necessary to question the stringency of the screening test and to ensure that there is a match between the samples that were used for assessing a screening test and the people subsequently being screened ( 2 , 3 , 9 – 11 ).

It is also important to recognize that there are sometimes noticeable tradeoffs between sensitivity and specificity, as well as between positive predictive values (PPVs) and negative predictive values (NPVs). This is demonstrated in the first four rows of entries in Table ​ Table1. 1 . Furthermore, as also illustrated in Table ​ Table1, 1 , there is little or no consistency regarding either size or pattern of sensitivity, specificity, and predictive values in different contexts, so it is not possible to determine one of them merely from information about any of the others. In that sense, they are pliable in relation to each other. This indicates that it is necessary to appreciate the foundations of, distinctions between, and uses and misuses of each of these metrics, and that it is necessary to provide information about all of them, as well as the reference standard and the sample on which they are based, to characterize a screening test adequately.

Five sets of sensitivity, specificity, and predictive values demonstrating differing patterns of results.

PPV, positive predictive value; NPV, negative predictive value .

Because sensitivity seems often to be confused with PPV, and specificity seems often to be confused with NPV, unambiguous definitions for each pair are necessary. These are provided below.

Definitions

Defining sensitivity and ppv.

The sensitivity of a screening test can be described in variety of ways, typically such as sensitivity being the ability of a screening test to detect a true positive, being based on the true positive rate, reflecting a test’s ability to correctly identify all people who have a condition, or, if 100%, identifying all people with a condition of interest by those people testing positive on the test.

Each of these definitions is incontestably accurate, but they can all be easily misinterpreted because none of them sufficiently emphasizes an important distinction between two essentially different contexts. In the first context, only those people who obtain positive results on the reference standard are assessed in terms of whether they obtained positive or negative results on the screening test. This determines the test’s sensitivity. In the second context, the focus changes from people who tested positive on the reference standard to people who tested positive on the screening test . Here, an attempt is made to establish whether people who tested positive on the screening test do or do not actually have the condition of interest. This refers to the screening test’s PPV. Expressed differently, the first context is the screening test being assessed on the basis of its performance relative to a reference standard, which focuses on whether the foundations of the screening test are satisfactory; the second context is people being assessed on the basis of a screening test, which focuses on the practical usefulness of the test in clinical practice.

By way of further explanation, sensitivity is based solely on the cells labeled a and c in Figure ​ Figure1 1 and, therefore, requires that all people in the analysis are diagnosed according to the reference standard as definitely having the target condition. The determination of sensitivity does not take into account any people who, according to the reference standard, do not have the condition of interest (who are in cells b and d). Confidence in a screening test’s ability, when it returns a positive result, to differentiate successfully between people who have a condition and those who do not, is another matter. As indicated above, it is the test’s PPV, and is based on the cells labeled a and b, which refer solely to the accuracy of positive results produced by the screening test. Those cells do not include any people who, according to results from the screening test, do not have the condition (who are in cells c and d).

Therefore, a clear definition of sensitivity—with italics for supportive emphasis—would be a screening test’s probability of correctly identifying, solely from among people who are known to have a condition , all those who do indeed have that condition (i.e., identifying true positives), and, at the same time, not categorizing other people as not having the condition when in fact they do have it (i.e., avoiding false negatives). Less elaborated, but perhaps also less helpfully explicit, definitions are possible, for example, that sensitivity is the proportion of people with a condition who are correctly identified by a screening test as indeed having that condition.

It follows that a clear definition of PPV would be a screening test’s probability, when returning a positive result, of correctly identifying, from among people who might or might not have a condition, all people who do actually have that condition (i.e., identifying true positives), and, at the same time, not categorizing some people as having the condition when in fact they do not (i.e., avoiding false positives). Expressed differently and more economically, PPV is the probability that people with a positive screening test result indeed do have the condition of interest.

Inspection of Figure ​ Figure1 1 supports the above definitions and those that are provided within the next subsection.

Defining Specificity and NPV

The specificity of a test is defined in a variety of ways, typically such as specificity being the ability of a screening test to detect a true negative, being based on the true negative rate, correctly identifying people who do not have a condition, or, if 100%, identifying all patients who do not have the condition of interest by those people testing negative on the test.

As with the definitions often offered for sensitivity, these definitions are accurate but can easily be misinterpreted because they do not sufficiently indicate the distinction between two different contexts that parallel those identified for sensitivity. Specificity is based on the cells labeled b and d in Figure ​ Figure1 1 and, therefore, requires that all the people in the analysis are diagnosed, according to a reference standard, as not having the target condition. Specificity does not take into account any people who, according to the reference standard, do have the condition (as pointed out above, those people, in the cells labeled a and c, were taken into account when determining sensitivity). Confidence in a screening test’s ability, when it returns a negative result, to differentiate between people who have a condition and those who do not, is another matter. That is the test’s NPV and is based on the cells labeled c and d, which refer solely to the accuracy of negative results produced by the screening test. Those cells do not include any people who, according to the screening test, do have the condition (who are located in cells a and b).

Therefore, a clear definition of specificity, again with italics for supportive emphasis, would be a screening test’s probability of correctly identifying, solely from among people who are known not to have a condition , all those who do indeed not have that condition (i.e., identifying true negatives), and, at the same time, not categorizing some people as having the condition when in fact they do not have it (i.e., avoiding false positives). Less elaborated, but perhaps also less helpfully explicit, definitions are possible, for example, that specificity is the proportion of people without a condition who are correctly identified by a screening test as indeed not having the condition.

It follows that a clear definition of NPV would be a screening test’s probability, when returning a negative result, of correctly identifying, from among people who might or might not have a condition, all people who indeed do not have that condition (i.e., identifying true negatives), and, at the same time, not categorizing some people as not having the condition when in fact they do (i.e., avoiding false negatives). Expressed differently and more economically, NPV is the probability that people with a negative screening test result indeed do not have the condition of interest.

Summary Regarding Definitions

Sensitivity and specificity are concerned with the accuracy of a screening test relative to a reference standard. The focus is the adequacy of the screening test , or its fundamental “credentials.” The main question is: do the results on the screening test correspond to the results on the reference standard? Here, the screening test is being assessed. By contrast, for PPV and NPV, people are being assessed. There are two main questions of relevance in that second situation. First, if a person’s screening test yields a positive result, what is the probability that that person has the relevant condition (PPV)? Second, if the screening test yields a negative result, what is the probability that the person does not have the condition (NPV)?

In order to sharpen the distinction, it could be said that sensitivity and specificity indicate the effectiveness of a test with respect to a trusted “outside” referent, while PPV and NPV indicate the effectiveness of a test for categorizing people as having or not having a target condition. More precisely, sensitivity and specificity indicate the concordance of a test with respect to a chosen referent, while PPV and NPV, respectively, indicate the likelihood that a test can successfully identify whether people do or do not have a target condition, based on their test results.

The two contexts (i.e., the context that relates to sensitivity and specificity, versus the context that relates to the two predictive values) should not be confused with each other. Of particular importance, although it is desirable to have tests with high sensitivity and specificity, the values for those two metrics should not be relied on when making decisions about individual people in screening situations. In that second context, use of PPVs and NPVs is more appropriate. The lack of correspondence between sensitivity, specificity, and predictive values is illustrated by the inconsistent pattern of entries in Table ​ Table1 1 and should become more obvious in the next section.

Uses and Misuses of Sensitivity and Specificity

Because the pairs of categories into which people are placed when sensitivity and specificity values are calculated are not the same as the pairs of categories that pertain in a screening context, there are not only important distinctions between sensitivity and PPV, and between specificity and NPV, but there are also distinct limitations on sensitivity and specificity for screening purposes. Akobeng [( 9 ), p. 340] has gone so far as to write that “both sensitivity and specificity … are of no practical use when it comes to helping the clinician estimate the probability of disease in individual patients.”

Sensitivity does not provide the basis for informed decisions following positive screening test results because those positive test results could contain many false positive outcomes that appear in the cell labeled b in Figure ​ Figure1. 1 . Those outcomes are ignored in determining sensitivity (cells a and c are used for determining sensitivity). Therefore, of itself a positive result on a screening test, even if that test has high sensitivity, is not at all useful for definitely regarding a condition as being present in a particular person. Conversely, specificity does not provide an accurate indication about a negative screening test result because negative outcomes from a screening test could contain many false negative results that appear in the cell labeled c, which are ignored in determining specificity (cells b and d are used for determining specificity). Therefore, of itself , a negative result on a screening test with high specificity is not at all useful for definitely ruling out disease in a particular person.

Failing to appreciate the above major constraints on sensitivity and specificity arises from what is known in formal logic as confusion of the inverse ( 16 ). An example of this with regard to sensitivity, consciously chosen in a form that makes the problem clear, would be converting the logical proposition This animal is a dog; therefore it is likely to have four legs into the illogical proposition This animal has four legs; therefore it is likely to be a dog . A parallel confusion of the inverse can occur with specificity. An example of this would be converting the logical proposition This person is not a young adult; therefore this person is not likely to be a university undergraduate into the illogical proposition This person is not a university undergraduate; therefore this person is not likely to be a young adult .

These examples demonstrate the flaws in believing that a positive result on a highly sensitive test indicates the presence of a condition and that a negative result on a highly specific test indicates the absence of a condition. Instead, it should be emphasized that a highly sensitive test, when yielding a positive result, by no means indicates that a condition is present (many animals with four legs are not dogs), and a highly specific test, when yielding a negative result, by no means indicates that a condition is absent (many young people are not university undergraduates).

Despite the above reservations concerning sensitivity and specificity in a screening situation, sensitivity and specificity can be useful in two circumstances but only if they are extremely high. First, because a highly sensitive screening test is unlikely to produce false negative outcomes (there will be few entries in cell c of Figure ​ Figure1), 1 ), people who test negative on that kind of screening test (i.e., a test with high sensitivity) are very unlikely to have the target condition. Expressed differently, high sensitivity permits people to be confidently regarded as not having a condition if their screening test yields a negative result. They can be “ruled out.” This has led to the mnemonic snout (sensitive, negative, out—in which it is useful to regard the n in snout as referring to the n in sensitive as well as the n in negative ) concerning high sensitivity in screening.

Second, because a highly specific screening test is unlikely to produce false positive results (there will be few entries in cell b in Figure ​ Figure1), 1 ), people are very unlikely to be categorized as having a condition if they indeed do not have it. Expressed differently, high specificity permits people to be confidently regarded as having a condition if their diagnostic test yields a positive result. They can be “ruled in”—and, thus, the mnemonic spin (specific, positive, in—in which it is useful to regard the p in spin as referring to the p in specific as well as the p in positive ) concerning high specificity in screening.

The mnemonics snout and spin , it must be emphasized, pertain only when sensitivity and specificity are high. Their pliability, therefore, has some strong limitations. Furthermore, these mnemonics are applied in a way that might seem counterintuitive. A screening test with high sensitivity is not necessarily useful for “picking things up.” It is useful only for deciding that a negative screening test outcome is so unusual that it strongly indicates the absence of the target condition. Conversely, a screening test with high specificity is not so “choosy” that it is effective in ignoring a condition if that condition is not present; rather, a highly specific test is useful only for deciding that a positive screening test outcome is so unusual that it strongly indicates the presence of the target condition. In addition, Pewsner et al. ( 2 ) have pointed out that effective use of snout and spin is “eroded” when highly sensitive tests are not sufficiently specific or highly specific tests are not sufficiently sensitive—and for many screening tests, unfortunately, either sensitivity or specificity is low despite the other being high, or neither sensitivity nor specificity is high. As a consequence, both sensitivity and specificity remain unhelpful for making decisions about individual people in most screening contexts, and PPV and NPV should be retained as the metrics of choice in those contexts.

Assessing Desirable Predictive Values and Consequences for Sensitivity and Specificity

When assessing the desirability of specific PPVs and NPVs, a variety of costs and benefits need to be considered ( 1 ). These include the immediate and long-term burdens on the healthcare system, the treatability of a particular condition, and the psychological effect on clients as well as clients’ health status. Considerations might also include over- versus under-application of diagnostic procedures as well as the possibility of premature versus inappropriately delayed application of diagnostic procedures. Input from clinicians and policymakers is likely to be particularly informative in any deliberations.

Decisions about desirable PPVs and NPVs can be approached from two related and complementary, but different, directions. One approach involves the extent to which true positive and true negative results are desirable on a screening test. The other approach involves the extent to which false positive and false negative results are tolerable or even acceptable.

A high PPV is desirable, implying that false positive outcomes are minimized, under a variety of circumstances. Some of these are when, relative to potential benefits, the costs (including costs associated with finances, time, and personnel for health services, as well as inconvenience, discomfort, and anxiety for clients) are high. A high PPV, with its concomitant few false positive screening test results, is also desirable when the risk of harm from follow-up diagnosis or therapy (including hemorrhaging and infection) is high despite the benefits from treatment also being high, or when the target condition is not life-threatening or progresses slowly. Under these circumstances, false positive outcomes can be associated with overtreatment and unnecessary costs and prospect of iatrogenic complications. False positive outcomes may also be annoying and distressing for both the providers and the recipients of health care.

A moderate PPV (with its greater proportion of false positive screening test outcomes) might be acceptable under a number of circumstances, most of which are the opposite of the situations in which a high PPV is desirable. For example, a certain percentage of false positive outcomes might not be objectionable if follow-up tests are inexpensive, easily and quickly performed, and not stressful for clients. In addition, false positive screening outcomes might be quite permissible if no harm is likely to be done to clients in protecting them against a target condition even if that condition is not present. For example, people who are mistakenly told that they have peripheral artery disease, despite not actually having it, are likely to benefit from adopting advice to exercise appropriately, improve their diet, and discontinue smoking.

A high NPV is desirable, implying that false negatives are minimized, under a different set of circumstances. Some of these are a condition being serious, largely asymptomatic, or contagious, or if treatment for a condition is advisable early in its course, particularly if the condition can be treated effectively and is likely to progress quickly. Under these circumstances, it would be highly undesirable if a screening test indicated that people did not have a condition when in fact they did. A moderate NPV—with its greater proportion of false negative screening test outcomes—might be acceptable under other circumstances, however, and most of those circumstances are the opposite of those that make a high NPV desirable. For example, the false negative outcomes associated with moderate NPVs might not be problematic if the target condition is not serious or contagious, or if a condition does not progress quickly or benefit from early treatment. Moderate NPVs might also be acceptable if diagnosis at low levels of a condition is known to be ambiguous and subsequent screening tests can easily be scheduled and performed, or if, given time, a condition is likely to resolve itself satisfactorily without treatment.

If, for a variety of reasons, the PPVs and NPVs on a screening test were deemed to be either too high or too low, they could be adjusted by altering the stringency of the screening test (for example, by raising or lowering cutpoints on a continuous variable or by changing the components that comprise a screening test), by altering the sample of people on whom the analyses were based (for example, by identifying people who are regarded as having more pertinent demographic or health status variables), or by altering the nature of the reference standard. Those strategies would almost inevitably result in changes to the sensitivity and specificity values, and those revised values would simply need to be reported as applying to the particular new level of stringency on the screening test, the applicable population, and the reference standard when that test was being described. This reveals, yet again, that pliability can be associated with sensitivity, specificity, and predictive values.

The Importance of Full Disclosure of Information in Research

When describing screening tests, many researchers provide information about their reference standard; the prevalence of the target condition in their research sample(s); the criteria that had been used to indicate presence or absence of a condition according to the screening test; and the sensitivity, specificity, and predictive values they obtained ( 6 , 7 , 15 , 17 , 18 ). The research results are not always impressive or what the researchers might have hoped for, but at least it is possible to draw informed conclusions from those results.

Sometimes only partial information is provided, and that limits the usefulness of research. For example, in a systematic review concerning the toe–brachial index in screening for peripheral artery disease, Tehan et al. ( 19 ) were evidently unable to find predictive values in so many of the final seven studies they reviewed that they did not provide any information about those values—despite those metrics being of fundamental importance for screening.

In one of the more informative articles reviewed by Tehan et al. ( 19 ), Okamoto et al. ( 13 ) did include information about sensitivity, specificity, and predictive values of several screening tests. However, they provided insufficient interpretation at times. For example, they reported an unusually low sensitivity value of 45.2% for the toe–brachial index in detecting peripheral artery disease. That value occurred in the presence of 100% specificity, indicating that the cutoff point might have been too stringent and that sensitivity had been sacrificed in the interest of obtaining high specificity, but the researchers did not draw attention to that or provide any explanation for their strategy. Information in a receiver operating characteristic analysis within their article suggests that more appropriate sensitivity and specificity values would have both been approximately 73% and therefore, incidentally, similar to the values obtained by other researchers ( 15 , 20 ).

Deficiencies in provision of information can be even more problematic. In a recently published article, Jönelid et al. ( 21 ) investigated usefulness of the ankle–brachial index for identifying polyvascular disease. Although they reported a specificity of 92.4% and a PPV of 68.4%, they did not provide results concerning either sensitivity or NPV. From information in their article, those unrevealed values can be calculated as both being 100%. That these values are so high in a screening context raises suspicions. When following those suspicions through, it becomes evident that the researchers used the ABI as a component of the reference standard as well as being the sole variable that comprised the screening test. Failure to sufficiently disclose this circular situation (the inevitability of something being highly related to something that is partly itself) permitted the authors to claim that the “ABI is a useful … measurement that appears predictive of widespread atherosclerosis” in their patients. That this statement is invalid becomes apparent only through an awareness of how researchers’ data should conform to entries in Figure ​ Figure1 1 and how reference standards and screening tests are conceptualized.

The above examples illustrate the importance of research consumers being provided with complete information when screening tests are being described, and consumers being able to interpret that information appropriately—sometimes with at least a modicum of skepticism. Having a healthy level of skepticism as well as clarity concerning the nature and appropriate interpretations and uses of sensitivity, specificity, and predictive values, can be seen as important for educators, researchers, and clinicians in public health.

  • Sensitivity and specificity should be emphasized as having different origins, and different purposes, from PPVs and NPVs, and all four metrics should be regarded as important when describing and assessing a screening test’s adequacy and usefulness.
  • Researchers and clinicians should avoid confusion of the inverse when considering the application of sensitivity and specificity to screening tests.
  • Predictive values are more relevant than are sensitivity and specificity when people are being screened.
  • Predictive values on screening tests need to be determined on the basis of careful clinical deliberation and might be used in a reverse process that would result in adjustments to sensitivity and specificity values.
  • Researchers should provide information about sensitivity, specificity, and predictive values when describing screening test results, and that information should include how those metrics were derived as well as appropriate interpretations.

Author Contributions

RT conceived of, conducted the research for, and wrote the complete manuscript.

Conflict of Interest Statement

The author declares that this research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

Rod Pope and Peta Tehan provided valuable feedback on earlier drafts of this manuscript. Professor Pope also drew my attention to literature that I had not been aware of, and Dr. Tehan generously shared computer output of receiver operating characteristic analyses that provided confirming insights about sensitivity and specificity.

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Please fill out the contact form below and we will reply as soon as possible.

  • Economics, Finance, & Analytics
  • Research, Quantitative Analysis, & Decision Science

Sensitivity Analysis - Explained

What is sensitivity analysis.

sensitivity analysis research definition

Written by Jason Gordon

Updated at June 22nd, 2023

  • Marketing, Advertising, Sales & PR Principles of Marketing Sales Advertising Public Relations SEO, Social Media, Direct Marketing
  • Accounting, Taxation, and Reporting Managerial & Financial Accounting & Reporting Business Taxation
  • Professionalism & Career Development
  • Law, Transactions, & Risk Management Government, Legal System, Administrative Law, & Constitutional Law Legal Disputes - Civil & Criminal Law Agency Law HR, Employment, Labor, & Discrimination Business Entities, Corporate Governance & Ownership Business Transactions, Antitrust, & Securities Law Real Estate, Personal, & Intellectual Property Commercial Law: Contract, Payments, Security Interests, & Bankruptcy Consumer Protection Insurance & Risk Management Immigration Law Environmental Protection Law Inheritance, Estates, and Trusts
  • Business Management & Operations Operations, Project, & Supply Chain Management Strategy, Entrepreneurship, & Innovation Business Ethics & Social Responsibility Global Business, International Law & Relations Business Communications & Negotiation Management, Leadership, & Organizational Behavior
  • Economics, Finance, & Analytics Economic Analysis & Monetary Policy Research, Quantitative Analysis, & Decision Science Investments, Trading, and Financial Markets Banking, Lending, and Credit Industry Business Finance, Personal Finance, and Valuation Principles

Sensitivity analysis, also referred to as simulation analysis, is a technique employed in financial modeling to determine how different values of a set of independent variables can influence a particular dependent variable under certain specific conditions and assumptions. It is used to ascertain how the overall uncertainty in the output of a mathematical model is affected by the various sources of uncertainty in its inputs. The application of sensitivity analysis spans a wide range of fields such as engineering, biology, environmental studies, social sciences, chemistry and economics. It is most often used in mathematical models where the output is an opaque function (i.e. one that cannot be subjected to an analysis) of several inputs.

Back to : RESEARCH, ANALYSIS, & DECISION SCIENCE

How is a  Sensitivity Analysis Used?

Sensitivity analysis is popularly known as what-if  analysis since the technique is used to measure varying outcomes using alternative assumptions and conditions across a range of independent variables. At the risk of oversimplification, sensitivity analysis can be said to observe changes in behavior for every change brought to the model. Sensitivity analysis can either be local or global. There are certain parameters that analysts need to be mindful of when undertaking such an activity. For starters, it is essential to determine the input variables for which the values will be altered during the analysis. Secondly, it is also necessary to ascertain how many variables will be affected at any given point in time. Thirdly, maximum and minimum values need to be assigned to all pertinent variables before the analysis commences. Lastly, analysts must scrutinize correlations between input and output and assign values to the combination accordingly. The values that can be altered include technical parameters, the number of activities and constraints, and the overall objective with respect to both the assumed risk as well as expected profits. Stipulated observations include the value of the objective with respect to the strategy, the values of the various decision variables, and the value of the objective function between two adopted strategies.

Steps in Sensitivity Analysis

Once the values of the input variables have been determined, sensitivity analysis can be performed in the following steps:

  • Defining the base case output : The first step is to define the corresponding base case output for the base case input value for which the sensitivity is to be measured.
  • Determining the new output value : During this step, we determine the value of the output for a new value of the input, given that the values of all other inputs are constant.
  • Calculating the change : We then compute the percentage changes in the output as well as the input.
  • Calculating sensitivity : This final step calculates sensitivity by dividing the percentage change in output by the percentage change in input.

Variations in Assumptions

Please enable JavaScript

Humix

Applications of Sensitivity Analysis

Sensitivity analysis has a wide variety of applications from something as trivial as planning a road trip to developing business models. Below are some of its most common applications.

  • Sensitivity analysis is used in the study of Black Box Processes , which are processes that can be analyzed on the basis of their inputs and outputs, without having to determine the complexities of their inner workings.
  • It is employed in Robust decision-making (RDM) frameworks in order to assess the robustness of the results of a model under epistemic situations that involve uncertainty.
  • It is used in the development of evolved models by identifying and analyzing correlations between observations, inputs and forecasts.
  • It is utilized in reducing uncertainty in models by identifying and omitting inputs that bring about significant uncertainty in the output.
  • Sensitivity analysis is also commonly used as a tool for model simplification by identifying and omitting inputs that are redundant or do not have any significant effect on the output.
  • It is used to generate sustainable, coherent as well as compelling recommendations that aim to enhance communication between modelers and decision makers.

Sensitivity analysis has become an integral part of Policy Impact Assessments (IAs) conducted by both national as well as international agencies.

Illustration of Sensitivity Analysis

Let us assume that a company C1 is involved in the manufacture and sale of snow plows. Joe, a sales analyst at the company is trying to understand the impact of an early advent of winter on total sales of snow plows. Company analysts have already determined that sales volume typically peaks during the last quarter of the year, i.e. during the months October through December. This increase in sales is driven by the anticipation of snowfall during late December, January and early February. However, Joe has determined from historical sales figures that during forecasts of early winter, snow plow sales have also peaked accordingly. For calendar years that have had snowfall 15 days earlier than usual, there has been a five percent rise in total sales volume. Based on this simple equation, Joe is able to construct a financial model as well as perform sensitivity analysis utilizing various what-if scenarios. According to Joes sensitivity analysis, whenever snowfall precedes the norm by 21, 15 and nine days, the total snow plow sales of C1 can also be expected to increase by seven, five and three percent respectively.

Related Articles

  • Rescaled Range Analysis - Explained
  • Variance Analysis - Explained
  • Weighted Average - Explained
  • Empirical Probability - Explained

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 23, Issue 1
  • What are sensitivity and specificity?
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

This article has a correction. Please see:

  • Correction: What are sensitivity and specificity? - April 01, 2022

Download PDF

  • http://orcid.org/0000-0001-5632-4926 Amelia Swift 1 ,
  • Roberta Heale 2 ,
  • http://orcid.org/0000-0003-1130-5603 Alison Twycross 3
  • 1 Nursing , University of Birmingham , Birmingham , UK
  • 2 School of Nursing , Laurentian University , Sudbury , Ontario , Canada
  • 3 Independent Consultant in Nurse Education and Research , Aylesbury , Buckinghamshire , UK
  • Correspondence to Dr Amelia Swift, Nursing, university of birmingham, Birmingham B15 2TT, UK; meliswift{at}gmail.com

https://doi.org/10.1136/ebnurs-2019-103225

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Whenever we create a test to screen for a disease, to detect an abnormality or to measure a physiological parameter such as blood pressure (BP), we must determine how valid that test is—does it measure what it sets out to measure accurately? There are lots of factors that combine to describe how valid a test is: sensitivity and specificity are two such factors. We often think of sensitivity and specificity as being ways to indicate the accuracy of the test or measure.

In the clinical setting, screening is used to decide which patients are more likely to have a condition. There is often a ‘gold-standard’ screening test—one that is considered the best to use because it is the most accurate. The gold standard test, when compared with other options, is most likely to correctly identify people with the disease (it is specific), and correctly identify those who do not have the disease (it is sensitive). When a test has a sensitivity of 0.8 or 80% it can correctly identify 80% of people who have the disease, but it misses 20%. This smaller group of people have the disease, but the test failed to detect them—this is known as a false negative. A test that has an 80% specificity can correctly identify 80% of people in a group that do not have a disease, but it will misidentify 20% of people. That group of 20% will be identified as having the disease when they do not, this is known as a false positive. See box 1 for definitions of common terms used when describing sensitivity and specificity.

Common terms

Sensitivity: the ability of a test to correctly identify patients with a disease.

Specificity: the ability of a test to correctly identify people without the disease.

True positive: the person has the disease and the test is positive.

True negative: the person does not have the disease and the test is negative.

False positive: the person does not have the disease and the test is positive.

False negative: the person has the disease and the test is negative.

Prevalence: the percentage of people in a population who have the condition of interest.

These terms are easier to visualise. In our first example Disease D is present in 30% of the population ( figure 1 ).

  • Download figure
  • Open in new tab
  • Download powerpoint

Prevalence of 30% (figure adapted from Loong 13 ). Each square represents a person. The red circle represents a person with Disease D. A blank circle represents a person without Disease D.

We want a screening test that will pick out as many of the people with Disease D as possible—we want the test to have high specificity. Figure 2 illustrates a test result.

Test result.

Sensitivity is calculated based on how many people have the disease (not the whole population). It can be calculated using the equation: sensitivity=number of true positives/(number of true positives+number of false negatives). Specificity is calculated based on how many people do not have the disease. It can be calculated using the equation: specificity=number of true negatives/(number of true negatives+number of false positives). If you are mathematically minded you will notice that we are calculating a ratio comparing the number of correct results with the total number of tests done. An example is provided in box 2 .

Calculation of sensitivity and specificity from figure 2 test result

In our example ( figure 2 ):

Sensitivity=18/(18+12)=0.6

Specificity=58/(58+12)=0.82

Because percentages are easy to understand we multiply sensitivity and specificity figures by 100. We can then discuss sensitivity and specificity as percentages. So, in our example, the sensitivity is 60% and the specificity is 82%. This test will correctly identify 60% of the people who have Disease D, but it will also fail to identify 40%. The test will correctly identify 82% who do not have the disease, but it will also identify 18% of people as having the disease when they do not. These are good numbers when we compare with some screening tests for which there are high stakes outcomes. A good example of this is screening for cervical cell changes that might indicate a high likelihood of cancer.

Meta-analysis suggests that the cervical smear or pap test has a sensitivity of between 30%–87% and a specificity of 86%–100%. 1 This means that up to 70% of women who have cervical abnormality will not be detected by this screening test. This is a poor performing test and has led to a suggestion that we add in or switch instead to screening for high-risk variants of the human papilloma virus, which has a higher sensitivity. 2 However, low sensitivity can be compensated for by frequent screening, which is why most cervical screening policies rely on women attending every three tofive years.

There is a risk that a test with high specificity will capture some people who do not have Disease D ( figure 3 ). The screening test in figure 2 will capture all those who have the disease but also many who do not. This will cause anxiety and unnecessary follow-up for well people. This phenomenon is currently a concern in medicine, discussed as over-detection, over-diagnosis and over-treatment—together these could be described as over-medicalisation. Over-detection is the identification of an abnormality that causes concern but if left untreated is unlikely to cause harm. Mammography, the radiographic detection of potential breast tumours, is thought to have an over-detection rate of between 7% and 32%. 3 The emotional and economic costs of this have led to the development of decision-aids to help women make an informed decision about undergoing screening. 4

High specificity.

Let us consider some further examples. Imagine that you have 100 patients in your emergency department (ED) waiting room who have all presented with an acute ankle injury. Ankle injuries are very common, but fractures are only present in approximately 15% of cases. 5 The gold standard test for an ankle fracture is an X-ray but because so few ankle injuries are fractures it is considered inappropriate to X-ray everyone. Doing so would result in unnecessary exposure to X-rays, lengthy waits for patients, and added expense. However, it is important to be able to identify fractures so that the most appropriate management strategy can be applied. Therefore, we need a way to determine who is most likely to have a fracture, and then we can send only those patients for X-ray confirmation. In 1992 a group of Canadian physicians created a set of rules, called the Ottawa ankle rules, 6 which can be used by the clinician to decide who needs an X-ray and have been incorporated into national guidance in many countries. 7

The Canadian group examined many features associated with ankle injury to see which were most predictive of fracture and determined that just four were required relating to tenderness in particular areas and an inability to weight-bear. When these rules are applied clinically, they have been shown (in a systematic review) to correctly identify approximately 96% of people who have a fracture and to correctly rule out between 10% and 70% of those who do not have a fracture. 8 The wide range of sensitivity is likely to be due to differences in the education of the clinicians involved in the studies from which those results derive. We can use our 100 patients waiting in the ED to show how these figures are calculated. We know from the research that approximately 15 people out of the 100 waiting will have an ankle fracture, the rest will have various strains and sprains. A specificity of 96% means that when the rules are applied almost everyone who has a fracture will be selected for an x-ray, which can be used to confirm the fracture and direct treatment. We can show this through a calculation. The prevalence of ankle fracture is 15%, so the true positive in our equation should be 15 out of 100 people in the ED. If the specificity is 95% we can substitute the numbers we know into the equation that was given earlier to help us find out what the number we do not know is. The number we do not know is the number of false negatives - people who have an ankle fracture that these rules would miss. When we do this we find the number of false negatives is less than 1 in 100 (0.96 = 15/(15+x); x=0.63). A sensitivity of 10-70% means that the rules will correctly identify between 10 and 70%. Using the same process as before we can use the equation to determine how many false positives there might be - people who are thought to have a fracture who do not. The equation for the lower specificity (0.1=85/(85+x) =765) shows that up to 765 might be sent for an unnecessary x-ray. The equation for the higher spedificity (0.7=85/(85+x)=3), meaning only 36 people would be sent for an unnecessary x-ray. This illustrates something key about sensitivity and specificity—it is rare that a test achieves high scores for both and that it is important that the test is used accurately and consistently.

It is important to know and understand the clinical implications of the sensitivity and specificity of diagnostic tests. The Prostate Specific Antigen (PSA) is one example. This test has a sensitivity of 86% meaning it is good at detecting prostate cancer, but a specificity of only 33%, which means there are many false positive results. A PSA may be elevated for several reasons, including when there is an increased prostate volume, such as in benign prostatic hyperplasia. Two-thirds of men who have an elevated PSA do not have prostate cancer. Many countries have national guidelines to help providers identify men who would most benefit from a PSA, given the inaccuracy of the PSA. 9 However, it can be confusing for men who qualify whether or not to have the test and requires health promotion counselling by their healthcare provider.

It is also important to know and account for the sensitivity and specificity of a diagnostic test, or examination, when one is included in a research study. For example, researchers conducting studies where one variable is the measurement of BP must understand that the sensitivity and specificity vary considerably. Measurements of BP for patients with hypertension in clinics have sensitivity rates between 34% and 69% and specificity between 73% and 92%. Home measurements for hypertensive patients have sensitivity of 81%–88% and specificity of 55%–64%. 10 These wide variations mean that single measurements of BP have little diagnostic value. 11 and using them to determine effectiveness of a research intervention, or to allocate a patient to a treatment group in a research study would be misleading. Justice et al 12 articulate the issues succinctly:

If symptoms are to be recognized and effectively addressed in clinical research, they must be collected using sensitive, specific, reliable, and clinically meaningful methods.

In summary, an understanding of sensitivity and specificity of diagnostic and physical assessment tests is important from both a clinical as well as research perspective. This knowledge puts healthcare providers in a better position to counsel patients about screening, results and treatment. The constructs are not the easiest to understand or to communicate to others. However, patient-centred care, and the ethical requirement for autonomy demands that we support patients to make good decisions about whether to undergo screening, what the results might mean, the importance of regular attendance to maximise chance of detection, and the probability of the result being incorrect. Fallibility is not failure or an indicator of poor care but failing to equip patients with complete information is an example of failure to support informed consent.

  • de Kok IMCM ,
  • van Rosmalen J ,
  • Dillner J , et al
  • Jørgensen KJ ,
  • Mæhlen J , et al
  • Barratt A ,
  • Jansen J , et al
  • Stiell IG ,
  • Greenberg GH ,
  • McKnight RD , et al
  • National Institute for Health and Care Excellence
  • Bachmann LM
  • Prostate Cancer UK
  • Martin U , et al
  • Muntner P ,
  • Carey RM , et al
  • Justice AC ,
  • Rabeneck L ,
  • Hays RD , et al

Twitter @nurseswift, @robertaheale, @alitwy

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Patient consent for publication Not required.

Provenance and peer review Not commissioned; internally peer reviewed.

Linked Articles

  • Miscellaneous Correction: What are sensitivity and specificity? BMJ Publishing Group Ltd and RCN Publishing Company Ltd Evidence-Based Nursing 2022; 25 e1-e1 Published Online First: 22 Mar 2022. doi: 10.1136/ebnurs-2019-103225corr1

Read the full text or download the PDF:

IMAGES

  1. Sensitivity Analysis

    sensitivity analysis research definition

  2. Sensitivity Analysis Definition

    sensitivity analysis research definition

  3. Sensitivity Analysis

    sensitivity analysis research definition

  4. PPT

    sensitivity analysis research definition

  5. Sensitivity Analysis- Your amazing guide towards Success

    sensitivity analysis research definition

  6. Sensitivity Analysis

    sensitivity analysis research definition

VIDEO

  1. Mastering Sensitivity Analysis Analysis

  2. Sensitivity Analysis Transportation Problem

  3. Sensitivity Analysis

  4. sensitivity analysis (Bsc 6th sem Operation Research) variation in objective function coefficient

  5. Sensitivity Analysis

  6. CIMA P1 Sensitivity Analysis Example

COMMENTS

  1. A tutorial on sensitivity analyses in clinical trials: the what, why

    What is a sensitivity analysis in clinical research? Sensitivity Analysis (SA) is defined as "a method to determine the robustness of an assessment by examining the extent to which results are affected by changes in methods, models, values of unmeasured variables, or assumptions" with the aim of identifying "results that are most dependent on questionable or unsupported assumptions" [].

  2. Sensitivity analysis

    Sensitivity analysis. Sensitivity analysis is the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be divided and allocated to different sources of uncertainty in its inputs. A related practice is uncertainty analysis, which has a greater focus on uncertainty quantification and ...

  3. Sensitivity Analysis Definition

    Sensitivity Analysis: A sensitivity analysis is a technique used to determine how different values of an independent variable impact a particular dependent variable under a given set of ...

  4. Sensitivity Analysis: A Method to Promote Certainty and Transparency in

    What is a sensitivity analysis? Sensitivity analysis is a method used to evaluate the influence of alternative assumptions or analyses on the pre-specified research questions proposed (Deeks et al., 2021; Schneeweiss, 2006; Thabane et al., 2013).In other words, a sensitivity analysis is purposed to evaluate the validity and certainty of the primary methodological or analytic strategy.

  5. Sensitivity analysis in clinical trials: three criteria for a valid

    Sensitivity analysis examines the robustness of the result by conducting the analyses under a range of plausible assumptions about the methods, models, or data that differ from the assumptions ...

  6. PDF Chapter 11. Sensitivity Analysis

    definition can allow for a richer understanding of the data, even for models based on data from a randomized controlled trial. Selection Bias The assessment of selection bias through sensitivity analysis involves assumptions regarding inclusion or participation by potential subjects, and results can be highly sensitive to assumptions.

  7. Sensitivity Analysis: A Method to Promote Certainty and Transparency in

    and reporting of clinical research. Moving forward, nursing and health researchers should consider the use of sensitivity analyses during the study design phase, and include a priori sensitivity models into research protocols and registries. A discussion of the included sensitivity analyses should also be routinely.

  8. PDF Sensitivity Analysis in Observational Research: Introducing the E-Value

    Sensitivity analysis for unmeasured confounding Sensitivity analysis considers how strong an unmeasured confounder would have to be related to the treatment and the outcome to explain away the observed association. Numerous sensitivity analysis techniques have been developed for different statistical models (14-22,24-40). Often

  9. 9.7 Sensitivity analyses

    A sensitivity analysis is a repeat of the primary analysis or meta-analysis, substituting alternative decisions or ranges of values for decisions that were arbitrary or unclear. For example, if the eligibility of some studies in the meta-analysis is dubious because they do not contain full details, sensitivity analysis may involve undertaking ...

  10. A tutorial on sensitivity analyses in clinical trials: the what, why

    Sensitivity analyses play a crucial role in assessing the robustness of the findings or conclusions based on primary analyses of data in clinical trials. They are a critical way to assess the impact, effect or influence of key assumptions or variations—such as different methods of analysis, definitions of outcomes, protocol deviations, missing data, and outliers—on the overall conclusions ...

  11. What is Sensitivity Analysis?

    Sensitivity Analysis is a tool used in financial modeling to analyze how the different values of a set of independent variables affect a specific dependent variable under certain specific conditions. In general, sensitivity analysis is used in a wide range of fields, ranging from biology and geography to economics and engineering.

  12. Sensitivity analysis: A review of recent advances

    The solution of several operations research problems requires the creation of a quantitative model. Sensitivity analysis is a crucial step in the model building and result communication process. Through sensitivity analysis we gain essential insights on model behavior, on its structure and on its response to changes in the model inputs.

  13. Introduction to Sensitivity Analysis

    Abstract. Sensitivity analysis provides users of mathematical and simulation models with tools to appreciate the dependency of the model output from model input and to investigate how important is each model input in determining its output. All application areas are concerned, from theoretical physics to engineering and socio-economics.

  14. Sensitivity Analysis: A Method to Promote Certainty and Transparency in

    Sensitivity analysis is a method used to evaluate the influence of alternative assumptions or analyses on the pre-specified research questions proposed (Deeks et al., 2021; Schneeweiss, 2006; Thabane et al., 2013).In other words, a sensitivity analysis is purposed to evaluate the validity and certainty of the primary methodological or analytic strategy.

  15. Sensitivity Analysis in Observational Research: Introducing the E-Value

    Sensitivity analysis is useful in assessing how robust an association is to potential unmeasured or uncontrolled confounding. This article introduces a new measure called the "E-value," which is related to the evidence for causality in observational studies that are potentially subject to confounding. The E-value is defined as the minimum ...

  16. Sensitivity Analysis Explained: Definitions, Formulas and Examples

    Sensitivity analysis is an indispensable tool utilized in corporate finance and business analysis to comprehend how the variability in key input variables influences the performance of a business. By methodically adjusting the inputs and observing the ensuing effect on outputs, analysts can discern which variables have the most profound impact ...

  17. How Is Sensitivity Analysis Used?

    Sensitivity analysis is an analysis method that is used to identify how much variations in the input values for a given variable will impact the results for a mathematical model.

  18. What Is a Sensitivity Analysis? Definition and Examples

    What is a sensitivity analysis? A sensitivity analysis, also referred to as a what-if analysis, is a mathematical tool used in scientific and financial modeling to study how uncertainties in a model affect that model's overall uncertainty. It's a way to determine what different values for an independent variable can do to affect a specific ...

  19. Sensitivity, Specificity, and Predictive Values: Foundations

    Determining Sensitivity, Specificity, and Predictive Values. When the adequacy, also known as the predictive power or predictive validity, of a screening test is being established, the outcomes yielded by that screening test are initially inspected to see whether they correspond to what is regarded as a definitive indicator, often referred to as a gold standard, of the same target condition.

  20. Sensitivity Analysis

    Sensitivity analysis, also referred to as simulation analysis, is a technique employed in financial modeling to determine how different values of a set of independent variables can influence a particular dependent variable under certain specific conditions and assumptions. It is used to ascertain how the overall uncertainty in the output of a ...

  21. Sensitivity Analysis

    A sensitivity analysis can be referred to as the "what if" analysis. Sensitivity analysis is used as a way to assess risk and pinpoint important business components. It provides a way to form a ...

  22. What are sensitivity and specificity?

    Common terms. Sensitivity: the ability of a test to correctly identify patients with a disease. Specificity: the ability of a test to correctly identify people without the disease. True positive: the person has the disease and the test is positive. True negative: the person does not have the disease and the test is negative.