CASP Checklists

How to use our CASP Checklists

Referencing and Creative Commons

  • Online Training Courses
  • CASP Workshops
  • What is Critical Appraisal
  • Study Designs
  • Useful Links
  • Bibliography
  • View all Tools and Resources
  • Testimonials

Critical Appraisal Checklists

We offer a number of free downloadable checklists to help you more easily and accurately perform critical appraisal across a number of different study types.

The CASP checklists are easy to understand but in case you need any further guidance on how they are structured, take a look at our guide on how to use our CASP checklists .

CASP Randomised Controlled Trial Checklist

  • Print & Fill

CASP Systematic Review Checklist

CASP Qualitative Studies Checklist

CASP Cohort Study Checklist

CASP Diagnostic Study Checklist

CASP Case Control Study Checklist

CASP Economic Evaluation Checklist

CASP Clinical Prediction Rule Checklist

Checklist Archive

  • CASP Randomised Controlled Trial Checklist 2018 fillable form
  • CASP Randomised Controlled Trial Checklist 2018

CASP Checklist

Need more information?

  • Online Learning
  • Privacy Policy

critical appraisal tool case study

Critical Appraisal Skills Programme

Critical Appraisal Skills Programme (CASP) will use the information you provide on this form to be in touch with you and to provide updates and marketing. Please let us know all the ways you would like to hear from us:

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.

Copyright 2024 CASP UK - OAP Ltd. All rights reserved Website by Beyond Your Brand

U.S. flag

An official website of the United States government

Here’s how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Heart-Healthy Living
  • High Blood Pressure
  • Sickle Cell Disease
  • Sleep Apnea
  • Information & Resources on COVID-19
  • The Heart Truth®
  • Learn More Breathe Better®
  • Blood Diseases and Disorders Education Program
  • Publications and Resources
  • Blood Disorders and Blood Safety
  • Sleep Science and Sleep Disorders
  • Lung Diseases
  • Health Disparities and Inequities
  • Heart and Vascular Diseases
  • Precision Medicine Activities
  • Obesity, Nutrition, and Physical Activity
  • Population and Epidemiology Studies
  • Women’s Health
  • Research Topics
  • Clinical Trials
  • All Science A-Z
  • Grants and Training Home
  • Policies and Guidelines
  • Funding Opportunities and Contacts
  • Training and Career Development
  • Email Alerts
  • NHLBI in the Press
  • Research Features
  • Past Events
  • Upcoming Events
  • Mission and Strategic Vision
  • Divisions, Offices and Centers
  • Advisory Committees
  • Budget and Legislative Information
  • Jobs and Working at the NHLBI
  • Contact and FAQs
  • NIH Sleep Research Plan
  • < Back To Health Topics

Study Quality Assessment Tools

In 2013, NHLBI developed a set of tailored quality assessment tools to assist reviewers in focusing on concepts that are key to a study’s internal validity. The tools were specific to certain study designs and tested for potential flaws in study methods or implementation. Experts used the tools during the systematic evidence review process to update existing clinical guidelines, such as those on cholesterol, blood pressure, and obesity. Their findings are outlined in the following reports:

  • Assessing Cardiovascular Risk: Systematic Evidence Review from the Risk Assessment Work Group
  • Management of Blood Cholesterol in Adults: Systematic Evidence Review from the Cholesterol Expert Panel
  • Management of Blood Pressure in Adults: Systematic Evidence Review from the Blood Pressure Expert Panel
  • Managing Overweight and Obesity in Adults: Systematic Evidence Review from the Obesity Expert Panel

While these tools have not been independently published and would not be considered standardized, they may be useful to the research community. These reports describe how experts used the tools for the project. Researchers may want to use the tools for their own projects; however, they would need to determine their own parameters for making judgements. Details about the design and application of the tools are included in Appendix A of the reports.

Quality Assessment of Controlled Intervention Studies - Study Quality Assessment Tools

*CD, cannot determine; NA, not applicable; NR, not reported

Guidance for Assessing the Quality of Controlled Intervention Studies

The guidance document below is organized by question number from the tool for quality assessment of controlled intervention studies.

Question 1. Described as randomized

Was the study described as randomized? A study does not satisfy quality criteria as randomized simply because the authors call it randomized; however, it is a first step in determining if a study is randomized

Questions 2 and 3. Treatment allocation–two interrelated pieces

Adequate randomization: Randomization is adequate if it occurred according to the play of chance (e.g., computer generated sequence in more recent studies, or random number table in older studies). Inadequate randomization: Randomization is inadequate if there is a preset plan (e.g., alternation where every other subject is assigned to treatment arm or another method of allocation is used, such as time or day of hospital admission or clinic visit, ZIP Code, phone number, etc.). In fact, this is not randomization at all–it is another method of assignment to groups. If assignment is not by the play of chance, then the answer to this question is no. There may be some tricky scenarios that will need to be read carefully and considered for the role of chance in assignment. For example, randomization may occur at the site level, where all individuals at a particular site are assigned to receive treatment or no treatment. This scenario is used for group-randomized trials, which can be truly randomized, but often are "quasi-experimental" studies with comparison groups rather than true control groups. (Few, if any, group-randomized trials are anticipated for this evidence review.)

Allocation concealment: This means that one does not know in advance, or cannot guess accurately, to what group the next person eligible for randomization will be assigned. Methods include sequentially numbered opaque sealed envelopes, numbered or coded containers, central randomization by a coordinating center, computer-generated randomization that is not revealed ahead of time, etc. Questions 4 and 5. Blinding

Blinding means that one does not know to which group–intervention or control–the participant is assigned. It is also sometimes called "masking." The reviewer assessed whether each of the following was blinded to knowledge of treatment assignment: (1) the person assessing the primary outcome(s) for the study (e.g., taking the measurements such as blood pressure, examining health records for events such as myocardial infarction, reviewing and interpreting test results such as x ray or cardiac catheterization findings); (2) the person receiving the intervention (e.g., the patient or other study participant); and (3) the person providing the intervention (e.g., the physician, nurse, pharmacist, dietitian, or behavioral interventionist).

Generally placebo-controlled medication studies are blinded to patient, provider, and outcome assessors; behavioral, lifestyle, and surgical studies are examples of studies that are frequently blinded only to the outcome assessors because blinding of the persons providing and receiving the interventions is difficult in these situations. Sometimes the individual providing the intervention is the same person performing the outcome assessment. This was noted when it occurred.

Question 6. Similarity of groups at baseline

This question relates to whether the intervention and control groups have similar baseline characteristics on average especially those characteristics that may affect the intervention or outcomes. The point of randomized trials is to create groups that are as similar as possible except for the intervention(s) being studied in order to compare the effects of the interventions between groups. When reviewers abstracted baseline characteristics, they noted when there was a significant difference between groups. Baseline characteristics for intervention groups are usually presented in a table in the article (often Table 1).

Groups can differ at baseline without raising red flags if: (1) the differences would not be expected to have any bearing on the interventions and outcomes; or (2) the differences are not statistically significant. When concerned about baseline difference in groups, reviewers recorded them in the comments section and considered them in their overall determination of the study quality.

Questions 7 and 8. Dropout

"Dropouts" in a clinical trial are individuals for whom there are no end point measurements, often because they dropped out of the study and were lost to followup.

Generally, an acceptable overall dropout rate is considered 20 percent or less of participants who were randomized or allocated into each group. An acceptable differential dropout rate is an absolute difference between groups of 15 percentage points at most (calculated by subtracting the dropout rate of one group minus the dropout rate of the other group). However, these are general rates. Lower overall dropout rates are expected in shorter studies, whereas higher overall dropout rates may be acceptable for studies of longer duration. For example, a 6-month study of weight loss interventions should be expected to have nearly 100 percent followup (almost no dropouts–nearly everybody gets their weight measured regardless of whether or not they actually received the intervention), whereas a 10-year study testing the effects of intensive blood pressure lowering on heart attacks may be acceptable if there is a 20-25 percent dropout rate, especially if the dropout rate between groups was similar. The panels for the NHLBI systematic reviews may set different levels of dropout caps.

Conversely, differential dropout rates are not flexible; there should be a 15 percent cap. If there is a differential dropout rate of 15 percent or higher between arms, then there is a serious potential for bias. This constitutes a fatal flaw, resulting in a poor quality rating for the study.

Question 9. Adherence

Did participants in each treatment group adhere to the protocols for assigned interventions? For example, if Group 1 was assigned to 10 mg/day of Drug A, did most of them take 10 mg/day of Drug A? Another example is a study evaluating the difference between a 30-pound weight loss and a 10-pound weight loss on specific clinical outcomes (e.g., heart attacks), but the 30-pound weight loss group did not achieve its intended weight loss target (e.g., the group only lost 14 pounds on average). A third example is whether a large percentage of participants assigned to one group "crossed over" and got the intervention provided to the other group. A final example is when one group that was assigned to receive a particular drug at a particular dose had a large percentage of participants who did not end up taking the drug or the dose as designed in the protocol.

Question 10. Avoid other interventions

Changes that occur in the study outcomes being assessed should be attributable to the interventions being compared in the study. If study participants receive interventions that are not part of the study protocol and could affect the outcomes being assessed, and they receive these interventions differentially, then there is cause for concern because these interventions could bias results. The following scenario is another example of how bias can occur. In a study comparing two different dietary interventions on serum cholesterol, one group had a significantly higher percentage of participants taking statin drugs than the other group. In this situation, it would be impossible to know if a difference in outcome was due to the dietary intervention or the drugs.

Question 11. Outcome measures assessment

What tools or methods were used to measure the outcomes in the study? Were the tools and methods accurate and reliable–for example, have they been validated, or are they objective? This is important as it indicates the confidence you can have in the reported outcomes. Perhaps even more important is ascertaining that outcomes were assessed in the same manner within and between groups. One example of differing methods is self-report of dietary salt intake versus urine testing for sodium content (a more reliable and valid assessment method). Another example is using BP measurements taken by practitioners who use their usual methods versus using BP measurements done by individuals trained in a standard approach. Such an approach may include using the same instrument each time and taking an individual's BP multiple times. In each of these cases, the answer to this assessment question would be "no" for the former scenario and "yes" for the latter. In addition, a study in which an intervention group was seen more frequently than the control group, enabling more opportunities to report clinical events, would not be considered reliable and valid.

Question 12. Power calculation

Generally, a study's methods section will address the sample size needed to detect differences in primary outcomes. The current standard is at least 80 percent power to detect a clinically relevant difference in an outcome using a two-sided alpha of 0.05. Often, however, older studies will not report on power.

Question 13. Prespecified outcomes

Investigators should prespecify outcomes reported in a study for hypothesis testing–which is the reason for conducting an RCT. Without prespecified outcomes, the study may be reporting ad hoc analyses, simply looking for differences supporting desired findings. Investigators also should prespecify subgroups being examined. Most RCTs conduct numerous post hoc analyses as a way of exploring findings and generating additional hypotheses. The intent of this question is to give more weight to reports that are not simply exploratory in nature.

Question 14. Intention-to-treat analysis

Intention-to-treat (ITT) means everybody who was randomized is analyzed according to the original group to which they are assigned. This is an extremely important concept because conducting an ITT analysis preserves the whole reason for doing a randomized trial; that is, to compare groups that differ only in the intervention being tested. When the ITT philosophy is not followed, groups being compared may no longer be the same. In this situation, the study would likely be rated poor. However, if an investigator used another type of analysis that could be viewed as valid, this would be explained in the "other" box on the quality assessment form. Some researchers use a completers analysis (an analysis of only the participants who completed the intervention and the study), which introduces significant potential for bias. Characteristics of participants who do not complete the study are unlikely to be the same as those who do. The likely impact of participants withdrawing from a study treatment must be considered carefully. ITT analysis provides a more conservative (potentially less biased) estimate of effectiveness.

General Guidance for Determining the Overall Quality Rating of Controlled Intervention Studies

The questions on the assessment tool were designed to help reviewers focus on the key concepts for evaluating a study's internal validity. They are not intended to create a list that is simply tallied up to arrive at a summary judgment of quality.

Internal validity is the extent to which the results (effects) reported in a study can truly be attributed to the intervention being evaluated and not to flaws in the design or conduct of the study–in other words, the ability for the study to make causal conclusions about the effects of the intervention being tested. Such flaws can increase the risk of bias. Critical appraisal involves considering the risk of potential for allocation bias, measurement bias, or confounding (the mixture of exposures that one cannot tease out from each other). Examples of confounding include co-interventions, differences at baseline in patient characteristics, and other issues addressed in the questions above. High risk of bias translates to a rating of poor quality. Low risk of bias translates to a rating of good quality.

Fatal flaws: If a study has a "fatal flaw," then risk of bias is significant, and the study is of poor quality. Examples of fatal flaws in RCTs include high dropout rates, high differential dropout rates, no ITT analysis or other unsuitable statistical analysis (e.g., completers-only analysis).

Generally, when evaluating a study, one will not see a "fatal flaw;" however, one will find some risk of bias. During training, reviewers were instructed to look for the potential for bias in studies by focusing on the concepts underlying the questions in the tool. For any box checked "no," reviewers were told to ask: "What is the potential risk of bias that may be introduced by this flaw?" That is, does this factor cause one to doubt the results that were reported in the study?

NHLBI staff provided reviewers with background reading on critical appraisal, while emphasizing that the best approach to use is to think about the questions in the tool in determining the potential for bias in a study. The staff also emphasized that each study has specific nuances; therefore, reviewers should familiarize themselves with the key concepts.

Quality Assessment of Systematic Reviews and Meta-Analyses - Study Quality Assessment Tools

Guidance for Quality Assessment Tool for Systematic Reviews and Meta-Analyses

A systematic review is a study that attempts to answer a question by synthesizing the results of primary studies while using strategies to limit bias and random error.424 These strategies include a comprehensive search of all potentially relevant articles and the use of explicit, reproducible criteria in the selection of articles included in the review. Research designs and study characteristics are appraised, data are synthesized, and results are interpreted using a predefined systematic approach that adheres to evidence-based methodological principles.

Systematic reviews can be qualitative or quantitative. A qualitative systematic review summarizes the results of the primary studies but does not combine the results statistically. A quantitative systematic review, or meta-analysis, is a type of systematic review that employs statistical techniques to combine the results of the different studies into a single pooled estimate of effect, often given as an odds ratio. The guidance document below is organized by question number from the tool for quality assessment of systematic reviews and meta-analyses.

Question 1. Focused question

The review should be based on a question that is clearly stated and well-formulated. An example would be a question that uses the PICO (population, intervention, comparator, outcome) format, with all components clearly described.

Question 2. Eligibility criteria

The eligibility criteria used to determine whether studies were included or excluded should be clearly specified and predefined. It should be clear to the reader why studies were included or excluded.

Question 3. Literature search

The search strategy should employ a comprehensive, systematic approach in order to capture all of the evidence possible that pertains to the question of interest. At a minimum, a comprehensive review has the following attributes:

  • Electronic searches were conducted using multiple scientific literature databases, such as MEDLINE, EMBASE, Cochrane Central Register of Controlled Trials, PsychLit, and others as appropriate for the subject matter.
  • Manual searches of references found in articles and textbooks should supplement the electronic searches.

Additional search strategies that may be used to improve the yield include the following:

  • Studies published in other countries
  • Studies published in languages other than English
  • Identification by experts in the field of studies and articles that may have been missed
  • Search of grey literature, including technical reports and other papers from government agencies or scientific groups or committees; presentations and posters from scientific meetings, conference proceedings, unpublished manuscripts; and others. Searching the grey literature is important (whenever feasible) because sometimes only positive studies with significant findings are published in the peer-reviewed literature, which can bias the results of a review.

In their reviews, researchers described the literature search strategy clearly, and ascertained it could be reproducible by others with similar results.

Question 4. Dual review for determining which studies to include and exclude

Titles, abstracts, and full-text articles (when indicated) should be reviewed by two independent reviewers to determine which studies to include and exclude in the review. Reviewers resolved disagreements through discussion and consensus or with third parties. They clearly stated the review process, including methods for settling disagreements.

Question 5. Quality appraisal for internal validity

Each included study should be appraised for internal validity (study quality assessment) using a standardized approach for rating the quality of the individual studies. Ideally, this should be done by at least two independent reviewers appraised each study for internal validity. However, there is not one commonly accepted, standardized tool for rating the quality of studies. So, in the research papers, reviewers looked for an assessment of the quality of each study and a clear description of the process used.

Question 6. List and describe included studies

All included studies were listed in the review, along with descriptions of their key characteristics. This was presented either in narrative or table format.

Question 7. Publication bias

Publication bias is a term used when studies with positive results have a higher likelihood of being published, being published rapidly, being published in higher impact journals, being published in English, being published more than once, or being cited by others.425,426 Publication bias can be linked to favorable or unfavorable treatment of research findings due to investigators, editors, industry, commercial interests, or peer reviewers. To minimize the potential for publication bias, researchers can conduct a comprehensive literature search that includes the strategies discussed in Question 3.

A funnel plot–a scatter plot of component studies in a meta-analysis–is a commonly used graphical method for detecting publication bias. If there is no significant publication bias, the graph looks like a symmetrical inverted funnel.

Reviewers assessed and clearly described the likelihood of publication bias.

Question 8. Heterogeneity

Heterogeneity is used to describe important differences in studies included in a meta-analysis that may make it inappropriate to combine the studies.427 Heterogeneity can be clinical (e.g., important differences between study participants, baseline disease severity, and interventions); methodological (e.g., important differences in the design and conduct of the study); or statistical (e.g., important differences in the quantitative results or reported effects).

Researchers usually assess clinical or methodological heterogeneity qualitatively by determining whether it makes sense to combine studies. For example:

  • Should a study evaluating the effects of an intervention on CVD risk that involves elderly male smokers with hypertension be combined with a study that involves healthy adults ages 18 to 40? (Clinical Heterogeneity)
  • Should a study that uses a randomized controlled trial (RCT) design be combined with a study that uses a case-control study design? (Methodological Heterogeneity)

Statistical heterogeneity describes the degree of variation in the effect estimates from a set of studies; it is assessed quantitatively. The two most common methods used to assess statistical heterogeneity are the Q test (also known as the X2 or chi-square test) or I2 test.

Reviewers examined studies to determine if an assessment for heterogeneity was conducted and clearly described. If the studies are found to be heterogeneous, the investigators should explore and explain the causes of the heterogeneity, and determine what influence, if any, the study differences had on overall study results.

Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies - Study Quality Assessment Tools

Guidance for Assessing the Quality of Observational Cohort and Cross-Sectional Studies

The guidance document below is organized by question number from the tool for quality assessment of observational cohort and cross-sectional studies.

Question 1. Research question

Did the authors describe their goal in conducting this research? Is it easy to understand what they were looking to find? This issue is important for any scientific paper of any type. Higher quality scientific research explicitly defines a research question.

Questions 2 and 3. Study population

Did the authors describe the group of people from which the study participants were selected or recruited, using demographics, location, and time period? If you were to conduct this study again, would you know who to recruit, from where, and from what time period? Is the cohort population free of the outcomes of interest at the time they were recruited?

An example would be men over 40 years old with type 2 diabetes who began seeking medical care at Phoenix Good Samaritan Hospital between January 1, 1990 and December 31, 1994. In this example, the population is clearly described as: (1) who (men over 40 years old with type 2 diabetes); (2) where (Phoenix Good Samaritan Hospital); and (3) when (between January 1, 1990 and December 31, 1994). Another example is women ages 34 to 59 years of age in 1980 who were in the nursing profession and had no known coronary disease, stroke, cancer, hypercholesterolemia, or diabetes, and were recruited from the 11 most populous States, with contact information obtained from State nursing boards.

In cohort studies, it is crucial that the population at baseline is free of the outcome of interest. For example, the nurses' population above would be an appropriate group in which to study incident coronary disease. This information is usually found either in descriptions of population recruitment, definitions of variables, or inclusion/exclusion criteria.

You may need to look at prior papers on methods in order to make the assessment for this question. Those papers are usually in the reference list.

If fewer than 50% of eligible persons participated in the study, then there is concern that the study population does not adequately represent the target population. This increases the risk of bias.

Question 4. Groups recruited from the same population and uniform eligibility criteria

Were the inclusion and exclusion criteria developed prior to recruitment or selection of the study population? Were the same underlying criteria used for all of the subjects involved? This issue is related to the description of the study population, above, and you may find the information for both of these questions in the same section of the paper.

Most cohort studies begin with the selection of the cohort; participants in this cohort are then measured or evaluated to determine their exposure status. However, some cohort studies may recruit or select exposed participants in a different time or place than unexposed participants, especially retrospective cohort studies–which is when data are obtained from the past (retrospectively), but the analysis examines exposures prior to outcomes. For example, one research question could be whether diabetic men with clinical depression are at higher risk for cardiovascular disease than those without clinical depression. So, diabetic men with depression might be selected from a mental health clinic, while diabetic men without depression might be selected from an internal medicine or endocrinology clinic. This study recruits groups from different clinic populations, so this example would get a "no."

However, the women nurses described in the question above were selected based on the same inclusion/exclusion criteria, so that example would get a "yes."

Question 5. Sample size justification

Did the authors present their reasons for selecting or recruiting the number of people included or analyzed? Do they note or discuss the statistical power of the study? This question is about whether or not the study had enough participants to detect an association if one truly existed.

A paragraph in the methods section of the article may explain the sample size needed to detect a hypothesized difference in outcomes. You may also find a discussion of power in the discussion section (such as the study had 85 percent power to detect a 20 percent increase in the rate of an outcome of interest, with a 2-sided alpha of 0.05). Sometimes estimates of variance and/or estimates of effect size are given, instead of sample size calculations. In any of these cases, the answer would be "yes."

However, observational cohort studies often do not report anything about power or sample sizes because the analyses are exploratory in nature. In this case, the answer would be "no." This is not a "fatal flaw." It just may indicate that attention was not paid to whether the study was sufficiently sized to answer a prespecified question–i.e., it may have been an exploratory, hypothesis-generating study.

Question 6. Exposure assessed prior to outcome measurement

This question is important because, in order to determine whether an exposure causes an outcome, the exposure must come before the outcome.

For some prospective cohort studies, the investigator enrolls the cohort and then determines the exposure status of various members of the cohort (large epidemiological studies like Framingham used this approach). However, for other cohort studies, the cohort is selected based on its exposure status, as in the example above of depressed diabetic men (the exposure being depression). Other examples include a cohort identified by its exposure to fluoridated drinking water and then compared to a cohort living in an area without fluoridated water, or a cohort of military personnel exposed to combat in the Gulf War compared to a cohort of military personnel not deployed in a combat zone.

With either of these types of cohort studies, the cohort is followed forward in time (i.e., prospectively) to assess the outcomes that occurred in the exposed members compared to nonexposed members of the cohort. Therefore, you begin the study in the present by looking at groups that were exposed (or not) to some biological or behavioral factor, intervention, etc., and then you follow them forward in time to examine outcomes. If a cohort study is conducted properly, the answer to this question should be "yes," since the exposure status of members of the cohort was determined at the beginning of the study before the outcomes occurred.

For retrospective cohort studies, the same principal applies. The difference is that, rather than identifying a cohort in the present and following them forward in time, the investigators go back in time (i.e., retrospectively) and select a cohort based on their exposure status in the past and then follow them forward to assess the outcomes that occurred in the exposed and nonexposed cohort members. Because in retrospective cohort studies the exposure and outcomes may have already occurred (it depends on how long they follow the cohort), it is important to make sure that the exposure preceded the outcome.

Sometimes cross-sectional studies are conducted (or cross-sectional analyses of cohort-study data), where the exposures and outcomes are measured during the same timeframe. As a result, cross-sectional analyses provide weaker evidence than regular cohort studies regarding a potential causal relationship between exposures and outcomes. For cross-sectional analyses, the answer to Question 6 should be "no."

Question 7. Sufficient timeframe to see an effect

Did the study allow enough time for a sufficient number of outcomes to occur or be observed, or enough time for an exposure to have a biological effect on an outcome? In the examples given above, if clinical depression has a biological effect on increasing risk for CVD, such an effect may take years. In the other example, if higher dietary sodium increases BP, a short timeframe may be sufficient to assess its association with BP, but a longer timeframe would be needed to examine its association with heart attacks.

The issue of timeframe is important to enable meaningful analysis of the relationships between exposures and outcomes to be conducted. This often requires at least several years, especially when looking at health outcomes, but it depends on the research question and outcomes being examined.

Cross-sectional analyses allow no time to see an effect, since the exposures and outcomes are assessed at the same time, so those would get a "no" response.

Question 8. Different levels of the exposure of interest

If the exposure can be defined as a range (examples: drug dosage, amount of physical activity, amount of sodium consumed), were multiple categories of that exposure assessed? (for example, for drugs: not on the medication, on a low dose, medium dose, high dose; for dietary sodium, higher than average U.S. consumption, lower than recommended consumption, between the two). Sometimes discrete categories of exposure are not used, but instead exposures are measured as continuous variables (for example, mg/day of dietary sodium or BP values).

In any case, studying different levels of exposure (where possible) enables investigators to assess trends or dose-response relationships between exposures and outcomes–e.g., the higher the exposure, the greater the rate of the health outcome. The presence of trends or dose-response relationships lends credibility to the hypothesis of causality between exposure and outcome.

For some exposures, however, this question may not be applicable (e.g., the exposure may be a dichotomous variable like living in a rural setting versus an urban setting, or vaccinated/not vaccinated with a one-time vaccine). If there are only two possible exposures (yes/no), then this question should be given an "NA," and it should not count negatively towards the quality rating.

Question 9. Exposure measures and assessment

Were the exposure measures defined in detail? Were the tools or methods used to measure exposure accurate and reliable–for example, have they been validated or are they objective? This issue is important as it influences confidence in the reported exposures. When exposures are measured with less accuracy or validity, it is harder to see an association between exposure and outcome even if one exists. Also as important is whether the exposures were assessed in the same manner within groups and between groups; if not, bias may result.

For example, retrospective self-report of dietary salt intake is not as valid and reliable as prospectively using a standardized dietary log plus testing participants' urine for sodium content. Another example is measurement of BP, where there may be quite a difference between usual care, where clinicians measure BP however it is done in their practice setting (which can vary considerably), and use of trained BP assessors using standardized equipment (e.g., the same BP device which has been tested and calibrated) and a standardized protocol (e.g., patient is seated for 5 minutes with feet flat on the floor, BP is taken twice in each arm, and all four measurements are averaged). In each of these cases, the former would get a "no" and the latter a "yes."

Here is a final example that illustrates the point about why it is important to assess exposures consistently across all groups: If people with higher BP (exposed cohort) are seen by their providers more frequently than those without elevated BP (nonexposed group), it also increases the chances of detecting and documenting changes in health outcomes, including CVD-related events. Therefore, it may lead to the conclusion that higher BP leads to more CVD events. This may be true, but it could also be due to the fact that the subjects with higher BP were seen more often; thus, more CVD-related events were detected and documented simply because they had more encounters with the health care system. Thus, it could bias the results and lead to an erroneous conclusion.

Question 10. Repeated exposure assessment

Was the exposure for each person measured more than once during the course of the study period? Multiple measurements with the same result increase our confidence that the exposure status was correctly classified. Also, multiple measurements enable investigators to look at changes in exposure over time, for example, people who ate high dietary sodium throughout the followup period, compared to those who started out high then reduced their intake, compared to those who ate low sodium throughout. Once again, this may not be applicable in all cases. In many older studies, exposure was measured only at baseline. However, multiple exposure measurements do result in a stronger study design.

Question 11. Outcome measures

Were the outcomes defined in detail? Were the tools or methods for measuring outcomes accurate and reliable–for example, have they been validated or are they objective? This issue is important because it influences confidence in the validity of study results. Also important is whether the outcomes were assessed in the same manner within groups and between groups.

An example of an outcome measure that is objective, accurate, and reliable is death–the outcome measured with more accuracy than any other. But even with a measure as objective as death, there can be differences in the accuracy and reliability of how death was assessed by the investigators. Did they base it on an autopsy report, death certificate, death registry, or report from a family member? Another example is a study of whether dietary fat intake is related to blood cholesterol level (cholesterol level being the outcome), and the cholesterol level is measured from fasting blood samples that are all sent to the same laboratory. These examples would get a "yes." An example of a "no" would be self-report by subjects that they had a heart attack, or self-report of how much they weigh (if body weight is the outcome of interest).

Similar to the example in Question 9, results may be biased if one group (e.g., people with high BP) is seen more frequently than another group (people with normal BP) because more frequent encounters with the health care system increases the chances of outcomes being detected and documented.

Question 12. Blinding of outcome assessors

Blinding means that outcome assessors did not know whether the participant was exposed or unexposed. It is also sometimes called "masking." The objective is to look for evidence in the article that the person(s) assessing the outcome(s) for the study (for example, examining medical records to determine the outcomes that occurred in the exposed and comparison groups) is masked to the exposure status of the participant. Sometimes the person measuring the exposure is the same person conducting the outcome assessment. In this case, the outcome assessor would most likely not be blinded to exposure status because they also took measurements of exposures. If so, make a note of that in the comments section.

As you assess this criterion, think about whether it is likely that the person(s) doing the outcome assessment would know (or be able to figure out) the exposure status of the study participants. If the answer is no, then blinding is adequate. An example of adequate blinding of the outcome assessors is to create a separate committee, whose members were not involved in the care of the patient and had no information about the study participants' exposure status. The committee would then be provided with copies of participants' medical records, which had been stripped of any potential exposure information or personally identifiable information. The committee would then review the records for prespecified outcomes according to the study protocol. If blinding was not possible, which is sometimes the case, mark "NA" and explain the potential for bias.

Question 13. Followup rate

Higher overall followup rates are always better than lower followup rates, even though higher rates are expected in shorter studies, whereas lower overall followup rates are often seen in studies of longer duration. Usually, an acceptable overall followup rate is considered 80 percent or more of participants whose exposures were measured at baseline. However, this is just a general guideline. For example, a 6-month cohort study examining the relationship between dietary sodium intake and BP level may have over 90 percent followup, but a 20-year cohort study examining effects of sodium intake on stroke may have only a 65 percent followup rate.

Question 14. Statistical analyses

Were key potential confounding variables measured and adjusted for, such as by statistical adjustment for baseline differences? Logistic regression or other regression methods are often used to account for the influence of variables not of interest.

This is a key issue in cohort studies, because statistical analyses need to control for potential confounders, in contrast to an RCT, where the randomization process controls for potential confounders. All key factors that may be associated both with the exposure of interest and the outcome–that are not of interest to the research question–should be controlled for in the analyses.

For example, in a study of the relationship between cardiorespiratory fitness and CVD events (heart attacks and strokes), the study should control for age, BP, blood cholesterol, and body weight, because all of these factors are associated both with low fitness and with CVD events. Well-done cohort studies control for multiple potential confounders.

Some general guidance for determining the overall quality rating of observational cohort and cross-sectional studies

The questions on the form are designed to help you focus on the key concepts for evaluating the internal validity of a study. They are not intended to create a list that you simply tally up to arrive at a summary judgment of quality.

Internal validity for cohort studies is the extent to which the results reported in the study can truly be attributed to the exposure being evaluated and not to flaws in the design or conduct of the study–in other words, the ability of the study to draw associative conclusions about the effects of the exposures being studied on outcomes. Any such flaws can increase the risk of bias.

Critical appraisal involves considering the risk of potential for selection bias, information bias, measurement bias, or confounding (the mixture of exposures that one cannot tease out from each other). Examples of confounding include co-interventions, differences at baseline in patient characteristics, and other issues throughout the questions above. High risk of bias translates to a rating of poor quality. Low risk of bias translates to a rating of good quality. (Thus, the greater the risk of bias, the lower the quality rating of the study.)

In addition, the more attention in the study design to issues that can help determine whether there is a causal relationship between the exposure and outcome, the higher quality the study. These include exposures occurring prior to outcomes, evaluation of a dose-response gradient, accuracy of measurement of both exposure and outcome, sufficient timeframe to see an effect, and appropriate control for confounding–all concepts reflected in the tool.

Generally, when you evaluate a study, you will not see a "fatal flaw," but you will find some risk of bias. By focusing on the concepts underlying the questions in the quality assessment tool, you should ask yourself about the potential for bias in the study you are critically appraising. For any box where you check "no" you should ask, "What is the potential risk of bias resulting from this flaw in study design or execution?" That is, does this factor cause you to doubt the results that are reported in the study or doubt the ability of the study to accurately assess an association between exposure and outcome?

The best approach is to think about the questions in the tool and how each one tells you something about the potential for bias in a study. The more you familiarize yourself with the key concepts, the more comfortable you will be with critical appraisal. Examples of studies rated good, fair, and poor are useful, but each study must be assessed on its own based on the details that are reported and consideration of the concepts for minimizing bias.

Quality Assessment of Case-Control Studies - Study Quality Assessment Tools

Guidance for Assessing the Quality of Case-Control Studies

The guidance document below is organized by question number from the tool for quality assessment of case-control studies.

Did the authors describe their goal in conducting this research? Is it easy to understand what they were looking to find? This issue is important for any scientific paper of any type. High quality scientific research explicitly defines a research question.

Question 2. Study population

Did the authors describe the group of individuals from which the cases and controls were selected or recruited, while using demographics, location, and time period? If the investigators conducted this study again, would they know exactly who to recruit, from where, and from what time period?

Investigators identify case-control study populations by location, time period, and inclusion criteria for cases (individuals with the disease, condition, or problem) and controls (individuals without the disease, condition, or problem). For example, the population for a study of lung cancer and chemical exposure would be all incident cases of lung cancer diagnosed in patients ages 35 to 79, from January 1, 2003 to December 31, 2008, living in Texas during that entire time period, as well as controls without lung cancer recruited from the same population during the same time period. The population is clearly described as: (1) who (men and women ages 35 to 79 with (cases) and without (controls) incident lung cancer); (2) where (living in Texas); and (3) when (between January 1, 2003 and December 31, 2008).

Other studies may use disease registries or data from cohort studies to identify cases. In these cases, the populations are individuals who live in the area covered by the disease registry or included in a cohort study (i.e., nested case-control or case-cohort). For example, a study of the relationship between vitamin D intake and myocardial infarction might use patients identified via the GRACE registry, a database of heart attack patients.

NHLBI staff encouraged reviewers to examine prior papers on methods (listed in the reference list) to make this assessment, if necessary.

Question 3. Target population and case representation

In order for a study to truly address the research question, the target population–the population from which the study population is drawn and to which study results are believed to apply–should be carefully defined. Some authors may compare characteristics of the study cases to characteristics of cases in the target population, either in text or in a table. When study cases are shown to be representative of cases in the appropriate target population, it increases the likelihood that the study was well-designed per the research question.

However, because these statistics are frequently difficult or impossible to measure, publications should not be penalized if case representation is not shown. For most papers, the response to question 3 will be "NR." Those subquestions are combined because the answer to the second subquestion–case representation–determines the response to this item. However, it cannot be determined without considering the response to the first subquestion. For example, if the answer to the first subquestion is "yes," and the second, "CD," then the response for item 3 is "CD."

Question 4. Sample size justification

Did the authors discuss their reasons for selecting or recruiting the number of individuals included? Did they discuss the statistical power of the study and provide a sample size calculation to ensure that the study is adequately powered to detect an association (if one exists)? This question does not refer to a description of the manner in which different groups were included or excluded using the inclusion/exclusion criteria (e.g., "Final study size was 1,378 participants after exclusion of 461 patients with missing data" is not considered a sample size justification for the purposes of this question).

An article's methods section usually contains information on sample size and the size needed to detect differences in exposures and on statistical power.

Question 5. Groups recruited from the same population

To determine whether cases and controls were recruited from the same population, one can ask hypothetically, "If a control was to develop the outcome of interest (the condition that was used to select cases), would that person have been eligible to become a case?" Case-control studies begin with the selection of the cases (those with the outcome of interest, e.g., lung cancer) and controls (those in whom the outcome is absent). Cases and controls are then evaluated and categorized by their exposure status. For the lung cancer example, cases and controls were recruited from hospitals in a given region. One may reasonably assume that controls in the catchment area for the hospitals, or those already in the hospitals for a different reason, would attend those hospitals if they became a case; therefore, the controls are drawn from the same population as the cases. If the controls were recruited or selected from a different region (e.g., a State other than Texas) or time period (e.g., 1991-2000), then the cases and controls were recruited from different populations, and the answer to this question would be "no."

The following example further explores selection of controls. In a study, eligible cases were men and women, ages 18 to 39, who were diagnosed with atherosclerosis at hospitals in Perth, Australia, between July 1, 2000 and December 31, 2007. Appropriate controls for these cases might be sampled using voter registration information for men and women ages 18 to 39, living in Perth (population-based controls); they also could be sampled from patients without atherosclerosis at the same hospitals (hospital-based controls). As long as the controls are individuals who would have been eligible to be included in the study as cases (if they had been diagnosed with atherosclerosis), then the controls were selected appropriately from the same source population as cases.

In a prospective case-control study, investigators may enroll individuals as cases at the time they are found to have the outcome of interest; the number of cases usually increases as time progresses. At this same time, they may recruit or select controls from the population without the outcome of interest. One way to identify or recruit cases is through a surveillance system. In turn, investigators can select controls from the population covered by that system. This is an example of population-based controls. Investigators also may identify and select cases from a cohort study population and identify controls from outcome-free individuals in the same cohort study. This is known as a nested case-control study.

Question 6. Inclusion and exclusion criteria prespecified and applied uniformly

Were the inclusion and exclusion criteria developed prior to recruitment or selection of the study population? Were the same underlying criteria used for all of the groups involved? To answer this question, reviewers determined if the investigators developed I/E criteria prior to recruitment or selection of the study population and if they used the same underlying criteria for all groups. The investigators should have used the same selection criteria, except for study participants who had the disease or condition, which would be different for cases and controls by definition. Therefore, the investigators use the same age (or age range), gender, race, and other characteristics to select cases and controls. Information on this topic is usually found in a paper's section on the description of the study population.

Question 7. Case and control definitions

For this question, reviewers looked for descriptions of the validity of case and control definitions and processes or tools used to identify study participants as such. Was a specific description of "case" and "control" provided? Is there a discussion of the validity of the case and control definitions and the processes or tools used to identify study participants as such? They determined if the tools or methods were accurate, reliable, and objective. For example, cases might be identified as "adult patients admitted to a VA hospital from January 1, 2000 to December 31, 2009, with an ICD-9 discharge diagnosis code of acute myocardial infarction and at least one of the two confirmatory findings in their medical records: at least 2mm of ST elevation changes in two or more ECG leads and an elevated troponin level. Investigators might also use ICD-9 or CPT codes to identify patients. All cases should be identified using the same methods. Unless the distinction between cases and controls is accurate and reliable, investigators cannot use study results to draw valid conclusions.

Question 8. Random selection of study participants

If a case-control study did not use 100 percent of eligible cases and/or controls (e.g., not all disease-free participants were included as controls), did the authors indicate that random sampling was used to select controls? When it is possible to identify the source population fairly explicitly (e.g., in a nested case-control study, or in a registry-based study), then random sampling of controls is preferred. When investigators used consecutive sampling, which is frequently done for cases in prospective studies, then study participants are not considered randomly selected. In this case, the reviewers would answer "no" to Question 8. However, this would not be considered a fatal flaw.

If investigators included all eligible cases and controls as study participants, then reviewers marked "NA" in the tool. If 100 percent of cases were included (e.g., NA for cases) but only 50 percent of eligible controls, then the response would be "yes" if the controls were randomly selected, and "no" if they were not. If this cannot be determined, the appropriate response is "CD."

Question 9. Concurrent controls

A concurrent control is a control selected at the time another person became a case, usually on the same day. This means that one or more controls are recruited or selected from the population without the outcome of interest at the time a case is diagnosed. Investigators can use this method in both prospective case-control studies and retrospective case-control studies. For example, in a retrospective study of adenocarcinoma of the colon using data from hospital records, if hospital records indicate that Person A was diagnosed with adenocarcinoma of the colon on June 22, 2002, then investigators would select one or more controls from the population of patients without adenocarcinoma of the colon on that same day. This assumes they conducted the study retrospectively, using data from hospital records. The investigators could have also conducted this study using patient records from a cohort study, in which case it would be a nested case-control study.

Investigators can use concurrent controls in the presence or absence of matching and vice versa. A study that uses matching does not necessarily mean that concurrent controls were used.

Question 10. Exposure assessed prior to outcome measurement

Investigators first determine case or control status (based on presence or absence of outcome of interest), and then assess exposure history of the case or control; therefore, reviewers ascertained that the exposure preceded the outcome. For example, if the investigators used tissue samples to determine exposure, did they collect them from patients prior to their diagnosis? If hospital records were used, did investigators verify that the date a patient was exposed (e.g., received medication for atherosclerosis) occurred prior to the date they became a case (e.g., was diagnosed with type 2 diabetes)? For an association between an exposure and an outcome to be considered causal, the exposure must have occurred prior to the outcome.

Question 11. Exposure measures and assessment

Were the exposure measures defined in detail? Were the tools or methods used to measure exposure accurate and reliable–for example, have they been validated or are they objective? This is important, as it influences confidence in the reported exposures. Equally important is whether the exposures were assessed in the same manner within groups and between groups. This question pertains to bias resulting from exposure misclassification (i.e., exposure ascertainment).

For example, a retrospective self-report of dietary salt intake is not as valid and reliable as prospectively using a standardized dietary log plus testing participants' urine for sodium content because participants' retrospective recall of dietary salt intake may be inaccurate and result in misclassification of exposure status. Similarly, BP results from practices that use an established protocol for measuring BP would be considered more valid and reliable than results from practices that did not use standard protocols. A protocol may include using trained BP assessors, standardized equipment (e.g., the same BP device which has been tested and calibrated), and a standardized procedure (e.g., patient is seated for 5 minutes with feet flat on the floor, BP is taken twice in each arm, and all four measurements are averaged).

Question 12. Blinding of exposure assessors

Blinding or masking means that outcome assessors did not know whether participants were exposed or unexposed. To answer this question, reviewers examined articles for evidence that the outcome assessor(s) was masked to the exposure status of the research participants. An outcome assessor, for example, may examine medical records to determine the outcomes that occurred in the exposed and comparison groups. Sometimes the person measuring the exposure is the same person conducting the outcome assessment. In this case, the outcome assessor would most likely not be blinded to exposure status. A reviewer would note such a finding in the comments section of the assessment tool.

One way to ensure good blinding of exposure assessment is to have a separate committee, whose members have no information about the study participants' status as cases or controls, review research participants' records. To help answer the question above, reviewers determined if it was likely that the outcome assessor knew whether the study participant was a case or control. If it was unlikely, then the reviewers marked "no" to Question 12. Outcome assessors who used medical records to assess exposure should not have been directly involved in the study participants' care, since they probably would have known about their patients' conditions. If the medical records contained information on the patient's condition that identified him/her as a case (which is likely), that information would have had to be removed before the exposure assessors reviewed the records.

If blinding was not possible, which sometimes happens, the reviewers marked "NA" in the assessment tool and explained the potential for bias.

Question 13. Statistical analysis

Were key potential confounding variables measured and adjusted for, such as by statistical adjustment for baseline differences? Investigators often use logistic regression or other regression methods to account for the influence of variables not of interest.

This is a key issue in case-controlled studies; statistical analyses need to control for potential confounders, in contrast to RCTs in which the randomization process controls for potential confounders. In the analysis, investigators need to control for all key factors that may be associated with both the exposure of interest and the outcome and are not of interest to the research question.

A study of the relationship between smoking and CVD events illustrates this point. Such a study needs to control for age, gender, and body weight; all are associated with smoking and CVD events. Well-done case-control studies control for multiple potential confounders.

Matching is a technique used to improve study efficiency and control for known confounders. For example, in the study of smoking and CVD events, an investigator might identify cases that have had a heart attack or stroke and then select controls of similar age, gender, and body weight to the cases. For case-control studies, it is important that if matching was performed during the selection or recruitment process, the variables used as matching criteria (e.g., age, gender, race) should be controlled for in the analysis.

General Guidance for Determining the Overall Quality Rating of Case-Controlled Studies

NHLBI designed the questions in the assessment tool to help reviewers focus on the key concepts for evaluating a study's internal validity, not to use as a list from which to add up items to judge a study's quality.

Internal validity for case-control studies is the extent to which the associations between disease and exposure reported in the study can truly be attributed to the exposure being evaluated rather than to flaws in the design or conduct of the study. In other words, what is ability of the study to draw associative conclusions about the effects of the exposures on outcomes? Any such flaws can increase the risk of bias.

In critical appraising a study, the following factors need to be considered: risk of potential for selection bias, information bias, measurement bias, or confounding (the mixture of exposures that one cannot tease out from each other). Examples of confounding include co-interventions, differences at baseline in patient characteristics, and other issues addressed in the questions above. High risk of bias translates to a poor quality rating; low risk of bias translates to a good quality rating. Again, the greater the risk of bias, the lower the quality rating of the study.

In addition, the more attention in the study design to issues that can help determine whether there is a causal relationship between the outcome and the exposure, the higher the quality of the study. These include exposures occurring prior to outcomes, evaluation of a dose-response gradient, accuracy of measurement of both exposure and outcome, sufficient timeframe to see an effect, and appropriate control for confounding–all concepts reflected in the tool.

If a study has a "fatal flaw," then risk of bias is significant; therefore, the study is deemed to be of poor quality. An example of a fatal flaw in case-control studies is a lack of a consistent standard process used to identify cases and controls.

Generally, when reviewers evaluated a study, they did not see a "fatal flaw," but instead found some risk of bias. By focusing on the concepts underlying the questions in the quality assessment tool, reviewers examined the potential for bias in the study. For any box checked "no," reviewers asked, "What is the potential risk of bias resulting from this flaw in study design or execution?" That is, did this factor lead to doubt about the results reported in the study or the ability of the study to accurately assess an association between exposure and outcome?

By examining questions in the assessment tool, reviewers were best able to assess the potential for bias in a study. Specific rules were not useful, as each study had specific nuances. In addition, being familiar with the key concepts helped reviewers assess the studies. Examples of studies rated good, fair, and poor were useful, yet each study had to be assessed on its own.

Quality Assessment Tool for Before-After (Pre-Post) Studies With No Control Group - Study Quality Assessment Tools

Guidance for Assessing the Quality of Before-After (Pre-Post) Studies With No Control Group

Question 1. Study question

Question 2. Eligibility criteria and study population

Did the authors describe the eligibility criteria applied to the individuals from whom the study participants were selected or recruited? In other words, if the investigators were to conduct this study again, would they know whom to recruit, from where, and from what time period?

Here is a sample description of a study population: men over age 40 with type 2 diabetes, who began seeking medical care at Phoenix Good Samaritan Hospital, between January 1, 2005 and December 31, 2007. The population is clearly described as: (1) who (men over age 40 with type 2 diabetes); (2) where (Phoenix Good Samaritan Hospital); and (3) when (between January 1, 2005 and December 31, 2007). Another sample description is women who were in the nursing profession, who were ages 34 to 59 in 1995, had no known CHD, stroke, cancer, hypercholesterolemia, or diabetes, and were recruited from the 11 most populous States, with contact information obtained from State nursing boards.

To assess this question, reviewers examined prior papers on study methods (listed in reference list) when necessary.

Question 3. Study participants representative of clinical populations of interest

The participants in the study should be generally representative of the population in which the intervention will be broadly applied. Studies on small demographic subgroups may raise concerns about how the intervention will affect broader populations of interest. For example, interventions that focus on very young or very old individuals may affect middle-aged adults differently. Similarly, researchers may not be able to extrapolate study results from patients with severe chronic diseases to healthy populations.

Question 4. All eligible participants enrolled

To further explore this question, reviewers may need to ask: Did the investigators develop the I/E criteria prior to recruiting or selecting study participants? Were the same underlying I/E criteria used for all research participants? Were all subjects who met the I/E criteria enrolled in the study?

Question 5. Sample size

Did the authors present their reasons for selecting or recruiting the number of individuals included or analyzed? Did they note or discuss the statistical power of the study? This question addresses whether there was a sufficient sample size to detect an association, if one did exist.

An article's methods section may provide information on the sample size needed to detect a hypothesized difference in outcomes and a discussion on statistical power (such as, the study had 85 percent power to detect a 20 percent increase in the rate of an outcome of interest, with a 2-sided alpha of 0.05). Sometimes estimates of variance and/or estimates of effect size are given, instead of sample size calculations. In any case, if the reviewers determined that the power was sufficient to detect the effects of interest, then they would answer "yes" to Question 5.

Question 6. Intervention clearly described

Another pertinent question regarding interventions is: Was the intervention clearly defined in detail in the study? Did the authors indicate that the intervention was consistently applied to the subjects? Did the research participants have a high level of adherence to the requirements of the intervention? For example, if the investigators assigned a group to 10 mg/day of Drug A, did most participants in this group take the specific dosage of Drug A? Or did a large percentage of participants end up not taking the specific dose of Drug A indicated in the study protocol?

Reviewers ascertained that changes in study outcomes could be attributed to study interventions. If participants received interventions that were not part of the study protocol and could affect the outcomes being assessed, the results could be biased.

Question 7. Outcome measures clearly described, valid, and reliable

Were the outcomes defined in detail? Were the tools or methods for measuring outcomes accurate and reliable–for example, have they been validated or are they objective? This question is important because the answer influences confidence in the validity of study results.

An example of an outcome measure that is objective, accurate, and reliable is death–the outcome measured with more accuracy than any other. But even with a measure as objective as death, differences can exist in the accuracy and reliability of how investigators assessed death. For example, did they base it on an autopsy report, death certificate, death registry, or report from a family member? Another example of a valid study is one whose objective is to determine if dietary fat intake affects blood cholesterol level (cholesterol level being the outcome) and in which the cholesterol level is measured from fasting blood samples that are all sent to the same laboratory. These examples would get a "yes."

An example of a "no" would be self-report by subjects that they had a heart attack, or self-report of how much they weight (if body weight is the outcome of interest).

Question 8. Blinding of outcome assessors

Blinding or masking means that the outcome assessors did not know whether the participants received the intervention or were exposed to the factor under study. To answer the question above, the reviewers examined articles for evidence that the person(s) assessing the outcome(s) was masked to the participants' intervention or exposure status. An outcome assessor, for example, may examine medical records to determine the outcomes that occurred in the exposed and comparison groups. Sometimes the person applying the intervention or measuring the exposure is the same person conducting the outcome assessment. In this case, the outcome assessor would not likely be blinded to the intervention or exposure status. A reviewer would note such a finding in the comments section of the assessment tool.

In assessing this criterion, the reviewers determined whether it was likely that the person(s) conducting the outcome assessment knew the exposure status of the study participants. If not, then blinding was adequate. An example of adequate blinding of the outcome assessors is to create a separate committee whose members were not involved in the care of the patient and had no information about the study participants' exposure status. Using a study protocol, committee members would review copies of participants' medical records, which would be stripped of any potential exposure information or personally identifiable information, for prespecified outcomes.

Question 9. Followup rate

Higher overall followup rates are always desirable to lower followup rates, although higher rates are expected in shorter studies, and lower overall followup rates are often seen in longer studies. Usually an acceptable overall followup rate is considered 80 percent or more of participants whose interventions or exposures were measured at baseline. However, this is a general guideline.

In accounting for those lost to followup, in the analysis, investigators may have imputed values of the outcome for those lost to followup or used other methods. For example, they may carry forward the baseline value or the last observed value of the outcome measure and use these as imputed values for the final outcome measure for research participants lost to followup.

Question 10. Statistical analysis

Were formal statistical tests used to assess the significance of the changes in the outcome measures between the before and after time periods? The reported study results should present values for statistical tests, such as p values, to document the statistical significance (or lack thereof) for the changes in the outcome measures found in the study.

Question 11. Multiple outcome measures

Were the outcome measures for each person measured more than once during the course of the before and after study periods? Multiple measurements with the same result increase confidence that the outcomes were accurately measured.

Question 12. Group-level interventions and individual-level outcome efforts

Group-level interventions are usually not relevant for clinical interventions such as bariatric surgery, in which the interventions are applied at the individual patient level. In those cases, the questions were coded as "NA" in the assessment tool.

General Guidance for Determining the Overall Quality Rating of Before-After Studies

The questions in the quality assessment tool were designed to help reviewers focus on the key concepts for evaluating the internal validity of a study. They are not intended to create a list from which to add up items to judge a study's quality.

Internal validity is the extent to which the outcome results reported in the study can truly be attributed to the intervention or exposure being evaluated, and not to biases, measurement errors, or other confounding factors that may result from flaws in the design or conduct of the study. In other words, what is the ability of the study to draw associative conclusions about the effects of the interventions or exposures on outcomes?

Critical appraisal of a study involves considering the risk of potential for selection bias, information bias, measurement bias, or confounding (the mixture of exposures that one cannot tease out from each other). Examples of confounding include co-interventions, differences at baseline in patient characteristics, and other issues throughout the questions above. High risk of bias translates to a rating of poor quality; low risk of bias translates to a rating of good quality. Again, the greater the risk of bias, the lower the quality rating of the study.

In addition, the more attention in the study design to issues that can help determine if there is a causal relationship between the exposure and outcome, the higher quality the study. These issues include exposures occurring prior to outcomes, evaluation of a dose-response gradient, accuracy of measurement of both exposure and outcome, and sufficient timeframe to see an effect.

Generally, when reviewers evaluate a study, they will not see a "fatal flaw," but instead will find some risk of bias. By focusing on the concepts underlying the questions in the quality assessment tool, reviewers should ask themselves about the potential for bias in the study they are critically appraising. For any box checked "no" reviewers should ask, "What is the potential risk of bias resulting from this flaw in study design or execution?" That is, does this factor lead to doubt about the results reported in the study or doubt about the ability of the study to accurately assess an association between the intervention or exposure and the outcome?

The best approach is to think about the questions in the assessment tool and how each one reveals something about the potential for bias in a study. Specific rules are not useful, as each study has specific nuances. In addition, being familiar with the key concepts will help reviewers be more comfortable with critical appraisal. Examples of studies rated good, fair, and poor are useful, but each study must be assessed on its own.

Quality Assessment Tool for Case Series Studies - Study Quality Assessment Tools

Background: development and use - study quality assessment tools.

Learn more about the development and use of Study Quality Assessment Tools.

Last updated: July, 2021

Methodological quality of case series studies: an introduction to the JBI critical appraisal tool

Affiliations.

  • 1 JBI, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, Australia.
  • 2 The George Institute for Global Health, Telangana, India.
  • 3 Australian Institute of Health Innovation, Faculty of Medicine and Health Sciences, Sydney, NSW, Australia.
  • PMID: 33038125
  • DOI: 10.11124/JBISRIR-D-19-00099

Introduction: Systematic reviews provide a rigorous synthesis of the best available evidence regarding a certain question. Where high-quality evidence is lacking, systematic reviewers may choose to rely on case series studies to provide information in relation to their question. However, to date there has been limited guidance on how to incorporate case series studies within systematic reviews assessing the effectiveness of an intervention, particularly with reference to assessing the methodological quality or risk of bias of these studies.

Methods: An international working group was formed to review the methodological literature regarding case series as a form of evidence for inclusion in systematic reviews. The group then developed a critical appraisal tool based on the epidemiological literature relating to bias within these studies. This was then piloted, reviewed, and approved by JBI's international Scientific Committee.

Results: The JBI critical appraisal tool for case series studies includes 10 questions addressing the internal validity and risk of bias of case series designs, particularly confounding, selection, and information bias, in addition to the importance of clear reporting.

Conclusion: In certain situations, case series designs may represent the best available evidence to inform clinical practice. The JBI critical appraisal tool for case series offers systematic reviewers an approved method to assess the methodological quality of these studies.

  • Research Design*
  • Systematic Reviews as Topic

Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Nuffield Department of Primary Care Health Sciences, University of Oxford

Critical Appraisal tools

Critical appraisal worksheets to help you appraise the reliability, importance and applicability of clinical evidence.

Critical appraisal is the systematic evaluation of clinical research papers in order to establish:

  • Does this study address a  clearly focused question ?
  • Did the study use valid methods to address this question?
  • Are the valid results of this study important?
  • Are these valid, important results applicable to my patient or population?

If the answer to any of these questions is “no”, you can save yourself the trouble of reading the rest of it.

This section contains useful tools and downloads for the critical appraisal of different types of medical evidence. Example appraisal sheets are provided together with several helpful examples.

Critical Appraisal Worksheets

  • Systematic Reviews  Critical Appraisal Sheet
  • Diagnostics  Critical Appraisal Sheet
  • Prognosis  Critical Appraisal Sheet
  • Randomised Controlled Trials  (RCT) Critical Appraisal Sheet
  • Critical Appraisal of Qualitative Studies  Sheet
  • IPD Review  Sheet

Chinese - translated by Chung-Han Yang and Shih-Chieh Shao

  • Systematic Reviews  Critical Appraisal Sheet
  • Diagnostic Study  Critical Appraisal Sheet
  • Prognostic Critical Appraisal Sheet
  • RCT  Critical Appraisal Sheet
  • IPD reviews Critical Appraisal Sheet
  • Qualitative Studies Critical Appraisal Sheet 

German - translated by Johannes Pohl and Martin Sadilek

  • Systematic Review  Critical Appraisal Sheet
  • Diagnosis Critical Appraisal Sheet
  • Prognosis Critical Appraisal Sheet
  • Therapy / RCT Critical Appraisal Sheet

Lithuanian - translated by Tumas Beinortas

  • Systematic review appraisal Lithuanian (PDF)
  • Diagnostic accuracy appraisal Lithuanian  (PDF)
  • Prognostic study appraisal Lithuanian  (PDF)
  • RCT appraisal sheets Lithuanian  (PDF)

Portugese - translated by Enderson Miranda, Rachel Riera and Luis Eduardo Fontes

  • Portuguese – Systematic Review Study Appraisal Worksheet
  • Portuguese – Diagnostic Study Appraisal Worksheet
  • Portuguese – Prognostic Study Appraisal Worksheet
  • Portuguese – RCT Study Appraisal Worksheet
  • Portuguese – Systematic Review Evaluation of Individual Participant Data Worksheet
  • Portuguese – Qualitative Studies Evaluation Worksheet

Spanish - translated by Ana Cristina Castro

  • Systematic Review  (PDF)
  • Diagnosis  (PDF)
  • Prognosis  Spanish Translation (PDF)
  • Therapy / RCT  Spanish Translation (PDF)

Persian - translated by Ahmad Sofi Mahmudi

  • Prognosis  (PDF)
  • PICO  Critical Appraisal Sheet (PDF)
  • PICO Critical Appraisal Sheet (MS-Word)
  • Educational Prescription  Critical Appraisal Sheet (PDF)

Explanations & Examples

  • Pre-test probability
  • SpPin and SnNout
  • Likelihood Ratios

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Health Policy Manag
  • v.3(3); 2014 Aug

The development of a critical appraisal tool for use in systematic reviews addressing questions of prevalence

Background: Recently there has been a significant increase in the number of systematic reviews addressing questions of prevalence. Key features of a systematic review include the creation of an a priori protocol, clear inclusion criteria, a structured and systematic search process, critical appraisal of studies, and a formal process of data extraction followed by methods to synthesize, or combine, this data. Currently there exists no standard method for conducting critical appraisal of studies in systematic reviews of prevalence data.

Methods: A working group was created to assess current critical appraisal tools for studies reporting prevalence data and develop a new tool for these studies in systematic reviews of prevalence. Following the development of this tool it was piloted amongst an experienced group of sixteen healthcare researchers.

Results: The results of the pilot found that this tool was a valid approach to assessing the methodological quality of studies reporting prevalence data to be included in systematic reviews. Participants found the tool acceptable and easy to use. Some comments were provided which helped refine the criteria.

Conclusion: The results of this pilot study found that this tool was well-accepted by users and further refinements have been made to the tool based on their feedback. We now put forward this tool for use by authors conducting prevalence systematic reviews.

Introduction

The prevalence of a disease indicates the number of people in a population that have the disease at a given point in time ( 1 ). The accurate measurement of disease burden among populations, whether at a local, national, or global level, is of critical importance for governments, policy-makers, health professionals and the general population to inform the development, delivery and use of health services. For example, accurate information regarding measures of disease can assist in planning management of disease services (by ensuring sufficient resources are available to cope with the burden of disease), set priorities regarding public health initiatives, and evaluate changes and trends in diseases over time. However, policy-makers are often faced with conflicting reports of disease prevalence in the literature.

The systematic review of evidence has been proposed and is now well-accepted as the ideal method to summarize the literature relating to a certain social or healthcare topic ( 2 , 3 ). The systematic review can provide a reliable summary of the literature to inform decision-makers in a timely fashion. Key features of a systematic review include the creation of an a priori protocol, clear inclusion criteria, a comprehensive and systematic search process, the critical appraisal of studies, and a formal process of data extraction followed by methods to synthesize, or combine, this data ( 4 ). In this way, systematic reviews extend beyond the subjective, narrative reporting characteristics of a traditional literature review to provide a comprehensive, rigorous, and transparent synthesis of the literature on a certain topic.

Historically, systematic reviews have predominantly focused on the synthesis of quantitative evidence to establish the effects of interventions. In the last five years, there has been a substantial increase in the number of systematic reviews addressing questions of prevalence ( Figure 1 ). However, currently there does not appear to be any formal guidance for authors wishing to conduct a review of prevalence. Consequently, there is significant variability in the methods used to conduct these reviews.

An external file that holds a picture, illustration, etc.
Object name is IJHPM-3-123-g001.jpg

Number of systematic reviews of prevalence by year of publication identified in a PubMed search

The Joanna Briggs Institute (JBI) and the Cochrane Collaboration are evidence-based organizations that were formed to develop methodologies and guidance on the process of conducting systematic reviews ( 2 , 5 – 8 ). In 2012, a working group was formed within the Joanna Briggs Institute to evaluate systematic reviews of prevalence and develop guidance for researchers wishing to conduct such reviews.

The group identified that the major area where prevalence reviews were disparate was in their conduct of critical appraisal or quality assessment of included studies. For example, whilst some reviews used instruments that were appropriate for reviews of prevalence data ( 9 , 10 ), others used instruments or criteria not designed to critically appraise studies reporting prevalence (such as reporting guidelines, study design specific tools, or self-developed criteria for their review question) ( 11 – 13 ), or refrained from conducting a formal quality assessment altogether ( 14 , 15 ). Therefore, the working group sought to address this gap by developing and testing a critical appraisal form that could be used for studies included in systematic reviews of prevalence data.

Materials and methods

Developing the tool.

The working party began by conducting a search for systematic reviews of prevalence data to determine how the methodological quality of studies included in these reviews were assessed. The group then searched for critical appraisal tools that have been used to assess studies reporting on prevalence data. A number of tools were identified including the Joanna Briggs Institute’s Descriptive/Case series critical appraisal tool. A non-exhaustive list is shown in Table 1 . Critical appraisal tools from the Cochrane Collaboration and the Critical Appraisal Skills Program (CASP) were also identified.

Although many of these checklists identified important criteria, it was felt by the group that none of these tools were complete and ideal for use during assessment of quality during the systematic review process. Based on a review of these criteria and our own knowledge and research we developed a tool specifically for use in systematic reviews of prevalence data. This tool was initially trialed by the working group and refined until it was deemed ready for further external review. Details on how to answer each question in the tool are available in Appendix 1 and the Joanna Briggs Institute guidance on conducting prevalence and incidence reviews ( Table 2 ) ( 21 ).

Pilot testing

A pilot of the tool was conducted during the 2013 Joanna Briggs Institute convention in Adelaide during October of that year. A workshop was held on systematic reviews of prevalence and incidence where attendees were given a cross-sectional study ( 22 ) to appraise with the new tool, along with a short survey that was developed to establish the face validity, ease of use, acceptability and timeliness (i.e. time taken to complete) of the tool, and feedback on areas for improvement ( 23 ). The questions asked and how they were measured is reported in Table 3 .

Sixteen workshop participants completed the critical appraisal task and survey. Of the 16, 13 participants stated they had an academic/research background, 2 said they had a health background, and one said they had both. The average time spent working in research was 11 years, with the minimum being 2.5 years and the maximum experience being 30 years.

For ease of use of the critical appraisal tool, the mean score on the 5-point Likert Scale was 3.63, with the majority (75%) of participants providing a rating of 4, corresponding to ‘easy’. For the acceptability of the tool, the mean score was 4.33, with all participants giving either a ranking of 4 (acceptable) or 5 (very acceptable). For timeliness, the mean score was 3.94, with 88% providing a score of 4 (acceptable). Out of all the participants, all except 1 viewed the tool as a valid quality appraisal checklist for prevalence data ( Table 4 ).

There were a number of suggestions provided for refinement and improvement of the tool. These comments resulted in some changes in the order of the questions of the tool and the supporting information used to assist in judging criteria, although no changes were made to the individual questions.

Systematic reviews of prevalence and incidence data are becoming increasingly important as decision makers realize their usefulness in informing policy and practice. These reviews have the potential to better support healthcare professionals, policy-makers, and consumers in making evidence-based decisions that effectively target and address burden of disease issues both now and in to the future.

The conduct of a systematic review is a scientific exercise that produces results which may influence healthcare decisions. As such, reviews are required to have the same rigor expected of all research. The quality of a review, and any recommendations for practice that may arise, depends on the extent to which scientific review methods are followed to minimize the risk of error and bias. The explicit and rigorous methods of the process distinguish systematic reviews from traditional reviews of the literature ( 2 ).

Systematic reviews normally rely on the use of critical appraisal checklists that are tailored to assess the quality of a particular study design. For example, there may be separate checklists used to appraise randomized controlled trials, cohort studies, cross-sectional studies and so on. Prevalence data can be sourced from various study designs, even randomized controlled trials ( 11 ); however, critical appraisal tools directed at assessing the risk of bias of randomized controlled trials are aimed at assessing biases related to causal effects and hence are not appropriate for reviews examining the prevalence of a condition. For example, criteria regarding the use of an intention-to-treat analysis as often seen in critical appraisal checklists for randomized controlled trials are not a true quality indicator for questions of prevalence.

Due to this, a new tool assessing validity and quality indicators specific to issues of prevalence has been developed. This checklist addresses critical issues of internal and external validity that must be considered when assessing validity of prevalence data that can be used across different study designs (not just cross-sectional studies but all studies that might report prevalence data). The criteria address the following issues:

  • Ensuring a representative sample.
  • Ensuring appropriate recruitment.
  • Ensuring an adequate sample size.
  • Ensuring appropriate description and reporting of study subjects and setting.
  • Ensuring data coverage of the identified sample is adequate.
  • Ensuring the condition was measured reliably and objectively.
  • Ensuring appropriate statistical analysis.
  • Ensuring confounding factors/subgroups/differences are identified and accounted for.

A pilot test of this tool amongst a group of experienced healthcare professionals and researchers found that this tool had face validity and high levels of acceptability, ease of use and timeliness to complete. The initial results of this pilot testing are encouraging. This tool now needs to be tested further in a larger scale study to assess its other clinimetric properties, particularly its construct validity and inter-rater reliability.

We have developed this tool as we did not feel that any of the current checklists identified from our search sufficiently addressed important quality issues in prevalence studies. Some of the tools [most notably the tool refined by Hoy et al. ( 20 )] contain similar questions to our tool but there are important differences. For example, we provide a criteria regarding sample size which is not included in the Hoy et al. checklist. Our tool also has the advantage of being simple, easy and quick as shown during the pilot testing. This tool will now be incorporated into the next version of the Joanna Briggs Institute’s systematic review package.

Critical appraisal is a pivotal step in the process of systematic reviews. As reviews of questions addressing prevalence become more well-known, critical appraisal tools addressing studies reporting prevalence data are needed. Following a search of the literature a new tool has been proposed that can be used for studies reporting prevalence data, developed by a working party within the Joanna Briggs Institute. The results of this pilot study found that this tool was well-accepted by users and further refinements have been made to the tool based on their feedback. We now put forward this tool for use by authors conducting prevalence systematic reviews.

Acknowledgements

The authors would like to acknowledge the participants of the workshop and the wider staff of the Joanna Briggs Institute and its collaboration for their feedback.

Ethical issues

No ethical issues are raised.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

ZM lead the methodological group and drafted the paper. SM, DR, and KL were members of the working group and provided substantial input regarding its development and testing.

Key messages

Implications for policy makers.

  • Until now there has been substantial variability in how studies reporting prevalence data are critically appraised. The tool proposed within this paper can be considered as a valid option for researchers and policy-makers when conducting systematic reviews of prevalence.
  • This tool guides the assessment of internal and external validity of studies reporting prevalence data.

Implications for public

Systematic reviews are of great importance to provide a critical summary of the research and inform evidencebased practice. Prevalence systematic reviews are becoming increasingly popular within the research community. This article proposes a new tool that can be used during the conduct of these types of systematic reviews to critically appraise studies to ensure that their results are valid.

Prevalence Critical Appraisal Instrument

The 10 criteria used to assess the methodological quality of studies reporting prevalence data and an explanation are described below. These questions can be answered either with a yes, no, unclear, or not applicable.

Answers: Yes, No, Unclear or Not/Applicable

1. Was the sample representative of the target population?

This question relies upon knowledge of the broader characteristics of the population of interest. If the study is of women with breast cancer, knowledge of at least the characteristics, demographics, and medical history is needed. The term “target population” should not be taken to infer every individual from everywhere or with similar disease or exposure characteristics. Instead, give consideration to specific population characteristics in the study, including age range, gender, morbidities, medications, and other potentially influential factors. For example, a sample may not be representative of the target population if a certain group has been used (such as those working for one organisation, or one profession) and the results then inferred to the target population (i.e. working adults).

2. Were study participants recruited in an appropriate way?

Recruitment is the calling or advertising strategy for gaining interest in the study, and is not the same as sampling. Studies may report random sampling from a population, and the methods section should report how sampling was performed. What source of data were study participants recruited from? Was the sampling frame appropriate? For example, census data is a good example of appropriate recruitment as a good census will identify everybody. Was everybody included who should have been included? Were any groups of persons excluded? Was the whole population of interest surveyed? If not, was random sampling from a defined subset of the population employed? Was stratified random sampling with eligibility criteria used to ensure the sample was representative of the population that the researchers were generalizing to?

3. Was the sample size adequate?

An adequate sample size is important to ensure good precision of the final estimate. Ideally we are looking for evidence that the authors conducted a sample size calculation to determine an adequate sample size. This will estimate how many subjects are needed to produce a reliable estimate of the measure(s) of interest. For conditions with a low prevalence, a larger sample size is needed. Also consider sample sizes for subgroup (or characteristics) analyses, and whether these are appropriate. Sometimes, the study will be large enough (as in large national surveys) whereby a sample size calculation is not required. In these cases, sample size can be considered adequate.

When there is no sample size calculation and it is not a large national survey, the reviewers may consider conducting their own sample size analysis using the following formula ( 24 , 25 ):

n= sample size

Z= Z statistic for a level of confidence

P= Expected prevalence or proportion (in proportion of one; if 20%, P= 0.2)

d= precision (in proportion of one; if 5%, d= 0.05)

4. Were the study subjects and setting described in detail?

Certain diseases or conditions vary in prevalence across different geographic regions and populations (e.g. women vs. men, socio-demographic variables between countries). Has the study sample been described in sufficient detail so that other researchers can determine if it is comparable to the population of interest to them?

5. Is the data analysis conducted with sufficient coverage of the identified sample?

A large number of dropouts, refusals or “not founds” amongst selected subjects may diminish a study’s validity, as can low response rates for survey studies.

- Did the authors describe the reasons for non-response and compare persons in the study to those not in the study, particularly with regards to their socio-demographic characteristics?

- Could the not-responders have led to an underestimate of prevalence of the disease or condition under investigation?

- If reasons for non-response appear to be unrelated to the outcome measured and the characteristics of non-responders are comparable to those in the study, the researchers may be able to justify a more modest response rate.

- Did the means of assessment or measurement negatively affect the response rate (measurement should be easily accessible, conveniently timed for participants, acceptable in length, and suitable in content).

6. Were objective, standard criteria used for measurement of the condition?

Here we are looking for measurement or classification bias. Many health problems are not easily diagnosed or defined and some measures may not be capable of including or excluding appropriate levels or stages of the health problem. If the outcomes were assessed based on existing definitions or diagnostic criteria, then the answer to this question is likely to be yes. If the outcomes were assessed using observer reported, or self-reported scales, the risk of over- or under-reporting is increased, and objectivity is compromised. Importantly, determine if the measurement tools used were validated instruments as this has a significant impact on outcome assessment validity.

7. Was the condition measured reliably?

Considerable judgment is required to determine the presence of some health outcomes. Having established the objectivity of the outcome measurement instrument (see item 6 of this scale), it is important to establish how the measurement was conducted. Were those involved in collecting data trained or educated in the use of the instrument/s? If there was more than one data collector, were they similar in terms of level of education, clinical or research experience, or level of responsibility in the piece of research being appraised? - Has the researcher justified the methods chosen? - Has the researcher made the methods explicit? (For interview method, how were interviews conducted?)

8. Was there appropriate statistical analysis?

As with any consideration of statistical analysis, consideration should be given to whether there was a more appropriate alternate statistical method that could have been used. The methods section should be detailed enough for reviewers to identify the analytical technique used and how specific variables were measured. Additionally, it is also important to assess the appropriateness of the analytical strategy in terms of the assumptions associated with the approach as differing methods of analysis are based on differing assumptions about the data and how it will respond. Prevalence rates found in studies only provide estimates of the true prevalence of a problem in the larger population. Since some subgroups are very small, 95% confidence intervals are usually given.

9. Are all important confounding factors/ subgroups/differences identified and accounted for?

Incidence and prevalence studies often draw or report findings regarding the differences between groups. It is important that authors of these studies identify all important confounding factors, subgroups and differences and account for these.

10. Were subpopulations identified using objective criteria?

Objective criteria should also be used where possible to identify subgroups (refer to question 6).

Citation: Munn Z, Moola S, Riitano D, Lisy K. The development of a critical appraisal tool for use in systematic reviews addressing questions of prevalence. Int J Health Policy Manag 2014; 3: 123–128. doi: 10.15171/ijhpm.2014.71

  • Interlibrary Loan

Ask an Expert

Ask an expert about access to resources, publishing, grants, and more.

MD Anderson faculty and staff can also request a one-on-one consultation with a librarian or scientific editor.

  • Library Calendar

Log in to the Library's remote access system using your MyID account.

The University of Texas MD Anderson Cancer Center Home

  • Library Home
  • Research Guides
  • Systematic Reviews

Appraising Studies

Systematic reviews: appraising studies.

  • Introduction
  • Step-by-Step
  • Standards and Guidelines
  • Designing the Protocol
  • Selecting Studies
  • Data Extraction
  • Writing a Systematic Review
  • Meta-analyses
  • Evaluating a Systematic Review
  • Updating a Systematic Review
  • Cochrane Systematic Reviews
  • Get more help
  • Related Review Types

You need to evaluate the quality of each study you include in your systematic review. This critical appraisal is often called "quality assessment" or "risk of bias assessment."  Your evaluations are shown in a table format in your article. There are several tools to help you evaluate various types of studies and create a table.  

Evaluation Tools for Controlled Trials

Cochrane's ROB 2 Risk of Bias Tool   evaluates randomized controlled trials on seven items: 

  • Random sequence generation
  • Allocation concealment
  • Blinding of participants and personnel
  • Blinding of outcome assessment
  • Incomplete outcome data
  • Selective reporting
  • Other possible sources of bias

For more information on risk of bias, see the  Cochrane Handbook,  Chapter 8: Assessing risk of bias in included  studies. 

Jadad Scale :  assigns trials a score of 0-5 based on the quality of randomization, blinding, and withdrawls

Robvis : a free online tool to create high-quality tables that summarize the risk of bias across studies with red, green, and yellow circles. 

Evaluation Tools for NON Randomized Controlled Trials

There are dozens of Risk of Bias Tools for various types of studies. For a comprehensive list, download the Excel spreadsheet at the Risk of Bias Tool Repository .  A few of the more common ones are:

CASP Checklists    appraise  RCTs and non-RCTs such as cohort, case control, economic evaluations, diagnostic studies, qualitative studies, and clinical prediction rules. 

Cochrane's  ROBINS-I tool  evaluates cohort studies, quasi-randomized trials, case-control studies, cross-sectional studies, interrupted time series, and controlled before-after studies. 

Newcastle-Ottawa Scale:    appraises case-control or cohort studies. 

NIH Study Quality Assessment Tools  evaluate controlled intervention studies, systematic reviews, meta-analyses, observational cohort and cross-sectional studies, case-control studies, and before-after (pre-post) studies with no control groups.

JBI Critical Appraisal Tools: evaluate cross-sectional, case control, case reports, case series, cohort, diagnostic test accuracy, economic evaluations, economic, prevalence, qualitative, quasi-experimental, and systematic reviews. 

QuADS Quality Appraisal for Diverse Studies:   an appraisal tool for systematic reviews which include a variety of mixed and multi-method studies

Guidelines for Assessing Studies

Assessing the Risk of Bias in Systematic Reviews of Health Care Interventions ( from AHRQ  Methods Guide for Comparative Effectiveness Reviews)

  • << Previous: Selecting Studies
  • Next: Data Extraction >>
  • Last Updated: Apr 10, 2024 1:16 PM
  • URL: https://mdanderson.libguides.com/systematicreviews

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Systematic Review
  • Open access
  • Published: 13 April 2024

Risk of childhood neoplasms related to neonatal phototherapy- a systematic review and meta-analysis

  • Ilari Kuitunen   ORCID: orcid.org/0000-0001-8178-9610 1 , 2 ,
  • Atte Nikkilä 3 , 4 ,
  • Panu Kiviranta 1 , 2 , 5 ,
  • Johanna Jääskeläinen 1 &
  • Anssi Auvinen 6  

Pediatric Research ( 2024 ) Cite this article

Metrics details

Observational studies have shown conflicting results as to whether exposure to neonatal phototherapy is associated with increased rates of childhood cancer.

To describe the rates of childhood neoplasms and cancer after neonatal phototherapy.

Data sources

The CENTRAL, PubMed, Scopus, and Web of Science databases.

Study selection

Observational studies regardless of design were included.

Data extraction

The data were extracted by one author and validated by another. The risk-of-bias assessment was performed using the ROBINS-E and Joanna Briggs Institute critical appraisal tools.

Six cohort and 10 case-control studies were included. The overall risk of bias was high in seven and low in nine studies. In cohort studies, the odds ratio (OR) was increased for hematopoietic cancer (1.44; confidence interval [CI]: 1.16–1.80) and solid tumors (OR: 1.18; CI: 1.00–1.40). In case-control studies, the OR was 1.63 (CI: 0.99–2.67) for hematopoietic cancers and 1.18 (CI: 1.04–1.34) for solid tumors.

Conclusions

Children with a history of neonatal phototherapy had increased risk of hematopoietic cancer and solid tumors. The evidence quality was limited due to the high risk of bias and potential residual confounding.

Impact statement

Exposure to neonatal phototherapy increased later risk of hematopoietic cancer and solid tumors.

This is the most comprehensive study on the association between phototherapy and cancer, but the evidence quality was limited due risk of bias and residual confounding.

Future large scale well conducted studies are still needed to better estimate the association and.

Introduction

Neonatal jaundice is a common condition during the first month of life, as approximately 70% of neonates have some level of jaundice, and 5% to 10% require phototherapy for treatment of unconjugated hyperbilirubinemia. 1 , 2 , 3 Phototherapy is commonly used to decrease bilirubin levels in order to avoid the neurotoxic effects of high bilirubin levels. Some of the known risk factors for unconjugated hyperbilirubinemia requiring phototherapy are maternal red blood cell antibodies, prematurity, birth injuries, hereditary factors (ethnicity and a history of phototherapy in older siblings), and maternal obesity. 3 , 4 , 5

Phototherapy has been associated with some short-term adverse events, such as rash, dehydration, and difficulties with breastfeeding, 6 , 7 as well as with long-term risks, such as allergies and seizure disorders. 8 , 9 , 10 Phototherapy has been suggested to cause DNA damage and promote reactive oxygen species and proinflammatory cytokines, which could lead to an increased cancer risk. 11 In addition, phototherapy has been associated with increased incidence of café-au-lait macules in children but not with melanocytic nevi. 12 , 13 Previous studies have shown conflicting results regarding the possible increased incidence of childhood cancers following neonatal phototherapy. In some cohort studies, children exposed to phototherapy had an increased risk of all childhood cancers, 14 , 15 whereas no such excess was reported in other studies. 16 , 17 It has also been speculated that there may be an association between hyperbilirubinemia and malignancies. Therefore, the association between phototherapy may be due to higher bilirubin levels or other maternal/neonatal factors that increase the risk for both hyperbilirubinemia and neoplasms. As phototherapy is an effective and frequently used therapy for neonatal unconjugated hyperbilirubinemia, 18 evidence summaries on possible long-term risk are of clinical relevance. A recent meta-analysis reported an increased risk for solid cancers among children treated with phototherapy, but the authors included benign nevi in their analysis and pooled case-control and cohort studies together, which caused a notable heterogeneity in their results. 19 The aim of this systematic review was to provide a systematic assessment of the incidence of cancer and neoplasms after neonatal phototherapy.

Search process

The literature search was performed on June 28, 2022. We searched the PubMed (MEDLINE), Web of Science, CENTRAL, and Scopus databases for these search terms: (neonat* OR newborn* OR infant*) AND (phototherapy OR hyperbilirubinemia OR jaundice) AND (cancer or malign* OR leukemia OR leukaemia OR lymphoma* OR tumor* or neoplasm*). Additional articles were included if found in the references of the included articles and assessed suitable for the review and analysis. We did not search other sources and decided not to include gray literature. The full search strategy is presented in the appendix (Supplementary file  1 ).

Inclusion criteria

We included only human studies published in peer-reviewed journals in English. Retrospective and prospective observational studies with control groups, regardless of the design (cohort, case-control, etc.) were included. Studies focusing on benign and/or malignant neoplasms, leukemia, and lymphomas were included.

Exclusion criteria

We excluded studies focusing only on nevi or other benign tumors (including hemangiomas). All animal studies were also excluded. Studies without original data or reported in languages other than English were excluded as well.

Main outcome

Our main outcome was neoplasm and cancer risk estimates stratified by anatomic site and the cell type of the neoplasm. We aimed to collect the mortality due to cancers.

Two authors screened the abstracts and full texts using Covidence software. 20 A third party was consulted in cases of disagreement if mutual consensus was not achieved. Data extraction was performed by one author and validated by another. The following information was extracted to a pre-designed spreadsheet: authors, year of publication, country where the study was conducted, study period, study design, original inclusion criteria, exposure and control, total number of people included in the study, number of exposed and unexposed or number of cases and controls (depending on the study design), follow-up duration, and overall person-years of follow-up. The effect estimates from both adjusted and unadjusted analyses (hazard ratios [HRs], incidence rates, odds ratio [ORs], and risk ratios [RRs]) with uncertainty estimates (95% confidence intervals [CIs]) were abstracted as well.

Risk-of-bias assessment

Risk of bias was assessed for all the included studies. We used the Risk of Bias in Non-randomized Studies of Exposures (ROBINS-E) tool to assess risk of bias. 21 If the study did not attempt to adjust for confounding, it was immediately labeled as high risk for bias, and other domains were not assessed. The scale used in the judgment was low , some concerns , and high . We also utilized a secondary risk-of-bias assessment strategy. We analyzed the cohort studies’ risk of bias according to the Joanna Briggs Institute critical appraisal tool for cohort studies and the case-control studies’ according to the Joanna Briggs Institute critical appraisal tool for case-control studies. 22 These were labeled as with concerns or no concerns . We decided not to exclude any reports from the synthesis due to risk of bias but performed sensitivity analyses where these were excluded.

Statistical methods

RevMan version 5.4 and R statistical software version 4.2.2 (metafor package) were used for the meta-analysis. Data analysis was performed according to Cochrane Handbook for Systematic Reviews guidelines. Forest plots are presented for all outcomes.

We decided not to pool case-control studies with cohort studies, as these have different inclusion strategies and are thus problematic to combine. Overall, we expected heterogeneity in the populations between the studies, and therefore we decided to use the random-effects Mantel-Haenszel model. 23 Pooled ORs with 95% CIs were calculated with the Mantel-Haenszel method for cohort and case-control studies. The inconsistency index statistic I² for statistical heterogeneity was calculated, but it was not used to decide whether the fixed-effect or random-effect model was used. Some of the studies contained outcomes that could not be pooled for quantitative analysis, and these outcomes have been reported according to the Synthesis Without Meta-analysis (SWiM) guideline. 24 For example, the adjusted effect estimates in the included studies had high heterogeneity (confounder selection, statistical method, chosen effect estimate measure [OR, RR, HR]) in the reporting, and thus we decided not to force this to a single estimate and presented these in a table. We assessed publication bias by Egger’s test and the trim and fill method and provide the funnel plots. 25

We report our meta-analysis according to the Meta-analysis of Observational Studies in Epidemiology (MOOSE) and Preferred Reporting Items in Systematic Reviews and Meta-analyses (PRISMA) guidelines and provide the checklists in the appendix. 26 , 27

Protocol registration

We registered our protocol in Prospero (ID CRD42022342273), and it can be assessed online: https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42022342273 .

We initially screened 2,325 abstracts and assessed 31 full reports. After exclusions (19 studies) and inclusions from hand searches (4 studies), a total 16 studies were included for systematic review and meta-analysis Figure  1 . 14 , 15 , 16 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 Six were retrospective cohort studies and 10 case-control studies (Table  1 ). Eight of the studies were from Europe, five from North America, and three from the Middle East. The study periods ranged from the 1960s to the 2010s. The main outcome used was the odds or risk of any cancer. The number of participants varied between 150 and 0.9 million (Table  2 ). Six studies did not adjust their analysis, and, furthermore, only five studies described a rationale for the selection of the covariates for adjustments (Table  2 ).

Risk of bias and publication bias

Risk of bias was assessed by ROBINS-E; nine studies were judged to have a low risk of bias, and seven studies had a high risk of bias due to lack of adjustment for potential confounders (Table  3 ). Concerns were found in nine studies with the Joanna Briggs Institute critical appraisal tool. Most issues were in confounder identification and strategies to address incomplete follow-up in cohort studies. In case-control studies, most issues were in measuring the exposure and appropriate statistical analysis (Table  3 ). We did not detect publication bias visually in funnel plots, and Egger’s test confirmed this. The trim and fill method was utilized and showed no obvious asymmetry (Fig. S 1 ).

figure 1

PRISMA flowchart of the study selection process.

Cancer and tumor incidence in cohort studies

Six cohort studies with a combined follow-up of 16 million person-years were analyzed and pooled for all cancer incidence estimates. In analysis by cancer type, the risk of hematopoietic cancers (OR: 1.44; CI: 1.16–1.80) and solid tumors (OR: 1.18; CI: 1.00–1.40) was increased. Rates of solid tumors and skin cancers did not show evidence of difference in crude analysis (Fig.  2 ). In sensitivity analyses, in which studies with high risk of bias were excluded, the OR changed only for skin cancers, and risk remained highly imprecise (OR: 1.78; CI: 0.70–7.97) (Fig. S 2 ).

figure 2

Forest plot of the cancer incidence between phototherapy exposed cohort and unexposed cohort stratified by the cancer type.

In adjusted analyses of the cohort studies, statistically significant associations were detected in two studies regarding all cancer incidences (Table  4 ). In stratified analysis, one study found an increased overall adjusted hazard of hematopoietic cancers and one an increased adjusted OR (aOR) for acute myeloid leukemia. One study further presented an increased aOR for kidney cancer but not for any other type of solid cancer.

Cancer and tumors in case-control studies

Ten case-control studies were included for a pooled analysis with 10,799 cancer cases, of whom 734 (7.0%) had been exposed to phototherapy. The control group consisted of 219,364 children, of whom 11,262 (5.1%) were exposed to phototherapy. In the analysis by tumor type, solid tumors were the only group with increased risk associated with phototherapy (OR: 1.18; CI: 1.04–1.34) (Fig.  3 ). This estimate remained unchanged in sensitivity analysis (Fig. S 3 ). The OR for hematopoietic cancers was 1.63 (CI: 0.99–2.67). In the sensitivity analysis, the OR for hematopoietic cancers was 1.70 (CI: 1.14–2.55) (Fig. S 3 ), indicating increased odds, when only studies with a low risk of bias were included.

figure 3

Forest plot of the crude overall cancer/ tumor rates of exposed and unexposed between case and control groups in case-control studies stratified by the tumor/cancer type.

Four case-control studies presented adjusted analyses. In the adjusted analyses, the aOR was statistically significant in one study and for only one outcome. The acute lymphatic leukemia aOR was 1.69 (CI: 1.37–2.08). Other adjusted estimates had CIs overlapping 1 (Table  4 ).

Main findings

Based on this systematic review and meta-analysis, children with a history of neonatal phototherapy have a 1.2- to 1.6-fold increased risk of hematopoietic cancers and solid tumors. However, several factors need to be considered in interpretation, including issues with the quality of reporting in the original studies, potential causal pathways, and confounding factors.

Some studies have speculated that the increased cancer risk could be at least partly attributable to hyperbilirubinemia instead of phototherapy, i.e., confounding by indication. This could be related to oxidative stress caused by bilirubin at the cellular level, which could promote carcinogenesis. 41 This is consistent with findings showing that cancer incidence among children with hyperbilirubinemia who did not receive phototherapy was between that of children without hyperbilirubinemia and that of those treated with phototherapy. 15 , 40

We originally intended to analyze cancer risk by duration and intensity of phototherapy, as it could be hypothesized that longer treatment duration could lead to higher risk. However, it turned out that most studies did not report the phototherapy duration.

Prematurity has been associated with both phototherapy and cancer risk. One of the included studies analyzed term and preterm infants separately and found that incidence did not differ between the treated and non-treated individuals who were born prematurely, whereas among full-term infants phototherapy was associated with a slightly increased risk of hematopoietic cancers. 33

Comparison to previous meta-analyses

During our initial search process, we identified a previous meta-analysis, and later another one was identified. 19 , 42 Their results were generally similar to ours, but there were some key differences and issues in the previous meta-analyses. Both previous meta-analyses pooled case-control and cohort studies and reported their combined results. Although this is technically possible, it increases variability in study populations and adds to heterogeneity. Furthermore, the meta-analysis by Hemati et al. also included benign nevi count as an outcome and did not present any sensitivity analysis to assess the impact of risk of bias or reasons for high heterogeneity. Furthermore, we were able to include one additional study to the meta-analysis by Abdellatif et al.

We performed our systematic review according to a pre-registered protocol without major deviations. In contrast to previous studies, we did not pool results from case-control and cohort studies, which reduces the heterogeneity in our reporting. The results from case-control studies exhibited high variability, including both increased and decreased odds. Furthermore, the measured inconsistency was high. The effect estimates from cohort studies had lower heterogeneity, which was also seen as higher statistical consistency. It must be noted that, based on the wide CIs, nearly all the included studies seemed to be underpowered to detect meaningful risk increases.

Limitations

Most of the limitations of this work come from the limitations of the included studies. Several studies had a high risk of bias due to lack of adjustment for possible confounders. The studies that did adjust for confounders rarely presented the rationale for the covariate selection. None of the studies discussed causal pathways or visualized them, e.g., as directed acyclic graphs. To overcome this issue, we have visualized the potential causal pathways in Figure S 4 to better illustrate the possible causality and alternative backdoor paths causing bias to estimates.

We were unable to perform two analyses planned in the protocol: mortality and exposure-outcome gradient (dose dependency). As the studies did not report mortality, we were unable to assess it. Furthermore, we aimed to examine the exposure gradient (higher risk with higher exposure level) in the potential association, as it could have strengthened the plausibility of a potential effect. Dose dependency would have been addressed by examining the duration and intensity of the phototherapy, but only two studies presented information on duration and none on the intensity (number of lamps). Furthermore, we were unable to find information on the phototherapy practices in the included countries during the study periods, as there may have been variations in the bilirubin levels for phototherapy initiation and ending. Thus, this causes additional heterogeneity in our estimates.

Implications for clinical practice and future research

Future studies are still needed. Although our systematic review identified 16 studies, the overall quality had clear limitations. Furthermore, due to the rare outcome, estimates in our meta-analysis have notable imprecision, and further large-scale studies are needed. Future studies should focus more on potential causal pathways in selecting the covariates for their analyses. We have illustrated the potential causal pathways and modifiers, which could partly explain the observed differences (Fig. S 4 ). Some maternal and neonatal conditions, such as prematurity, congenital anomalies, hereditary syndromes, and intrauterine growth restrictions, may increase the rates of phototherapy and cancers. Inability to control for these creates a potential source of bias due to confounding by indication and shared risk factors. Mortality in cancer patients with and without prior phototherapy would be an interesting topic to address in the future.

While our results suggest that neonatal phototherapy may increase the risk of hematopoietic cancers and solid tumors, they do not justify changes in the use of phototherapy. As high bilirubin levels are neurotoxic, it is important to treat hyperbilirubinemia appropriately. However, guidelines should be followed and unnecessary therapy avoided, as it may have harmful effects. 43 Currently, we cannot conclude whether the phototherapy, high bilirubin, or shared risk factors for prematurity and childhood cancer underlie the observed association with cancer risk.

Neonates receiving phototherapy have a 1.2- to 1.6-fold increased risk of hematopoietic cancers and solid tumors. Quality concerns in the reporting of the original studies limited the evidence. More high-quality studies are needed to further elucidate the observed association between phototherapy and neoplasia and improve understanding of the potential causal pathways.

Data availability

All the data generated during the review process are available from the corresponding author upon request.

van der Geest, B. A. M. et al. Assessment, management, and incidence of neonatal jaundice in healthy neonates cared for in primary care: a prospective cohort study. Sci. Rep. 12 , 14385 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Kuzniewicz, M. W., Escobar, G. J. & Newman, T. B. Impact of universal bilirubin screening on severe hyperbilirubinemia and phototherapy use. Pediatrics 124 , 1031–1039 (2009).

Article   PubMed   Google Scholar  

Mitra, S. & Rennie, J. Neonatal jaundice: aetiology, diagnosis and treatment. Br. J. Hosp. Med Lond. Engl. 2005 78 , 699–704 (2017).

Google Scholar  

Kuitunen, I., Huttunen, T. T., Ponkilainen, V. T. & Kekki, M. Incidence of obese parturients and the outcomes of their pregnancies: A nationwide register study in Finland. Eur. J. Obstet. Gynecol. Reprod. Biol. 274 , 62–67 (2022).

Article   CAS   PubMed   Google Scholar  

Egami, N., Muta, R., Korematsu, T. & Koga, H. Mild neonatal complications following guideline-compliant vacuum-assisted delivery in Japan: improvements still needed. J. Matern-Fetal Neonatal Med J. Eur. Assoc. Perinat. Med Fed. Asia Ocean Perinat. Soc. Int Soc. Perinat. Obstet. 35 , 3400–3406 (2022).

Faulhaber, F. R. S., Procianoy, R. S. & Silveira, R. C. Side Effects of Phototherapy on Neonates. Am. J. Perinatol. 36 , 252–257 (2019).

Maisels, M. J. & McDonagh, A. F. Phototherapy for neonatal jaundice. N. Engl. J. Med. 358 , 920–928 (2008).

Kuniyoshi, Y., Tsujimoto, Y., Banno, M., Taito, S. & Ariie, T. Neonatal jaundice, phototherapy and childhood allergic diseases: An updated systematic review and meta-analysis. Pediatr. Allergy Immunol. Publ. Eur. Soc. Pediatr. Allergy Immunol. 32 , 690–701 (2021).

Newman, T. B., Wu, Y. W., Kuzniewicz, M. W., Grimes, B. A. & McCulloch, C. E. Childhood Seizures After Phototherapy. Pediatrics 142 , e20180648 (2018).

Maimburg, R. D., Olsen, J. & Sun, Y. Neonatal hyperbilirubinemia and the risk of febrile seizures and childhood epilepsy. Epilepsy Res 124 , 67–72 (2016).

Oláh, J., Tóth-Molnár, E., Kemény, L. & Csoma, Z. Long-term hazards of neonatal blue-light phototherapy. Br. J. Dermatol. 169 , 243–249 (2013).

Lai, Y. C. & Yew, Y. W. Neonatal Blue Light Phototherapy and Melanocytic Nevus Count in Children: A Systematic Review and Meta-Analysis of Observational Studies. Pediatr. Dermatol 33 , 62–68 (2016).

Wintermeier, K. et al. Neonatal blue light phototherapy increases café-au-lait macules in preschool children. Eur. J. Pediatr. 173 , 1519–1525 (2014).

Wickremasinghe A. C., Kuzniewicz M. W., Grimes B. A., Mcculloch C. E., Newman T. B. Neonatal phototherapy and infantile cancer. Pediatrics . 137 . https://doi.org/10.1542/peds.2015-1353 (2016).

Auger, N., Laverdière, C., Ayoub, A., Lo, E. & Luu, T. M. Neonatal phototherapy and future risk of childhood cancer. Int J. Cancer J. Int Cancer 145 , 2061–2069 (2019).

Article   CAS   Google Scholar  

Digitale J. C., Kim M. O., Kuzniewicz M. W., Newman T. B. Update on Phototherapy and Childhood Cancer in a Northern California Cohort. Pediatrics . 148 . https://doi.org/10.1542/peds.2021-051033 (2021).

Newman T. B., et al. Retrospective cohort study of phototherapy and childhood cancer in Northern California. Pediatrics . 137 . https://doi.org/10.1542/peds.2015-1354 (2016).

Kumar, P., Chawla, D. & Deorari, A. Light‐emitting diode phototherapy for unconjugated hyperbilirubinaemia in neonates. Cochrane Database Syst. Rev. 2011 , CD007969 (2011).

PubMed   PubMed Central   Google Scholar  

Hemati, Z., Keikha, M., Khoshhali, M. & Kelishadi, R. Phototherapy and risk of childhood cancer: A systematic review and meta-analysis. J. Neonatal Nurs. 28 , 219–228 (2022).

Article   Google Scholar  

Harrison, H., Griffin, S. J., Kuhn, I. & Usher-Smith, J. A. Software tools to support title and abstract screening for systematic reviews in healthcare: an evaluation. BMC Med Res Methodol. 20 , 7 (2020).

ROBINS-E Development Group. Risk of bias tools - ROBINS-E tool. Published June 1, 2022. Accessed September 19, https://www.riskofbias.info/welcome/robins-e-tool (2022).

Zeng, X. et al. The methodological quality assessment tools for preclinical and clinical studies, systematic review and meta-analysis, and clinical practice guideline: a systematic review. J. Evid.-Based Med. 8 , 2–10 (2015).

Greenland, S. & Robins, J. M. Estimation of a common effect parameter from sparse follow-up data. Biometrics 41 , 55–68 (1985).

Campbell M., et al. Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline. BMJ . Published online January 16;l6890. https://doi.org/10.1136/bmj.l6890 (2020).

Lin, L. & Chu, H. Quantifying publication bias in meta-analysis. Biometrics 74 , 785–794 (2018).

Page M. J., et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ . Published online March 29, n160. https://doi.org/10.1136/bmj.n160 (2021).

Stroup, D. F. et al. Meta-analysis of Observational Studies in EpidemiologyA Proposal for Reporting. JAMA 283 , 2008–2012 (2000).

Auger, N., Ayoub, A., Lo, E. & Luu, T. M. Increased risk of hemangioma after exposure to neonatal phototherapy in infants with predisposing risk factors. Acta Paediatr. 108 , 1447–1452 (2019).

Berg, P. & Lindelöf, B. Is phototherapy in neonates a risk factor for malignant melanoma development? Arch. Pediatr. Adolesc. Med 151 , 1185–1187 (1997).

Brewster, D. H. et al. Risk of skin cancer after neonatal phototherapy: Retrospective cohort study. Arch. Dis. Child 95 , 826–831 (2010).

Bugaiski-Shaked A., Shany E., Mesner O., Sergienko R., Wainstock T. Association Between Neonatal Phototherapy Exposure and Childhood Neoplasm. J Pediatr . Published online https://doi.org/10.1016/j.jpeds.2022.01.046 (2022).

Cnattingius, S. et al. Prenatal and Neonatal Risk Factors for Childhood Myeloid Leukemia. Cancer Epidemiol. Biomark. Prev. 4 , 441–445 (1995).

CAS   Google Scholar  

Heck, J. E. et al. Phototherapy and childhood cancer: Shared risk factors. Int. J. Cancer J. Int. Cancer 146 , 2059–2062 (2020).

Kadivar, M., Sangsari, R., Saeedi, M. & Tehrani, S. G. Association between neonatal phototherapy and cancer during childhood. Iran J. Neonatol. 11 , 104–108 (2020).

Sabzevari, F., Sinaei, R., Bahmanbijari, B., Dehghan Krooki, S. & Dehghani, A. Is neonatal phototherapy associated with a greater risk of childhood cancers? BMC Pediatr. 22 , 356 (2022).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Seppälä, L. K., Vettenranta, K., Leinonen, M. K., Tommiska, V. & Madanat-Harjuoja, L. Preterm birth, neonatal therapies and the risk of childhood cancer. Int J. Cancer J. Int Cancer 148 , 2139–2147 (2021).

Cnattingius, S. et al. Prenatal and neonatal risk factors for childhood lymphatic leukemia. J. Natl. Cancer Inst. 87 , 908–914 (1995).

Linet, M. S. et al. Maternal and perinatal risk factors for childhood brain tumors (Sweden). Cancer Causes Control CCC 7 , 437–448 (1996).

Roman, E., Ansell, P. & Bull, D. Leukaemia and non-Hodgkin’s lymphoma in children and young adults: are prenatal and neonatal factors important determinants of disease? Br. J. Cancer 76 , 406–415 (1997).

Podvin, D., Kuehn, C. M., Mueller, B. A. & Williams, M. Maternal and birth characteristics in relation to childhood leukaemia. Paediatr. Perinat. Epidemiol. 20 , 312–322 (2006).

Inoguchi, T., Nohara, Y., Nojiri, C. & Nakashima, N. Association of serum bilirubin levels with risk of cancer development and total death. Sci. Rep. 11 , 13224 (2021).

Abdellatif, M. et al. Association between neonatal phototherapy and future cancer: an updated systematic review and meta-analysis. Eur. J. Pediatr. 182 , 329–341 (2023).

Overview | Jaundice in newborn babies under 28 days | Guidance | NICE. Published May 19, 2010. Accessed March 6, https://www.nice.org.uk/guidance/cg98 (2023).

Olsen, J. H., Hertz, H., Kjaer, S. K., Bautz, A. Mellemkjaer, L. & Boice, J. D. Jr. Childhood leukemia following phototherapy for neonatal hyperbilirubinemia (Denmark). Cancer Causes Control . 7 , 411–4 (1996).

Download references

This study did not receive specific funding. Open access costs will be covered by University of Eastern Finland library. Open access funding provided by University of Eastern Finland (including Kuopio University Hospital).

Author information

Authors and affiliations.

University of Eastern Finland, Institute of Clinical Medicine and Department of Pediatrics, Kuopio, Finland

Ilari Kuitunen, Panu Kiviranta & Johanna Jääskeläinen

Kuopio University Hospital, Department of Pediatrics, Kuopio, Finland

Ilari Kuitunen & Panu Kiviranta

Tampere University, Faculty of Medicine and Health Technologies, Tampere, Finland

Atte Nikkilä

Kanta-Häme Central Hospital, Department of Pediatrics, Hämeenlinna, Finland

The Finnish Medical Society Duodecim, Helsinki, Finland

Panu Kiviranta

Tampere University, Faculty of Social Sciences, Department of Epidemiology, Tampere, Finland

Anssi Auvinen

You can also search for this author in PubMed   Google Scholar

Contributions

Dr. Ilari Kuitunen had the original idea and conceptualized the study design, participated in screening process and data extraction process, was in charge of the statistical analyses, and wrote the initial draft. Dr. Atte Nikkilä participated in the conceptualization and provided methodological assistant for analyses and conducted some parts of the analyses, participated in the screening and data extraction process, and also provided important revisions to the manuscript. Dr. Johanna Jääskeläinen and Dr. Panu Kiviranta both participated in the screening process and data extraction process and provided important revisions to the manuscript. Prof. Anssi Auvinen participated in the conceptualization and supervised the whole process, provided methodological knowledge and have revised important intellectual content to the manuscript. All authors approved the final manuscript as submitted and agreed to be accountable for all aspects of the work.

Corresponding author

Correspondence to Ilari Kuitunen .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Moose-checklist, prisma_2020_checklist, supplementary material, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Kuitunen, I., Nikkilä, A., Kiviranta, P. et al. Risk of childhood neoplasms related to neonatal phototherapy- a systematic review and meta-analysis. Pediatr Res (2024). https://doi.org/10.1038/s41390-024-03191-7

Download citation

Received : 11 August 2023

Revised : 18 March 2024

Accepted : 24 March 2024

Published : 13 April 2024

DOI : https://doi.org/10.1038/s41390-024-03191-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

critical appraisal tool case study

  • Open access
  • Published: 08 April 2024

Unveiling the hidden struggle of healthcare students as second victims through a systematic review

  • José Joaquín Mira 1 , 2 ,
  • Valerie Matarredona 1 ,
  • Susanna Tella 3 ,
  • Paulo Sousa 4 ,
  • Vanessa Ribeiro Neves 5 ,
  • Reinhard Strametz 6 &
  • Adriana López-Pineda 2  

BMC Medical Education volume  24 , Article number:  378 ( 2024 ) Cite this article

242 Accesses

8 Altmetric

Metrics details

When healthcare students witness, engage in, or are involved in an adverse event, it often leads to a second victim experience, impacting their mental well-being and influencing their future professional practice. This study aimed to describe the efforts, methods, and outcomes of interventions to help students in healthcare disciplines cope with the emotional experience of being involved in or witnessing a mistake causing harm to a patient during their clerkships or training.

This systematic review followed the PRISMA guidelines and includes the synthesis of eighteen studies, published in diverse languages from 2011 to 2023, identified from the databases MEDLINE, EMBASE, SCOPUS and APS PsycInfo. PICO method was used for constructing a research question and formulating eligibility criteria. The selection process was conducted through Rayyan. Titles and abstracts of were independently screened by two authors. The critical appraisal tools of the Joanna Briggs Institute was used to assess the risk of bias of the included studies.

A total of 1354 studies were retrieved, 18 met the eligibility criteria. Most studies were conducted in the USA. Various educational interventions along with learning how to prevent mistakes, and resilience training were described. In some cases, this experience contributed to the student personal growth. Psychological support in the aftermath of adverse events was scattered.

Ensuring healthcare students’ resilience should be a fundamental part of their training. Interventions to train them to address the second victim phenomenon during their clerkships are scarce, scattered, and do not yield conclusive results on identifying what is most effective and what is not.

Peer Review reports

Introduction

Students in healthcare disciplines often witness or personally face stressful clinical events during their practical training [ 1 , 2 ], such as unexpected patient deaths, discussions with patients' families or among healthcare team members, violence toward professionals, or inappropriate treatment toward themselves. When this occurs, the majority of students talk to other students about it (approximately 90%), and less frequently, they speak to healthcare team members or mentors (37%) [ 2 ]. This is because they usually believe they will not receive attention, will not be understood, or fear negative consequences in their evaluation [ 1 , 2 ].

A particular case of a stressful clinical event is being involved in an adverse event (AE) or making an honest mistake [ 2 ] due to circumstances beyond the student's control. Approximately three-quarters of nursing or medical students witness some AE during their professional development (clerkships and training in healthcare centers) [ 2 , 3 ] and studies show that 18%-30% of students report committing an error resulting in an AE [ 4 , 5 ]. Some of them may even experience humiliation or verbal abuse for that error [ 6 ]. The vast majority (85%) of these occurrences lead to a second victim experience [ 7 , 8 ]. Consistent with what we know about the second victim experience [ 9 , 10 , 11 ], it is common for students in these cases to experience hypervigilance, acute stress, and doubts about their own ability for this work [ 12 , 13 ]. These emotional disturbances are usually more intense among females than males [ 14 ] and people with high values in the personality trait of neuroticism [ 15 , 16 ].

They also observe the impact of clinical errors on other healthcare professionals, influencing their response [ 3 ]. All these situations affect their well-being and can shape their future professional practice style [ 17 , 18 ]. For example, they may develop defensive practices more frequently [ 5 , 17 ] or avoid informing patients in the future after an AE [ 4 ]. Educators should not overlook the emotional effects of AEs on students/trainees [ 19 ]. Indeed, patient safety topics, including the second victim, mental well-being, and resilience, are neglected in undergraduate medical and nursing curricula in Europe. Furthermore, over half (56%) according to the responses from the students they did not ‘speak up' during a critical situation when they felt they could or should have [ 20 ].

Recently, psychological interventions to promote resilience in students facing stressful situations have been reviewed [ 21 ]. These interventions are not widely implemented, and approximately only one-fourth of students report having sufficient resilience training during their educational period [ 2 ]. In the specific case of supporting students who experience the second victim phenomenon, we lack information about the approach, scope, and method of possible interventions. The objective of this systematic review was to describe the efforts, methods, and outcomes of interventions to help students in healthcare disciplines cope with the emotional experience (second victim) of being involved in or witnessing a mistake causing harm to a patient during their clerkships or training.

The review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [ 22 ]. The study protocol was registered at PROSPERO (International prospective register of systematic reviews) [ 23 ] under the registration number CRD42023442014.

Eligibility criteria

The research question and eligibility criteria were constructed using the PICO method as follows (see Supplemental material 1 ):

Population: Students of healthcare disciplines

Intervention: Any method or intervention addressing the second victim phenomenon

Comparator: If applicable, any other method or intervention

Outcomes: Any measure of impact

Eligible studies included those reporting any method or intervention to prevent and address the second victim experience among healthcare students involved in or witnessing a mistake causing adverse events during their clerkships or training. Additionally, studies reporting interventions addressing psychological stress or reinforcing competences to face highly stressful situations, enhancing resilience, or increasing understanding of honest errors in the clinical setting were also included. Regarding the study population, eligible studies included healthcare discipline students (e.g., medical, nursing, pharmacy students) enrolled in any year, level, or course, both in public and private schools or faculties worldwide. All quantitative studies (experimental, quasi-experimental, case–control, cohort, and cross-sectional studies) within the scope of educational activities, as well as all qualitative studies (e.g., focus groups, interviews) conducted to explore intervention outcomes, were included.

The exclusion criteria were interventions and data regarding residents or professionals as trainees, analysis aimed at preparing the curriculum content or evaluating academic performance (including regarding patient safety issues), and any type of review study, editorials, letters to the editor, comments, or other noncitable articles (such as editorials, book reviews, gr ey literature, opinion articles or abstracts). Conference abstracts were included if they contained substantial and original information not found elsewhere.

The search was conducted on August 5, 2023, in the following electronic databases: MEDLINE, EMBASE, SCOPUS and APS PsycInfo. The reference lists of relevant reviews and other selected articles were explored further to find any additional appropriate articles. Last, recommended websites (gray literature) found during the comprehensive reading of publications were included if they met the inclusion criteria.

Controlled vocabulary and free text were combined using Boolean operators and filters to develop the search strategy (Supplemental Material 1 ). The terminology used in this study was extracted from the literature while respecting the most common usage of the terms prior to initiation of this screening. No limitations were imposed regarding language or the publication date.

Study selection

The selection process was conducted through Rayyan [ 24 ]. After removal of duplicates, two researchers (JM and VM) independently screened the titles and abstracts of all retrieved publications to determine eligibility. Discrepancies were resolved by an arbiter (AL), who made the final decision after debate to obtain consensus. Afterwards, screening of the full texts of the preselected articles was carried out in the same manner.

Data extraction

After final inclusion, the following characteristics of each study were collected by two reviewers: publication details (first author, year of publication), country of the study location, aim/s, study design, setting, type of study participants, and sample size. Separately, the following information of the included studies was collected: the description of methods, support programs or study interventions to address the second victim phenomenon, the findings on their effectiveness (competences and attitudes changed) and participants’ views or experience, if applicable, and whether a ‘second victim’ term was used.

Quality appraisal

We used the critical appraisal tools of the Joanna Briggs Institute [ 25 ] to assess the risk of bias of the included studies, according to the study design. Those studies that did not meet at least 60% of the criteria [ 26 ] were considered to have a high risk of bias. The critical appraisal was performed by two independent reviewers, and the overall result was expressed as a percentage of items answered with “yes”. Additionally, the number of citations of each article was collected as a quality measure [ 27 ].

Data synthesis

A descriptive narrative synthesis of the studies (approaches and outcomes) was conducted comparing the type and content of the methods or interventions implemented. Before initiating our literature search, we drafted a thematic framework informed by our research objectives, anticipating potential themes. This framework guided our evidence synthesis, dynamically adapting as we analyzed the included studies. Our approach allowed systematic integration of findings into coherent themes, ensuring our narrative synthesis was both grounded in evidence and reflective of our initial thematic expectations, providing a nuanced understanding of the topic within the existing research context. All data collected from the data extraction were reported and summarized in tables. The main findings were categorized into broad themes: (1) Are students informed about the phenomenon of second victims or how to act in case of making a mistake or witnessing a mistake? (2) What do students learn about an honest mistake, intentional errors, and key elements of safety culture? (3) What kind of support do students value and receive to manage the second victim phenomenon? (4) Strategies for supporting students in coping with the second victim phenomenon after making or witnessing a mistake. We considered the effectiveness (measurement of the achieved change in knowledge, skills, or attitudes) and meaningfulness (individual experience, viewpoints, convictions, and understandings of the participants) of each intervention or support program.

A total of 1622 titles were identified after the initial search. After removing duplicates, 1354 studies were screened. After the title, abstract and full text review, we identified and extracted information from 18 studies. The selection process is shown in the PRISMA flow diagram (Fig.  1 ).

figure 1

PRISMA 2020 flow diagram for new systematic reviews which included searches of databases, registers and other sources

The articles included in this review are shown in Table  1 in alphabetical order of the first author, detailing the characteristics and overall result of the quality assessment (measured as the percentage of compliance with the JBI tool criteria) of each study. Most studies were conducted in the USA ( n  = 7) [ 19 , 21 , 28 , 29 , 30 , 31 , 32 ], followed by Korea ( n  = 2) [ 33 , 34 ] and Australia ( n  = 2) [ 35 , 36 ], and the rest were carried out in Denmark [ 37 ], China [ 38 ], Italy [ 39 ], the United Kingdom [ 40 ], Georgia [ 41 ], Brazil [ 42 ], and Canada [ 43 ] ( n  = 1 each). The included studies cover a publication period that ranges from 2011 to 2023, with four of them being published in 2020. All these investigations were conducted within the academic setting, with the exception of one study, which took place in the Western Sydney Local Health District. Regarding the study participants, eleven studies were exclusively focused on medical students, six specifically targeted nursing students, and one included both medical and nursing students. In terms of study design, quasi-experimental ( n  = 8), cross-sectional ( n  = 2) and qualitative designs ( n  = 6) were used, and two studies used a mixed-methods design.

Supplementary Tables  1 , 2 and 3 show the quality assessment of quasi-experimental, cross-sectional, and qualitative studies, respectively. Four of the included studies [ 19 , 28 , 41 , 44 ] did not meet at least 60% of items and were considered to have a high risk of bias. The five studies of highest quality [ 32 , 35 , 37 , 38 , 43 ] met 80% of the items. The study of Le et al. (2022) [ 30 ] did not have enough information to assess the risk of bias, as it was a conference abstract. The study cited the most is the Hanson et al. study, conducted in 2020 [ 35 ].

Table 2 shows educational interventions, support strategies and any method reported in the scientific literature to help healthcare students cope with the emotional experience (second victim) of being involved in or witnessing a mistake during their clerkships or training. Due to the heterogeneity of retrieved studies regarding the type of design, the intervention type and outcome measures, a statistical analysis of the dataset was not possible. Thus, the evidence was summarized in broad themes.

Are students informed about the phenomenon of second victims or how to act in case of making a mistake or witnessing a mistake?

Some authors focus on the identification and reporting of errors, assuming that this process helps to cope with the emotional experience after the safety incident. Their studies [ 19 , 33 , 34 , 41 , 44 ] reported information on trainings given to medical or nursing students based on how to disclose errors, without addressing the second victim phenomenon specifically. In 2011, Gillies et al. reported that a medical error apology intervention increased confidence in providing effective apologies and their comfort in disclosing errors to a faculty member or patient [ 41 ]. It included online content with interactive tasks, small-group tasks and discussion, a standardized patient interview, and anonymous feedback by peers on written apologies. In 2015, Roh et al. showed that understanding, attitudes, and sense of responsibility regarding patient safety improved after a three-day patient safety training. This study involved medical students who were instructed on error causes, error reporting, communication with patients and caregivers and other concepts of patient safety. They used interactive lectures with demonstrations, small group practices, role playing, and debriefing [ 34 ]. In 2019, Ryder et al. reported that an interactive Patient Safety Reporting Curriculum (PSRC) seems to improve attitudes toward medical errors and increase comfort with disclosing them [ 19 ]. This curriculum was developed to be integrated into the third-year internal medicine clerkship during an 8-week clinical experience. It aimed to enable students to identify medical errors and report them using a format similar to official reports. Students were instructed in the method of classifying AEs developed by Robert Wachter and James Reason's Swiss cheese model [ 12 , 45 ]. A 60-min session included demonstrating the system model of error through a focused case-based writing assignment and discussion. In 2019, Mohsin et al. showed that clinical error reporting increased after a 4-h workshop where in addition to other concepts, the importance of reporting errors was discussed [ 42 ]. Other authors [ 30 , 33 ] focused on students' ability to report these AEs with curricula and syllabi employing methods such as the use of standardized patients, facilitated reflection, feedback, and short didactics for summarization. These studies also reported that this type of education program seems to enhance students’ current knowledge [ 36 ] and abilities to disclose medical errors [ 30 , 33 ].

Only the educational intervention suggested by David et al. in 2020, based on the World Health Organization (WHO) Patient Safety Curriculum, addresses the consequences and effects of the second victim phenomenon [ 29 ]. A 3-h session that consisted of the presentation of an AE in the form of a video or narrative, a discussion of case studies in small groups, where students have the opportunity to share their personal experiences related to these situations, and a list of practical application measures such as conclusions, improved knowledge, application skills, and critical thinking of students.

What do students learn about honest and intentional errors and key elements of safety culture?

Most training for both medical and nursing students focuses on how to identify the occurrence of a medical error since students, when asked about it, show little confidence in their ability to recognize such errors because they are little exposed to clinical procedures during their learning, which makes it difficult for them to differentiate errors from normal practice. In addition to teaching them how to identify them, interventions have also focused on how to prevent these AEs before they happen, as well as how to talk about them once they occur [ 29 , 40 , 41 , 44 ]. None of the training mentioned in the studies included in this review incorporated education on honest or intentional errors. However, a patient safety curriculum for medical students designed by Roh et al. (2015) [ 34 ] and a medication safety science curriculum developed by Davis & Coviello (2020) [ 29 ] for nursing students were based on the WHO Patient Safety Curriculum [ 13 ], which includes key aspects such as patient safety awareness, effective communication, teamwork and collaboration, safety culture, and safe medication management.

What kind of support do students value and receive to manage the second victim phenomenon?

Students stated that the greatest support comes from their peers, followed by their mentors and, finally, their families and friends [ 32 , 37 , 38 , 39 , 42 , 43 ]. Most hospitals and some universities have support programs specifically tailored for such situations, offering psychological assistance [ 39 ]. However, as these are mostly voluntary aids, many students do not make use of them, and if they do, the support they receive is usually limited. Mousinho Tavares et al. (2022) found that the students did not know about the organizational support or protocols available to students who become second victims of patient safety incidents [ 42 ]. In 2020, in the USA, interactive sessions exploring the professional and personal effects of medical errors were designed to explain to medical students the support resources available to them [ 31 ].

Strategies for supporting students after making or witnessing a mistake

In 2019, Breslin et al. developed a 2.5-h seminar on resilience for fourth-year medical students (in the USA) consisting of an initial group discussion about the psychology of shame and the guilt responses that arise from medical error [ 28 ]. During this first group discussion, students had the opportunity to share their experiences related to these concepts encountered during their medical training. Following this, students formed small groups led by previously trained teachers to enhance their confidence in discussing shame and to further explore the topics covered in the group seminar. This training improved confidence in recognizing shame, distinguishing it from guilt, identifying shame reactions, and being willing to seek help from others. In 2020, Musunur et al. showed that an hour-long interactive group session for medical students in the USA increased awareness of available resources in coping with medical errors and self-reported confidence in detecting and coping with medical errors [ 31 ]. A 2022 Italian cross-sectional study on healthcare students and medical residents as second victims found no data on structured programs included in medical residency programs/specialization schools to support residents after the occurrence of an adverse event. The study also found that it might be interesting to design interventions for posttraumatic stress disorder (PTSD) for this type of student, as the symptoms of second victims are similar to those of this disorder. Similarly, this study proposes a series of interventions that could be useful, such as psychological therapy, self-help programs, and even drug therapies, as they have been proven effective in treating PTSD [ 39 ].

Few training interventions exist to support healthcare students cope with emotional experiences of being involved in or witnessing a mistake causing harm to a patient during their clerkships. These interventions are scattered and not widely available. Additionally, there's uncertainty about their effectiveness.

In 2008, Martinez and Lo [ 3 ] highlighted that during students' studies, there are numerous missed opportunities to instruct them on how to respond to and learn from errors. This study seems to confirm this statement. Despite some positive published experiences, the provision of this type of training is limited. Deans, school directors, academic and clinical mentors, along with faculty members, have the opportunity to recognize the needs of their students, helping to prepare them for psychologically challenging situations. Such events occur frequently and are managed by professionals who rely on their own capacity for resilience. These sources of stress are not unknown to us, as they are a regular part of daily practice in healthcare settings. However, they do not always receive the necessary attention, and it is often assumed that they are addressed without difficulty [ 3 ].

Currently, we are aware that students also undergo the second victim experience [ 8 , 37 , 46 ], and it has been emphasized that this experience may impact their future professional careers and personal lives [ 39 ]. There is a wide diversity in training programs and local regulations regarding the activities that students in practice can undertake. Although there is a growing interest, the number of studies has increased since 2019, there are still many topics to address, and the extent of the experiences suggests that these are isolated initiatives without further development informed in other faculties or schools.

Over one-third of the studies have employed quasi-experimental designs with pre-post measures, although most studies have relied on qualitative methodologies to explore students' responses to specific issues [ 19 , 28 , 29 , 31 , 33 , 34 , 41 , 44 ]. These investigations do allow us to assert that we understand the problem, have quantified it, and have ideas to address it, but we lack a consensus-based and tested framework to ensure the capacity to confront these situations in the students. Moreover, similar to what occurs in the study of training in resilience or to face the second victim phenomenon in the case of healthcare workers [ 2 , 21 , 28 , 35 , 47 , 48 ], all of the studies have been focused on medical and nursing students. Other profiles (such as pharmacy or psychology students) have not been included until now.

The first study on the impact of unintended incidents on students in healthcare disciplines dates back to 2011. Patey et al. (in 2007) identified deficiencies in patient safety training among medical students and designed an additional training module alongside their educational program [ 6 ]. Other experiences have also focused on providing patient safety education [ 6 , 29 , 33 , 35 , 39 ].

The majority of studies included in this review focused on training students in providing information and apologies to patients who have experienced an AE (due to a clinical error). These studies have been conducted on every continent except Africa, and while they have different objectives, they share a similar focus: enhancing the skills to disclosure and altering defensive or concealment attitudes. Many students had difficulty speaking up about medical errors [ 49 ]. This fact poses a threat to patient safety. The early formative period is the optimal time to address this issue, provide skills, and overcome the traditional and natural barriers to discussing things that go wrong.

Students preparing for highly stressful situations in their future careers face a contrast between the interest in their readiness and the observed figures of clinical errors during practices. A 2010 study [ 37 ] in Denmark reported that practically all students (93% of 229) witnessed medical errors, with 62% contributing to them. In Belgium (thirteen years after), up to 85% of students witness mistakes [ 17 ], while US and Italy studies (2019–2022) showed lower figures. Among 282 American students, only 36% experienced AEs, and Italian nursing students reported up to 37% [ 4 , 8 , 10 ]. Students are witnessing 3.8 incidents every 10 days [ 48 ], although there are students who do not report witnessing any errors during their clinical placements, indicating difficulties with speaking up. Preparing students for emotional responses and reactions from their environment when an adverse event occurs seems necessary in light of these data.

Although the information is limited (a total of 125 students were involved), the data provided by Haglund et al. (in 2009) suggest that being involved in highly stressful situations contributes to reinforcing resilience and represents an opportunity for their personal growth [ 48 ]. Training to confront these stressful situations, including clinical errors, helps reduce reactive responses, although it does not guarantee maintaining the previous level of emotional well-being among students [ 21 ]. In this sense, the model proposed by Seys et al. [ 50 ], which defines 5 stages, with the first two focused on preventing second victim symptoms and ensuring self-care capacity (at the individual and team levels), could also be applied to the case of students and, by extension, to first-year residents to enhance their capacity to cope with an experience as a second victim.

AEs are often attributed to professional errors, perpetuating a blame culture in healthcare [ 51 ]. Students may adopt defensive attitudes, risking patient safety. Up to 47% [ 4 ] feel unprepared for assigned tasks, and 80% expect more support than received [ 39 ]. Emotional responses to EAs include fear, shame, anxiety, stress, loneliness, and moral distress [ 1 , 5 , 14 , 17 , 20 , 21 ],. Students face loss of psychological well-being, self-confidence, skills, job satisfaction, and high hypervigilance [ 10 , 13 , 17 ]. While distress diminishes over time, mistakes' impact may persist, especially if harm occurs [ 5 ]. Near misses can positively contribute to education, raising awareness. of patient safety [ 52 ]. Simulating situations using virtual reality enhances coping abilities and indirectly improves patient safety [ 53 ].

In spite of these data, students are typically not informed about the phenomenon of second victims or how to respond in the event of making or witnessing a mistake, including during their period of training in faculties and schools [ 54 ]. They express a desire for support from their workplace and believe that preparation for these situations should commence during their university education [ 4 ]. Students attribute errors to individual causes rather than factors beyond their control (considering them as intentional rather than honest mistakes). There have been instances of successful experiences demonstrating how this information can be effectively communicated and students can be equipped to deal with these stressful situations. Notably, there are training programs aimed at enhancing disclosure skills among medical and nursing students [ 33 , 36 ]. However, the dissemination of such educational packages in faculties and schools is currently limited. This study was unable to locate research where the concepts of honest or intentional errors were shared with students.

Support interventions for second victims should provide a distinct perspective on addressing safety issues, incorporating the principles of a just culture, and offering emotional support to healthcare professionals and teams, ultimately benefiting patients. These interventions have primarily been developed and implemented within hospital settings [ 55 ]. However, comprehensive studies are lacking, and experiences within schools and faculties, as well as extending support to students during their clinical placements, appear to be quite limited. Conversely, there exists a body of literature discussing the encounters of residents from various disciplines when they assume the role of second victims [ 38 ]. These experiences should be considered when designing support programs in schools and faculties. In fact, a recent study has described how students seem to cope with mistakes by separating the personal from the professional and seeking support from their social network [ 37 ]. Models such as SLIPPS (Shared Learning from Practice to Improve Patient Safety) is a tool for collecting learning events associated with patient safety from students or other implementers. This could prove beneficial in acquainting students with the concept of the second victim phenomenon. Interventions in progress to support residents when they become second victims from their early training years could be extended to faculties and schools to reduce the emotional impact of witnessing or being involved in a severe clinical error [ 56 ]. However, it is essential not to forget that healthcare professionals work in multidisciplinary teams, and resilience training for high-stress situations should, to align with the reality of everyday healthcare settings, encompass the response of the entire team, not just individual team members. Moreover, to date, cited studies have focused only on stages 1 and 2 at the individual level. However, we should not rule out the possibility that the other stages may need to be activated at any time to address students' needs.

Recently, Krogh et al. [ 37 ] summarized the main expectations that students have for dealing with errors in clinical practice, including more knowledge about contributing factors, strategies to tackle them, attention to learning needs and wishes for the future healthcare system. They have identified as trigger of the second victim syndrome the severity of patient-injury and that the AE be completely unexpected.

Implications for trainers & Health Policy

Collaboration among faculty, mentors, health disciplines students, and healthcare institutions is vital for promoting a learning culture that avoids blame, punishment, and shame and fear which will benefit the quality that patients received. This approach makes speak-up more straightforward, allowing continuous improvement in patient safety by installing a learning from errors culture. Ensuring safe practices requires close cooperation between the university and healthcare institutions [ 57 ]. Several practical implications of this study are summarized in Supplementary Table  4 .

Psychological traumatizing events such as life-threating events, needle sticks, dramatic deaths, violent and threatening situations, torpid patient evolution, resuscitations, complaints, suicidal tendencies, and harm to patients are in the daily bases of healthcare workers. Errors occur all too frequently in the daily work of healthcare professionals. It is not just a matter of doctors or nurses, but it affects all healthcare workers. Ensuring their resilience in these situations should be a fundamental part of their training. This can be achieved through simulation exercises within the context of clinical practices, as it should be one of the key educational objectives. Specifically, clinical mistakes often have a strong emotional impact on professionals, and it seems that students (future professionals) are not receiving the necessary training to cope with the realities of clinical practice. Furthermore, during their training period, they may be affected by witnessing the consequences of AEs experienced by patients, which can significantly influence how professionals approach their work (e.g., defensive practices) [ 58 ] and their overall experience (e.g., detachment) [ 59 ]. There are proposals for toolkits that have proven to be useful [ 31 , 60 ], and the data clearly indicate that educators should not delay further including educational content for their students to deal with errors and other highly stressful situations in healthcare practice [ 52 ]. Adapting measures within the academic environment and at healthcare facilities that host students in training programs is a task that we should no longer postpone.

Future research directions

Individual differences in reactions to stress can modulate the future performance of current students and condition their resilience capacity [ 61 ]. This aspect should be studied in more detail alongside gender bias regarding mistakes made by man and woman [ 62 ]. The student perception of psychological safety to speak openly with their mentors [ 63 ], is also a crucial aspect in this training phase. Additionally, their conceptualization of human fallibility [ 63 , 64 ] needs to be analyzed to identify the most appropriate educational contents.

Both witnessing errors with serious consequences and being involved in them can affect their subsequent professional development. Analyzing the impact of these incidents to prevent inappropriate defensive practices or dropouts requires greater attention. Future studies could link these experiences to attitudes towards incident reporting and open disclosure with patients.

Limitations of the study

This review was limited to publications available in selected databases and might be subject to publication bias. The selection of studies could have been biased by the search strategy (controlled using a very broad strategy) or by the databases selected (controlled by choosing the four most relevant databases). Despite employing a comprehensive search strategy, relevant studies not indexed in the chosen databases may have been omitted. In the case of three articles, access to the full text was not available. There were no language limitations since there was no restructuring of the search. On the other hand, selection bias was controlled because the review was carried out by independent parties and with a third party for discrepancies. Regarding the results, the included studies exhibited considerable variability in design, interventions, and outcomes. This heterogeneity reflects the diverse educational settings and methodologies employed to address the second victim phenomenon but limits the generalizability of findings. In addition, most of the studies were conducted in high-income countries, which may not reflect the experiences or interventions applicable in low- and middle-income settings.

In conclusion, students also undergo the second victim experience, which may impact their future professional careers and personal lives. Interventions aimed at training healthcare discipline students to address the emotional experience of being involved in or witnessing mistakes causing harm to patients during their clerkships are currently scarce, scattered, and do not yield conclusive results on their effectiveness. Furthermore, most studies have focused on medical and nursing students, neglecting other healthcare disciplines such as pharmacy or psychology.

Despite some positive experiences, the provision of this type of training remains limited. There is a need for greater attention in the academic and clinical settings to identify students' needs and adequately prepare them for psychologically traumatizing events that occur frequently attending complex patients.

Efforts to support students in dealing with witnessing errors and highly stressful situations in clinical practice are essential to ensure their resilience and well-being of the future generation of healthcare professionals and ensure patient safety.

Availability of data and materials

The authors verify that the data supporting the conclusions of this study can be found in the article and its supplementary materials. However, data regarding the quality assessment process can be obtained from the corresponding author upon a reasonable request.

Guo L, et al. Impact of unacceptable behavior between healthcare workers on clinical performance and patient outcomes: a systematic review. BMJ Qual Saf. 2022;31(9):679–87. https://doi.org/10.1136/bmjqs-2021-013955 .

Article   Google Scholar  

Houpy JC, Lee WW, Woodruff JN, Pincavage AT. Medical student resilience and stressful clinical events during clinical training. Med Educ Online. 2017;22(1):1320187. https://doi.org/10.1080/10872981.2017.1320187 .

Martinez W, Lo B. Medical students’ experiences with medical errors: an analysis of medical student essays. Med Educ. 2008;42(7):733–41. https://doi.org/10.1111/j.1365-2923.2008.03109.x .

Kiesewetter J, et al. German undergraduate medical students’ attitudes and needs regarding medical errors and patient safety–a national survey in Germany. Med Teach. 2014;36:505–10. https://doi.org/10.3109/0142159x.2014.891008 .

Panella M, et al. The determinants of defensive medicine in Italian hospitals: The impact of being a second victim. Rev Calid Asist. 2016;31(Suppl. 2):20–5. https://doi.org/10.1016/j.cali.2016.04.010 .

Patey R, et al. Patient safety: helping medical students understand error in healthcare. Qual Saf Healthcare. 2007;16:256–9. https://doi.org/10.1136/qshc.2006.021014 .

Strametz R, et al. Prevalence of second victims, risk factors and support strategies among young German physicians in internal medicine (SeViD-I survey). J Occup Med Toxicol. 2021;16:11. https://doi.org/10.1186/s12995-021-00300-8 .

Van Slambrouck L, et al. Second victims among baccalaureate nursing students in the aftermath of a patient safety incident: An exploratory cross-sectional study. J Prof Nurs. 2021;37:765–70. https://doi.org/10.1016/j.profnurs.2021.04.010 .

Mira JJ, et al. A Spanish-language patient safety questionnaire to measure medical and nursing students’ attitudes and knowledge. Rev Panam Salud Publica. 2015;38:110–9.

Google Scholar  

Mira JJ, et al. Lessons learned for reducing the negative impact of adverse events on patients, health professionals and healthcare organizations. Int J Qual Healthcare. 2017;29:450–60. https://doi.org/10.1093/intqhc/mzx056 .

Vanhaecht K, et al. An Evidence and Consensus-Based Definition of Second Victim: A Strategic Topic in Healthcare Quality, Patient Safety, Person-Centeredness and Human Resource Management. Int J Environ Res Public Health. 2022;19(24):16869. https://doi.org/10.3390/ijerph192416869 .

Reason J. Human error: models and management. BMJ. 2000;320(7237):768–70. https://doi.org/10.1136/bmj.320.7237.768 . Disponible en.

World Health Organization. Patient safety curriculum guide: multiprofessional edition. News. 2011. Available at: https://www.who.int/publications/i/item/9789241501958 (accessed 16 Nov 2023).

Seys D, et al. Supporting involved healthcare professionals (second victims) following an adverse health event: a literature review. Int J Nurs Stud. 2013;50:678–87. https://doi.org/10.1016/j.ijnurstu.2012.07.006 .

Marung H, et al. Second Victims among German Emergency Medical Services Physicians (SeViD-III-Study). Int J Environ Res Public Health. 2023;20:4267. https://doi.org/10.3390/ijerph20054267 .

Potura E, et al. Second Victims among Austrian Pediatricians (SeViD-A1 Study). Healthcare. 2023;11(18):2501. https://doi.org/10.3390/healthcare11182501 .

Liukka M. Action after Adverse Events in Healthcare: An Integrative Literature Review. Int J Environ Res Public Health. 2020;17:4717. https://doi.org/10.3390/ijerph17134717 .

Ratanawongsa NT, Hauer AHE. Third Year Medical Students’ Experiences with Dying Patients during the Internal Medicine Clerkship: A Qualitative study of the Informal Curriculum. Acad Med. 2005;80(7):641–7. https://doi.org/10.1097/00001888-200507000-00006 .

Ryder HF, et al. What Do I Do When Something Goes Wrong? Teaching Medical Students to Identify, Understand, and Engage in Reporting Medical Errors. Acad Med. 2019;94:1910–5. https://doi.org/10.1097/acm.0000000000002872 .

Schwappach D, et al. Speaking up culture of medical students within an academic teaching hospital: Need of faculty working in patient safety. PLoS ONE. 2019;14(9): e0222461. https://doi.org/10.1371/journal.pone.0222461 .

Kunzler AM, et al. Psychological interventions to foster resilience in healthcare students. Cochrane Database Syst Rev. 2020;7(7):CD013684. https://doi.org/10.1002/14651858.cd012527.pub2 .

Page MJ, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:71. https://doi.org/10.1136/bmj.n71 .

Mira JJ, et al. Health students as second victims: A systematic review of support interventions. PROSPERO News. 2023. Available from: https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42023442014 (accessed 16 Nov 2023).

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016;5:210. https://doi.org/10.1186/s13643-016-0384-4 .

Chan ST, Khong PCB, Wang W. Psychological responses, coping and supporting needs of healthcare professionals as second victims. Int Nurs Rev. 2016;64(2):242–62. https://doi.org/10.1111/inr.12317 .

Caon M, Trapp J, Baldock C. Citations are a good way to determine the quality of research. Phys Eng Sci Med. 2020;43:1145–8. https://doi.org/10.1007/s13246-020-00941-9 .

JBI. Critical appraisal tools. JBI News. 2019. Available at: https://jbi.global/critical-appraisal-tools (accessed 16 Nov 2023).

Breslin A, Moorthy G, Bynum W. Confronting sentinel emotional events: An innovative seminar to build shame resilience after medical error. Ann Emerg Med. 2019;74(4):S31. https://doi.org/10.1016/j.annemergmed.2019.08.084 .

Davis M, Coviello J. Development of a medication safety science curriculum for nursing students. Nurse Educ. 2020;45(5):273–6. https://doi.org/10.1097/nne.0000000000000783 .

Le H, Bord S, Jung J. Development of an educational experience for medical students on coping with medical errors in residency and beyond. West J Emerg Med. 2022;23(4.1). Available at: https://escholarship.org/uc/item/6284s9hp .

Musunur S, et al. When Bad Things Happen: Training Medical Students to Anticipate the Aftermath of Medical Errors. Acad Psychiatry. 2020;44(5):586–95. https://doi.org/10.1007/s40596-020-01278-x .

Noland CM, Carmack HJ. Narrativizing Nursing Students’ Experiences With Medical Errors During Clinicals. Qual Health Res. 2015;25(10):1423–34. https://doi.org/10.1177/1049732314562892 .

Kim CW, Myung SJ, Eo EK, Chang Y. Improving disclosure of medical error through educational program as a first step toward patient safety. BMC Med Educ. 2017;17(1):52. https://doi.org/10.1186/s12909-017-0880-9 .

Roh H, Park SJ, Kim T. Patient safety education to change medical students’ attitudes and sense of responsibility. Med Teach. 2015;37(10):908–14. https://doi.org/10.3109/0142159x.2014.970988 .

Hanson J, et al. “Speaking up for safety”: A graded assertiveness intervention for first-year nursing students in preparation for clinical placement: Thematic analysis. Nurse Educ Today. 2020;84:104252. https://doi.org/10.1016/j.nedt.2019.104252 .

Lane AS, Roberts C. Developing open disclosure strategies to medical error using simulation in final-year medical students: linking mindset and experiential learning to lifelong reflective practice. BMJ Simul Technol Enhanc Learn. 2021;7(5):345–51. https://doi.org/10.1136/bmjstel-2020-000659 .

Krogh TB, et al. Medical students’ experiences, perceptions, and management of second victim: An interview study. BMC Med Educ. 2023;23(1):786. https://doi.org/10.21203/rs.3.rs-2753074/v1 .

Huang H, et al. Experiences and responses of nursing students as second victims of patient safety incidents in a clinical setting: A mixed-methods study. J Nurs Manag. 2020;28(6):1317–25. https://doi.org/10.1111/jonm.13085 .

Rinaldi C, et al. Healthcare Students and Medical Residents as Second Victims: A Cross-Sectional Study. Int J Environ Res Public Health. 2022;19(19):12218. https://doi.org/10.3390/ijerph191912218 .

Thomas I. Student views of stressful simulated ward rounds. Clin Teach. 2015;12(5):364–72. https://doi.org/10.1111/tct.12329 .

Guillies RA, Speers SH, Young SE, Fly CA. Teaching medical error apologies: development of a multi-component intervention. Fam Med. 2011;43(6):400–6.

Mousinho TAP, et al. Support provided to nursing students in the face of patient safety incidents: a qualitative study. Rev Bras Enferm. 2022;75(2):e20220009. https://doi.org/10.1590/0034-7167-2022-0009 .

Zieber MP, Williams B. The Experience of Nursing Students Who Make Mistakes in Clinical. J Nurs Educ Scholarsh. 2015;12(1):65–73. https://doi.org/10.1515/ijnes-2014-0070 .

Mohsin SU, Ibrahim Y, Levine D. Teaching medical students to recognise and report errors. BMJ Open Qual. 2019;8(2):e000558. https://doi.org/10.1136/bmjoq-2018-000558 .

Wachter Robert M. Understanding Patient Safety. Journal For Healthcare Quality. 2009;31(2):57–8. https://doi.org/10.1111/j.1945-1474.2009.00020_1.x .

Sahay A, McKenna L. Nurses and nursing students as second victims: A scoping review. Nurs Outlook. 2023;71(4):101992. https://doi.org/10.1016/j.outlook.2023.101992 .

Bynum WE 4th, Uijtdehaage S, Artino AR Jr, Fox JW. The psychology of shame: A resilience seminar for medical students. MedEdPORTAL. 2020;16:11052. https://doi.org/10.15766/mep_2374-8265.11052 .

Haglund ME, et al. Charney DS Resilience in the third year of medical school: a prospective study of the associations between stressful events occurring during clinical rotations and student well-being. Acad Med. 2009;84(2):258–68. https://doi.org/10.1097/acm.0b013e31819381b1 .

Lee HY, Hahm MI, Lee SG. Undergraduate medical students’ perceptions and intentions regarding patient safety during clinical clerkship. BMC Med Educ. 2018;18(1):66. https://doi.org/10.1186/s12909-018-1180-8 .

Seys D, et al. In search of an international multidimensional action plan for second victim support: a narrative review. BMC Health Serv Res. 2023;23(1):816. https://doi.org/10.1186/s12913-023-09637-8 .

Stevanin S, et al. Adverse events witnessed by nursing students during clinical learning experiences: Findings from a longitudinal study. Nurs Health Sci. 2018;20:438–44. https://doi.org/10.1111/nhs.12430 .

Kiesewetter I, Konings KD, Kager M, Kiesewetter J. Undergraduate medical students’ behavioral intentions toward medical errors and how to handle them: a qualitative vignette study. BMJ Open. 2018;8(3):e019500. https://doi.org/10.1136/bmjopen-2017-019500 .

Peddle M. Participant perceptions of virtual simulation to develop non-technical skills in health professionals. J Res Nurs. 2019;24(3–4):167–80. https://doi.org/10.1177/1744987119835873 .

Sánchez-García A, et al. Patient safety topics, especially the second victim phenomenon, are neglected in undergraduate medical and nursing curricula in Europe: an online observational study. BMC Nurs. 2023;22(1):283. https://doi.org/10.1186/s12912-023-01448-w .

Busch IM, Moretti F, Purgato M, Barbui C, Wu AW, Rimondini M. Promoting the Psychological Well-Being of Healthcare Providers Facing the Burden of Adverse Events: A Systematic Review of Second Victim Support Resources. Int J Environ Res Public Health. 2021;18(10):5080. https://doi.org/10.3390/ijerph18105080 .

Steven A, et al. Development of an International Tool for Students to Record and Reflect on Patient Safety Learning Experiences. Nurse Educ. 2022;47(3):E62–7. https://doi.org/10.1097/nne.0000000000001142 .

Gradišnik M, Fekonja Z, Vrbnjak D. Nursing students’ handling patient safety incidents during clinical practice: A retrospective qualitative study. Nurse Educ Today. 2024;132: 105993. https://doi.org/10.1016/j.nedt.2023.105993 .

Panella M, Rinaldi C, Leigheb F, Knesse S, Donnarumma C, Kul S, et al. Prevalence and costs of defensive medicine: a national survey of Italian physicians. J Health Serv Res Policy. 2017;22(4):211–7. https://doi.org/10.1177/1355819617707224 . Disponible en.

Mira JJ, Carrillo I, Lorenzo S, Ferrús L, Silvestre C, Pérez-Pérez P, et al. The aftermath of adverse events in Spanish primary care and hospital health professionals. BMC Health Serv Res. 2015;15(1):151. https://doi.org/10.1186/s12913-015-0790-7 . Disponible en.

Chung AS, Smart J, Zdradzinski M, et al. Educator Toolkits on Second Victim Syndrome, Mindfulness and Meditation, and Positive Psychology: The 2017 Resident Wellness Consensus Summit. West J Emerg Med. 2018;19(2):327–31. https://doi.org/10.5811/cpcem.2017.11.36179 .

Asensi-Vicente J, Jiménez-Ruiz I, Vizcaya-Moreno MF. Medication Errors Involving Nursing Students. Nurse Educ. 2018;43:E1–5. https://doi.org/10.1097/nne.0000000000000481 .

Mankaka CO, Waeber G, Gachoud D. Female residents experiencing medical errors in general internal medicine: a qualitative study. BMC Med Educ. 2014;14:140. https://doi.org/10.1186/1472-6920-14-140 .

Appelbaum NP, Dow A, Mazmanian PE, Jundt DK, Appelbaum EN. The effects of power, leadership and psychological safety on resident event reporting. Med Educ. 2016;50:343–50. https://doi.org/10.1111/medu.12947 .

Christian H, Johannes B, Helmut H. Research on Human Fallibility and Learning from Errors at Work: Challenges for Theory, Research, and Practice. Human Fallibility. 2012;6:255–65. https://doi.org/10.1007/978-90-481-3941-5_15 .

Download references

Acknowledgements

Not applicable.

This publication is based upon work from COST ACTION 19113, supported by COST (European Cooperation in Science and Technology) www.cost.eu .

Author information

Authors and affiliations.

Atenea Research. FISABIO, Alicante, Spain

José Joaquín Mira & Valerie Matarredona

Universidad Miguel Hernández, Elche, Spain

José Joaquín Mira & Adriana López-Pineda

Faculty of Health and Social Care, LAB University of Applied Sciences, Lappeenranta, Finland

Susanna Tella

NOVA National School of Public Health, Public Health Research Centre, Comprehensive Health Research Center, CHRC, NOVA University Lisbon, Lisbon, Portugal

Paulo Sousa

Escola Paulista de Enfermagem, Universidade Federal de São Paulo, São Paulo, Brasil

Vanessa Ribeiro Neves

Wiesbaden Institute for Healthcare Economics and Patient Safety (WiHelP), RheinMain UAS, Wiesbaden, Germany

Reinhard Strametz

You can also search for this author in PubMed   Google Scholar

Contributions

Study design: JJM, ALP, JJM, ST, VR, RS, PS Data collection: JJM, ALP, VM Data analysis: JJM, ALP, VM Study supervision: JJM Manuscript writing: JJM, ALP, VM Critical revisions for important intellectual content: ST, VRN, RS, PS.

Corresponding author

Correspondence to José Joaquín Mira .

Ethics declarations

Ethics approval and consent to participate.

This systematic review, because it does not use individual data, is exempt from ethics committee approval.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Mira, J.J., Matarredona, V., Tella, S. et al. Unveiling the hidden struggle of healthcare students as second victims through a systematic review. BMC Med Educ 24 , 378 (2024). https://doi.org/10.1186/s12909-024-05336-y

Download citation

Received : 25 January 2024

Accepted : 21 March 2024

Published : 08 April 2024

DOI : https://doi.org/10.1186/s12909-024-05336-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Adverse events
  • Patient safety
  • Second victims

BMC Medical Education

ISSN: 1472-6920

critical appraisal tool case study

IMAGES

  1. Summary table of the most well known Critical Appraisal Tools (CAT

    critical appraisal tool case study

  2. Critical Appraisal Guidelines for Single Case Study Research

    critical appraisal tool case study

  3. Table 1 from Methodological quality of case series studies: an

    critical appraisal tool case study

  4. Critical Appraisal Guidelines for Single Case Study Research

    critical appraisal tool case study

  5. The Joanna Briggs Institute (JBI) critical appraisal checklist for

    critical appraisal tool case study

  6. Critical Appraisal Of A Cross Sectional Study Survey

    critical appraisal tool case study

VIDEO

  1. HS2405 AssessmentTask1 Group4 Maru

  2. Appraising Case Reports/Series

  3. Critical Appraisal of Research NOV 23

  4. Critical appraisal of Research Papers and Protocols Testing Presence of Confounders GKSingh

  5. Critical Appraisal of a Clinical Trial- Lecture by Dr. Bishal Gyawali

  6. Critical Appraisal (3 sessions) practical book EBM

COMMENTS

  1. JBI Critical Appraisal Tools

    JBI's Evidence Synthesis Critical Appraisal Tools Assist in Assessing the Trustworthiness, Relevance and Results of Published Papers ... Barker TH, Moola S, Tufanaru C, Stern C, McArthur A, Stephenson M, Aromataris E. Methodological quality of case series studies: an introduction to the JBI critical appraisal tool. JBI Evidence Synthesis ...

  2. CASP Checklists

    Critical Appraisal Checklists. We offer a number of free downloadable checklists to help you more easily and accurately perform critical appraisal across a number of different study types. The CASP checklists are easy to understand but in case you need any further guidance on how they are structured, take a look at our guide on how to use our ...

  3. Critical Appraisal Tools and Reporting Guidelines

    Some critical appraisal tools are generic, whereas others are study-design specific (Katrak et al., 2004). Since each tool has strengths and limitations, researchers and practitioners must be cautious when selecting critical appraisal tools. Reviewing the documentation and guidelines about how to use these tools is highly recommended.

  4. PDF for use in JBI Systematic Reviews Checklist for Case Series

    Introduction to the Case Series Critical Appraisal Tool How to cite: Moola S, Munn Z, Tufanaru C, Aromataris E, Sears K, Sfetcu R, Currie M, Qureshi R, Mattis ... .1-3 The gamut of case studies is wide, with some studies claiming to be a case series realistically being nothing more than a collection of case reports, with others more akin to

  5. Study Quality Assessment Tools

    The guidance document below is organized by question number from the tool for quality assessment of case-control studies. Question 1. Research question. ... Critical appraisal of a study involves considering the risk of potential for selection bias, information bias, measurement bias, or confounding (the mixture of exposures that one cannot ...

  6. Methodological quality of case series studies: an introduction to the

    Results: The JBI critical appraisal tool for case series studies includes 10 questions addressing the internal validity and risk of bias of case series designs, particularly confounding, selection, and information bias, in addition to the importance of clear reporting. Conclusion: In certain situations, case series designs may represent the ...

  7. Methodological quality of case series studies: an introducti ...

    An international working group was formed to review the methodological literature regarding case series as a form of evidence for inclusion in systematic reviews. The group then developed a critical appraisal tool based on the epidemiological literature relating to bias within these studies. This was then piloted, reviewed, and approved by JBI ...

  8. PDF for use in JBI Systematic Reviews Checklist for Case Reports

    JBI Critical Appraisal Tools All systematic reviews incorporate a process of critique or appraisal of the research evidence. The purpose of this appraisal is to assess the methodological quality of a study and to determine the extent to which a study has addressed the possibility of bias in its design, conduct and analysis. All papers

  9. Scientific writing: Critical Appraisal Toolkit (CAT) for assessing

    The descriptive study critical appraisal tool assesses different aspects of sampling, data collection, statistical analysis, and ethical conduct. It is used to appraise cross-sectional studies, outbreak investigations, case series and case reports. The literature review critical appraisal tool assesses the methodology, results and applicability ...

  10. Critical Appraisal Guidelines for Single Case Study Research

    Following the critical research appraisal guidelines [64], a single case study rather than multiple case studies was deemed appropriate to achieve the aim of the study-an in-depth investigation ...

  11. Critical Appraisal Tools and Reporting Guidelines

    Keywords. Critical appraisal tools and reporting guidelines are the two most important instruments available to researchers and practitioners involved in research, evidence-based practice, and policymaking. Each of these instruments has unique characteristics, and both instruments play an essen-tial role in evidence-based practice and decision ...

  12. Critical Appraisal tools

    This section contains useful tools and downloads for the critical appraisal of different types of medical evidence. Example appraisal sheets are provided together with several helpful examples. Critical appraisal worksheets to help you appraise the reliability, importance and applicability of clinical evidence.

  13. Tools for critically appraising different study designs, systematic

    Summary. A Critical Appraisal Tool (CAT) allows the methodological quality of a study/process to be assessed, which, in turn, influences the reliability of the evidence produced by such a study/process. CATs help to minimise subjectivity in the appraisal and maximise transparency.

  14. A guide to critical appraisal of evidence : Nursing2020 Critical Care

    Critical appraisal is the assessment of research studies' worth to clinical practice. Critical appraisal—the heart of evidence-based practice—involves four phases: rapid critical appraisal, evaluation, synthesis, and recommendation. This article reviews each phase and provides examples, tips, and caveats to help evidence appraisers ...

  15. Full article: Critical appraisal

    What is critical appraisal? Critical appraisal involves a careful and systematic assessment of a study's trustworthiness or rigour (Booth et al., Citation 2016).A well-conducted critical appraisal: (a) is an explicit systematic, rather than an implicit haphazard, process; (b) involves judging a study on its methodological, ethical, and theoretical quality, and (c) is enhanced by a reviewer ...

  16. (PDF) Critical Appraisal of a Case Report

    TheTable 1 shows the checklists needed to make a critical analysis of a casestudy article. (PDF) Critical Appraisal of a Case Report. ... cases is an important tool in ... case study, PSPs ...

  17. Optimising the value of the critical appraisal skills programme (CASP

    Our novel question is comparable to questions in the JBI critical appraisal tool. 48 Five of 10 questions in the JBI tool prompt the reviewer to consider the congruity between the research methodology and a particular aspect of the ... An evaluation of sensitivity analyses in two case study reviews. Qual Health Res 2012; 22: 1425-1434. Crossref.

  18. The development of a critical appraisal tool for use in systematic

    The group then searched for critical appraisal tools that have been used to assess studies reporting on prevalence data. A number of tools were identified including the Joanna Briggs Institute's Descriptive/Case series critical appraisal tool. A non-exhaustive list is shown in Table 1. Critical appraisal tools from the Cochrane Collaboration ...

  19. Appraising Studies

    Appraising Studies. You need to evaluate the quality of each study you include in your systematic review. This critical appraisal is often called "quality assessment" or "risk of bias assessment." Your evaluations are shown in a table format in your article. There are several tools to help you evaluate various types of studies and create a table.

  20. PDF Critical Appraisal tools for use in JBI Systematic Reviews

    critical appraisal tool How to cite: Munn Z, Barker T, Moola S, Tufanaru C, Stern C, McArthur A, Stephenson M, Aromataris E. Methodological quality of case series studies, JBI Evidence Synthesis, doi: 10.11124/JBISRIR-D-19-00099

  21. Risk of childhood neoplasms related to neonatal phototherapy- a

    We analyzed the cohort studies' risk of bias according to the Joanna Briggs Institute critical appraisal tool for cohort studies and the case-control studies' according to the Joanna Briggs ...

  22. PDF Checklist for Case Control Studies

    Explanation of case control studies critical appraisal How to cite: Moola S, Munn Z, Tufanaru C, Aromataris E, Sears K, Sfetcu R, Currie M, Qureshi R, Mattis P, ... Case Control Studies Critical Appraisal Tool Answers: Yes, No, Unclear or Not/Applicable 1. Were the groups comparable other than presence of disease in cases or absence of disease

  23. Unveiling the hidden struggle of healthcare students as second victims

    Quality appraisal. We used the critical appraisal tools of the Joanna Briggs Institute to assess the risk of bias of the included studies, according to the study design. Those studies that did not meet at least 60% of the criteria were considered to have a high risk of bias. The critical appraisal was performed by two independent reviewers, and ...

  24. PDF CHECKLIST FOR CASE CONTROL STUDIES

    EXPLANATION OF CASE CONTROL STUDIES CRITICAL APPRAISAL How to cite: Moola S, Munn Z, Tufanaru C, Aromataris E, Sears K, Sfetcu R, Currie M, Qureshi R, Mattis P, Lisy K, Mu P-F. Chapter 7: Systematic reviews of etiology and risk . In: Aromataris E, Munn Z (Editors). JBI Manual for Evidence Synthesis. JBI, 2020. Available from https ...