CASP Checklists

How to use our CASP Checklists

Referencing and Creative Commons

  • Online Training Courses
  • CASP Workshops
  • What is Critical Appraisal
  • Study Designs
  • Useful Links
  • Bibliography
  • View all Tools and Resources
  • Testimonials

Critical Appraisal Checklists

We offer a number of free downloadable checklists to help you more easily and accurately perform critical appraisal across a number of different study types.

The CASP checklists are easy to understand but in case you need any further guidance on how they are structured, take a look at our guide on how to use our CASP checklists .

CASP Checklist: Systematic Reviews with Meta-Analysis of Observational Studies

CASP Checklist: Systematic Reviews with Meta-Analysis of Randomised Controlled Trials (RCTs)

CASP Randomised Controlled Trial Checklist

  • Print & Fill

CASP Systematic Review Checklist

CASP Qualitative Studies Checklist

CASP Cohort Study Checklist

CASP Diagnostic Study Checklist

CASP Case Control Study Checklist

CASP Economic Evaluation Checklist

CASP Clinical Prediction Rule Checklist

Checklist Archive

  • CASP Randomised Controlled Trial Checklist 2018 fillable form
  • CASP Randomised Controlled Trial Checklist 2018

CASP Checklist

Need more information?

  • Online Learning
  • Privacy Policy

critical appraisal tool case study

Critical Appraisal Skills Programme

Critical Appraisal Skills Programme (CASP) will use the information you provide on this form to be in touch with you and to provide updates and marketing. Please let us know all the ways you would like to hear from us:

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.

Copyright 2024 CASP UK - OAP Ltd. All rights reserved Website by Beyond Your Brand

Methodological quality of case series studies: an introduction to the JBI critical appraisal tool

Affiliations.

  • 1 JBI, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, Australia.
  • 2 The George Institute for Global Health, Telangana, India.
  • 3 Australian Institute of Health Innovation, Faculty of Medicine and Health Sciences, Sydney, NSW, Australia.
  • PMID: 33038125
  • DOI: 10.11124/JBISRIR-D-19-00099

Introduction: Systematic reviews provide a rigorous synthesis of the best available evidence regarding a certain question. Where high-quality evidence is lacking, systematic reviewers may choose to rely on case series studies to provide information in relation to their question. However, to date there has been limited guidance on how to incorporate case series studies within systematic reviews assessing the effectiveness of an intervention, particularly with reference to assessing the methodological quality or risk of bias of these studies.

Methods: An international working group was formed to review the methodological literature regarding case series as a form of evidence for inclusion in systematic reviews. The group then developed a critical appraisal tool based on the epidemiological literature relating to bias within these studies. This was then piloted, reviewed, and approved by JBI's international Scientific Committee.

Results: The JBI critical appraisal tool for case series studies includes 10 questions addressing the internal validity and risk of bias of case series designs, particularly confounding, selection, and information bias, in addition to the importance of clear reporting.

Conclusion: In certain situations, case series designs may represent the best available evidence to inform clinical practice. The JBI critical appraisal tool for case series offers systematic reviewers an approved method to assess the methodological quality of these studies.

  • Research Design*
  • Systematic Reviews as Topic

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Can Commun Dis Rep
  • v.43(9); 2017 Sep 7

Logo of ccdr

Scientific writing

Critical appraisal toolkit (cat) for assessing multiple types of evidence.

1 Memorial University School of Nursing, St. John’s, NL

2 Centre for Communicable Diseases and Infection Control, Public Health Agency of Canada, Ottawa, ON

Contributor: Jennifer Kruse, Public Health Agency of Canada – Conceptualization and project administration

Healthcare professionals are often expected to critically appraise research evidence in order to make recommendations for practice and policy development. Here we describe the Critical Appraisal Toolkit (CAT) currently used by the Public Health Agency of Canada. The CAT consists of: algorithms to identify the type of study design, three separate tools (for appraisal of analytic studies, descriptive studies and literature reviews), additional tools to support the appraisal process, and guidance for summarizing evidence and drawing conclusions about a body of evidence. Although the toolkit was created to assist in the development of national guidelines related to infection prevention and control, clinicians, policy makers and students can use it to guide appraisal of any health-related quantitative research. Participants in a pilot test completed a total of 101 critical appraisals and found that the CAT was user-friendly and helpful in the process of critical appraisal. Feedback from participants of the pilot test of the CAT informed further revisions prior to its release. The CAT adds to the arsenal of available tools and can be especially useful when the best available evidence comes from non-clinical trials and/or studies with weak designs, where other tools may not be easily applied.

Introduction

Healthcare professionals, researchers and policy makers are often involved in the development of public health policies or guidelines. The most valuable guidelines provide a basis for evidence-based practice with recommendations informed by current, high quality, peer-reviewed scientific evidence. To develop such guidelines, the available evidence needs to be critically appraised so that recommendations are based on the "best" evidence. The ability to critically appraise research is, therefore, an essential skill for health professionals serving on policy or guideline development working groups.

Our experience with working groups developing infection prevention and control guidelines was that the review of relevant evidence went smoothly while the critical appraisal of the evidence posed multiple challenges. Three main issues were identified. First, although working group members had strong expertise in infection prevention and control or other areas relevant to the guideline topic, they had varying levels of expertise in research methods and critical appraisal. Second, the critical appraisal tools in use at that time focused largely on analytic studies (such as clinical trials), and lacked definitions of key terms and explanations of the criteria used in the studies. As a result, the use of these tools by working group members did not result in a consistent way of appraising analytic studies nor did the tools provide a means of assessing descriptive studies and literature reviews. Third, working group members wanted guidance on how to progress from assessing individual studies to summarizing and assessing a body of evidence.

To address these issues, a review of existing critical appraisal tools was conducted. We found that the majority of existing tools were design-specific, with considerable variability in intent, criteria appraised and construction of the tools. A systematic review reported that fewer than half of existing tools had guidelines for use of the tool and interpretation of the items ( 1 ). The well-known Grading of Recommendations Assessment, Development and Evaluation (GRADE) rating-of-evidence system and the Cochrane tools for assessing risk of bias were considered for use ( 2 ), ( 3 ). At that time, the guidelines for using these tools were limited, and the tools were focused primarily on randomized controlled trials (RCTs) and non-randomized controlled trials. For feasibility and ethical reasons, clinical trials are rarely available for many common infection prevention and control issues ( 4 ), ( 5 ). For example, there are no intervention studies assessing which practice restrictions, if any, should be placed on healthcare workers who are infected with a blood-borne pathogen. Working group members were concerned that if they used GRADE, all evidence would be rated as very low or as low quality or certainty, and recommendations based on this evidence may be interpreted as unconvincing, even if they were based on the best or only available evidence.

The team decided to develop its own critical appraisal toolkit. So a small working group was convened, led by an epidemiologist with expertise in research, methodology and critical appraisal, with the goal of developing tools to critically appraise studies informing infection prevention and control recommendations. This article provides an overview of the Critical Appraisal Toolkit (CAT). The full document, entitled Infection Prevention and Control Guidelines Critical Appraisal Tool Kit is available online ( 6 ).

Following a review of existing critical appraisal tools, studies informing infection prevention and control guidelines that were in development were reviewed to identify the types of studies that would need to be appraised using the CAT. A preliminary draft of the CAT was used by various guideline development working groups and iterative revisions were made over a two year period. A pilot test of the CAT was then conducted which led to the final version ( 6 ).

The toolkit is set up to guide reviewers through three major phases in the critical appraisal of a body of evidence: appraisal of individual studies; summarizing the results of the appraisals; and appraisal of the body of evidence.

Tools for critically appraising individual studies

The first step in the critical appraisal of an individual study is to identify the study design; this can be surprisingly problematic, since many published research studies are complex. An algorithm was developed to help identify whether a study was an analytic study, a descriptive study or a literature review (see text box for definitions). It is critical to establish the design of the study first, as the criteria for assessment differs depending on the type of study.

Definitions of the types of studies that can be analyzed with the Critical Appraisal Toolkit*

Analytic study: A study designed to identify or measure effects of specific exposures, interventions or risk factors. This design employs the use of an appropriate comparison group to test epidemiologic hypotheses, thus attempting to identify associations or causal relationships.

Descriptive study: A study that describes characteristics of a condition in relation to particular factors or exposure of interest. This design often provides the first important clues about possible determinants of disease and is useful for the formulation of hypotheses that can be subsequently tested using an analytic design.

Literature review: A study that analyzes critical points of a published body of knowledge. This is done through summary, classification and comparison of prior studies. With the exception of meta-analyses, which statistically re-analyze pooled data from several studies, these studies are secondary sources and do not report any new or experimental work.

* Public Health Agency of Canada. Infection Prevention and Control Guidelines Critical Appraisal Tool Kit ( 6 )

Separate algorithms were developed for analytic studies, descriptive studies and literature reviews to help reviewers identify specific designs within those categories. The algorithm below, for example, helps reviewers determine which study design was used within the analytic study category ( Figure 1 ). It is based on key decision points such as number of groups or allocation to group. The legends for the algorithms and supportive tools such as the glossary provide additional detail to further differentiate study designs, such as whether a cohort study was retrospective or prospective.

An external file that holds a picture, illustration, etc.
Object name is 430902-f1.jpg

Abbreviations: CBA, controlled before-after; ITS, interrupted time series; NRCT, non-randomized controlled trial; RCT, randomized controlled trial; UCBA, uncontrolled before-after

Separate critical appraisal tools were developed for analytic studies, for descriptive studies and for literature reviews, with relevant criteria in each tool. For example, a summary of the items covered in the analytic study critical appraisal tool is shown in Table 1 . This tool is used to appraise trials, observational studies and laboratory-based experiments. A supportive tool for assessing statistical analysis was also provided that describes common statistical tests used in epidemiologic studies.

The descriptive study critical appraisal tool assesses different aspects of sampling, data collection, statistical analysis, and ethical conduct. It is used to appraise cross-sectional studies, outbreak investigations, case series and case reports.

The literature review critical appraisal tool assesses the methodology, results and applicability of narrative reviews, systematic reviews and meta-analyses.

After appraisal of individual items in each type of study, each critical appraisal tool also contains instructions for drawing a conclusion about the overall quality of the evidence from a study, based on the per-item appraisal. Quality is rated as high, medium or low. While a RCT is a strong study design and a survey is a weak design, it is possible to have a poor quality RCT or a high quality survey. As a result, the quality of evidence from a study is distinguished from the strength of a study design when assessing the quality of the overall body of evidence. A definition of some terms used to evaluate evidence in the CAT is shown in Table 2 .

* Considered strong design if there are at least two control groups and two intervention groups. Considered moderate design if there is only one control and one intervention group

Tools for summarizing the evidence

The second phase in the critical appraisal process involves summarizing the results of the critical appraisal of individual studies. Reviewers are instructed to complete a template evidence summary table, with key details about each study and its ratings. Studies are listed in descending order of strength in the table. The table simplifies looking across all studies that make up the body of evidence informing a recommendation and allows for easy comparison of participants, sample size, methods, interventions, magnitude and consistency of results, outcome measures and individual study quality as determined by the critical appraisal. These evidence summary tables are reviewed by the working group to determine the rating for the quality of the overall body of evidence and to facilitate development of recommendations based on evidence.

Rating the quality of the overall body of evidence

The third phase in the critical appraisal process is rating the quality of the overall body of evidence. The overall rating depends on the five items summarized in Table 2 : strength of study designs, quality of studies, number of studies, consistency of results and directness of the evidence. The various combinations of these factors lead to an overall rating of the strength of the body of evidence as strong, moderate or weak as summarized in Table 3 .

A unique aspect of this toolkit is that recommendations are not graded but are formulated based on the graded body of evidence. Actions are either recommended or not recommended; it is the strength of the available evidence that varies, not the strength of the recommendation. The toolkit does highlight, however, the need to re-evaluate new evidence as it becomes available especially when recommendations are based on weak evidence.

Pilot test of the CAT

Of 34 individuals who indicated an interest in completing the pilot test, 17 completed it. Multiple peer-reviewed studies were selected representing analytic studies, descriptive studies and literature reviews. The same studies were assigned to participants with similar content expertise. Each participant was asked to appraise three analytic studies, two descriptive studies and one literature review, using the appropriate critical appraisal tool as identified by the participant. For each study appraised, one critical appraisal tool and the associated tool-specific feedback form were completed. Each participant also completed a single general feedback form. A total of 101 of 102 critical appraisals were conducted and returned, with 81 tool-specific feedback forms and 14 general feedback forms returned.

The majority of participants (>85%) found the flow of each tool was logical and the length acceptable but noted they still had difficulty identifying the study designs ( Table 4 ).

* Number of tool-specific forms returned for total number of critical appraisals conducted

The vast majority of the feedback forms (86–93%) indicated that the different tools facilitated the critical appraisal process. In the assessment of consistency, however, only four of ten analytic studies appraised (40%), had complete agreement on the rating of overall study quality by participants, the other six studies had differences noted as mismatches. Four of the six studies with mismatches were observational studies. The differences were minor. None of the mismatches included a study that was rated as both high and low quality by different participants. Based on the comments provided by participants, most mismatches could likely have been resolved through discussion with peers. Mismatched ratings were not an issue for the descriptive studies and literature reviews. In summary, the pilot test provided useful feedback on different aspects of the toolkit. Revisions were made to address the issues identified from the pilot test and thus strengthen the CAT.

The Infection Prevention and Control Guidelines Critical Appraisal Tool Kit was developed in response to the needs of infection control professionals reviewing literature that generally did not include clinical trial evidence. The toolkit was designed to meet the identified needs for training in critical appraisal with extensive instructions and dictionaries, and tools applicable to all three types of studies (analytic studies, descriptive studies and literature reviews). The toolkit provided a method to progress from assessing individual studies to summarizing and assessing the strength of a body of evidence and assigning a grade. Recommendations are then developed based on the graded body of evidence. This grading system has been used by the Public Health Agency of Canada in the development of recent infection prevention and control guidelines ( 5 ), ( 7 ). The toolkit has also been used for conducting critical appraisal for other purposes, such as addressing a practice problem and serving as an educational tool ( 8 ), ( 9 ).

The CAT has a number of strengths. It is applicable to a wide variety of study designs. The criteria that are assessed allow for a comprehensive appraisal of individual studies and facilitates critical appraisal of a body of evidence. The dictionaries provide reviewers with a common language and criteria for discussion and decision making.

The CAT also has a number of limitations. The tools do not address all study designs (e.g., modelling studies) and the toolkit provides limited information on types of bias. Like the majority of critical appraisal tools ( 10 ), ( 11 ), these tools have not been tested for validity and reliability. Nonetheless, the criteria assessed are those indicated as important in textbooks and in the literature ( 12 ), ( 13 ). The grading scale used in this toolkit does not allow for comparison of evidence grading across organizations or internationally, but most reviewers do not need such comparability. It is more important that strong evidence be rated higher than weak evidence, and that reviewers provide rationales for their conclusions; the toolkit enables them to do so.

Overall, the pilot test reinforced that the CAT can help with critical appraisal training and can increase comfort levels for those with limited experience. Further evaluation of the toolkit could assess the effectiveness of revisions made and test its validity and reliability.

A frequent question regarding this toolkit is how it differs from GRADE as both distinguish stronger evidence from weaker evidence and use similar concepts and terminology. The main differences between GRADE and the CAT are presented in Table 5 . Key differences include the focus of the CAT on rating the quality of individual studies, and the detailed instructions and supporting tools that assist those with limited experience in critical appraisal. When clinical trials and well controlled intervention studies are or become available, GRADE and related tools from Cochrane would be more appropriate ( 2 ), ( 3 ). When descriptive studies are all that is available, the CAT is very useful.

Abbreviation: GRADE, Grading of Recommendations Assessment, Development and Evaluation

The Infection Prevention and Control Guidelines Critical Appraisal Tool Kit was developed in response to needs for training in critical appraisal, assessing evidence from a wide variety of research designs, and a method for going from assessing individual studies to characterizing the strength of a body of evidence. Clinician researchers, policy makers and students can use these tools for critical appraisal of studies whether they are trying to develop policies, find a potential solution to a practice problem or critique an article for a journal club. The toolkit adds to the arsenal of critical appraisal tools currently available and is especially useful in assessing evidence from a wide variety of research designs.

Authors’ Statement

DM – Conceptualization, methodology, investigation, data collection and curation and writing – original draft, review and editing

TO – Conceptualization, methodology, investigation, data collection and curation and writing – original draft, review and editing

KD – Conceptualization, review and editing, supervision and project administration

Acknowledgements

We thank the Infection Prevention and Control Expert Working Group of the Public Health Agency of Canada for feedback on the development of the toolkit, Lisa Marie Wasmund for data entry of the pilot test results, Katherine Defalco for review of data and cross-editing of content and technical terminology for the French version of the toolkit, Laurie O’Neil for review and feedback on early versions of the toolkit, Frédéric Bergeron for technical support with the algorithms in the toolkit and the Centre for Communicable Diseases and Infection Control of the Public Health Agency of Canada for review, feedback and ongoing use of the toolkit. We thank Dr. Patricia Huston, Canada Communicable Disease Report Editor-in-Chief, for a thorough review and constructive feedback on the draft manuscript.

Conflict of interest: None.

Funding: This work was supported by the Public Health Agency of Canada.

  • Mayo Clinic Libraries
  • Systematic Reviews
  • Critical Appraisal by Study Design

Systematic Reviews: Critical Appraisal by Study Design

  • Knowledge Synthesis Comparison
  • Knowledge Synthesis Decision Tree
  • Standards & Reporting Results
  • Materials in the Mayo Clinic Libraries
  • Training Resources
  • Review Teams
  • Develop & Refine Your Research Question
  • Develop a Timeline
  • Project Management
  • Communication
  • PRISMA-P Checklist
  • Eligibility Criteria
  • Register your Protocol
  • Other Resources
  • Other Screening Tools
  • Grey Literature Searching
  • Citation Searching
  • Data Extraction Tools
  • Minimize Bias
  • Synthesis & Meta-Analysis
  • Publishing your Systematic Review

Tools for Critical Appraisal of Studies

critical appraisal tool case study

“The purpose of critical appraisal is to determine the scientific merit of a research report and its applicability to clinical decision making.” 1 Conducting a critical appraisal of a study is imperative to any well executed evidence review, but the process can be time consuming and difficult. 2 The critical appraisal process requires “a methodological approach coupled with the right tools and skills to match these methods is essential for finding meaningful results.” 3 In short, it is a method of differentiating good research from bad research.

Critical Appraisal by Study Design (featured tools)

  • Non-RCTs or Observational Studies
  • Diagnostic Accuracy
  • Animal Studies
  • Qualitative Research
  • Tool Repository
  • AMSTAR 2 The original AMSTAR was developed to assess the risk of bias in systematic reviews that included only randomized controlled trials. AMSTAR 2 was published in 2017 and allows researchers to “identify high quality systematic reviews, including those based on non-randomised studies of healthcare interventions.” 4 more... less... AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews)
  • ROBIS ROBIS is a tool designed specifically to assess the risk of bias in systematic reviews. “The tool is completed in three phases: (1) assess relevance(optional), (2) identify concerns with the review process, and (3) judge risk of bias in the review. Signaling questions are included to help assess specific concerns about potential biases with the review.” 5 more... less... ROBIS (Risk of Bias in Systematic Reviews)
  • BMJ Framework for Assessing Systematic Reviews This framework provides a checklist that is used to evaluate the quality of a systematic review.
  • CASP Checklist for Systematic Reviews This CASP checklist is not a scoring system, but rather a method of appraising systematic reviews by considering: 1. Are the results of the study valid? 2. What are the results? 3. Will the results help locally? more... less... CASP (Critical Appraisal Skills Programme)
  • CEBM Systematic Reviews Critical Appraisal Sheet The CEBM’s critical appraisal sheets are designed to help you appraise the reliability, importance, and applicability of clinical evidence. more... less... CEBM (Centre for Evidence-Based Medicine)
  • JBI Critical Appraisal Tools, Checklist for Systematic Reviews JBI Critical Appraisal Tools help you assess the methodological quality of a study and to determine the extent to which study has addressed the possibility of bias in its design, conduct and analysis.
  • NHLBI Study Quality Assessment of Systematic Reviews and Meta-Analyses The NHLBI’s quality assessment tools were designed to assist reviewers in focusing on concepts that are key for critical appraisal of the internal validity of a study. more... less... NHLBI (National Heart, Lung, and Blood Institute)
  • RoB 2 RoB 2 “provides a framework for assessing the risk of bias in a single estimate of an intervention effect reported from a randomized trial,” rather than the entire trial. 6 more... less... RoB 2 (revised tool to assess Risk of Bias in randomized trials)
  • CASP Randomised Controlled Trials Checklist This CASP checklist considers various aspects of an RCT that require critical appraisal: 1. Is the basic study design valid for a randomized controlled trial? 2. Was the study methodologically sound? 3. What are the results? 4. Will the results help locally? more... less... CASP (Critical Appraisal Skills Programme)
  • CONSORT Statement The CONSORT checklist includes 25 items to determine the quality of randomized controlled trials. “Critical appraisal of the quality of clinical trials is possible only if the design, conduct, and analysis of RCTs are thoroughly and accurately described in the report.” 7 more... less... CONSORT (Consolidated Standards of Reporting Trials)
  • NHLBI Study Quality Assessment of Controlled Intervention Studies The NHLBI’s quality assessment tools were designed to assist reviewers in focusing on concepts that are key for critical appraisal of the internal validity of a study. more... less... NHLBI (National Heart, Lung, and Blood Institute)
  • JBI Critical Appraisal Tools Checklist for Randomized Controlled Trials JBI Critical Appraisal Tools help you assess the methodological quality of a study and to determine the extent to which study has addressed the possibility of bias in its design, conduct and analysis.
  • ROBINS-I ROBINS-I is a “tool for evaluating risk of bias in estimates of the comparative effectiveness… of interventions from studies that did not use randomization to allocate units… to comparison groups.” 8 more... less... ROBINS-I (Risk Of Bias in Non-randomized Studies – of Interventions)
  • NOS This tool is used primarily to evaluate and appraise case-control or cohort studies. more... less... NOS (Newcastle-Ottawa Scale)
  • AXIS Cross-sectional studies are frequently used as an evidence base for diagnostic testing, risk factors for disease, and prevalence studies. “The AXIS tool focuses mainly on the presented [study] methods and results.” 9 more... less... AXIS (Appraisal tool for Cross-Sectional Studies)
  • NHLBI Study Quality Assessment Tools for Non-Randomized Studies The NHLBI’s quality assessment tools were designed to assist reviewers in focusing on concepts that are key for critical appraisal of the internal validity of a study. • Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies • Quality Assessment of Case-Control Studies • Quality Assessment Tool for Before-After (Pre-Post) Studies With No Control Group • Quality Assessment Tool for Case Series Studies more... less... NHLBI (National Heart, Lung, and Blood Institute)
  • Case Series Studies Quality Appraisal Checklist Developed by the Institute of Health Economics (Canada), the checklist is comprised of 20 questions to assess “the robustness of the evidence of uncontrolled, [case series] studies.” 10
  • Methodological Quality and Synthesis of Case Series and Case Reports In this paper, Dr. Murad and colleagues “present a framework for appraisal, synthesis and application of evidence derived from case reports and case series.” 11
  • MINORS The MINORS instrument contains 12 items and was developed for evaluating the quality of observational or non-randomized studies. 12 This tool may be of particular interest to researchers who would like to critically appraise surgical studies. more... less... MINORS (Methodological Index for Non-Randomized Studies)
  • JBI Critical Appraisal Tools for Non-Randomized Trials JBI Critical Appraisal Tools help you assess the methodological quality of a study and to determine the extent to which study has addressed the possibility of bias in its design, conduct and analysis. • Checklist for Analytical Cross Sectional Studies • Checklist for Case Control Studies • Checklist for Case Reports • Checklist for Case Series • Checklist for Cohort Studies
  • QUADAS-2 The QUADAS-2 tool “is designed to assess the quality of primary diagnostic accuracy studies… [it] consists of 4 key domains that discuss patient selection, index test, reference standard, and flow of patients through the study and timing of the index tests and reference standard.” 13 more... less... QUADAS-2 (a revised tool for the Quality Assessment of Diagnostic Accuracy Studies)
  • JBI Critical Appraisal Tools Checklist for Diagnostic Test Accuracy Studies JBI Critical Appraisal Tools help you assess the methodological quality of a study and to determine the extent to which study has addressed the possibility of bias in its design, conduct and analysis.
  • STARD 2015 The authors of the standards note that “[e]ssential elements of [diagnostic accuracy] study methods are often poorly described and sometimes completely omitted, making both critical appraisal and replication difficult, if not impossible.”10 The Standards for the Reporting of Diagnostic Accuracy Studies was developed “to help… improve completeness and transparency in reporting of diagnostic accuracy studies.” 14 more... less... STARD 2015 (Standards for the Reporting of Diagnostic Accuracy Studies)
  • CASP Diagnostic Study Checklist This CASP checklist considers various aspects of diagnostic test studies including: 1. Are the results of the study valid? 2. What were the results? 3. Will the results help locally? more... less... CASP (Critical Appraisal Skills Programme)
  • CEBM Diagnostic Critical Appraisal Sheet The CEBM’s critical appraisal sheets are designed to help you appraise the reliability, importance, and applicability of clinical evidence. more... less... CEBM (Centre for Evidence-Based Medicine)
  • SYRCLE’s RoB “[I]mplementation of [SYRCLE’s RoB tool] will facilitate and improve critical appraisal of evidence from animal studies. This may… enhance the efficiency of translating animal research into clinical practice and increase awareness of the necessity of improving the methodological quality of animal studies.” 15 more... less... SYRCLE’s RoB (SYstematic Review Center for Laboratory animal Experimentation’s Risk of Bias)
  • ARRIVE 2.0 “The [ARRIVE 2.0] guidelines are a checklist of information to include in a manuscript to ensure that publications [on in vivo animal studies] contain enough information to add to the knowledge base.” 16 more... less... ARRIVE 2.0 (Animal Research: Reporting of In Vivo Experiments)
  • Critical Appraisal of Studies Using Laboratory Animal Models This article provides “an approach to critically appraising papers based on the results of laboratory animal experiments,” and discusses various “bias domains” in the literature that critical appraisal can identify. 17
  • CEBM Critical Appraisal of Qualitative Studies Sheet The CEBM’s critical appraisal sheets are designed to help you appraise the reliability, importance and applicability of clinical evidence. more... less... CEBM (Centre for Evidence-Based Medicine)
  • CASP Qualitative Studies Checklist This CASP checklist considers various aspects of qualitative research studies including: 1. Are the results of the study valid? 2. What were the results? 3. Will the results help locally? more... less... CASP (Critical Appraisal Skills Programme)
  • Quality Assessment and Risk of Bias Tool Repository Created by librarians at Duke University, this extensive listing contains over 100 commonly used risk of bias tools that may be sorted by study type.
  • Latitudes Network A library of risk of bias tools for use in evidence syntheses that provides selection help and training videos.

References & Recommended Reading

1.     Kolaski, K., Logan, L. R., & Ioannidis, J. P. (2024). Guidance to best tools and practices for systematic reviews .  British Journal of Pharmacology ,  181 (1), 180-210

2.    Portney LG.  Foundations of clinical research : applications to evidence-based practice.  Fourth edition. ed. Philadelphia: F A Davis; 2020.

3.     Fowkes FG, Fulton PM.  Critical appraisal of published research: introductory guidelines.   BMJ (Clinical research ed).  1991;302(6785):1136-1140.

4.     Singh S.  Critical appraisal skills programme.   Journal of Pharmacology and Pharmacotherapeutics.  2013;4(1):76-77.

5.     Shea BJ, Reeves BC, Wells G, et al.  AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both.   BMJ (Clinical research ed).  2017;358:j4008.

6.     Whiting P, Savovic J, Higgins JPT, et al.  ROBIS: A new tool to assess risk of bias in systematic reviews was developed.   Journal of clinical epidemiology.  2016;69:225-234.

7.     Sterne JAC, Savovic J, Page MJ, et al.  RoB 2: a revised tool for assessing risk of bias in randomised trials.  BMJ (Clinical research ed).  2019;366:l4898.

8.     Moher D, Hopewell S, Schulz KF, et al.  CONSORT 2010 Explanation and Elaboration: Updated guidelines for reporting parallel group randomised trials.  Journal of clinical epidemiology.  2010;63(8):e1-37.

9.     Sterne JA, Hernan MA, Reeves BC, et al.  ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions.  BMJ (Clinical research ed).  2016;355:i4919.

10.     Downes MJ, Brennan ML, Williams HC, Dean RS.  Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS).   BMJ open.  2016;6(12):e011458.

11.   Guo B, Moga C, Harstall C, Schopflocher D.  A principal component analysis is conducted for a case series quality appraisal checklist.   Journal of clinical epidemiology.  2016;69:199-207.e192.

12.   Murad MH, Sultan S, Haffar S, Bazerbachi F.  Methodological quality and synthesis of case series and case reports.  BMJ evidence-based medicine.  2018;23(2):60-63.

13.   Slim K, Nini E, Forestier D, Kwiatkowski F, Panis Y, Chipponi J.  Methodological index for non-randomized studies (MINORS): development and validation of a new instrument.   ANZ journal of surgery.  2003;73(9):712-716.

14.   Whiting PF, Rutjes AWS, Westwood ME, et al.  QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies.   Annals of internal medicine.  2011;155(8):529-536.

15.   Bossuyt PM, Reitsma JB, Bruns DE, et al.  STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies.   BMJ (Clinical research ed).  2015;351:h5527.

16.   Hooijmans CR, Rovers MM, de Vries RBM, Leenaars M, Ritskes-Hoitinga M, Langendam MW.  SYRCLE's risk of bias tool for animal studies.   BMC medical research methodology.  2014;14:43.

17.   Percie du Sert N, Ahluwalia A, Alam S, et al.  Reporting animal research: Explanation and elaboration for the ARRIVE guidelines 2.0.  PLoS biology.  2020;18(7):e3000411.

18.   O'Connor AM, Sargeant JM.  Critical appraisal of studies using laboratory animal models.   ILAR journal.  2014;55(3):405-417.

  • << Previous: Minimize Bias
  • Next: GRADE >>
  • Last Updated: May 10, 2024 7:59 AM
  • URL: https://libraryguides.mayo.edu/systematicreviewprocess

Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Nuffield Department of Primary Care Health Sciences, University of Oxford

Critical Appraisal tools

Critical appraisal worksheets to help you appraise the reliability, importance and applicability of clinical evidence.

Critical appraisal is the systematic evaluation of clinical research papers in order to establish:

  • Does this study address a  clearly focused question ?
  • Did the study use valid methods to address this question?
  • Are the valid results of this study important?
  • Are these valid, important results applicable to my patient or population?

If the answer to any of these questions is “no”, you can save yourself the trouble of reading the rest of it.

This section contains useful tools and downloads for the critical appraisal of different types of medical evidence. Example appraisal sheets are provided together with several helpful examples.

Critical Appraisal Worksheets

  • Systematic Reviews  Critical Appraisal Sheet
  • Diagnostics  Critical Appraisal Sheet
  • Prognosis  Critical Appraisal Sheet
  • Randomised Controlled Trials  (RCT) Critical Appraisal Sheet
  • Critical Appraisal of Qualitative Studies  Sheet
  • IPD Review  Sheet

Chinese - translated by Chung-Han Yang and Shih-Chieh Shao

  • Systematic Reviews  Critical Appraisal Sheet
  • Diagnostic Study  Critical Appraisal Sheet
  • Prognostic Critical Appraisal Sheet
  • RCT  Critical Appraisal Sheet
  • IPD reviews Critical Appraisal Sheet
  • Qualitative Studies Critical Appraisal Sheet 

German - translated by Johannes Pohl and Martin Sadilek

  • Systematic Review  Critical Appraisal Sheet
  • Diagnosis Critical Appraisal Sheet
  • Prognosis Critical Appraisal Sheet
  • Therapy / RCT Critical Appraisal Sheet

Lithuanian - translated by Tumas Beinortas

  • Systematic review appraisal Lithuanian (PDF)
  • Diagnostic accuracy appraisal Lithuanian  (PDF)
  • Prognostic study appraisal Lithuanian  (PDF)
  • RCT appraisal sheets Lithuanian  (PDF)

Portugese - translated by Enderson Miranda, Rachel Riera and Luis Eduardo Fontes

  • Portuguese – Systematic Review Study Appraisal Worksheet
  • Portuguese – Diagnostic Study Appraisal Worksheet
  • Portuguese – Prognostic Study Appraisal Worksheet
  • Portuguese – RCT Study Appraisal Worksheet
  • Portuguese – Systematic Review Evaluation of Individual Participant Data Worksheet
  • Portuguese – Qualitative Studies Evaluation Worksheet

Spanish - translated by Ana Cristina Castro

  • Systematic Review  (PDF)
  • Diagnosis  (PDF)
  • Prognosis  Spanish Translation (PDF)
  • Therapy / RCT  Spanish Translation (PDF)

Persian - translated by Ahmad Sofi Mahmudi

  • Prognosis  (PDF)
  • PICO  Critical Appraisal Sheet (PDF)
  • PICO Critical Appraisal Sheet (MS-Word)
  • Educational Prescription  Critical Appraisal Sheet (PDF)

Explanations & Examples

  • Pre-test probability
  • SpPin and SnNout
  • Likelihood Ratios

U.S. flag

An official website of the United States government

Here’s how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Heart-Healthy Living
  • High Blood Pressure
  • Sickle Cell Disease
  • Sleep Apnea
  • Information & Resources on COVID-19
  • The Heart Truth®
  • Learn More Breathe Better®
  • Blood Diseases and Disorders Education Program
  • Publications and Resources
  • Blood Disorders and Blood Safety
  • Sleep Science and Sleep Disorders
  • Lung Diseases
  • Health Disparities and Inequities
  • Heart and Vascular Diseases
  • Precision Medicine Activities
  • Obesity, Nutrition, and Physical Activity
  • Population and Epidemiology Studies
  • Women’s Health
  • Research Topics
  • Clinical Trials
  • All Science A-Z
  • Grants and Training Home
  • Policies and Guidelines
  • Funding Opportunities and Contacts
  • Training and Career Development
  • Email Alerts
  • NHLBI in the Press
  • Research Features
  • Past Events
  • Upcoming Events
  • Mission and Strategic Vision
  • Divisions, Offices and Centers
  • Advisory Committees
  • Budget and Legislative Information
  • Jobs and Working at the NHLBI
  • Contact and FAQs
  • NIH Sleep Research Plan
  • < Back To Health Topics

Study Quality Assessment Tools

In 2013, NHLBI developed a set of tailored quality assessment tools to assist reviewers in focusing on concepts that are key to a study’s internal validity. The tools were specific to certain study designs and tested for potential flaws in study methods or implementation. Experts used the tools during the systematic evidence review process to update existing clinical guidelines, such as those on cholesterol, blood pressure, and obesity. Their findings are outlined in the following reports:

  • Assessing Cardiovascular Risk: Systematic Evidence Review from the Risk Assessment Work Group
  • Management of Blood Cholesterol in Adults: Systematic Evidence Review from the Cholesterol Expert Panel
  • Management of Blood Pressure in Adults: Systematic Evidence Review from the Blood Pressure Expert Panel
  • Managing Overweight and Obesity in Adults: Systematic Evidence Review from the Obesity Expert Panel

While these tools have not been independently published and would not be considered standardized, they may be useful to the research community. These reports describe how experts used the tools for the project. Researchers may want to use the tools for their own projects; however, they would need to determine their own parameters for making judgements. Details about the design and application of the tools are included in Appendix A of the reports.

Quality Assessment of Controlled Intervention Studies - Study Quality Assessment Tools

*CD, cannot determine; NA, not applicable; NR, not reported

Guidance for Assessing the Quality of Controlled Intervention Studies

The guidance document below is organized by question number from the tool for quality assessment of controlled intervention studies.

Question 1. Described as randomized

Was the study described as randomized? A study does not satisfy quality criteria as randomized simply because the authors call it randomized; however, it is a first step in determining if a study is randomized

Questions 2 and 3. Treatment allocation–two interrelated pieces

Adequate randomization: Randomization is adequate if it occurred according to the play of chance (e.g., computer generated sequence in more recent studies, or random number table in older studies). Inadequate randomization: Randomization is inadequate if there is a preset plan (e.g., alternation where every other subject is assigned to treatment arm or another method of allocation is used, such as time or day of hospital admission or clinic visit, ZIP Code, phone number, etc.). In fact, this is not randomization at all–it is another method of assignment to groups. If assignment is not by the play of chance, then the answer to this question is no. There may be some tricky scenarios that will need to be read carefully and considered for the role of chance in assignment. For example, randomization may occur at the site level, where all individuals at a particular site are assigned to receive treatment or no treatment. This scenario is used for group-randomized trials, which can be truly randomized, but often are "quasi-experimental" studies with comparison groups rather than true control groups. (Few, if any, group-randomized trials are anticipated for this evidence review.)

Allocation concealment: This means that one does not know in advance, or cannot guess accurately, to what group the next person eligible for randomization will be assigned. Methods include sequentially numbered opaque sealed envelopes, numbered or coded containers, central randomization by a coordinating center, computer-generated randomization that is not revealed ahead of time, etc. Questions 4 and 5. Blinding

Blinding means that one does not know to which group–intervention or control–the participant is assigned. It is also sometimes called "masking." The reviewer assessed whether each of the following was blinded to knowledge of treatment assignment: (1) the person assessing the primary outcome(s) for the study (e.g., taking the measurements such as blood pressure, examining health records for events such as myocardial infarction, reviewing and interpreting test results such as x ray or cardiac catheterization findings); (2) the person receiving the intervention (e.g., the patient or other study participant); and (3) the person providing the intervention (e.g., the physician, nurse, pharmacist, dietitian, or behavioral interventionist).

Generally placebo-controlled medication studies are blinded to patient, provider, and outcome assessors; behavioral, lifestyle, and surgical studies are examples of studies that are frequently blinded only to the outcome assessors because blinding of the persons providing and receiving the interventions is difficult in these situations. Sometimes the individual providing the intervention is the same person performing the outcome assessment. This was noted when it occurred.

Question 6. Similarity of groups at baseline

This question relates to whether the intervention and control groups have similar baseline characteristics on average especially those characteristics that may affect the intervention or outcomes. The point of randomized trials is to create groups that are as similar as possible except for the intervention(s) being studied in order to compare the effects of the interventions between groups. When reviewers abstracted baseline characteristics, they noted when there was a significant difference between groups. Baseline characteristics for intervention groups are usually presented in a table in the article (often Table 1).

Groups can differ at baseline without raising red flags if: (1) the differences would not be expected to have any bearing on the interventions and outcomes; or (2) the differences are not statistically significant. When concerned about baseline difference in groups, reviewers recorded them in the comments section and considered them in their overall determination of the study quality.

Questions 7 and 8. Dropout

"Dropouts" in a clinical trial are individuals for whom there are no end point measurements, often because they dropped out of the study and were lost to followup.

Generally, an acceptable overall dropout rate is considered 20 percent or less of participants who were randomized or allocated into each group. An acceptable differential dropout rate is an absolute difference between groups of 15 percentage points at most (calculated by subtracting the dropout rate of one group minus the dropout rate of the other group). However, these are general rates. Lower overall dropout rates are expected in shorter studies, whereas higher overall dropout rates may be acceptable for studies of longer duration. For example, a 6-month study of weight loss interventions should be expected to have nearly 100 percent followup (almost no dropouts–nearly everybody gets their weight measured regardless of whether or not they actually received the intervention), whereas a 10-year study testing the effects of intensive blood pressure lowering on heart attacks may be acceptable if there is a 20-25 percent dropout rate, especially if the dropout rate between groups was similar. The panels for the NHLBI systematic reviews may set different levels of dropout caps.

Conversely, differential dropout rates are not flexible; there should be a 15 percent cap. If there is a differential dropout rate of 15 percent or higher between arms, then there is a serious potential for bias. This constitutes a fatal flaw, resulting in a poor quality rating for the study.

Question 9. Adherence

Did participants in each treatment group adhere to the protocols for assigned interventions? For example, if Group 1 was assigned to 10 mg/day of Drug A, did most of them take 10 mg/day of Drug A? Another example is a study evaluating the difference between a 30-pound weight loss and a 10-pound weight loss on specific clinical outcomes (e.g., heart attacks), but the 30-pound weight loss group did not achieve its intended weight loss target (e.g., the group only lost 14 pounds on average). A third example is whether a large percentage of participants assigned to one group "crossed over" and got the intervention provided to the other group. A final example is when one group that was assigned to receive a particular drug at a particular dose had a large percentage of participants who did not end up taking the drug or the dose as designed in the protocol.

Question 10. Avoid other interventions

Changes that occur in the study outcomes being assessed should be attributable to the interventions being compared in the study. If study participants receive interventions that are not part of the study protocol and could affect the outcomes being assessed, and they receive these interventions differentially, then there is cause for concern because these interventions could bias results. The following scenario is another example of how bias can occur. In a study comparing two different dietary interventions on serum cholesterol, one group had a significantly higher percentage of participants taking statin drugs than the other group. In this situation, it would be impossible to know if a difference in outcome was due to the dietary intervention or the drugs.

Question 11. Outcome measures assessment

What tools or methods were used to measure the outcomes in the study? Were the tools and methods accurate and reliable–for example, have they been validated, or are they objective? This is important as it indicates the confidence you can have in the reported outcomes. Perhaps even more important is ascertaining that outcomes were assessed in the same manner within and between groups. One example of differing methods is self-report of dietary salt intake versus urine testing for sodium content (a more reliable and valid assessment method). Another example is using BP measurements taken by practitioners who use their usual methods versus using BP measurements done by individuals trained in a standard approach. Such an approach may include using the same instrument each time and taking an individual's BP multiple times. In each of these cases, the answer to this assessment question would be "no" for the former scenario and "yes" for the latter. In addition, a study in which an intervention group was seen more frequently than the control group, enabling more opportunities to report clinical events, would not be considered reliable and valid.

Question 12. Power calculation

Generally, a study's methods section will address the sample size needed to detect differences in primary outcomes. The current standard is at least 80 percent power to detect a clinically relevant difference in an outcome using a two-sided alpha of 0.05. Often, however, older studies will not report on power.

Question 13. Prespecified outcomes

Investigators should prespecify outcomes reported in a study for hypothesis testing–which is the reason for conducting an RCT. Without prespecified outcomes, the study may be reporting ad hoc analyses, simply looking for differences supporting desired findings. Investigators also should prespecify subgroups being examined. Most RCTs conduct numerous post hoc analyses as a way of exploring findings and generating additional hypotheses. The intent of this question is to give more weight to reports that are not simply exploratory in nature.

Question 14. Intention-to-treat analysis

Intention-to-treat (ITT) means everybody who was randomized is analyzed according to the original group to which they are assigned. This is an extremely important concept because conducting an ITT analysis preserves the whole reason for doing a randomized trial; that is, to compare groups that differ only in the intervention being tested. When the ITT philosophy is not followed, groups being compared may no longer be the same. In this situation, the study would likely be rated poor. However, if an investigator used another type of analysis that could be viewed as valid, this would be explained in the "other" box on the quality assessment form. Some researchers use a completers analysis (an analysis of only the participants who completed the intervention and the study), which introduces significant potential for bias. Characteristics of participants who do not complete the study are unlikely to be the same as those who do. The likely impact of participants withdrawing from a study treatment must be considered carefully. ITT analysis provides a more conservative (potentially less biased) estimate of effectiveness.

General Guidance for Determining the Overall Quality Rating of Controlled Intervention Studies

The questions on the assessment tool were designed to help reviewers focus on the key concepts for evaluating a study's internal validity. They are not intended to create a list that is simply tallied up to arrive at a summary judgment of quality.

Internal validity is the extent to which the results (effects) reported in a study can truly be attributed to the intervention being evaluated and not to flaws in the design or conduct of the study–in other words, the ability for the study to make causal conclusions about the effects of the intervention being tested. Such flaws can increase the risk of bias. Critical appraisal involves considering the risk of potential for allocation bias, measurement bias, or confounding (the mixture of exposures that one cannot tease out from each other). Examples of confounding include co-interventions, differences at baseline in patient characteristics, and other issues addressed in the questions above. High risk of bias translates to a rating of poor quality. Low risk of bias translates to a rating of good quality.

Fatal flaws: If a study has a "fatal flaw," then risk of bias is significant, and the study is of poor quality. Examples of fatal flaws in RCTs include high dropout rates, high differential dropout rates, no ITT analysis or other unsuitable statistical analysis (e.g., completers-only analysis).

Generally, when evaluating a study, one will not see a "fatal flaw;" however, one will find some risk of bias. During training, reviewers were instructed to look for the potential for bias in studies by focusing on the concepts underlying the questions in the tool. For any box checked "no," reviewers were told to ask: "What is the potential risk of bias that may be introduced by this flaw?" That is, does this factor cause one to doubt the results that were reported in the study?

NHLBI staff provided reviewers with background reading on critical appraisal, while emphasizing that the best approach to use is to think about the questions in the tool in determining the potential for bias in a study. The staff also emphasized that each study has specific nuances; therefore, reviewers should familiarize themselves with the key concepts.

Quality Assessment of Systematic Reviews and Meta-Analyses - Study Quality Assessment Tools

Guidance for Quality Assessment Tool for Systematic Reviews and Meta-Analyses

A systematic review is a study that attempts to answer a question by synthesizing the results of primary studies while using strategies to limit bias and random error.424 These strategies include a comprehensive search of all potentially relevant articles and the use of explicit, reproducible criteria in the selection of articles included in the review. Research designs and study characteristics are appraised, data are synthesized, and results are interpreted using a predefined systematic approach that adheres to evidence-based methodological principles.

Systematic reviews can be qualitative or quantitative. A qualitative systematic review summarizes the results of the primary studies but does not combine the results statistically. A quantitative systematic review, or meta-analysis, is a type of systematic review that employs statistical techniques to combine the results of the different studies into a single pooled estimate of effect, often given as an odds ratio. The guidance document below is organized by question number from the tool for quality assessment of systematic reviews and meta-analyses.

Question 1. Focused question

The review should be based on a question that is clearly stated and well-formulated. An example would be a question that uses the PICO (population, intervention, comparator, outcome) format, with all components clearly described.

Question 2. Eligibility criteria

The eligibility criteria used to determine whether studies were included or excluded should be clearly specified and predefined. It should be clear to the reader why studies were included or excluded.

Question 3. Literature search

The search strategy should employ a comprehensive, systematic approach in order to capture all of the evidence possible that pertains to the question of interest. At a minimum, a comprehensive review has the following attributes:

  • Electronic searches were conducted using multiple scientific literature databases, such as MEDLINE, EMBASE, Cochrane Central Register of Controlled Trials, PsychLit, and others as appropriate for the subject matter.
  • Manual searches of references found in articles and textbooks should supplement the electronic searches.

Additional search strategies that may be used to improve the yield include the following:

  • Studies published in other countries
  • Studies published in languages other than English
  • Identification by experts in the field of studies and articles that may have been missed
  • Search of grey literature, including technical reports and other papers from government agencies or scientific groups or committees; presentations and posters from scientific meetings, conference proceedings, unpublished manuscripts; and others. Searching the grey literature is important (whenever feasible) because sometimes only positive studies with significant findings are published in the peer-reviewed literature, which can bias the results of a review.

In their reviews, researchers described the literature search strategy clearly, and ascertained it could be reproducible by others with similar results.

Question 4. Dual review for determining which studies to include and exclude

Titles, abstracts, and full-text articles (when indicated) should be reviewed by two independent reviewers to determine which studies to include and exclude in the review. Reviewers resolved disagreements through discussion and consensus or with third parties. They clearly stated the review process, including methods for settling disagreements.

Question 5. Quality appraisal for internal validity

Each included study should be appraised for internal validity (study quality assessment) using a standardized approach for rating the quality of the individual studies. Ideally, this should be done by at least two independent reviewers appraised each study for internal validity. However, there is not one commonly accepted, standardized tool for rating the quality of studies. So, in the research papers, reviewers looked for an assessment of the quality of each study and a clear description of the process used.

Question 6. List and describe included studies

All included studies were listed in the review, along with descriptions of their key characteristics. This was presented either in narrative or table format.

Question 7. Publication bias

Publication bias is a term used when studies with positive results have a higher likelihood of being published, being published rapidly, being published in higher impact journals, being published in English, being published more than once, or being cited by others.425,426 Publication bias can be linked to favorable or unfavorable treatment of research findings due to investigators, editors, industry, commercial interests, or peer reviewers. To minimize the potential for publication bias, researchers can conduct a comprehensive literature search that includes the strategies discussed in Question 3.

A funnel plot–a scatter plot of component studies in a meta-analysis–is a commonly used graphical method for detecting publication bias. If there is no significant publication bias, the graph looks like a symmetrical inverted funnel.

Reviewers assessed and clearly described the likelihood of publication bias.

Question 8. Heterogeneity

Heterogeneity is used to describe important differences in studies included in a meta-analysis that may make it inappropriate to combine the studies.427 Heterogeneity can be clinical (e.g., important differences between study participants, baseline disease severity, and interventions); methodological (e.g., important differences in the design and conduct of the study); or statistical (e.g., important differences in the quantitative results or reported effects).

Researchers usually assess clinical or methodological heterogeneity qualitatively by determining whether it makes sense to combine studies. For example:

  • Should a study evaluating the effects of an intervention on CVD risk that involves elderly male smokers with hypertension be combined with a study that involves healthy adults ages 18 to 40? (Clinical Heterogeneity)
  • Should a study that uses a randomized controlled trial (RCT) design be combined with a study that uses a case-control study design? (Methodological Heterogeneity)

Statistical heterogeneity describes the degree of variation in the effect estimates from a set of studies; it is assessed quantitatively. The two most common methods used to assess statistical heterogeneity are the Q test (also known as the X2 or chi-square test) or I2 test.

Reviewers examined studies to determine if an assessment for heterogeneity was conducted and clearly described. If the studies are found to be heterogeneous, the investigators should explore and explain the causes of the heterogeneity, and determine what influence, if any, the study differences had on overall study results.

Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies - Study Quality Assessment Tools

Guidance for Assessing the Quality of Observational Cohort and Cross-Sectional Studies

The guidance document below is organized by question number from the tool for quality assessment of observational cohort and cross-sectional studies.

Question 1. Research question

Did the authors describe their goal in conducting this research? Is it easy to understand what they were looking to find? This issue is important for any scientific paper of any type. Higher quality scientific research explicitly defines a research question.

Questions 2 and 3. Study population

Did the authors describe the group of people from which the study participants were selected or recruited, using demographics, location, and time period? If you were to conduct this study again, would you know who to recruit, from where, and from what time period? Is the cohort population free of the outcomes of interest at the time they were recruited?

An example would be men over 40 years old with type 2 diabetes who began seeking medical care at Phoenix Good Samaritan Hospital between January 1, 1990 and December 31, 1994. In this example, the population is clearly described as: (1) who (men over 40 years old with type 2 diabetes); (2) where (Phoenix Good Samaritan Hospital); and (3) when (between January 1, 1990 and December 31, 1994). Another example is women ages 34 to 59 years of age in 1980 who were in the nursing profession and had no known coronary disease, stroke, cancer, hypercholesterolemia, or diabetes, and were recruited from the 11 most populous States, with contact information obtained from State nursing boards.

In cohort studies, it is crucial that the population at baseline is free of the outcome of interest. For example, the nurses' population above would be an appropriate group in which to study incident coronary disease. This information is usually found either in descriptions of population recruitment, definitions of variables, or inclusion/exclusion criteria.

You may need to look at prior papers on methods in order to make the assessment for this question. Those papers are usually in the reference list.

If fewer than 50% of eligible persons participated in the study, then there is concern that the study population does not adequately represent the target population. This increases the risk of bias.

Question 4. Groups recruited from the same population and uniform eligibility criteria

Were the inclusion and exclusion criteria developed prior to recruitment or selection of the study population? Were the same underlying criteria used for all of the subjects involved? This issue is related to the description of the study population, above, and you may find the information for both of these questions in the same section of the paper.

Most cohort studies begin with the selection of the cohort; participants in this cohort are then measured or evaluated to determine their exposure status. However, some cohort studies may recruit or select exposed participants in a different time or place than unexposed participants, especially retrospective cohort studies–which is when data are obtained from the past (retrospectively), but the analysis examines exposures prior to outcomes. For example, one research question could be whether diabetic men with clinical depression are at higher risk for cardiovascular disease than those without clinical depression. So, diabetic men with depression might be selected from a mental health clinic, while diabetic men without depression might be selected from an internal medicine or endocrinology clinic. This study recruits groups from different clinic populations, so this example would get a "no."

However, the women nurses described in the question above were selected based on the same inclusion/exclusion criteria, so that example would get a "yes."

Question 5. Sample size justification

Did the authors present their reasons for selecting or recruiting the number of people included or analyzed? Do they note or discuss the statistical power of the study? This question is about whether or not the study had enough participants to detect an association if one truly existed.

A paragraph in the methods section of the article may explain the sample size needed to detect a hypothesized difference in outcomes. You may also find a discussion of power in the discussion section (such as the study had 85 percent power to detect a 20 percent increase in the rate of an outcome of interest, with a 2-sided alpha of 0.05). Sometimes estimates of variance and/or estimates of effect size are given, instead of sample size calculations. In any of these cases, the answer would be "yes."

However, observational cohort studies often do not report anything about power or sample sizes because the analyses are exploratory in nature. In this case, the answer would be "no." This is not a "fatal flaw." It just may indicate that attention was not paid to whether the study was sufficiently sized to answer a prespecified question–i.e., it may have been an exploratory, hypothesis-generating study.

Question 6. Exposure assessed prior to outcome measurement

This question is important because, in order to determine whether an exposure causes an outcome, the exposure must come before the outcome.

For some prospective cohort studies, the investigator enrolls the cohort and then determines the exposure status of various members of the cohort (large epidemiological studies like Framingham used this approach). However, for other cohort studies, the cohort is selected based on its exposure status, as in the example above of depressed diabetic men (the exposure being depression). Other examples include a cohort identified by its exposure to fluoridated drinking water and then compared to a cohort living in an area without fluoridated water, or a cohort of military personnel exposed to combat in the Gulf War compared to a cohort of military personnel not deployed in a combat zone.

With either of these types of cohort studies, the cohort is followed forward in time (i.e., prospectively) to assess the outcomes that occurred in the exposed members compared to nonexposed members of the cohort. Therefore, you begin the study in the present by looking at groups that were exposed (or not) to some biological or behavioral factor, intervention, etc., and then you follow them forward in time to examine outcomes. If a cohort study is conducted properly, the answer to this question should be "yes," since the exposure status of members of the cohort was determined at the beginning of the study before the outcomes occurred.

For retrospective cohort studies, the same principal applies. The difference is that, rather than identifying a cohort in the present and following them forward in time, the investigators go back in time (i.e., retrospectively) and select a cohort based on their exposure status in the past and then follow them forward to assess the outcomes that occurred in the exposed and nonexposed cohort members. Because in retrospective cohort studies the exposure and outcomes may have already occurred (it depends on how long they follow the cohort), it is important to make sure that the exposure preceded the outcome.

Sometimes cross-sectional studies are conducted (or cross-sectional analyses of cohort-study data), where the exposures and outcomes are measured during the same timeframe. As a result, cross-sectional analyses provide weaker evidence than regular cohort studies regarding a potential causal relationship between exposures and outcomes. For cross-sectional analyses, the answer to Question 6 should be "no."

Question 7. Sufficient timeframe to see an effect

Did the study allow enough time for a sufficient number of outcomes to occur or be observed, or enough time for an exposure to have a biological effect on an outcome? In the examples given above, if clinical depression has a biological effect on increasing risk for CVD, such an effect may take years. In the other example, if higher dietary sodium increases BP, a short timeframe may be sufficient to assess its association with BP, but a longer timeframe would be needed to examine its association with heart attacks.

The issue of timeframe is important to enable meaningful analysis of the relationships between exposures and outcomes to be conducted. This often requires at least several years, especially when looking at health outcomes, but it depends on the research question and outcomes being examined.

Cross-sectional analyses allow no time to see an effect, since the exposures and outcomes are assessed at the same time, so those would get a "no" response.

Question 8. Different levels of the exposure of interest

If the exposure can be defined as a range (examples: drug dosage, amount of physical activity, amount of sodium consumed), were multiple categories of that exposure assessed? (for example, for drugs: not on the medication, on a low dose, medium dose, high dose; for dietary sodium, higher than average U.S. consumption, lower than recommended consumption, between the two). Sometimes discrete categories of exposure are not used, but instead exposures are measured as continuous variables (for example, mg/day of dietary sodium or BP values).

In any case, studying different levels of exposure (where possible) enables investigators to assess trends or dose-response relationships between exposures and outcomes–e.g., the higher the exposure, the greater the rate of the health outcome. The presence of trends or dose-response relationships lends credibility to the hypothesis of causality between exposure and outcome.

For some exposures, however, this question may not be applicable (e.g., the exposure may be a dichotomous variable like living in a rural setting versus an urban setting, or vaccinated/not vaccinated with a one-time vaccine). If there are only two possible exposures (yes/no), then this question should be given an "NA," and it should not count negatively towards the quality rating.

Question 9. Exposure measures and assessment

Were the exposure measures defined in detail? Were the tools or methods used to measure exposure accurate and reliable–for example, have they been validated or are they objective? This issue is important as it influences confidence in the reported exposures. When exposures are measured with less accuracy or validity, it is harder to see an association between exposure and outcome even if one exists. Also as important is whether the exposures were assessed in the same manner within groups and between groups; if not, bias may result.

For example, retrospective self-report of dietary salt intake is not as valid and reliable as prospectively using a standardized dietary log plus testing participants' urine for sodium content. Another example is measurement of BP, where there may be quite a difference between usual care, where clinicians measure BP however it is done in their practice setting (which can vary considerably), and use of trained BP assessors using standardized equipment (e.g., the same BP device which has been tested and calibrated) and a standardized protocol (e.g., patient is seated for 5 minutes with feet flat on the floor, BP is taken twice in each arm, and all four measurements are averaged). In each of these cases, the former would get a "no" and the latter a "yes."

Here is a final example that illustrates the point about why it is important to assess exposures consistently across all groups: If people with higher BP (exposed cohort) are seen by their providers more frequently than those without elevated BP (nonexposed group), it also increases the chances of detecting and documenting changes in health outcomes, including CVD-related events. Therefore, it may lead to the conclusion that higher BP leads to more CVD events. This may be true, but it could also be due to the fact that the subjects with higher BP were seen more often; thus, more CVD-related events were detected and documented simply because they had more encounters with the health care system. Thus, it could bias the results and lead to an erroneous conclusion.

Question 10. Repeated exposure assessment

Was the exposure for each person measured more than once during the course of the study period? Multiple measurements with the same result increase our confidence that the exposure status was correctly classified. Also, multiple measurements enable investigators to look at changes in exposure over time, for example, people who ate high dietary sodium throughout the followup period, compared to those who started out high then reduced their intake, compared to those who ate low sodium throughout. Once again, this may not be applicable in all cases. In many older studies, exposure was measured only at baseline. However, multiple exposure measurements do result in a stronger study design.

Question 11. Outcome measures

Were the outcomes defined in detail? Were the tools or methods for measuring outcomes accurate and reliable–for example, have they been validated or are they objective? This issue is important because it influences confidence in the validity of study results. Also important is whether the outcomes were assessed in the same manner within groups and between groups.

An example of an outcome measure that is objective, accurate, and reliable is death–the outcome measured with more accuracy than any other. But even with a measure as objective as death, there can be differences in the accuracy and reliability of how death was assessed by the investigators. Did they base it on an autopsy report, death certificate, death registry, or report from a family member? Another example is a study of whether dietary fat intake is related to blood cholesterol level (cholesterol level being the outcome), and the cholesterol level is measured from fasting blood samples that are all sent to the same laboratory. These examples would get a "yes." An example of a "no" would be self-report by subjects that they had a heart attack, or self-report of how much they weigh (if body weight is the outcome of interest).

Similar to the example in Question 9, results may be biased if one group (e.g., people with high BP) is seen more frequently than another group (people with normal BP) because more frequent encounters with the health care system increases the chances of outcomes being detected and documented.

Question 12. Blinding of outcome assessors

Blinding means that outcome assessors did not know whether the participant was exposed or unexposed. It is also sometimes called "masking." The objective is to look for evidence in the article that the person(s) assessing the outcome(s) for the study (for example, examining medical records to determine the outcomes that occurred in the exposed and comparison groups) is masked to the exposure status of the participant. Sometimes the person measuring the exposure is the same person conducting the outcome assessment. In this case, the outcome assessor would most likely not be blinded to exposure status because they also took measurements of exposures. If so, make a note of that in the comments section.

As you assess this criterion, think about whether it is likely that the person(s) doing the outcome assessment would know (or be able to figure out) the exposure status of the study participants. If the answer is no, then blinding is adequate. An example of adequate blinding of the outcome assessors is to create a separate committee, whose members were not involved in the care of the patient and had no information about the study participants' exposure status. The committee would then be provided with copies of participants' medical records, which had been stripped of any potential exposure information or personally identifiable information. The committee would then review the records for prespecified outcomes according to the study protocol. If blinding was not possible, which is sometimes the case, mark "NA" and explain the potential for bias.

Question 13. Followup rate

Higher overall followup rates are always better than lower followup rates, even though higher rates are expected in shorter studies, whereas lower overall followup rates are often seen in studies of longer duration. Usually, an acceptable overall followup rate is considered 80 percent or more of participants whose exposures were measured at baseline. However, this is just a general guideline. For example, a 6-month cohort study examining the relationship between dietary sodium intake and BP level may have over 90 percent followup, but a 20-year cohort study examining effects of sodium intake on stroke may have only a 65 percent followup rate.

Question 14. Statistical analyses

Were key potential confounding variables measured and adjusted for, such as by statistical adjustment for baseline differences? Logistic regression or other regression methods are often used to account for the influence of variables not of interest.

This is a key issue in cohort studies, because statistical analyses need to control for potential confounders, in contrast to an RCT, where the randomization process controls for potential confounders. All key factors that may be associated both with the exposure of interest and the outcome–that are not of interest to the research question–should be controlled for in the analyses.

For example, in a study of the relationship between cardiorespiratory fitness and CVD events (heart attacks and strokes), the study should control for age, BP, blood cholesterol, and body weight, because all of these factors are associated both with low fitness and with CVD events. Well-done cohort studies control for multiple potential confounders.

Some general guidance for determining the overall quality rating of observational cohort and cross-sectional studies

The questions on the form are designed to help you focus on the key concepts for evaluating the internal validity of a study. They are not intended to create a list that you simply tally up to arrive at a summary judgment of quality.

Internal validity for cohort studies is the extent to which the results reported in the study can truly be attributed to the exposure being evaluated and not to flaws in the design or conduct of the study–in other words, the ability of the study to draw associative conclusions about the effects of the exposures being studied on outcomes. Any such flaws can increase the risk of bias.

Critical appraisal involves considering the risk of potential for selection bias, information bias, measurement bias, or confounding (the mixture of exposures that one cannot tease out from each other). Examples of confounding include co-interventions, differences at baseline in patient characteristics, and other issues throughout the questions above. High risk of bias translates to a rating of poor quality. Low risk of bias translates to a rating of good quality. (Thus, the greater the risk of bias, the lower the quality rating of the study.)

In addition, the more attention in the study design to issues that can help determine whether there is a causal relationship between the exposure and outcome, the higher quality the study. These include exposures occurring prior to outcomes, evaluation of a dose-response gradient, accuracy of measurement of both exposure and outcome, sufficient timeframe to see an effect, and appropriate control for confounding–all concepts reflected in the tool.

Generally, when you evaluate a study, you will not see a "fatal flaw," but you will find some risk of bias. By focusing on the concepts underlying the questions in the quality assessment tool, you should ask yourself about the potential for bias in the study you are critically appraising. For any box where you check "no" you should ask, "What is the potential risk of bias resulting from this flaw in study design or execution?" That is, does this factor cause you to doubt the results that are reported in the study or doubt the ability of the study to accurately assess an association between exposure and outcome?

The best approach is to think about the questions in the tool and how each one tells you something about the potential for bias in a study. The more you familiarize yourself with the key concepts, the more comfortable you will be with critical appraisal. Examples of studies rated good, fair, and poor are useful, but each study must be assessed on its own based on the details that are reported and consideration of the concepts for minimizing bias.

Quality Assessment of Case-Control Studies - Study Quality Assessment Tools

Guidance for Assessing the Quality of Case-Control Studies

The guidance document below is organized by question number from the tool for quality assessment of case-control studies.

Did the authors describe their goal in conducting this research? Is it easy to understand what they were looking to find? This issue is important for any scientific paper of any type. High quality scientific research explicitly defines a research question.

Question 2. Study population

Did the authors describe the group of individuals from which the cases and controls were selected or recruited, while using demographics, location, and time period? If the investigators conducted this study again, would they know exactly who to recruit, from where, and from what time period?

Investigators identify case-control study populations by location, time period, and inclusion criteria for cases (individuals with the disease, condition, or problem) and controls (individuals without the disease, condition, or problem). For example, the population for a study of lung cancer and chemical exposure would be all incident cases of lung cancer diagnosed in patients ages 35 to 79, from January 1, 2003 to December 31, 2008, living in Texas during that entire time period, as well as controls without lung cancer recruited from the same population during the same time period. The population is clearly described as: (1) who (men and women ages 35 to 79 with (cases) and without (controls) incident lung cancer); (2) where (living in Texas); and (3) when (between January 1, 2003 and December 31, 2008).

Other studies may use disease registries or data from cohort studies to identify cases. In these cases, the populations are individuals who live in the area covered by the disease registry or included in a cohort study (i.e., nested case-control or case-cohort). For example, a study of the relationship between vitamin D intake and myocardial infarction might use patients identified via the GRACE registry, a database of heart attack patients.

NHLBI staff encouraged reviewers to examine prior papers on methods (listed in the reference list) to make this assessment, if necessary.

Question 3. Target population and case representation

In order for a study to truly address the research question, the target population–the population from which the study population is drawn and to which study results are believed to apply–should be carefully defined. Some authors may compare characteristics of the study cases to characteristics of cases in the target population, either in text or in a table. When study cases are shown to be representative of cases in the appropriate target population, it increases the likelihood that the study was well-designed per the research question.

However, because these statistics are frequently difficult or impossible to measure, publications should not be penalized if case representation is not shown. For most papers, the response to question 3 will be "NR." Those subquestions are combined because the answer to the second subquestion–case representation–determines the response to this item. However, it cannot be determined without considering the response to the first subquestion. For example, if the answer to the first subquestion is "yes," and the second, "CD," then the response for item 3 is "CD."

Question 4. Sample size justification

Did the authors discuss their reasons for selecting or recruiting the number of individuals included? Did they discuss the statistical power of the study and provide a sample size calculation to ensure that the study is adequately powered to detect an association (if one exists)? This question does not refer to a description of the manner in which different groups were included or excluded using the inclusion/exclusion criteria (e.g., "Final study size was 1,378 participants after exclusion of 461 patients with missing data" is not considered a sample size justification for the purposes of this question).

An article's methods section usually contains information on sample size and the size needed to detect differences in exposures and on statistical power.

Question 5. Groups recruited from the same population

To determine whether cases and controls were recruited from the same population, one can ask hypothetically, "If a control was to develop the outcome of interest (the condition that was used to select cases), would that person have been eligible to become a case?" Case-control studies begin with the selection of the cases (those with the outcome of interest, e.g., lung cancer) and controls (those in whom the outcome is absent). Cases and controls are then evaluated and categorized by their exposure status. For the lung cancer example, cases and controls were recruited from hospitals in a given region. One may reasonably assume that controls in the catchment area for the hospitals, or those already in the hospitals for a different reason, would attend those hospitals if they became a case; therefore, the controls are drawn from the same population as the cases. If the controls were recruited or selected from a different region (e.g., a State other than Texas) or time period (e.g., 1991-2000), then the cases and controls were recruited from different populations, and the answer to this question would be "no."

The following example further explores selection of controls. In a study, eligible cases were men and women, ages 18 to 39, who were diagnosed with atherosclerosis at hospitals in Perth, Australia, between July 1, 2000 and December 31, 2007. Appropriate controls for these cases might be sampled using voter registration information for men and women ages 18 to 39, living in Perth (population-based controls); they also could be sampled from patients without atherosclerosis at the same hospitals (hospital-based controls). As long as the controls are individuals who would have been eligible to be included in the study as cases (if they had been diagnosed with atherosclerosis), then the controls were selected appropriately from the same source population as cases.

In a prospective case-control study, investigators may enroll individuals as cases at the time they are found to have the outcome of interest; the number of cases usually increases as time progresses. At this same time, they may recruit or select controls from the population without the outcome of interest. One way to identify or recruit cases is through a surveillance system. In turn, investigators can select controls from the population covered by that system. This is an example of population-based controls. Investigators also may identify and select cases from a cohort study population and identify controls from outcome-free individuals in the same cohort study. This is known as a nested case-control study.

Question 6. Inclusion and exclusion criteria prespecified and applied uniformly

Were the inclusion and exclusion criteria developed prior to recruitment or selection of the study population? Were the same underlying criteria used for all of the groups involved? To answer this question, reviewers determined if the investigators developed I/E criteria prior to recruitment or selection of the study population and if they used the same underlying criteria for all groups. The investigators should have used the same selection criteria, except for study participants who had the disease or condition, which would be different for cases and controls by definition. Therefore, the investigators use the same age (or age range), gender, race, and other characteristics to select cases and controls. Information on this topic is usually found in a paper's section on the description of the study population.

Question 7. Case and control definitions

For this question, reviewers looked for descriptions of the validity of case and control definitions and processes or tools used to identify study participants as such. Was a specific description of "case" and "control" provided? Is there a discussion of the validity of the case and control definitions and the processes or tools used to identify study participants as such? They determined if the tools or methods were accurate, reliable, and objective. For example, cases might be identified as "adult patients admitted to a VA hospital from January 1, 2000 to December 31, 2009, with an ICD-9 discharge diagnosis code of acute myocardial infarction and at least one of the two confirmatory findings in their medical records: at least 2mm of ST elevation changes in two or more ECG leads and an elevated troponin level. Investigators might also use ICD-9 or CPT codes to identify patients. All cases should be identified using the same methods. Unless the distinction between cases and controls is accurate and reliable, investigators cannot use study results to draw valid conclusions.

Question 8. Random selection of study participants

If a case-control study did not use 100 percent of eligible cases and/or controls (e.g., not all disease-free participants were included as controls), did the authors indicate that random sampling was used to select controls? When it is possible to identify the source population fairly explicitly (e.g., in a nested case-control study, or in a registry-based study), then random sampling of controls is preferred. When investigators used consecutive sampling, which is frequently done for cases in prospective studies, then study participants are not considered randomly selected. In this case, the reviewers would answer "no" to Question 8. However, this would not be considered a fatal flaw.

If investigators included all eligible cases and controls as study participants, then reviewers marked "NA" in the tool. If 100 percent of cases were included (e.g., NA for cases) but only 50 percent of eligible controls, then the response would be "yes" if the controls were randomly selected, and "no" if they were not. If this cannot be determined, the appropriate response is "CD."

Question 9. Concurrent controls

A concurrent control is a control selected at the time another person became a case, usually on the same day. This means that one or more controls are recruited or selected from the population without the outcome of interest at the time a case is diagnosed. Investigators can use this method in both prospective case-control studies and retrospective case-control studies. For example, in a retrospective study of adenocarcinoma of the colon using data from hospital records, if hospital records indicate that Person A was diagnosed with adenocarcinoma of the colon on June 22, 2002, then investigators would select one or more controls from the population of patients without adenocarcinoma of the colon on that same day. This assumes they conducted the study retrospectively, using data from hospital records. The investigators could have also conducted this study using patient records from a cohort study, in which case it would be a nested case-control study.

Investigators can use concurrent controls in the presence or absence of matching and vice versa. A study that uses matching does not necessarily mean that concurrent controls were used.

Question 10. Exposure assessed prior to outcome measurement

Investigators first determine case or control status (based on presence or absence of outcome of interest), and then assess exposure history of the case or control; therefore, reviewers ascertained that the exposure preceded the outcome. For example, if the investigators used tissue samples to determine exposure, did they collect them from patients prior to their diagnosis? If hospital records were used, did investigators verify that the date a patient was exposed (e.g., received medication for atherosclerosis) occurred prior to the date they became a case (e.g., was diagnosed with type 2 diabetes)? For an association between an exposure and an outcome to be considered causal, the exposure must have occurred prior to the outcome.

Question 11. Exposure measures and assessment

Were the exposure measures defined in detail? Were the tools or methods used to measure exposure accurate and reliable–for example, have they been validated or are they objective? This is important, as it influences confidence in the reported exposures. Equally important is whether the exposures were assessed in the same manner within groups and between groups. This question pertains to bias resulting from exposure misclassification (i.e., exposure ascertainment).

For example, a retrospective self-report of dietary salt intake is not as valid and reliable as prospectively using a standardized dietary log plus testing participants' urine for sodium content because participants' retrospective recall of dietary salt intake may be inaccurate and result in misclassification of exposure status. Similarly, BP results from practices that use an established protocol for measuring BP would be considered more valid and reliable than results from practices that did not use standard protocols. A protocol may include using trained BP assessors, standardized equipment (e.g., the same BP device which has been tested and calibrated), and a standardized procedure (e.g., patient is seated for 5 minutes with feet flat on the floor, BP is taken twice in each arm, and all four measurements are averaged).

Question 12. Blinding of exposure assessors

Blinding or masking means that outcome assessors did not know whether participants were exposed or unexposed. To answer this question, reviewers examined articles for evidence that the outcome assessor(s) was masked to the exposure status of the research participants. An outcome assessor, for example, may examine medical records to determine the outcomes that occurred in the exposed and comparison groups. Sometimes the person measuring the exposure is the same person conducting the outcome assessment. In this case, the outcome assessor would most likely not be blinded to exposure status. A reviewer would note such a finding in the comments section of the assessment tool.

One way to ensure good blinding of exposure assessment is to have a separate committee, whose members have no information about the study participants' status as cases or controls, review research participants' records. To help answer the question above, reviewers determined if it was likely that the outcome assessor knew whether the study participant was a case or control. If it was unlikely, then the reviewers marked "no" to Question 12. Outcome assessors who used medical records to assess exposure should not have been directly involved in the study participants' care, since they probably would have known about their patients' conditions. If the medical records contained information on the patient's condition that identified him/her as a case (which is likely), that information would have had to be removed before the exposure assessors reviewed the records.

If blinding was not possible, which sometimes happens, the reviewers marked "NA" in the assessment tool and explained the potential for bias.

Question 13. Statistical analysis

Were key potential confounding variables measured and adjusted for, such as by statistical adjustment for baseline differences? Investigators often use logistic regression or other regression methods to account for the influence of variables not of interest.

This is a key issue in case-controlled studies; statistical analyses need to control for potential confounders, in contrast to RCTs in which the randomization process controls for potential confounders. In the analysis, investigators need to control for all key factors that may be associated with both the exposure of interest and the outcome and are not of interest to the research question.

A study of the relationship between smoking and CVD events illustrates this point. Such a study needs to control for age, gender, and body weight; all are associated with smoking and CVD events. Well-done case-control studies control for multiple potential confounders.

Matching is a technique used to improve study efficiency and control for known confounders. For example, in the study of smoking and CVD events, an investigator might identify cases that have had a heart attack or stroke and then select controls of similar age, gender, and body weight to the cases. For case-control studies, it is important that if matching was performed during the selection or recruitment process, the variables used as matching criteria (e.g., age, gender, race) should be controlled for in the analysis.

General Guidance for Determining the Overall Quality Rating of Case-Controlled Studies

NHLBI designed the questions in the assessment tool to help reviewers focus on the key concepts for evaluating a study's internal validity, not to use as a list from which to add up items to judge a study's quality.

Internal validity for case-control studies is the extent to which the associations between disease and exposure reported in the study can truly be attributed to the exposure being evaluated rather than to flaws in the design or conduct of the study. In other words, what is ability of the study to draw associative conclusions about the effects of the exposures on outcomes? Any such flaws can increase the risk of bias.

In critical appraising a study, the following factors need to be considered: risk of potential for selection bias, information bias, measurement bias, or confounding (the mixture of exposures that one cannot tease out from each other). Examples of confounding include co-interventions, differences at baseline in patient characteristics, and other issues addressed in the questions above. High risk of bias translates to a poor quality rating; low risk of bias translates to a good quality rating. Again, the greater the risk of bias, the lower the quality rating of the study.

In addition, the more attention in the study design to issues that can help determine whether there is a causal relationship between the outcome and the exposure, the higher the quality of the study. These include exposures occurring prior to outcomes, evaluation of a dose-response gradient, accuracy of measurement of both exposure and outcome, sufficient timeframe to see an effect, and appropriate control for confounding–all concepts reflected in the tool.

If a study has a "fatal flaw," then risk of bias is significant; therefore, the study is deemed to be of poor quality. An example of a fatal flaw in case-control studies is a lack of a consistent standard process used to identify cases and controls.

Generally, when reviewers evaluated a study, they did not see a "fatal flaw," but instead found some risk of bias. By focusing on the concepts underlying the questions in the quality assessment tool, reviewers examined the potential for bias in the study. For any box checked "no," reviewers asked, "What is the potential risk of bias resulting from this flaw in study design or execution?" That is, did this factor lead to doubt about the results reported in the study or the ability of the study to accurately assess an association between exposure and outcome?

By examining questions in the assessment tool, reviewers were best able to assess the potential for bias in a study. Specific rules were not useful, as each study had specific nuances. In addition, being familiar with the key concepts helped reviewers assess the studies. Examples of studies rated good, fair, and poor were useful, yet each study had to be assessed on its own.

Quality Assessment Tool for Before-After (Pre-Post) Studies With No Control Group - Study Quality Assessment Tools

Guidance for Assessing the Quality of Before-After (Pre-Post) Studies With No Control Group

Question 1. Study question

Question 2. Eligibility criteria and study population

Did the authors describe the eligibility criteria applied to the individuals from whom the study participants were selected or recruited? In other words, if the investigators were to conduct this study again, would they know whom to recruit, from where, and from what time period?

Here is a sample description of a study population: men over age 40 with type 2 diabetes, who began seeking medical care at Phoenix Good Samaritan Hospital, between January 1, 2005 and December 31, 2007. The population is clearly described as: (1) who (men over age 40 with type 2 diabetes); (2) where (Phoenix Good Samaritan Hospital); and (3) when (between January 1, 2005 and December 31, 2007). Another sample description is women who were in the nursing profession, who were ages 34 to 59 in 1995, had no known CHD, stroke, cancer, hypercholesterolemia, or diabetes, and were recruited from the 11 most populous States, with contact information obtained from State nursing boards.

To assess this question, reviewers examined prior papers on study methods (listed in reference list) when necessary.

Question 3. Study participants representative of clinical populations of interest

The participants in the study should be generally representative of the population in which the intervention will be broadly applied. Studies on small demographic subgroups may raise concerns about how the intervention will affect broader populations of interest. For example, interventions that focus on very young or very old individuals may affect middle-aged adults differently. Similarly, researchers may not be able to extrapolate study results from patients with severe chronic diseases to healthy populations.

Question 4. All eligible participants enrolled

To further explore this question, reviewers may need to ask: Did the investigators develop the I/E criteria prior to recruiting or selecting study participants? Were the same underlying I/E criteria used for all research participants? Were all subjects who met the I/E criteria enrolled in the study?

Question 5. Sample size

Did the authors present their reasons for selecting or recruiting the number of individuals included or analyzed? Did they note or discuss the statistical power of the study? This question addresses whether there was a sufficient sample size to detect an association, if one did exist.

An article's methods section may provide information on the sample size needed to detect a hypothesized difference in outcomes and a discussion on statistical power (such as, the study had 85 percent power to detect a 20 percent increase in the rate of an outcome of interest, with a 2-sided alpha of 0.05). Sometimes estimates of variance and/or estimates of effect size are given, instead of sample size calculations. In any case, if the reviewers determined that the power was sufficient to detect the effects of interest, then they would answer "yes" to Question 5.

Question 6. Intervention clearly described

Another pertinent question regarding interventions is: Was the intervention clearly defined in detail in the study? Did the authors indicate that the intervention was consistently applied to the subjects? Did the research participants have a high level of adherence to the requirements of the intervention? For example, if the investigators assigned a group to 10 mg/day of Drug A, did most participants in this group take the specific dosage of Drug A? Or did a large percentage of participants end up not taking the specific dose of Drug A indicated in the study protocol?

Reviewers ascertained that changes in study outcomes could be attributed to study interventions. If participants received interventions that were not part of the study protocol and could affect the outcomes being assessed, the results could be biased.

Question 7. Outcome measures clearly described, valid, and reliable

Were the outcomes defined in detail? Were the tools or methods for measuring outcomes accurate and reliable–for example, have they been validated or are they objective? This question is important because the answer influences confidence in the validity of study results.

An example of an outcome measure that is objective, accurate, and reliable is death–the outcome measured with more accuracy than any other. But even with a measure as objective as death, differences can exist in the accuracy and reliability of how investigators assessed death. For example, did they base it on an autopsy report, death certificate, death registry, or report from a family member? Another example of a valid study is one whose objective is to determine if dietary fat intake affects blood cholesterol level (cholesterol level being the outcome) and in which the cholesterol level is measured from fasting blood samples that are all sent to the same laboratory. These examples would get a "yes."

An example of a "no" would be self-report by subjects that they had a heart attack, or self-report of how much they weight (if body weight is the outcome of interest).

Question 8. Blinding of outcome assessors

Blinding or masking means that the outcome assessors did not know whether the participants received the intervention or were exposed to the factor under study. To answer the question above, the reviewers examined articles for evidence that the person(s) assessing the outcome(s) was masked to the participants' intervention or exposure status. An outcome assessor, for example, may examine medical records to determine the outcomes that occurred in the exposed and comparison groups. Sometimes the person applying the intervention or measuring the exposure is the same person conducting the outcome assessment. In this case, the outcome assessor would not likely be blinded to the intervention or exposure status. A reviewer would note such a finding in the comments section of the assessment tool.

In assessing this criterion, the reviewers determined whether it was likely that the person(s) conducting the outcome assessment knew the exposure status of the study participants. If not, then blinding was adequate. An example of adequate blinding of the outcome assessors is to create a separate committee whose members were not involved in the care of the patient and had no information about the study participants' exposure status. Using a study protocol, committee members would review copies of participants' medical records, which would be stripped of any potential exposure information or personally identifiable information, for prespecified outcomes.

Question 9. Followup rate

Higher overall followup rates are always desirable to lower followup rates, although higher rates are expected in shorter studies, and lower overall followup rates are often seen in longer studies. Usually an acceptable overall followup rate is considered 80 percent or more of participants whose interventions or exposures were measured at baseline. However, this is a general guideline.

In accounting for those lost to followup, in the analysis, investigators may have imputed values of the outcome for those lost to followup or used other methods. For example, they may carry forward the baseline value or the last observed value of the outcome measure and use these as imputed values for the final outcome measure for research participants lost to followup.

Question 10. Statistical analysis

Were formal statistical tests used to assess the significance of the changes in the outcome measures between the before and after time periods? The reported study results should present values for statistical tests, such as p values, to document the statistical significance (or lack thereof) for the changes in the outcome measures found in the study.

Question 11. Multiple outcome measures

Were the outcome measures for each person measured more than once during the course of the before and after study periods? Multiple measurements with the same result increase confidence that the outcomes were accurately measured.

Question 12. Group-level interventions and individual-level outcome efforts

Group-level interventions are usually not relevant for clinical interventions such as bariatric surgery, in which the interventions are applied at the individual patient level. In those cases, the questions were coded as "NA" in the assessment tool.

General Guidance for Determining the Overall Quality Rating of Before-After Studies

The questions in the quality assessment tool were designed to help reviewers focus on the key concepts for evaluating the internal validity of a study. They are not intended to create a list from which to add up items to judge a study's quality.

Internal validity is the extent to which the outcome results reported in the study can truly be attributed to the intervention or exposure being evaluated, and not to biases, measurement errors, or other confounding factors that may result from flaws in the design or conduct of the study. In other words, what is the ability of the study to draw associative conclusions about the effects of the interventions or exposures on outcomes?

Critical appraisal of a study involves considering the risk of potential for selection bias, information bias, measurement bias, or confounding (the mixture of exposures that one cannot tease out from each other). Examples of confounding include co-interventions, differences at baseline in patient characteristics, and other issues throughout the questions above. High risk of bias translates to a rating of poor quality; low risk of bias translates to a rating of good quality. Again, the greater the risk of bias, the lower the quality rating of the study.

In addition, the more attention in the study design to issues that can help determine if there is a causal relationship between the exposure and outcome, the higher quality the study. These issues include exposures occurring prior to outcomes, evaluation of a dose-response gradient, accuracy of measurement of both exposure and outcome, and sufficient timeframe to see an effect.

Generally, when reviewers evaluate a study, they will not see a "fatal flaw," but instead will find some risk of bias. By focusing on the concepts underlying the questions in the quality assessment tool, reviewers should ask themselves about the potential for bias in the study they are critically appraising. For any box checked "no" reviewers should ask, "What is the potential risk of bias resulting from this flaw in study design or execution?" That is, does this factor lead to doubt about the results reported in the study or doubt about the ability of the study to accurately assess an association between the intervention or exposure and the outcome?

The best approach is to think about the questions in the assessment tool and how each one reveals something about the potential for bias in a study. Specific rules are not useful, as each study has specific nuances. In addition, being familiar with the key concepts will help reviewers be more comfortable with critical appraisal. Examples of studies rated good, fair, and poor are useful, but each study must be assessed on its own.

Quality Assessment Tool for Case Series Studies - Study Quality Assessment Tools

Background: development and use - study quality assessment tools.

Learn more about the development and use of Study Quality Assessment Tools.

Last updated: July, 2021

Health Research Reporting Guidelines, Study Execution Manuals, Critical Appraisal, Risk of Bias, & Non-reporting Biases

  • Health Research Reporting Guidelines
  • Study Execution Manuals

Why Critical Appraisal or Study Execution Assessment?

Systematic review critical appraisal tools, scoping & other review types critical appraisal, randomized clinical trial critical appraisal tools, quasi-experimental (non-randomised) trials/studies critical appraisal tools, observational studies critical appraisal tools, diagnostic studies critical appraisal tools, prognosis/prediction critical appraisal tools, economic evaluations critical appraisal tools, qualitative studies critical appraisal tools, case reports & case series critical appraisal tools, statistical methodology assessment, all critical appraisal tools in alphabetical order.

  • Risk of Bias
  • Reporting Biases
  • Quality of Evidence
  • HSLS Systematic Review LibGuide

For additional information, contact:

Helena VonVille, MLS, MPH

  • HSLS Pitt Public Health Liaison 
  • HSLS Research & Instruction Librarian
  • Email me: [email protected]
  • Schedule an appointment with me

What is critical appraisal?

Definitions for critical appraisal evolve around a similar set of criteria:

What is Critical Appraisal (Bandolier)

  • "Critical appraisal is the process of carefully and systematically examining research to judge its trustworthiness, and its value and relevance in a particular context."

Glossary definition (Critical Appraisal Skills Program (CASP)

  • "Critical Appraisal is the process of assessing and interpreting evidence, by systematically considering its  validity , results and relevance to your own context."

Critical Appraisal Tools (Centre for Evidence-Based Medicine (CEBM))

  • Does this study address a  clearly focused question ?
  • Did the study use valid methods to address this question?
  • Are the valid results of this study important?
  • Are these valid, important results applicable to my patient or population?

While critical appraisal can highlight bias in a study, the current version of the  Cochrane Handbook points out:

" Methodological quality refers to critical appraisal of a study or systematic review and the extent to which study authors conducted and reported their research to the highest possible standard. Bias refers to systematic deviation of results or inferences from the truth. These deviations can occur as a result of flaws in design, conduct, analysis, and/or reporting. It is not always possible to know whether an estimate is biased even if there is a flaw in the study; further, it is difficult to quantify and at times to predict the direction of bias. For these reasons, reviewers refer to ‘risk of bias’ (Chapter 8)." Chapter V: Overviews of Reviews  (below the table).

A separate page has been created for Risk of Bias .

Why critical appraisal?

Critical appraisal and, more specifically, critical appraisal tools provide us with a mechanism to evaluate the research methodology of a study with a critical, objective, and systematic lens. This appraisal is essential when evaluating a study for a systematic review, for determining new guidelines for patient care, or for choosing appropriate interventions. 

Uses for Critical Appraisal Tools

CA tools can be used in multiple ways and in different settings.

  • Use the appropriate critical appraisal tool (and reporting guideline) to ensure you engage in good study conduct and are prepared to practice clear and transparent reporting.
  • Use the appropriate critical appraisal tool (and reporting guideline) to ensure the author(s) engaged in good study conduct as well as clear and transparent reporting.
  • Use the appropriate critical appraisal tool (and reporting guideline) as a framework for evaluating manuscripts and providing feedback in a clear and objective manner.
  • Critical appraisal of included studies is a necessity when conducting a systematic review, even when the studies are non-randomized or observational. 

Checklist for Systematic Reviews and Research Syntheses

Produced by:  Joanna Briggs Institute Part of:  The JBI Critical Appraisal Tools collection

  • The checklist is a Word document.
  • See  Appendix 3.2 for a discussion of JBI Qualitative critical appraisal criteria

AMSTAR 2: A MeaSurement Tool to Assess systematic Reviews  

There were 2 goals in the development of AMSTAR:

  • To create valid, reliable and useable instruments that would help users differentiate between systematic reviews, focusing on their methodological quality and expert consensus.
  • To facilitate the development of high-quality reviews.

Critical Appraisal Tools

Produced by:  Centre for Evidence Based Medicine, University of Oxford, UK Part of:  The  CEBM Critical Appraisal Tools  collection

  • Systematic Reviews Critical Appraisal Sheet
  • Chapter 26 of the Cochrane Handbook describes an IPD review
  • The appraisal worksheets are in PDF format.
  • The appraisal worksheets are in English as well as Chinese, German, Lithuanian, Persian, and Spanish; languages other than English can be found on the home page.

CASP Checklists

Produced by:   Critical Appraisal Skills Programme, UK

CASP  Systematic Review  Checklist

  • Print & Fill

About:  Heise, T.L., Seidler, A., Girbig, M. et al. CAT HPPR: a critical appraisal tool to assess the quality of systematic, rapid, and scoping reviews investigating interventions in health promotion and prevention. BMC Med Res Methodol 22, 334 (2022). https://doi.org/10.1186/s12874-022-01821-4

Supplementary information:

  • File 1 Includes the checklist and user manual
  • File 2 includes methodological supporting documentation

Randomized Controlled Trials

  • See  Appendix 4.2 for a discussion of the JBI appraisal criteria for randomized controlled trials

Study Quality Assessment Tools

Produced by:  US National Heart, Lung, and Blood Institute (NHLBI)

This site has 6 assessment tools covering controlled interventions, SRs/MAs, observational cohort studies, cross-sectional studies, case control studies, pre-post studies (no control group), and case series.

CASP has 8 critical appraisal tools for SRs, RCTs, cohort studies, case control studies, economic evaluations, diagnostic studies, qualitative studies, and clinical prediction. Each item in the individual checklists provides a series of questions.

Randomised Controlled Trials (RCT) Critical Appraisal Worksheet

Checklist for quasi-experimental studies (non-randomized experimental studies).

This site has an assessment tool for pre-post studies (no control group).

Methodological index for non-randomized studies (MINORS): development and validation of a new instrument (requires U Pitt authentication)  

  • "Background: Because of specific methodological difficulties in conducting randomized trials, surgical research remains dependent predominantly on observational or non‐randomized studies. Few validated instruments are available to determine the methodological quality of such studies either from the reader's perspective or for the purpose of meta‐analysis. The aim of the present study was to develop and validate such an instrument."

Observational studies

Checklist for analytical cross sectional studies, checklist for case control studies, checklist for cohort studies, checklist for prevalence studies.

  • Each checklist is a Word document.
  • This site has assessment tools for: observational cohort studies, cross-sectional studies, and case control studies.

Produced by:  Critical Appraisal Skills Programme, UK

  • CASP has a critical appraisal tool for cohort studies and case control studies. Each item in the individual checklists provides a series of questions.

Newcastle-Ottawa Scale: Case Control Studies & Cohort Studies

Produced by:  University of Newcastle, Australia and University of Ottawa, Canada

  • Instructions for both case control and cohort studies
  • Instruments for both case control and cohort studies

Diagnostic Test Accuracy Studies

CASP has a critical appraisal tool for diagnostic studies. Each item in the individual checklists provides a series of questions.

Diagnostics Critical Appraisal Sheet

Charms: critical appraisal and data extraction for systematic reviews of prediction modelling studies.

Moons KG, de Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, Reitsma JB, Collins GS. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014 Oct 14;11(10):e1001744. doi: 10.1371/journal.pmed.1001744 . PMID: 25314315 ; PMCID: PMC4196729 .

CHARMS-PF: Checklist for critical appraisal and data extraction for systematic reviews of prognostic factors studies

Found in:  Riley RD, Moons KGM, Snell KIE, Ensor J, Hooft L, Altman DG, Hayden J, Collins GS, Debray TPA. A guide to systematic review and meta-analysis of prognostic factor studies. BMJ. 2019 Jan 30;364:k4597. doi: 10.1136/bmj.k4597 . PMID: 30700442 .

CASP has 8 critical appraisal tools including one for clinical prediction. Each item in the individual checklists provides a series of questions.

Prognosis Critical Appraisal Sheet

Checklist for economic evaluations.

CASP has a critical appraisal tool for economic evaluations. Each item in the individual checklists provides a series of questions.

Checklist for Qualitative Research

  • See Appendix 3.2 for a discussion of the JBI Qualitative Critical Appraisal Checklist.
  • The checklist is in PDF format
  • CASP has a critical appraisal tool for qualitative studies. Each item in the individual checklists provides a series of questions.

Critical Appraisal of Qualitative Studies Sheet 

Evaluation tool for mixed methods studies.

Produced by:  A. Long, U of Leeds

The ‘mixed method’ evaluation tool was developed from the evaluation tools for ‘quantitative’ and ‘qualitative’ studies, themselves created within the context of a project exploring the feasibility of undertaking systematic reviews of research literature on effectiveness and outcomes in social care.

Checklist for Case Reports  &  Checklist for Case Series

  • The checklists are Word documents.

CHAMP (Checklist for statistical Assessment of Medical Papers)  (2021)

About:  "While CHAMP is primarily aimed at editors and peer reviewers during the statistical assessment of a medical paper, we believe it will serve as a useful reference to improve authors' and readers' practice in their use of statistics in medical research."

  • CHecklist for statistical Assessment of Medical Papers: the CHAMP statement
  • A CHecklist for statistical Assessment of Medical Papers (the CHAMP statement): explanation and elaboration

A tool to assess the quality of a meta-analysis (2013) (Not open access)

Higgins JP, Lane PW, Anagnostelis B, Anzures-Cabrera J, Baker NF, Cappelleri JC, Haughie S, Hollis S, Lewis SC, Moneuse P, Whitehead A. A tool to assess the quality of a meta-analysis. Res Synth Methods. 2013 Dec;4(4):351-66. doi: 10.1002/jrsm.1092. Epub 2013 Oct 18. PMID: 26053948 . Not open access

CHAMP (Checklist for statistical Assessment of Medical Papers) (2021)

"While CHAMP is primarily aimed at editors and peer reviewers during the statistical assessment of a medical paper, we believe it will serve as a useful reference to improve authors' and readers' practice in their use of statistics in medical research."

Produced by:  Joanna Briggs Institute

The link above points to all of the Critical Appraisal Tools from JBI. All are in the format of a Word document.

Produced by:  Centre for Evidence Based Medicine, Oxford, UK

The appraisal worksheets are in English as well as Chinese, German, Lithuanian, Persian, and Spanish; languages can be found on the home page.

  • Systematic Reviews  Critical Appraisal Sheet
  • Diagnostics  Critical Appraisal Sheet
  • Prognosis  Critical Appraisal Sheet
  • Randomised Controlled Trials  (RCT) Critical Appraisal Sheet
  • Critical Appraisal of Qualitative Studies  Sheet
  • Chapter 26  of the Cochrane Handbook describes an IPD review.

Downs & Black: The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions

Items from the Downs & Black checklist can be found in this article. 

Newcastle-Ottawa Scale

Produced by:   University of Newcastle, Australia and University of Ottawa, Canada

NOS was developed to assess the quality of nonrandomised studies with its design, content and ease of use directed to the task of incorporating the quality assessments in the interpretation of meta-analytic results. A 'star system' has been developed in which a study is judged on three broad perspectives: the selection of the study groups; the comparability of the groups; and the ascertainment of either the exposure or outcome of interest for case-control or cohort studies respectively.

Additional Links

  • The link goes to a PDF of the article and the QI checklist which can be found in the supplement.
  • The aim of this WIKI is to enable collaborative work for developing a Mixed Methods Appraisal Tool (MMAT).
  • The MMAT is intended to be used as a checklist for concomitantly appraising and/or describing studies included in systematic mixed studies reviews (reviews including original qualitative, quantitative and mixed methods studies).

themselves created within the context of a project exploring the feasibility of undertaking systematic reviews of research literature on effectiveness and outcomes in social care.

  • The MetaQAT is a meta-tool for appraising public health evidence that orients users to the appropriate application of several critical appraisal tools and places them within a larger framework to guide their use. It is one of few critical appraisal tools designed specifically for public health evidence.

The methodological quality assessment tools for preclinical and clinical studies, systematic review and meta-analysis, and clinical practice guideline: a systematic review

  • This study, from 2015, evaluated 21 assessment (critical appraisal) tools and provides recommendations for which tool to use, depending on the study methodology.
  • Note that several of the instruments evaluated have undergone some modifications since 2015.
  • << Previous: Study Execution Manuals
  • Next: Risk of Bias >>
  • Last Updated: May 23, 2024 2:33 PM
  • URL: https://hsls.libguides.com/reporting-study-tools

Banner

Critical Appraisal : Critical appraisal full list of checklists and tools

  • What is critical appraisal?
  • Where to start
  • Education and childhood studies
  • Occupational Therapy
  • Physiotherapy
  • Interpreting statistics
  • Further reading and resources

Which checklist or tool should I use?

There are hundreds of critical appraisal checklists and tools you can choose from, which can be very overwhelming. There are so many because there are many kinds of research, knowledge can be communicated in a wide range of ways, and whether something is appropriate to meet your information needs depends on your specific context. 

We have asked for recommendations from lecturers in different academic departments, to give you an idea about which checklists and tools may be the most relevant for you. Please hover over the drop-down menu at the top of the page, underneath 'Critical appraisal checklists and tools' to view the individual subject pages.

Below are lists of as many critical appraisal tools and checklists as we have been able to find. These are split into health sciences and social sciences because the two areas tend to take different approaches to evaluation, for various reasons!

To see a selection of checklists more suitable for your subject, hover over the top tab of this page.  

Critical appraisal checklists and tools for Health Sciences

  • AACODS  Checklist for appraising grey literature
  • AMSTAR 2  critical appraisal tool for systematic reviews that include randomised and non-randomised studies of healthcare interventions or both
  • AOTA Critically Appraised Papers  American Occupational Therapy Association 
  • Bandolier - "Evidence based thinking about healthcare"
  • BestBETS critical appraisal worksheet
  • BMJ critical appraisal checklists
  • CASP  Critical Appraisal Skills Programme includes checklists for case control studies, clinical prediction rule, cohort studies, diagnostic studies, economic evaluation, qualitative studies, RCTs and systematic reviews
  • Centre for Evidence Based Medicine (Oxford) Critical Appraisal Tools  CEBM's worksheets to assess systematic reviews, diagnostic, prognosis, and RCTs
  • Centre for Evidence Based Medicine (Oxford) CATmaker and EBM calculator  CEBM's computer assisted critical appraisal tool CATmaker 
  • CEMB critical appraisal sheets  (Centre for Evidence Based Medicine)
  • Cochrane Assessing Risk of Bias in a Randomized Trial
  • Critical appraisal: a checklist from Students for Best Evidence S4BE (student network with simple explanations of difficult concepts)
  • Critical appraisal and statistical skills (Knowledge for Healthcare)
  • Critical appraisal of clinical trials  from Testing Treatments International
  • Critical appraisal of clinical trials (Medicines Learning Portal)
  • Critical appraisal of quantitative research  
  • Critical appraisal of a quantitative paper  from Teeside University
  • Critical appraisal of a qualitative paper  from Teeside University
  • Critical appraisal tools  from the Centre for Evidence-Based Medicine
  • Critical Evaluation of Research Papers – Qualitative Studies from Teeside University
  • Critical Evaluation of Research Papers – RCTs/Experimental Studies from Teeside University
  • Evaluation tool for mixed methods study designs 
  • GRADE - The Grading of Recommendations Assessment, Development and Evaluation working group  guidelines and publications for grading the quality of evidence in healthcare research and policy
  • HCPRDU Evaluation Tool for Mixed Methods Studies  - University of Salford Health Care Practice R&D Unit 
  • HCPRDU Evaluation Tool for Qualitative Studies  - University of Salford Health Care Practice R&D Unit 
  • HCPRDU Evaluation Tool for Quantitative Studies  - University of Salford Health Care Practice R&D Unit 
  • JBI Joanna Briggs Institute critical appraisal tools  checklists for Analytical cross sectional studies, case control studies, case reports, case series, cohort studies, diagnostic test accuracy, economic evaluations, prevalence studies, qualitative research, quasi-experimental (non-randomised) studies, RCTs, systematic reviews and for text and opinion  
  • Knowledge Translation Program  - Toronto based KTP critical appraisal worksheets for systematic reviews, prognosis, diagnosis, harm and therapy
  • MATT Mixed Methods Appraisal Tool 
  • McMaster University Evidence Based Practice Research Group quantitative and qualitative review forms
  • NHLBI (National Heart, Blood and lung Institute) study quality assessment tools for case control studies, case series, controlled intervention, observational cohort and cross sectional studies, before-after (pre-post) studies with no control group, systematic reviews and meta analyses 
  • NICE Guidelines, The Manual Appendix H. pp9-24
  • QUADAS-2  tool for evaluating risk of bias in systematic reviews from the University of Bristol
  • PEDro  PEDro (Physiotherapy Evidence Database) Scale - appraisal resources including a tutorial and appraisal tool
  • RoB 2   A revised Cochrane risk-of-bias tool for randomized trials
  • ROBINS-I Risk Of Bias In Non-Randomized Studies of Interventions 
  • ROBIS  Risk of Bias in Systematic Reviews
  • ROB-ME   A tool for assessing Risk Of Bias due to Missing Evidence in a synthesis
  • SIGN  - Critical appraisal notes and checklists for case control studies, cohort studies, diagnostic studies, economic studies, RCTs, meta-analyses and systematic reviews
  • Strength of Recommendation Taxonomy  - the SORT scale for quality, quantity and consistency of evidence in individual studies or bodies of evidence
  • STROBE (Strengthening the Reporting of Observational studies in Epidemiology)  for cohort, case-control, and cross-sectional studies (combined),  cohort, case-control, cross-sectional studies and conference abstracts
  • SURE Case Controlled Studies Critical Appraisal checklist
  • SURE Case Series Studies Critical Appraisal checklist
  • SURE Cohort Studies Critical Appraisal checklist
  • SURE Cross-sectional Studies Critical Appraisal checklist
  • SURE Experimental Studies Critical Appraisal checklist
  • SURE Qualitative Studies Critical Appraisal checklist
  • SURE Systematic Review Critical Appraisal checklist

Critical appraisal checklists and tools for Social Sciences

  • AACODS   Checklist for appraising grey literature
  • CRAAP test to evaluate sources of information 
  • Critical Appraisal of an Article on an Educational Intervention  (variable study design) from the University of Glasgow
  • Educational Interventions Critical Appraisal worksheet  from BestBETs
  • PROMPT  from Open University
  • PROVEN  - tool to evaluate any source of information 

SIFT (The Four Moves)  to help students distinguish between truth and fake news 

Some Guidelines for the Critical Reviewing of Conceptual Papers

  • << Previous: Where to start
  • Next: Business >>
  • Last Updated: May 21, 2024 2:14 PM
  • URL: https://libguides.qmu.ac.uk/critical-appraisal
  • Open access
  • Published: 24 May 2024

Integration of case-based learning and three-dimensional printing for tetralogy of fallot instruction in clinical medical undergraduates: a randomized controlled trial

  • Jian Zhao 1   na1 ,
  • Xin Gong 1   na1 ,
  • Jian Ding 1 ,
  • Kepin Xiong 2 ,
  • Kangle Zhuang 3 ,
  • Rui Huang 1 ,
  • Shu Li 4 &
  • Huachun Miao 1  

BMC Medical Education volume  24 , Article number:  571 ( 2024 ) Cite this article

164 Accesses

Metrics details

Case-based learning (CBL) methods have gained prominence in medical education, proving especially effective for preclinical training in undergraduate medical education. Tetralogy of Fallot (TOF) is a congenital heart disease characterized by four malformations, presenting a challenge in medical education due to the complexity of its anatomical pathology. Three-dimensional printing (3DP), generating physical replicas from data, offers a valuable tool for illustrating intricate anatomical structures and spatial relationships in the classroom. This study explores the integration of 3DP with CBL teaching for clinical medical undergraduates.

Sixty senior clinical medical undergraduates were randomly assigned to the CBL group and the CBL-3DP group. Computed tomography imaging data from a typical TOF case were exported, processed, and utilized to create four TOF models with a color 3D printer. The CBL group employed CBL teaching methods, while the CBL-3DP group combined CBL with 3D-printed models. Post-class exams and questionnaires assessed the teaching effectiveness of both groups.

The CBL-3DP group exhibited improved performance in post-class examinations, particularly in pathological anatomy and TOF imaging data analysis ( P  < 0.05). Questionnaire responses from the CBL-3DP group indicated enhanced satisfaction with teaching mode, promotion of diagnostic skills, bolstering of self-assurance in managing TOF cases, and cultivation of critical thinking and clinical reasoning abilities ( P  < 0.05). These findings underscore the potential of 3D printed models to augment the effectiveness of CBL, aiding students in mastering instructional content and bolstering their interest and self-confidence in learning.

The fusion of CBL with 3D printing models is feasible and effective in TOF instruction to clinical medical undergraduates, and worthy of popularization and application in medical education, especially for courses involving intricate anatomical components.

Peer Review reports

Tetralogy of Fallot (TOF) is the most common cyanotic congenital heart disease(CHD) [ 1 ]. Characterized by four structural anomalies: ventricular septal defect (VSD), pulmonary stenosis (PS), right ventricular hypertrophy (RVH), and overriding aorta (OA), TOF is a focal point and challenge in medical education. Understanding anatomical spatial structures is pivotal for learning and mastering TOF [ 2 ]. Given the constraints of course duration, medical school educators aim to provide students with a comprehensive and intuitive understanding of the disease within a limited timeframe [ 3 ].

The case-based learning (CBL) teaching model incorporates a case-based instructional approach that emphasizes typical clinical cases as a guide in student-centered and teacher-facilitated group discussions [ 4 ]. The CBL instructional methods have garnered widespread attention in medical education as they are particularly appropriate for preclinical training in undergraduate medical education [ 5 , 6 ]. The collection of case data, including medical records and examination results, is essential for case construction [ 7 ]. The anatomical and hemodynamic consequences of TOF can be determined using ultrasonography, computed tomography (CT), and magnetic resonance imaging techniques. However, understanding the anatomical structures from imaging data is a slow and challenging psychological reconstruction process for undergraduate medical students [ 8 ]. Three-dimensional (3D) visualization is valuable for depicting anatomical structures [ 9 ]. 3D printing (3DP), which creates physical replicas based on data, facilitates the demonstration of complex anatomical structures and spatial relationships in the classroom [ 10 ].

During the classroom session, 3D-printed models offer a convenient means for hands-on demonstration and communication, similar to facing a patient, enhancing the efficiency and specificity of intra-team communication and discussion [ 11 ]. In this study, we printed TOF models based on case imaging data, integrated them into CBL teaching, and assessed the effectiveness of classroom instruction.

Research participants

The study employed a prospective, randomized controlled design which received approval from the institutional ethics committee. Senior undergraduate students majoring in clinical medicine at Wannan Medical College were recruited for participation based on predefined inclusion criteria. The researchers implemented recruitment according to the recruitment criteria by contacting the class leaders of the target classes they had previously taught. Notably, these students were in their third year of medical education, with anticipation of progressing to clinical courses in the fourth year, encompassing Internal Medicine, Surgery, Obstetrics, Gynecology, and Pediatrics. Inclusion criteria for participants encompassed the following: (1) proficient communication and comprehension abilities, (2) consistent attendance without absenteeism or truancy, (3) absence of failing grades in prior examinations, and (4) capability to conscientiously fulfill assigned learning tasks. Exclusion criteria were (1) absence from lectures, (2) failure to complete pre-and post-tests, and (3) inadequate completion of questionnaires. For their participation in the study, Students were provided access to the e-book “Localized Anatomy,” authored by the investigators, as an incentive for their participation. Voluntary and anonymous participation was emphasized, with participants retaining the right to withdraw from the study at any time without providing a reason.

The study was conducted between May 1st, 2023, and June 30, 2023, from recruitment to completion of data collection. Drawing upon insights gained from a previous analogous investigation which yielded an effect size of 0.95 [ 10 ]. Sample size was computed, guided by a statistical consultant, with the aim of 0.85 power value, predicated on an effect size of 0.8 and a margin of error set at 0.05. A minimum of 30 participants per group was calculated using G*Power software (latest ver. 3.1.9.7; Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany), resulting in the recruitment of a total of 60 undergraduate students. Each participant was assigned an identification number, with codes placed in boxes. Codes drawn from the boxes determined allocation to either the CBL group or the CBL-3DP group. Subsequently, participants were randomly assigned to either the CBL group, receiving instruction utilizing the CBL methodology, or the CBL-3DP group, which received instruction integrating both CBL and 3D Printed models.

Printing of TOF models

Figure  1 A shows the printing flowchart of the TOF models. A typical TOF case was collected from the Yijishan Hospital of Wannan Medical College. The CT angiography imaging data of the case was exported. Mimics Research 20.0 software (Mimics Innovation Suite version 20, Materialize, Belgium) was used for data processing. The cardiovascular module of the CT-Heart tool was employed to adjust the threshold range, independently obtain the cardiac chambers and vessels, post-process the chambers and vessels to generate a hollow blood pool, and merge it with the myocardial volume to construct a complete heart model. The file was imported into Magics 24.0 software (version 24.0; Materialize, Belgium) for correction using the Shell tool page. After repairs, the model entered the smoothing page, where tools such as triangular surface simplification, local smoothing, refinement and smoothing, subdivision of components, and mesh painting were utilized to achieve varying degrees of smoothness. Finally, optimized data were obtained and exported as stereolithography (STL) files. An experienced cardiothoracic surgeon validated the anatomical accuracy of the digital model.

The STL files were imported into a 3D printer (J401Pro; Sailner 3D Technology, China) for model printing. This printer can produce full-color medical models using different materials. The models were fabricated using two distinct materials: rigid and flexible. Both materials are suitable for the observational discussion of the teaching objectives outlined in our study. From the perspective of observing pathological changes in the TOF, there is no significant difference between the two materials.

figure 1

Experimental flow chart of this study. A TOF model printing flow chart. B The instructional framework

Teaching implementation

Figure  1 B illustrates the instructional framework employed in this study. One week preceding the class session, all the students were tasked with a 30-minute self-study session, focusing on the theoretical content related to TOF as outlined in the Pediatrics and Surgery textbooks, along with a review of pertinent academic literature. Both groups received co-supervision from two basic medicine lecturers boasting over a decade of teaching experience, alongside a senior cardiothoracic surgeon. Teaching conditions remained consistent across groups, encompassing uniform assessment criteria and adherence to predefined teaching time frames, all conducted in a Project-Based Learning (PBL) classroom at Wannan Medical College. Additionally, a pre-course examination was administered to gauge students’ preparedness for self-study.

In adherence to the curriculum guidelines, the teaching objectives aimed to empower students to master TOF’s clinical manifestations, diagnostic modalities, and differential diagnoses, while acquainting them with treatment principles and surgical methodologies. Additionally, the objectives sought to cultivate students’ clinical reasoning abilities and problem-solving skills. the duration of instruction for the TOF theory session was standardized to 25 min. The didactic content was integrated with the TOF case study to construct a coherent pedagogical structure.

During the instructional session, both groups underwent teaching utilizing the CBL methodology. Clinical manifestations and case details of TOF cases were presented to stimulate students’ interest and curiosity. Subsequently, the theory of TOF, including its etiology, pathogenesis, pathologic anatomy, clinical manifestations, diagnostic methods, and therapeutic interventions, was briefly elucidated. Emphasis was then placed on the case, wherein selected typical TOF cases were explained, guiding students in analysis and discussion. Students were organized into four teams under the instructors’ supervision, fostering cooperative learning and communication, thereby deepening their understanding of the disease through continuous inquiry and exploration (Fig.  2 L). In the routinely equipped PBL classroom with standard heart models (Fig.  2 J, K), all students had prior exposure to human anatomy and were familiar with these models. Both groups were provided with four standard heart models for reference, while the CBL-3DP group received additional four 3D-printed models depicting TOF anomalies, enriching their learning experience (Fig.  2 D, G). After the lesson, summarization, and feedback sessions were conducted to consolidate group discussions’ outcomes, evaluate teaching effectiveness, and assess learning outcomes.

figure 2

Heart models utilized in instructional sessions. A External perspective of 3D digital models. B, C Cross-sectional views following trans-septal sagittal dissection of the 3D digital model (PS: Pulmonary Stenosis; OA: Overriding Aorta; VSD: Ventricular Septal Defect; RVH: Right Ventricular Hypertrophy). D External depiction of rigid 3D printed model. E, F Sagittal sections of the rigid 3D printed model. G External portrayal of flexible 3D printed model. H, I Sagittal sections of the flexible 3D printed model. J, K The normal heart model employed in the instruction of the CBL group. L Ongoing classroom session

Teaching effectiveness assessment

Following the instructional session, participants from the two groups underwent a theoretical examination to assess their comprehension of the taught material. This assessment covered domains such as pathological anatomy, clinical manifestations, imaging data interpretation, diagnosis, and treatment relevant to TOF. Additionally, structured questionnaires were administered to evaluate the efficacy of the pedagogical approach employed. The questionnaire consisted of six questions designed to gauge participants’ understanding of the teaching content, enhancement of diagnostic skills, cultivation of critical thinking and clinical reasoning abilities, bolstering of confidence in managing TOF cases, satisfaction with the teaching mode, and satisfaction with the CBL methodology.

The questionnaire employed a 5-point Likert scale to gauge responses, with 5 indicating “strongly satisfied/agree,” 4 for “satisfied/agree,” 3 denoting “neutral,” 2 reflecting “dissatisfied/disagree,” and 1 indicating “strongly dissatisfied/disagree.” It comprised six questions, with the initial two probing participants’ knowledge acquisition, questions 3 and 4 exploring satisfaction regarding enhanced competence, and the final two assessing satisfaction with teaching methods and modes. Additionally, participants were encouraged to provide suggestions at the end of the questionnaire. To ensure the questionnaire’s validity, five esteemed lecturers in basic medical sciences with more than 10 years of experience verified its content and assessed its Content Validity Ratio and Content Validity Index to ensure alignment with the study’s objectives.

Statistical analysis

Statistical analyses were conducted utilizing GraphPad Prism 9.0 software. Aggregate score data for both groups were presented as mean ± standard deviation (x ± s). The gender comparisons were analyzed with the chi-square (χ2) test, while the other variables were compared using the Mann-Whitney U test. The threshold for determining statistical significance was set at P  < 0.05.

Three-dimensional printing models

After configuring the structural colors of each component (Fig.  2 A, B, C), we printed four color TOF models using both rigid and flexible materials, resulting in four life-sized TOF models. Two color TOF models were created using rigid materials (Fig.  2 D, E, F). These models, exhibiting resistance to deformation, and with a firm texture, smooth and glossy surface, and good transparency, allowing visibility of the internal structures, were deemed conducive to teaching and observation. We also fabricated two color TOF models using flexible materials (Fig.  2 G, H, I), characterized by soft texture, opacity, and deformability, allowing for easy manipulation and cutting. It has potential utility beyond observational purposes. It can serve as a valuable tool for simulating surgical interventions and may be employed to create tomographic anatomical specimens. In this study, both material models were suitable for observation in the classroom. The participants were able to discern the four pathological changes characteristic of TOF from surface examination or cross-sectional analysis.

Baseline characteristics of the students

In total, 60 students were included in this study. The CBL group comprised 30 students (14 males and 16 females), with an average age of (21.20 ± 0.76) years. The CBL-3DP group consisted of 30 students (17 males and 13 females) with an average age of 20.96 years. All the students completed the study procedures. There were no significant differences in age, sex ratio, or pre-class exam scores between the two groups ( P  > 0.05), indicating that the baseline scores between the two groups were comparable (Table  1 ).

Theoretical examination results

All students completed the research procedures as planned. The post-class theoretical examination encompassed assessment of pathological anatomy, clinical presentations, imaging data interpretation, diagnosis, and treatment pertinent to TOF. Notably, no statistically significant disparities were observed in the scores on clinical manifestations, diagnosis and treatment components between the cohorts, as delineated in Table  2 . Conversely, discernible distinctions were evident whereby the CBL-3DP group outperformed the CBL group notably in pathological anatomy, imaging data interpretation, and overall aggregate scores ( P  < 0.05).

Results of the questionnaires

All the 60 participants submitted the questionnaire. Comparing the CBL and CBL-3DP groups, the scores from the CBL-3DP group showed significant improvements in many areas. This included satisfaction with the teaching mode, promotion of diagnostic skills, bolstering of self-assurance in managing TOF cases, and cultivation of critical thinking and clinical reasoning abilities (Fig.  3 B, C, D, E). All of which improved significantly ( P  < 0.05 for the first aspects and P  < 0.01 for the rest). However, the two groups were not comparable ( P  > 0.05) in terms of understanding of the teaching content and Satisfaction with the CBL methodology (Fig.  3 A, F).

Upon completion of the questionnaires, participants were invited to proffer recommendations. Notably, in the CBL group, seven students expressed challenges in comprehending TOF and indicated a need for additional time for consolidation to enhance understanding. Conversely, within the CBL-3DP group, twelve students advocated for the augmentation of model repertoire and the expansion of disease-related data collection to bolster pedagogical efficacy across other didactic domains.

figure 3

Five-point Likert scores of students’ attitudes in CBL ( n  = 30) and CBL-3DP ( n  = 30) groups. A Understanding of teaching content. B Promotion of diagnostic skills. C Cultivation of critical thinking and clinical reasoning abilities. D Bolstering of self-assurance in managing TOF cases. E Satisfaction with the teaching mode. F Satisfaction with the CBL methodology. ns No significant difference, * p  < 0.05, ** p  < 0.01, *** p  < 0.001

TOF presents a significant challenge in clinical practice, necessitating a comprehensive understanding for effective diagnosis and treatment [ 12 ]. Traditional teaching methods in medical schools have relied on conventional resources such as textbooks, 2D illustrations, cadaver dissections, and radiographic materials to impart knowledge about complex conditions like TOF [ 13 ]. However, the limitations of these methods in fully engaging students and bridging the gap between theoretical knowledge and practical application have prompted a need for innovative instructional approaches.

CBL has emerged as a valuable tool in medical education, offering students opportunities to engage with authentic clinical cases through group discussions and inquiry-based learning [ 14 ]. By actively involving students in problem-solving and decision-making processes, CBL facilitates the application of theoretical knowledge to real-world scenarios, thus better-preparing students for future clinical practice [ 15 ]. Our investigation revealed that both groups of students exhibited comparable levels of satisfaction with the CBL methodology, devoid of discernible disparities.

CHD presents a formidable challenge due to the intricate nature of anatomical anomalies, the diverse spectrum of conditions, and individual variations [ 16 ]. Utilizing 3D-printed physical models, derived from patient imaging data, can significantly enhance comprehension of complex anatomical structures [ 17 ]. These models have proven invaluable in guiding surgical planning, providing training for junior or inexperienced pediatric residents, and educating healthcare professionals and parents of patients [ 18 ]. Studies indicate that as much as 50% of pediatric surgical decisions can be influenced by the insights gained from 3D printed models [ 19 ]. By providing tangible, anatomically accurate models, 3D printing offers a unique opportunity for people to visualize complex structures and enhance their understanding of anatomical intricacies. Our study utilized full-color, to-scale 3D printed models to illustrate the structural abnormalities associated with TOF, thereby enriching classroom sessions and facilitating a deeper comprehension of the condition.

Comparative analysis between the CBL-3DP group and the CBL group revealed significant improvements in post-test performance, particularly in pathological anatomy and imaging data interpretation. Additionally, questionnaire responses indicated higher levels of satisfaction and confidence among students in the CBL-3DP group, highlighting the positive impact of incorporating 3D printed models into the learning environment, improving the effectiveness of CBL classroom instruction. Despite the merits, our study has limitations. Primarily, participants were exclusively drawn from the same grade level within a single college, possibly engendering bias owing to shared learning backgrounds. Future research could further strengthen these findings by expanding the sample size and including long-term follow-up to assess the retention of knowledge and skills. Additionally, the influence of the 3D models depicting a normal heart on the learning process and its potential to introduce bias into the results warrants consideration, highlighting a need for scrutiny in subsequent studies.

As medical science continues to advance, the need for effective teaching methods becomes increasingly paramount. Our study underscores the potential of combining active learning approaches like CBL with innovative technologies such as 3D printing to enhance teaching effectiveness, improve knowledge acquisition, and foster students’ confidence and enthusiasm in pursuing clinical careers. Moving forward, further research and integration of such methodologies are essential for meeting the evolving demands of medical education, especially in areas involving complex anatomical understanding.

Conclusions

Integrating 3D-printed models with the CBL method is feasible and effective in TOF instruction. The demonstrated success of this method warrants broad implementation in medical education, particularly for complex anatomical topics.

Data availability

All data supporting the conclusions of this research are available upon reasonable request from the corresponding author.

Apitz C, Webb GD, Redington AN. Tetralogy of Fallot. Lancet. 2009;374:1462–71.

Article   Google Scholar  

Ghosh RM, Jolley MA, Mascio CE, Chen JM, Fuller S, Rome JJ, et al. Clinical 3D modeling to guide pediatric cardiothoracic surgery and intervention using 3D printed anatomic models, computer aided design and virtual reality. 3D Print Med. 2022;8:11.

Chakrabarti R, Wardle K, Wright T, Bennie T, Gishen F. Approaching an undergraduate medical curriculum map: challenges and expectations. BMC Med Educ. 2021;21:341.

Donkin R, Yule H, Fyfe T. Online case-based learning in medical education: a scoping review. BMC Med Educ. 2023;23:564.

Novack JP. Designing cases for case-based immunology teaching in large medical school classes. Front Immunol. 2020;11:995.

Chen HC, Van Den Broek WES, Ten Cate O. The case for use of entrustable professional activities in undergraduate medical education. Acad Med. 2015;90:431–6.

Wang M, Sun Z, Jia M, Wang Y, Wang H, Zhu X, et al. Intelligent virtual case learning system based on real medical records and natural language processing. BMC Med Inf Decis Mak. 2022;22:60.

Yoo S-J, Thabit O, Kim EK, Ide H, Yim D, Dragulescu A, et al. 3D printing in medicine of congenital heart diseases. 3D Print Med. 2015;2:3.

Yammine K, Violato C. A meta-analysis of the educational effectiveness of three-dimensional visualization technologies in teaching anatomy. Anat Sci Educ. 2015;8:525–38.

Miao H, Ding J, Gong X, Zhao J, Li H, Xiong K, et al. Application of 3D-printed pulmonary segment specimens in experimental teaching of sectional anatomy. BMC Surg. 2023;23:109.

Sun Z, Wong YH, Yeong CH. Patient-specific 3D-printed low-cost models in medical education and clinical practice. Micromachines (Basel). 2023;14:464.

Downing TE, Kim YY. Tetralogy of Fallot: general principles of management. Cardiol Clin. 2015;33:531–41. vii–viii.

Jia X, Zeng W, Zhang Q. Combined administration of problem- and lecture-based learning teaching models in medical education in China: a meta-analysis of randomized controlled trials. Med (Baltim). 2018;97:e11366.

McLean SF. Case-based learning and its application in medical and health-care fields: a review of worldwide literature. J Med Educ Curric Dev. 2016;3:JMECD.S20377.

Zeng N, Lu H, Li S, Yang Q, Liu F, Pan H, et al. Application of the combination of CBL teaching method and SEGUE framework to improve the doctor-patient communication skills of resident physicians in otolaryngology department. Bmc Med Educ. 2024;24:201.

Sun Z. Patient-specific 3D-printed models in pediatric congenital heart disease. Children. 2023;10:319.

Meyer-Szary J, Luis MS, Mikulski S, Patel A, Schulz F, Tretiakow D, et al. The role of 3D printing in planning complex medical procedures and training of medical professionals—cross-sectional multispecialty review. IJERPH. 2022;19:3331.

Sun Z, Wee C. 3D printed models in cardiovascular disease: an exciting future to deliver personalized medicine. Micromachines-basel. 2022;13:1575.

Valverde I, Gomez-Ciriza G, Hussain T, Suarez-Mejias C, Velasco-Forte MN, Byrne N, et al. Three-dimensional printed models for surgical planning of complex congenital heart defects: an international multicentre study. Eur J Cardio-thorac. 2017;52:1139–48.

Download references

Acknowledgements

We extend our sincere appreciation to the instructors and students whose invaluable participated in this study.

This paper received support from the Education Department of Anhui Province, China (Grant Numbers 2022jyxm1693, 2022jyxm1694, 2022xskc103, 2018jyxm1280).

Author information

Jian Zhao and Xin Gong are joint first authors.

Authors and Affiliations

Department of Human Anatomy, Wannan Medical College, No.22 West Wenchang Road, Wuhu, 241002, China

Jian Zhao, Xin Gong, Jian Ding, Rui Huang & Huachun Miao

Department of Cardio-Thoracic Surgery, Yijishan Hospital of Wannan Medical College, Wuhu, China

Kepin Xiong

Zhuhai Sailner 3D Technology Co., Ltd., Zhuhai, China

Kangle Zhuang

School of Basic Medical Sciences, Wannan Medical College, Wuhu, China

You can also search for this author in PubMed   Google Scholar

Contributions

Jian Zhao and Huachun Miao designed the research. Jian Zhao, Xin Gong, Jian Ding, Kepin Xiong designed the tests and questionnaires. Kangle Zhuang processed the imaging data and printed the models. Xing Gong and Kepin Xiong implemented the teaching. Jian Zhao and Rui Huang collected the data and performed the statistical analysis. Jian Zhao and Huachun Miao prepared the manuscript. Shu Li and Huachun Miao revised the manuscript. Shu Li provided the Funding acquisition. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Shu Li or Huachun Miao .

Ethics declarations

Ethics approval and consent to participate.

This investigation received ethical approval from the Ethical Committee of School of Basic Medical Sciences, Wannan Medical College (ECBMSWMC2022-1-12). All methodologies adhered strictly to established protocols and guidelines. Written informed consent was obtained from the study participants to take part in the study.

Consent for publication

Written informed consent was obtained from the individuals for the publication of any potentially identifiable images or data included in this article.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Zhao, J., Gong, X., Ding, J. et al. Integration of case-based learning and three-dimensional printing for tetralogy of fallot instruction in clinical medical undergraduates: a randomized controlled trial. BMC Med Educ 24 , 571 (2024). https://doi.org/10.1186/s12909-024-05583-z

Download citation

Received : 03 March 2024

Accepted : 21 May 2024

Published : 24 May 2024

DOI : https://doi.org/10.1186/s12909-024-05583-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Medical education
  • Case-based learning
  • 3D printing
  • Tetralogy of fallot
  • Medical undergraduates

BMC Medical Education

ISSN: 1472-6920

critical appraisal tool case study

IMAGES

  1. Summary table of the most well known Critical Appraisal Tools (CAT

    critical appraisal tool case study

  2. Critical Appraisal Guidelines for Single Case Study Research

    critical appraisal tool case study

  3. Table 1 from Methodological quality of case series studies: an

    critical appraisal tool case study

  4. Critical Appraisal Guidelines for Single Case Study Research

    critical appraisal tool case study

  5. The Joanna Briggs Institute (JBI) critical appraisal checklist for

    critical appraisal tool case study

  6. Critical Appraisal Of A Cross Sectional Study Survey

    critical appraisal tool case study

VIDEO

  1. HS2405 AssessmentTask1 Group4 Maru

  2. Critical Appraisal of a Clinical Trial- Lecture by Dr. Bishal Gyawali

  3. CRITICAL APPRAISAL OF A COHORT STUDY DR THOMAS VINOD THOMAS

  4. Critical Appraisal (3 sessions) practical book EBM

  5. critical appraisal by dr ammad waheed khan

  6. Reflections on critical appraisal of research for qualitative evidence synthesis

COMMENTS

  1. JBI Critical Appraisal Tools

    JBI's Evidence Synthesis Critical Appraisal Tools Assist in Assessing the Trustworthiness, Relevance and Results of Published Papers ... Barker TH, Moola S, Tufanaru C, Stern C, McArthur A, Stephenson M, Aromataris E. Methodological quality of case series studies: an introduction to the JBI critical appraisal tool. JBI Evidence Synthesis ...

  2. Critical Appraisal Tools and Reporting Guidelines

    Some critical appraisal tools are generic, whereas others are study-design specific (Katrak et al., 2004). Since each tool has strengths and limitations, researchers and practitioners must be cautious when selecting critical appraisal tools. Reviewing the documentation and guidelines about how to use these tools is highly recommended.

  3. CASP Checklists

    Critical Appraisal Checklists. We offer a number of free downloadable checklists to help you more easily and accurately perform critical appraisal across a number of different study types. The CASP checklists are easy to understand but in case you need any further guidance on how they are structured, take a look at our guide on how to use our ...

  4. Methodological quality of case series studies: an introduction to the

    Results: The JBI critical appraisal tool for case series studies includes 10 questions addressing the internal validity and risk of bias of case series designs, particularly confounding, selection, and information bias, in addition to the importance of clear reporting. Conclusion: In certain situations, case series designs may represent the ...

  5. Methodological quality of case series studies: an introducti ...

    An international working group was formed to review the methodological literature regarding case series as a form of evidence for inclusion in systematic reviews. The group then developed a critical appraisal tool based on the epidemiological literature relating to bias within these studies. This was then piloted, reviewed, and approved by JBI ...

  6. Critical Appraisal Tools and Reporting Guidelines

    Keywords. Critical appraisal tools and reporting guidelines are the two most important instruments available to researchers and practitioners involved in research, evidence-based practice, and policymaking. Each of these instruments has unique characteristics, and both instruments play an essen-tial role in evidence-based practice and decision ...

  7. Scientific writing: Critical Appraisal Toolkit (CAT) for assessing

    The descriptive study critical appraisal tool assesses different aspects of sampling, data collection, statistical analysis, and ethical conduct. It is used to appraise cross-sectional studies, outbreak investigations, case series and case reports. The literature review critical appraisal tool assesses the methodology, results and applicability ...

  8. Systematic Reviews: Critical Appraisal by Study Design

    Tools for Critical Appraisal of Studies. "The purpose of critical appraisal is to determine the scientific merit of a research report and its applicability to clinical decision making."1 Conducting a critical appraisal of a study is imperative to any well executed evidence review, but the process can be time consuming and difficult.2 The ...

  9. Critical Appraisal tools

    This section contains useful tools and downloads for the critical appraisal of different types of medical evidence. Example appraisal sheets are provided together with several helpful examples. Critical appraisal worksheets to help you appraise the reliability, importance and applicability of clinical evidence.

  10. Study Quality Assessment Tools

    The guidance document below is organized by question number from the tool for quality assessment of case-control studies. Question 1. Research question. ... Critical appraisal of a study involves considering the risk of potential for selection bias, information bias, measurement bias, or confounding (the mixture of exposures that one cannot ...

  11. PDF © Joanna Briggs Institute 2017 Critical Appraisal Checklist for

    JBI Critical Appraisal Tools All systematic reviews incorporate a process of critique or appraisal of the research evidence. The purpose of this appraisal is to assess the methodological quality of a study and to determine the extent to which a study has addressed the possibility of bias in its design, conduct and analysis. All papers

  12. A guide to critical appraisal of evidence : Nursing2020 Critical Care

    Critical appraisal is the assessment of research studies' worth to clinical practice. Critical appraisal—the heart of evidence-based practice—involves four phases: rapid critical appraisal, evaluation, synthesis, and recommendation. This article reviews each phase and provides examples, tips, and caveats to help evidence appraisers ...

  13. Tools for critically appraising different study designs, systematic

    Summary. A Critical Appraisal Tool (CAT) allows the methodological quality of a study/process to be assessed, which, in turn, influences the reliability of the evidence produced by such a study/process. CATs help to minimise subjectivity in the appraisal and maximise transparency.

  14. (PDF) Critical Appraisal of a Case Report

    T able 1: Critical appraisal of a case report. e discussion of clinical cases is an important tool in learning and. improvemen t of clinical reasoning, the use of this critical evaluation of. case ...

  15. Critical Appraisal

    Critical appraisal and, more specifically, critical appraisal tools provide us with a mechanism to evaluate the research methodology of a study with a critical, objective, and systematic lens. This appraisal is essential when evaluating a study for a systematic review, for determining new guidelines for patient care, or for choosing appropriate ...

  16. Full article: Critical appraisal

    What is critical appraisal? Critical appraisal involves a careful and systematic assessment of a study's trustworthiness or rigour (Booth et al., Citation 2016).A well-conducted critical appraisal: (a) is an explicit systematic, rather than an implicit haphazard, process; (b) involves judging a study on its methodological, ethical, and theoretical quality, and (c) is enhanced by a reviewer ...

  17. Critical appraisal full list of checklists and tools

    JBI Joanna Briggs Institute critical appraisal tools checklists for Analytical cross sectional studies, case control studies, case reports, case series, cohort studies, diagnostic test accuracy, economic evaluations, prevalence studies, qualitative research, quasi-experimental (non-randomised) studies, RCTs, systematic reviews and for text and ...

  18. Optimising the value of the critical appraisal skills programme (CASP

    Our novel question is comparable to questions in the JBI critical appraisal tool. 48 Five of 10 questions in the JBI tool prompt the reviewer to consider the congruity between the research methodology and a particular aspect of the ... An evaluation of sensitivity analyses in two case study reviews. Qual Health Res 2012; 22: 1425-1434. Crossref.

  19. Integration of case-based learning and three-dimensional printing for

    Background Case-based learning (CBL) methods have gained prominence in medical education, proving especially effective for preclinical training in undergraduate medical education. Tetralogy of Fallot (TOF) is a congenital heart disease characterized by four malformations, presenting a challenge in medical education due to the complexity of its anatomical pathology. Three-dimensional printing ...