University of Texas

  • University of Texas Libraries
  • UT Libraries

Systematic Reviews & Evidence Synthesis Methods

  • Types of Reviews
  • Formulate Question
  • Find Existing Reviews & Protocols
  • Register a Protocol
  • Searching Systematically
  • Supplementary Searching
  • Managing Results
  • Deduplication
  • Critical Appraisal
  • Glossary of terms
  • Librarian Support
  • Video tutorials This link opens in a new window
  • Systematic Review & Evidence Synthesis Boot Camp

What is a Systematic Review?

A systematic review gathers, assesses, and synthesizes  all available empirical  research on a specific question using a comprehensive search method with an aim to minimize bias.

Or, put another way : 

A systematic review begins with a specific research question.  Authors of the review gather and evaluate all experimental studies that address the question .  Bringing together the findings of these separate studies allows the review authors to make new conclusions from what has been learned.

*The key characteristics of a systematic review are:

  • A clearly stated set of objectives with pre-defined eligibility criteria for studies;
  • An explicit, reproducible methodology;
  • A systematic search that attempts to identify all relevant research;
  • A critical appraisal of the included studies;
  • A clear and objective synthesis and presentation of the characteristics and findings of the included studies.

*Lasserson T, Thomas J, Higgins JPT. Chapter 1: Starting a review. In Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors).  Cochrane Handbook for Systematic Reviews of Interventions  version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

What is the difference between an evidence synthesis and a systematic review? A systematic review is a type of evidence synthesis.  Any literature review is a type of evidence synthesis.  For the various types of evidence syntheses/literature reviews, see the page on this guide Types of Reviews .

Systematic reviews are usually done as a team project , requiring cooperation and a commitment of (lots of) time and effort over an extended period. You will need at least 3 people and, depending on the scope of the project and the size of the database result sets, you should plan for 6-24 months from start to completion

Things to Know Before You Begin . . .

Run exploratory searches on the topic to get a sense of the plausibility of your project.

A systematic review requires a research question that is already well-covered in the primary literature.  That is, if there has been little previous work on the topic, there will be little to analyze and conclusions hard to find.

A narrowly-focused research question may add little to the knowledge of the field of study.

Make sure someone else has not already 1) written a recent systematic review on your topic, or 2) is in the midst of a similar systematic review project. Instructions on how to check .

Team members will need to use research databases for searching the literature.  If these databases are not available through library subscriptions or freely available, their use may require payment or travel. Look here for database recommendations .

It is extremely important to develop a protocol for your project.  Guidance is provided here .

Tools such as a reference manager and a screening tool will save time.  

Lynn Bostwick : Nursing, Nutrition, Pharmacy, Public Health

Meryl Brodsky : Communication and Information Studies

Hannah Chapman Tripp : Biology, Neuroscience

Carolyn Cunningham : Human Development & Family Sciences, Psychology, Sociology

Larayne Dallas : Engineering

Liz DeHart : Marine Science

Grant Hardaway : Educational Psychology, Kinesiology & Health Education, Social Work

Janelle Hedstrom : Special Education, Curriculum & Instruction, Ed Leadership & Policy ​

Susan Macicak : Linguistics

Imelda Vetter : Dell Medical School

  • Last Updated: Apr 9, 2024 8:57 PM
  • URL: https://guides.lib.utexas.edu/systematicreviews

Creative Commons License

1.2.2  What is a systematic review?

A systematic review attempts to collate all empirical evidence that fits pre-specified eligibility criteria in order to answer a specific research question.  It  uses explicit, systematic methods that are selected with a view to minimizing bias, thus providing more reliable findings from which conclusions can be drawn and decisions made (Antman 1992, Oxman 1993) . The key characteristics of a systematic review are:

a clearly stated set of objectives with pre-defined eligibility criteria for studies;

an explicit, reproducible methodology;

a systematic search that attempts to identify all studies that would meet the eligibility criteria;

an assessment of the validity of the findings of the included studies, for example through the assessment of risk of bias; and

a systematic presentation, and synthesis, of the characteristics and findings of the included studies.

Many systematic reviews contain meta-analyses. Meta-analysis is the use of statistical methods to summarize the results of independent studies (Glass 1976). By combining information from all relevant studies, meta-analyses can provide more precise estimates of the effects of health care than those derived from the individual studies included within a review (see Chapter 9, Section 9.1.3 ). They also facilitate investigations of the consistency of evidence across studies, and the exploration of differences across studies.

University of Maryland Libraries Logo

Systematic Review

  • Library Help
  • What is a Systematic Review (SR)?
  • Steps of a Systematic Review
  • Framing a Research Question
  • Developing a Search Strategy
  • Searching the Literature
  • Managing the Process
  • Meta-analysis
  • Publishing your Systematic Review

Introduction to Systematic Review

  • Introduction
  • Types of literature reviews
  • Other Libguides
  • Systematic review as part of a dissertation
  • Tutorials & Guidelines & Examples from non-Medical Disciplines

Depending on your learning style, please explore the resources in various formats on the tabs above.

For additional tutorials, visit the SR Workshop Videos  from UNC at Chapel Hill outlining each stage of the systematic review process.

Know the difference! Systematic review vs. literature review

is a systematic literature review empirical research

Types of literature reviews along with associated methodologies

JBI Manual for Evidence Synthesis .  Find definitions and methodological guidance.

- Systematic Reviews - Chapters 1-7

- Mixed Methods Systematic Reviews -  Chapter 8

- Diagnostic Test Accuracy Systematic Reviews -  Chapter 9

- Umbrella Reviews -  Chapter 10

- Scoping Reviews -  Chapter 11

- Systematic Reviews of Measurement Properties -  Chapter 12

Systematic reviews vs scoping reviews - 

Grant, M. J., & Booth, A. (2009). A typology of reviews: an analysis of 14 review types and associated methodologies. Health Information and Libraries Journal , 26 (2), 91–108. https://doi.org/10.1111/j.1471-1842.2009.00848.x

Gough, D., Thomas, J., & Oliver, S. (2012). Clarifying differences between review designs and methods. Systematic Reviews, 1 (28). htt p s://doi.org/ 10.1186/2046-4053-1-28

Munn, Z., Peters, M., Stern, C., Tufanaru, C., McArthur, A., & Aromataris, E. (2018).  Systematic review or  scoping review ?  Guidance for authors when choosing between a systematic or scoping review approach.  BMC medical research methodology, 18 (1), 143. https://doi.org/10.1186/s12874-018-0611-x. Also, check out the  Libguide from Weill Cornell Medicine  for the  differences between a systematic review and a scoping review  and when to embark on either one of them.

Sutton, A., Clowes, M., Preston, L., & Booth, A. (2019). Meeting the review family: Exploring review types and associated information retrieval requirements . Health Information & Libraries Journal , 36 (3), 202–222. https://doi.org/10.1111/hir.12276

Temple University. Review Types . - This guide provides useful descriptions of some of the types of reviews listed in the above article.

UMD Health Sciences and Human Services Library.  Review Types . - Guide describing Literature Reviews, Scoping Reviews, and Rapid Reviews.

Whittemore, R., Chao, A., Jang, M., Minges, K. E., & Park, C. (2014). Methods for knowledge synthesis: An overview. Heart & Lung: The Journal of Acute and Critical Care, 43 (5), 453–461. https://doi.org/10.1016/j.hrtlng.2014.05.014

Differences between a systematic review and other types of reviews

Armstrong, R., Hall, B. J., Doyle, J., & Waters, E. (2011). ‘ Scoping the scope ’ of a cochrane review. Journal of Public Health , 33 (1), 147–150. https://doi.org/10.1093/pubmed/fdr015

Kowalczyk, N., & Truluck, C. (2013). Literature reviews and systematic reviews: What is the difference? Radiologic Technology , 85 (2), 219–222.

White, H., Albers, B., Gaarder, M., Kornør, H., Littell, J., Marshall, Z., Matthew, C., Pigott, T., Snilstveit, B., Waddington, H., & Welch, V. (2020). Guidance for producing a Campbell evidence and gap map . Campbell Systematic Reviews, 16 (4), e1125. https://doi.org/10.1002/cl2.1125. Check also this comparison between evidence and gaps maps and systematic reviews.

Rapid Reviews Tutorials

Rapid Review Guidebook  by the National Collaborating Centre of Methods and Tools (NCCMT)

Hamel, C., Michaud, A., Thuku, M., Skidmore, B., Stevens, A., Nussbaumer-Streit, B., & Garritty, C. (2021). Defining Rapid Reviews: a systematic scoping review and thematic analysis of definitions and defining characteristics of rapid reviews.  Journal of clinical epidemiology ,  129 , 74–85. https://doi.org/10.1016/j.jclinepi.2020.09.041

  • Müller, C., Lautenschläger, S., Meyer, G., & Stephan, A. (2017). Interventions to support people with dementia and their caregivers during the transition from home care to nursing home care: A systematic review . International Journal of Nursing Studies, 71 , 139–152. https://doi.org/10.1016/j.ijnurstu.2017.03.013
  • Bhui, K. S., Aslam, R. W., Palinski, A., McCabe, R., Johnson, M. R. D., Weich, S., … Szczepura, A. (2015). Interventions to improve therapeutic communications between Black and minority ethnic patients and professionals in psychiatric services: Systematic review . The British Journal of Psychiatry, 207 (2), 95–103. https://doi.org/10.1192/bjp.bp.114.158899
  • Rosen, L. J., Noach, M. B., Winickoff, J. P., & Hovell, M. F. (2012). Parental smoking cessation to protect young children: A systematic review and meta-analysis . Pediatrics, 129 (1), 141–152. https://doi.org/10.1542/peds.2010-3209

Scoping Review

  • Hyshka, E., Karekezi, K., Tan, B., Slater, L. G., Jahrig, J., & Wild, T. C. (2017). The role of consumer perspectives in estimating population need for substance use services: A scoping review . BMC Health Services Research, 171-14.  https://doi.org/10.1186/s12913-017-2153-z
  • Olson, K., Hewit, J., Slater, L.G., Chambers, T., Hicks, D., Farmer, A., & ... Kolb, B. (2016). Assessing cognitive function in adults during or following chemotherapy: A scoping review . Supportive Care In Cancer, 24 (7), 3223-3234. https://doi.org/10.1007/s00520-016-3215-1
  • Pham, M. T., Rajić, A., Greig, J. D., Sargeant, J. M., Papadopoulos, A., & McEwen, S. A. (2014). A scoping review of scoping reviews: Advancing the approach and enhancing the consistency . Research Synthesis Methods, 5 (4), 371–385. https://doi.org/10.1002/jrsm.1123
  • Scoping Review Tutorial from UNC at Chapel Hill

Qualitative Systematic Review/Meta-Synthesis

  • Lee, H., Tamminen, K. A., Clark, A. M., Slater, L., Spence, J. C., & Holt, N. L. (2015). A meta-study of qualitative research examining determinants of children's independent active free play . International Journal Of Behavioral Nutrition & Physical Activity, 12 (5), 121-12. https://doi.org/10.1186/s12966-015-0165-9

Videos on systematic reviews

Systematic Reviews: What are they? Are they right for my research? - 47 min. video recording with a closed caption option.

More training videos  on systematic reviews:   

Books on Systematic Reviews

Cover Art

Books on Meta-analysis

is a systematic literature review empirical research

  • University of Toronto Libraries  - very detailed with good tips on the sensitivity and specificity of searches.
  • Monash University  - includes an interactive case study tutorial. 
  • Dalhousie University Libraries - a comprehensive How-To Guide on conducting a systematic review.

Guidelines for a systematic review as part of the dissertation

  • Guidelines for Systematic Reviews in the Context of Doctoral Education Background  by University of Victoria (PDF)
  • Can I conduct a Systematic Review as my Master’s dissertation or PhD thesis? Yes, It Depends!  by Farhad (blog)
  • What is a Systematic Review Dissertation Like? by the University of Edinburgh (50 min video) 

Further readings on experiences of PhD students and doctoral programs with systematic reviews

Puljak, L., & Sapunar, D. (2017). Acceptance of a systematic review as a thesis: Survey of biomedical doctoral programs in Europe . Systematic Reviews , 6 (1), 253. https://doi.org/10.1186/s13643-017-0653-x

Perry, A., & Hammond, N. (2002). Systematic reviews: The experiences of a PhD Student . Psychology Learning & Teaching , 2 (1), 32–35. https://doi.org/10.2304/plat.2002.2.1.32

Daigneault, P.-M., Jacob, S., & Ouimet, M. (2014). Using systematic review methods within a Ph.D. dissertation in political science: Challenges and lessons learned from practice . International Journal of Social Research Methodology , 17 (3), 267–283. https://doi.org/10.1080/13645579.2012.730704

UMD Doctor of Philosophy Degree Policies

Before you embark on a systematic review research project, check the UMD PhD Policies to make sure you are on the right path. Systematic reviews require a team of at least two reviewers and an information specialist or a librarian. Discuss with your advisor the authorship roles of the involved team members. Keep in mind that the  UMD Doctor of Philosophy Degree Policies (scroll down to the section, Inclusion of one's own previously published materials in a dissertation ) outline such cases, specifically the following: 

" It is recognized that a graduate student may co-author work with faculty members and colleagues that should be included in a dissertation . In such an event, a letter should be sent to the Dean of the Graduate School certifying that the student's examining committee has determined that the student made a substantial contribution to that work. This letter should also note that the inclusion of the work has the approval of the dissertation advisor and the program chair or Graduate Director. The letter should be included with the dissertation at the time of submission.  The format of such inclusions must conform to the standard dissertation format. A foreword to the dissertation, as approved by the Dissertation Committee, must state that the student made substantial contributions to the relevant aspects of the jointly authored work included in the dissertation."

  • Cochrane Handbook for Systematic Reviews of Interventions - See Part 2: General methods for Cochrane reviews
  • Systematic Searches - Yale library video tutorial series 
  • Using PubMed's Clinical Queries to Find Systematic Reviews  - From the U.S. National Library of Medicine
  • Systematic reviews and meta-analyses: A step-by-step guide - From the University of Edinsburgh, Centre for Cognitive Ageing and Cognitive Epidemiology

Bioinformatics

  • Mariano, D. C., Leite, C., Santos, L. H., Rocha, R. E., & de Melo-Minardi, R. C. (2017). A guide to performing systematic literature reviews in bioinformatics .  arXiv preprint arXiv:1707.05813.

Environmental Sciences

Collaboration for Environmental Evidence. 2018.  Guidelines and Standards for Evidence synthesis in Environmental Management. Version 5.0 (AS Pullin, GK Frampton, B Livoreil & G Petrokofsky, Eds) www.environmentalevidence.org/information-for-authors .

Pullin, A. S., & Stewart, G. B. (2006). Guidelines for systematic review in conservation and environmental management. Conservation Biology, 20 (6), 1647–1656. https://doi.org/10.1111/j.1523-1739.2006.00485.x

Engineering Education

  • Borrego, M., Foster, M. J., & Froyd, J. E. (2014). Systematic literature reviews in engineering education and other developing interdisciplinary fields. Journal of Engineering Education, 103 (1), 45–76. https://doi.org/10.1002/jee.20038

Public Health

  • Hannes, K., & Claes, L. (2007). Learn to read and write systematic reviews: The Belgian Campbell Group . Research on Social Work Practice, 17 (6), 748–753. https://doi.org/10.1177/1049731507303106
  • McLeroy, K. R., Northridge, M. E., Balcazar, H., Greenberg, M. R., & Landers, S. J. (2012). Reporting guidelines and the American Journal of Public Health’s adoption of preferred reporting items for systematic reviews and meta-analyses . American Journal of Public Health, 102 (5), 780–784. https://doi.org/10.2105/AJPH.2011.300630
  • Pollock, A., & Berge, E. (2018). How to do a systematic review.   International Journal of Stroke, 13 (2), 138–156. https://doi.org/10.1177/1747493017743796
  • Institute of Medicine. (2011). Finding what works in health care: Standards for systematic reviews . https://doi.org/10.17226/13059
  • Wanden-Berghe, C., & Sanz-Valero, J. (2012). Systematic reviews in nutrition: Standardized methodology . The British Journal of Nutrition, 107 Suppl 2, S3-7. https://doi.org/10.1017/S0007114512001432

Social Sciences

  • Bronson, D., & Davis, T. (2012).  Finding and evaluating evidence: Systematic reviews and evidence-based practice (Pocket guides to social work research methods). Oxford: Oxford University Press.
  • Petticrew, M., & Roberts, H. (2006).  Systematic reviews in the social sciences: A practical guide . Malden, MA: Blackwell Pub.
  • Cornell University Library Guide -  Systematic literature reviews in engineering: Example: Software Engineering
  • Biolchini, J., Mian, P. G., Natali, A. C. C., & Travassos, G. H. (2005). Systematic review in software engineering .  System Engineering and Computer Science Department COPPE/UFRJ, Technical Report ES, 679 (05), 45.
  • Biolchini, J. C., Mian, P. G., Natali, A. C. C., Conte, T. U., & Travassos, G. H. (2007). Scientific research ontology to support systematic review in software engineering . Advanced Engineering Informatics, 21 (2), 133–151.
  • Kitchenham, B. (2007). Guidelines for performing systematic literature reviews in software engineering . [Technical Report]. Keele, UK, Keele University, 33(2004), 1-26.
  • Weidt, F., & Silva, R. (2016). Systematic literature review in computer science: A practical guide .  Relatórios Técnicos do DCC/UFJF ,  1 .
  • Academic Phrasebank - Get some inspiration and find some terms and phrases for writing your research paper
  • Oxford English Dictionary  - Use to locate word variants and proper spelling
  • << Previous: Library Help
  • Next: Steps of a Systematic Review >>
  • Last Updated: Apr 19, 2024 12:47 PM
  • URL: https://lib.guides.umd.edu/SR

A Systematic Literature Review of Empirical Research on Epistemic Network Analysis in Education

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Wiley-Blackwell Online Open

Logo of blackwellopen

An overview of methodological approaches in systematic reviews

Prabhakar veginadu.

1 Department of Rural Clinical Sciences, La Trobe Rural Health School, La Trobe University, Bendigo Victoria, Australia

Hanny Calache

2 Lincoln International Institute for Rural Health, University of Lincoln, Brayford Pool, Lincoln UK

Akshaya Pandian

3 Department of Orthodontics, Saveetha Dental College, Chennai Tamil Nadu, India

Mohd Masood

Associated data.

APPENDIX B: List of excluded studies with detailed reasons for exclusion

APPENDIX C: Quality assessment of included reviews using AMSTAR 2

The aim of this overview is to identify and collate evidence from existing published systematic review (SR) articles evaluating various methodological approaches used at each stage of an SR.

The search was conducted in five electronic databases from inception to November 2020 and updated in February 2022: MEDLINE, Embase, Web of Science Core Collection, Cochrane Database of Systematic Reviews, and APA PsycINFO. Title and abstract screening were performed in two stages by one reviewer, supported by a second reviewer. Full‐text screening, data extraction, and quality appraisal were performed by two reviewers independently. The quality of the included SRs was assessed using the AMSTAR 2 checklist.

The search retrieved 41,556 unique citations, of which 9 SRs were deemed eligible for inclusion in final synthesis. Included SRs evaluated 24 unique methodological approaches used for defining the review scope and eligibility, literature search, screening, data extraction, and quality appraisal in the SR process. Limited evidence supports the following (a) searching multiple resources (electronic databases, handsearching, and reference lists) to identify relevant literature; (b) excluding non‐English, gray, and unpublished literature, and (c) use of text‐mining approaches during title and abstract screening.

The overview identified limited SR‐level evidence on various methodological approaches currently employed during five of the seven fundamental steps in the SR process, as well as some methodological modifications currently used in expedited SRs. Overall, findings of this overview highlight the dearth of published SRs focused on SR methodologies and this warrants future work in this area.

1. INTRODUCTION

Evidence synthesis is a prerequisite for knowledge translation. 1 A well conducted systematic review (SR), often in conjunction with meta‐analyses (MA) when appropriate, is considered the “gold standard” of methods for synthesizing evidence related to a topic of interest. 2 The central strength of an SR is the transparency of the methods used to systematically search, appraise, and synthesize the available evidence. 3 Several guidelines, developed by various organizations, are available for the conduct of an SR; 4 , 5 , 6 , 7 among these, Cochrane is considered a pioneer in developing rigorous and highly structured methodology for the conduct of SRs. 8 The guidelines developed by these organizations outline seven fundamental steps required in SR process: defining the scope of the review and eligibility criteria, literature searching and retrieval, selecting eligible studies, extracting relevant data, assessing risk of bias (RoB) in included studies, synthesizing results, and assessing certainty of evidence (CoE) and presenting findings. 4 , 5 , 6 , 7

The methodological rigor involved in an SR can require a significant amount of time and resource, which may not always be available. 9 As a result, there has been a proliferation of modifications made to the traditional SR process, such as refining, shortening, bypassing, or omitting one or more steps, 10 , 11 for example, limits on the number and type of databases searched, limits on publication date, language, and types of studies included, and limiting to one reviewer for screening and selection of studies, as opposed to two or more reviewers. 10 , 11 These methodological modifications are made to accommodate the needs of and resource constraints of the reviewers and stakeholders (e.g., organizations, policymakers, health care professionals, and other knowledge users). While such modifications are considered time and resource efficient, they may introduce bias in the review process reducing their usefulness. 5

Substantial research has been conducted examining various approaches used in the standardized SR methodology and their impact on the validity of SR results. There are a number of published reviews examining the approaches or modifications corresponding to single 12 , 13 or multiple steps 14 involved in an SR. However, there is yet to be a comprehensive summary of the SR‐level evidence for all the seven fundamental steps in an SR. Such a holistic evidence synthesis will provide an empirical basis to confirm the validity of current accepted practices in the conduct of SRs. Furthermore, sometimes there is a balance that needs to be achieved between the resource availability and the need to synthesize the evidence in the best way possible, given the constraints. This evidence base will also inform the choice of modifications to be made to the SR methods, as well as the potential impact of these modifications on the SR results. An overview is considered the choice of approach for summarizing existing evidence on a broad topic, directing the reader to evidence, or highlighting the gaps in evidence, where the evidence is derived exclusively from SRs. 15 Therefore, for this review, an overview approach was used to (a) identify and collate evidence from existing published SR articles evaluating various methodological approaches employed in each of the seven fundamental steps of an SR and (b) highlight both the gaps in the current research and the potential areas for future research on the methods employed in SRs.

An a priori protocol was developed for this overview but was not registered with the International Prospective Register of Systematic Reviews (PROSPERO), as the review was primarily methodological in nature and did not meet PROSPERO eligibility criteria for registration. The protocol is available from the corresponding author upon reasonable request. This overview was conducted based on the guidelines for the conduct of overviews as outlined in The Cochrane Handbook. 15 Reporting followed the Preferred Reporting Items for Systematic reviews and Meta‐analyses (PRISMA) statement. 3

2.1. Eligibility criteria

Only published SRs, with or without associated MA, were included in this overview. We adopted the defining characteristics of SRs from The Cochrane Handbook. 5 According to The Cochrane Handbook, a review was considered systematic if it satisfied the following criteria: (a) clearly states the objectives and eligibility criteria for study inclusion; (b) provides reproducible methodology; (c) includes a systematic search to identify all eligible studies; (d) reports assessment of validity of findings of included studies (e.g., RoB assessment of the included studies); (e) systematically presents all the characteristics or findings of the included studies. 5 Reviews that did not meet all of the above criteria were not considered a SR for this study and were excluded. MA‐only articles were included if it was mentioned that the MA was based on an SR.

SRs and/or MA of primary studies evaluating methodological approaches used in defining review scope and study eligibility, literature search, study selection, data extraction, RoB assessment, data synthesis, and CoE assessment and reporting were included. The methodological approaches examined in these SRs and/or MA can also be related to the substeps or elements of these steps; for example, applying limits on date or type of publication are the elements of literature search. Included SRs examined or compared various aspects of a method or methods, and the associated factors, including but not limited to: precision or effectiveness; accuracy or reliability; impact on the SR and/or MA results; reproducibility of an SR steps or bias occurred; time and/or resource efficiency. SRs assessing the methodological quality of SRs (e.g., adherence to reporting guidelines), evaluating techniques for building search strategies or the use of specific database filters (e.g., use of Boolean operators or search filters for randomized controlled trials), examining various tools used for RoB or CoE assessment (e.g., ROBINS vs. Cochrane RoB tool), or evaluating statistical techniques used in meta‐analyses were excluded. 14

2.2. Search

The search for published SRs was performed on the following scientific databases initially from inception to third week of November 2020 and updated in the last week of February 2022: MEDLINE (via Ovid), Embase (via Ovid), Web of Science Core Collection, Cochrane Database of Systematic Reviews, and American Psychological Association (APA) PsycINFO. Search was restricted to English language publications. Following the objectives of this study, study design filters within databases were used to restrict the search to SRs and MA, where available. The reference lists of included SRs were also searched for potentially relevant publications.

The search terms included keywords, truncations, and subject headings for the key concepts in the review question: SRs and/or MA, methods, and evaluation. Some of the terms were adopted from the search strategy used in a previous review by Robson et al., which reviewed primary studies on methodological approaches used in study selection, data extraction, and quality appraisal steps of SR process. 14 Individual search strategies were developed for respective databases by combining the search terms using appropriate proximity and Boolean operators, along with the related subject headings in order to identify SRs and/or MA. 16 , 17 A senior librarian was consulted in the design of the search terms and strategy. Appendix A presents the detailed search strategies for all five databases.

2.3. Study selection and data extraction

Title and abstract screening of references were performed in three steps. First, one reviewer (PV) screened all the titles and excluded obviously irrelevant citations, for example, articles on topics not related to SRs, non‐SR publications (such as randomized controlled trials, observational studies, scoping reviews, etc.). Next, from the remaining citations, a random sample of 200 titles and abstracts were screened against the predefined eligibility criteria by two reviewers (PV and MM), independently, in duplicate. Discrepancies were discussed and resolved by consensus. This step ensured that the responses of the two reviewers were calibrated for consistency in the application of the eligibility criteria in the screening process. Finally, all the remaining titles and abstracts were reviewed by a single “calibrated” reviewer (PV) to identify potential full‐text records. Full‐text screening was performed by at least two authors independently (PV screened all the records, and duplicate assessment was conducted by MM, HC, or MG), with discrepancies resolved via discussions or by consulting a third reviewer.

Data related to review characteristics, results, key findings, and conclusions were extracted by at least two reviewers independently (PV performed data extraction for all the reviews and duplicate extraction was performed by AP, HC, or MG).

2.4. Quality assessment of included reviews

The quality assessment of the included SRs was performed using the AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews). The tool consists of a 16‐item checklist addressing critical and noncritical domains. 18 For the purpose of this study, the domain related to MA was reclassified from critical to noncritical, as SRs with and without MA were included. The other six critical domains were used according to the tool guidelines. 18 Two reviewers (PV and AP) independently responded to each of the 16 items in the checklist with either “yes,” “partial yes,” or “no.” Based on the interpretations of the critical and noncritical domains, the overall quality of the review was rated as high, moderate, low, or critically low. 18 Disagreements were resolved through discussion or by consulting a third reviewer.

2.5. Data synthesis

To provide an understandable summary of existing evidence syntheses, characteristics of the methods evaluated in the included SRs were examined and key findings were categorized and presented based on the corresponding step in the SR process. The categories of key elements within each step were discussed and agreed by the authors. Results of the included reviews were tabulated and summarized descriptively, along with a discussion on any overlap in the primary studies. 15 No quantitative analyses of the data were performed.

From 41,556 unique citations identified through literature search, 50 full‐text records were reviewed, and nine systematic reviews 14 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 were deemed eligible for inclusion. The flow of studies through the screening process is presented in Figure  1 . A list of excluded studies with reasons can be found in Appendix B .

An external file that holds a picture, illustration, etc.
Object name is JEBM-15-39-g001.jpg

Study selection flowchart

3.1. Characteristics of included reviews

Table  1 summarizes the characteristics of included SRs. The majority of the included reviews (six of nine) were published after 2010. 14 , 22 , 23 , 24 , 25 , 26 Four of the nine included SRs were Cochrane reviews. 20 , 21 , 22 , 23 The number of databases searched in the reviews ranged from 2 to 14, 2 reviews searched gray literature sources, 24 , 25 and 7 reviews included a supplementary search strategy to identify relevant literature. 14 , 19 , 20 , 21 , 22 , 23 , 26 Three of the included SRs (all Cochrane reviews) included an integrated MA. 20 , 21 , 23

Characteristics of included studies

SR = systematic review; MA = meta‐analysis; RCT = randomized controlled trial; CCT = controlled clinical trial; N/R = not reported.

The included SRs evaluated 24 unique methodological approaches (26 in total) used across five steps in the SR process; 8 SRs evaluated 6 approaches, 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 while 1 review evaluated 18 approaches. 14 Exclusion of gray or unpublished literature 21 , 26 and blinding of reviewers for RoB assessment 14 , 23 were evaluated in two reviews each. Included SRs evaluated methods used in five different steps in the SR process, including methods used in defining the scope of review ( n  = 3), literature search ( n  = 3), study selection ( n  = 2), data extraction ( n  = 1), and RoB assessment ( n  = 2) (Table  2 ).

Summary of findings from review evaluating systematic review methods

There was some overlap in the primary studies evaluated in the included SRs on the same topics: Schmucker et al. 26 and Hopewell et al. 21 ( n  = 4), Hopewell et al. 20 and Crumley et al. 19 ( n  = 30), and Robson et al. 14 and Morissette et al. 23 ( n  = 4). There were no conflicting results between any of the identified SRs on the same topic.

3.2. Methodological quality of included reviews

Overall, the quality of the included reviews was assessed as moderate at best (Table  2 ). The most common critical weakness in the reviews was failure to provide justification for excluding individual studies (four reviews). Detailed quality assessment is provided in Appendix C .

3.3. Evidence on systematic review methods

3.3.1. methods for defining review scope and eligibility.

Two SRs investigated the effect of excluding data obtained from gray or unpublished sources on the pooled effect estimates of MA. 21 , 26 Hopewell et al. 21 reviewed five studies that compared the impact of gray literature on the results of a cohort of MA of RCTs in health care interventions. Gray literature was defined as information published in “print or electronic sources not controlled by commercial or academic publishers.” Findings showed an overall greater treatment effect for published trials than trials reported in gray literature. In a more recent review, Schmucker et al. 26 addressed similar objectives, by investigating gray and unpublished data in medicine. In addition to gray literature, defined similar to the previous review by Hopewell et al., the authors also evaluated unpublished data—defined as “supplemental unpublished data related to published trials, data obtained from the Food and Drug Administration  or other regulatory websites or postmarketing analyses hidden from the public.” The review found that in majority of the MA, excluding gray literature had little or no effect on the pooled effect estimates. The evidence was limited to conclude if the data from gray and unpublished literature had an impact on the conclusions of MA. 26

Morrison et al. 24 examined five studies measuring the effect of excluding non‐English language RCTs on the summary treatment effects of SR‐based MA in various fields of conventional medicine. Although none of the included studies reported major difference in the treatment effect estimates between English only and non‐English inclusive MA, the review found inconsistent evidence regarding the methodological and reporting quality of English and non‐English trials. 24 As such, there might be a risk of introducing “language bias” when excluding non‐English language RCTs. The authors also noted that the numbers of non‐English trials vary across medical specialties, as does the impact of these trials on MA results. Based on these findings, Morrison et al. 24 conclude that literature searches must include non‐English studies when resources and time are available to minimize the risk of introducing “language bias.”

3.3.2. Methods for searching studies

Crumley et al. 19 analyzed recall (also referred to as “sensitivity” by some researchers; defined as “percentage of relevant studies identified by the search”) and precision (defined as “percentage of studies identified by the search that were relevant”) when searching a single resource to identify randomized controlled trials and controlled clinical trials, as opposed to searching multiple resources. The studies included in their review frequently compared a MEDLINE only search with the search involving a combination of other resources. The review found low median recall estimates (median values between 24% and 92%) and very low median precisions (median values between 0% and 49%) for most of the electronic databases when searched singularly. 19 A between‐database comparison, based on the type of search strategy used, showed better recall and precision for complex and Cochrane Highly Sensitive search strategies (CHSSS). In conclusion, the authors emphasize that literature searches for trials in SRs must include multiple sources. 19

In an SR comparing handsearching and electronic database searching, Hopewell et al. 20 found that handsearching retrieved more relevant RCTs (retrieval rate of 92%−100%) than searching in a single electronic database (retrieval rates of 67% for PsycINFO/PsycLIT, 55% for MEDLINE, and 49% for Embase). The retrieval rates varied depending on the quality of handsearching, type of electronic search strategy used (e.g., simple, complex or CHSSS), and type of trial reports searched (e.g., full reports, conference abstracts, etc.). The authors concluded that handsearching was particularly important in identifying full trials published in nonindexed journals and in languages other than English, as well as those published as abstracts and letters. 20

The effectiveness of checking reference lists to retrieve additional relevant studies for an SR was investigated by Horsley et al. 22 The review reported that checking reference lists yielded 2.5%–40% more studies depending on the quality and comprehensiveness of the electronic search used. The authors conclude that there is some evidence, although from poor quality studies, to support use of checking reference lists to supplement database searching. 22

3.3.3. Methods for selecting studies

Three approaches relevant to reviewer characteristics, including number, experience, and blinding of reviewers involved in the screening process were highlighted in an SR by Robson et al. 14 Based on the retrieved evidence, the authors recommended that two independent, experienced, and unblinded reviewers be involved in study selection. 14 A modified approach has also been suggested by the review authors, where one reviewer screens and the other reviewer verifies the list of excluded studies, when the resources are limited. It should be noted however this suggestion is likely based on the authors’ opinion, as there was no evidence related to this from the studies included in the review.

Robson et al. 14 also reported two methods describing the use of technology for screening studies: use of Google Translate for translating languages (for example, German language articles to English) to facilitate screening was considered a viable method, while using two computer monitors for screening did not increase the screening efficiency in SR. Title‐first screening was found to be more efficient than simultaneous screening of titles and abstracts, although the gain in time with the former method was lesser than the latter. Therefore, considering that the search results are routinely exported as titles and abstracts, Robson et al. 14 recommend screening titles and abstracts simultaneously. However, the authors note that these conclusions were based on very limited number (in most instances one study per method) of low‐quality studies. 14

3.3.4. Methods for data extraction

Robson et al. 14 examined three approaches for data extraction relevant to reviewer characteristics, including number, experience, and blinding of reviewers (similar to the study selection step). Although based on limited evidence from a small number of studies, the authors recommended use of two experienced and unblinded reviewers for data extraction. The experience of the reviewers was suggested to be especially important when extracting continuous outcomes (or quantitative) data. However, when the resources are limited, data extraction by one reviewer and a verification of the outcomes data by a second reviewer was recommended.

As for the methods involving use of technology, Robson et al. 14 identified limited evidence on the use of two monitors to improve the data extraction efficiency and computer‐assisted programs for graphical data extraction. However, use of Google Translate for data extraction in non‐English articles was not considered to be viable. 14 In the same review, Robson et al. 14 identified evidence supporting contacting authors for obtaining additional relevant data.

3.3.5. Methods for RoB assessment

Two SRs examined the impact of blinding of reviewers for RoB assessments. 14 , 23 Morissette et al. 23 investigated the mean differences between the blinded and unblinded RoB assessment scores and found inconsistent differences among the included studies providing no definitive conclusions. Similar conclusions were drawn in a more recent review by Robson et al., 14 which included four studies on reviewer blinding for RoB assessment that completely overlapped with Morissette et al. 23

Use of experienced reviewers and provision of additional guidance for RoB assessment were examined by Robson et al. 14 The review concluded that providing intensive training and guidance on assessing studies reporting insufficient data to the reviewers improves RoB assessments. 14 Obtaining additional data related to quality assessment by contacting study authors was also found to help the RoB assessments, although based on limited evidence. When assessing the qualitative or mixed method reviews, Robson et al. 14 recommends the use of a structured RoB tool as opposed to an unstructured tool. No SRs were identified on data synthesis and CoE assessment and reporting steps.

4. DISCUSSION

4.1. summary of findings.

Nine SRs examining 24 unique methods used across five steps in the SR process were identified in this overview. The collective evidence supports some current traditional and modified SR practices, while challenging other approaches. However, the quality of the included reviews was assessed to be moderate at best and in the majority of the included SRs, evidence related to the evaluated methods was obtained from very limited numbers of primary studies. As such, the interpretations from these SRs should be made cautiously.

The evidence gathered from the included SRs corroborate a few current SR approaches. 5 For example, it is important to search multiple resources for identifying relevant trials (RCTs and/or CCTs). The resources must include a combination of electronic database searching, handsearching, and reference lists of retrieved articles. 5 However, no SRs have been identified that evaluated the impact of the number of electronic databases searched. A recent study by Halladay et al. 27 found that articles on therapeutic intervention, retrieved by searching databases other than PubMed (including Embase), contributed only a small amount of information to the MA and also had a minimal impact on the MA results. The authors concluded that when the resources are limited and when large number of studies are expected to be retrieved for the SR or MA, PubMed‐only search can yield reliable results. 27

Findings from the included SRs also reiterate some methodological modifications currently employed to “expedite” the SR process. 10 , 11 For example, excluding non‐English language trials and gray/unpublished trials from MA have been shown to have minimal or no impact on the results of MA. 24 , 26 However, the efficiency of these SR methods, in terms of time and the resources used, have not been evaluated in the included SRs. 24 , 26 Of the SRs included, only two have focused on the aspect of efficiency 14 , 25 ; O'Mara‐Eves et al. 25 report some evidence to support the use of text‐mining approaches for title and abstract screening in order to increase the rate of screening. Moreover, only one included SR 14 considered primary studies that evaluated reliability (inter‐ or intra‐reviewer consistency) and accuracy (validity when compared against a “gold standard” method) of the SR methods. This can be attributed to the limited number of primary studies that evaluated these outcomes when evaluating the SR methods. 14 Lack of outcome measures related to reliability, accuracy, and efficiency precludes making definitive recommendations on the use of these methods/modifications. Future research studies must focus on these outcomes.

Some evaluated methods may be relevant to multiple steps; for example, exclusions based on publication status (gray/unpublished literature) and language of publication (non‐English language studies) can be outlined in the a priori eligibility criteria or can be incorporated as search limits in the search strategy. SRs included in this overview focused on the effect of study exclusions on pooled treatment effect estimates or MA conclusions. Excluding studies from the search results, after conducting a comprehensive search, based on different eligibility criteria may yield different results when compared to the results obtained when limiting the search itself. 28 Further studies are required to examine this aspect.

Although we acknowledge the lack of standardized quality assessment tools for methodological study designs, we adhered to the Cochrane criteria for identifying SRs in this overview. This was done to ensure consistency in the quality of the included evidence. As a result, we excluded three reviews that did not provide any form of discussion on the quality of the included studies. The methods investigated in these reviews concern supplementary search, 29 data extraction, 12 and screening. 13 However, methods reported in two of these three reviews, by Mathes et al. 12 and Waffenschmidt et al., 13 have also been examined in the SR by Robson et al., 14 which was included in this overview; in most instances (with the exception of one study included in Mathes et al. 12 and Waffenschmidt et al. 13 each), the studies examined in these excluded reviews overlapped with those in the SR by Robson et al. 14

One of the key gaps in the knowledge observed in this overview was the dearth of SRs on the methods used in the data synthesis component of SR. Narrative and quantitative syntheses are the two most commonly used approaches for synthesizing data in evidence synthesis. 5 There are some published studies on the proposed indications and implications of these two approaches. 30 , 31 These studies found that both data synthesis methods produced comparable results and have their own advantages, suggesting that the choice of the method must be based on the purpose of the review. 31 With increasing number of “expedited” SR approaches (so called “rapid reviews”) avoiding MA, 10 , 11 further research studies are warranted in this area to determine the impact of the type of data synthesis on the results of the SR.

4.2. Implications for future research

The findings of this overview highlight several areas of paucity in primary research and evidence synthesis on SR methods. First, no SRs were identified on methods used in two important components of the SR process, including data synthesis and CoE and reporting. As for the included SRs, a limited number of evaluation studies have been identified for several methods. This indicates that further research is required to corroborate many of the methods recommended in current SR guidelines. 4 , 5 , 6 , 7 Second, some SRs evaluated the impact of methods on the results of quantitative synthesis and MA conclusions. Future research studies must also focus on the interpretations of SR results. 28 , 32 Finally, most of the included SRs were conducted on specific topics related to the field of health care, limiting the generalizability of the findings to other areas. It is important that future research studies evaluating evidence syntheses broaden the objectives and include studies on different topics within the field of health care.

4.3. Strengths and limitations

To our knowledge, this is the first overview summarizing current evidence from SRs and MA on different methodological approaches used in several fundamental steps in SR conduct. The overview methodology followed well established guidelines and strict criteria defined for the inclusion of SRs.

There are several limitations related to the nature of the included reviews. Evidence for most of the methods investigated in the included reviews was derived from a limited number of primary studies. Also, the majority of the included SRs may be considered outdated as they were published (or last updated) more than 5 years ago 33 ; only three of the nine SRs have been published in the last 5 years. 14 , 25 , 26 Therefore, important and recent evidence related to these topics may not have been included. Substantial numbers of included SRs were conducted in the field of health, which may limit the generalizability of the findings. Some method evaluations in the included SRs focused on quantitative analyses components and MA conclusions only. As such, the applicability of these findings to SR more broadly is still unclear. 28 Considering the methodological nature of our overview, limiting the inclusion of SRs according to the Cochrane criteria might have resulted in missing some relevant evidence from those reviews without a quality assessment component. 12 , 13 , 29 Although the included SRs performed some form of quality appraisal of the included studies, most of them did not use a standardized RoB tool, which may impact the confidence in their conclusions. Due to the type of outcome measures used for the method evaluations in the primary studies and the included SRs, some of the identified methods have not been validated against a reference standard.

Some limitations in the overview process must be noted. While our literature search was exhaustive covering five bibliographic databases and supplementary search of reference lists, no gray sources or other evidence resources were searched. Also, the search was primarily conducted in health databases, which might have resulted in missing SRs published in other fields. Moreover, only English language SRs were included for feasibility. As the literature search retrieved large number of citations (i.e., 41,556), the title and abstract screening was performed by a single reviewer, calibrated for consistency in the screening process by another reviewer, owing to time and resource limitations. These might have potentially resulted in some errors when retrieving and selecting relevant SRs. The SR methods were grouped based on key elements of each recommended SR step, as agreed by the authors. This categorization pertains to the identified set of methods and should be considered subjective.

5. CONCLUSIONS

This overview identified limited SR‐level evidence on various methodological approaches currently employed during five of the seven fundamental steps in the SR process. Limited evidence was also identified on some methodological modifications currently used to expedite the SR process. Overall, findings highlight the dearth of SRs on SR methodologies, warranting further work to confirm several current recommendations on conventional and expedited SR processes.

CONFLICT OF INTEREST

The authors declare no conflicts of interest.

Supporting information

APPENDIX A: Detailed search strategies

ACKNOWLEDGMENTS

The first author is supported by a La Trobe University Full Fee Research Scholarship and a Graduate Research Scholarship.

Open Access Funding provided by La Trobe University.

Veginadu P, Calache H, Gussy M, Pandian A, Masood M. An overview of methodological approaches in systematic reviews . J Evid Based Med . 2022; 15 :39–54. 10.1111/jebm.12468 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest Content
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 11, Issue 1
  • Validity and reliability of outcome measures to assess dysfunctional breathing: a systematic review
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0002-9067-2817 Vikram Mohan 1 ,
  • http://orcid.org/0000-0002-0049-8430 Chandrasekar Rathinam 2 , 3 ,
  • Derick Yates 3 ,
  • Aatit Paungmali 4 and
  • Christopher Boos 5 , 6
  • 1 Department of Rehabilitation and Sports Sciences, Faculty of Health and Social Sciences , Bournemouth University , Bournemouth , UK
  • 2 University of Birmingham , Birmingham , UK
  • 3 Birmingham Women's and Children's NHS Foundation Trust , Birmingham , UK
  • 4 Department of Physical Therapy, Faculty of Associated Medical Sciences , Chiang Mai University , Chiang Mai , Thailand
  • 5 Cardiology Department , University Hospitals Dorset NHS Foundation Trust , Poole , UK
  • 6 Faculty of Health and Social Sciences , Bournemouth University , Bournemouth , UK
  • Correspondence to Chandrasekar Rathinam; c.rathinam{at}bham.ac.uk

Objective This study aimed to systematically review the psychometric properties of outcome measures that assess dysfunctional breathing (DB) in adults.

Methods Studies on developing and evaluating measurement properties to assess DB were included. The study investigated the empirical research published between 1990 and February 2022, with an updated search in May 2023 in the Cochrane Library database of systematic reviews and the Cochrane Central Register of Controlled Trials, the Ovid Medline (full), the Ovid Excerta Medica Database, the Ovid allied and complementary medicines database, the Ebscohost Cumulative Index to Nursing and Allied Health Literature and the Physiotherapy Evidence Database. The included studies’ methodological quality was assessed using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) risk of bias checklist. Data analysis and synthesis followed the COSMIN methodology for reviews of outcome measurement instruments.

Results Sixteen studies met the inclusion criteria, and 10 outcome measures were identified. The psychometric properties of these outcome measures were evaluated using COSMIN. The Nijmegen Questionnaire (NQ) is the only outcome measure with ‘sufficient’ ratings for content validity, internal consistency, reliability and construct validity. All other outcome measures did not report characteristics of content validity in the patients’ group.

Discussion The NQ showed high-quality evidence for validity and reliability in assessing DB. Our review suggests that using NQ to evaluate DB in people with bronchial asthma and hyperventilation syndrome is helpful. Further evaluation of the psychometric properties is needed for the remaining outcome measures before considering them for clinical use.

PROSPERO registration number CRD42021274960.

  • Patient Outcome Assessment
  • Respiratory Measurement
  • Physical Examination

Data availability statement

Data are available in a public, open access repository. All the relevant data were available at https://osf.io/49hju/ .

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjresp-2023-001884

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

Clinicians commonly use various outcome measures to examine dysfunctional breathing (DB). Currently, no review is available that examines these outcome measures psychometric properties.

WHAT THIS STUDY ADDS

The psychometric properties of the available DB outcome measures in adults are reviewed. Nijmegen Questionnaire (NQ) is the only available outcome measure graded as ‘very high’ quality and evaluated by the COnsensus-based Standards for the selection of health Measurement INstruments tool.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

The existing outcome measures need to establish content validity and other psychometric properties prior to consideration for clinical use. NQ can be used to assess DB in the adult population.

Introduction

The normal breathing pattern consists of thoracic and abdominal cavity expansion during inhalation and retraction during exhalation. 1 Dysfunctional breathing (DB) deviates from the typical biomechanical pattern. 2 3 Barker and Everard (2015) proposed a definition for DB as ‘an alteration in the normal biomechanical patterns of breathing that results in intermittent or chronic symptoms that may be respiratory and/or non-respiratory’. 3 The DB subtypes include thoracic and extrathoracic patterns. 2 3 Thoracic DB is often observed in hyperventilation and extrathoracic DB in patients with paradoxical vocal cord dysfunction. 3 A DB has historically been identified under a variety of nomenclature; a few examples include thoracoabdominal asynchrony, breathing pattern dysfunction, breathing pattern disorder, unexplained breathlessness, psychological breathlessness, panic breathing, apical breathing, periodic deep sighing, hyperventilation and paradoxical breathing. 3 4 DB has an estimated prevalence of 29% and 8% in people with and without asthma, respectively. 5 This signifies that the general adult population and those with lung disease may experience DB with symptoms that may improve with treatment, contributing to improved quality of life (QoL). 6

Several respiratory disorders, such as bronchial asthma, sleep apnoea and chronic obstructive pulmonary disease, are reported to be linked with DB. 7–9 Breathlessness, chest tightness, anxiety, light-headedness and fatigue can occur in people with these illnesses and DB. 6–9 QoL, anxiety, sense of coherence and asthma control are significantly reduced in patients with DB, and breathing retraining has been shown to improve DB and health-related QoL. 10 11 Even though the DB is non-specific in some instances, it can lead directly to misdiagnosing respiratory disease in many situations. 4 Despite the clinical importance of evaluating DB, a consensus on the assessment method still needs to be reached. The potential impacts of DB on constructs like bodily biochemistry, psychological functioning and social aspects must also be considered in a comprehensive evaluation. 6 12–14

Clinical judgement and outcome measures enhance symptom-specific DB evaluation. An outcome measure that examines DB is required to guide suitable treatments. A range of objective evaluation instruments are available, including respiratory movement measuring instruments and respiratory inductive plethysmography. 15 16 These laboratory-based measurement methods offer identification of DB, and they have excellent reliability and validity. 16 17 However, these outcome measures cannot be used in routine clinical practice, especially in the community, due to time consumption, expensive equipment and the need for specific clinical environments. Clinicians often use various outcome measures to assess DB. 18–20 These include Hi-Lo breathing, 21 the Manual Assessment of Respiratory Motion (MARM), 21 the Self-Evaluation of Breathing Questionnaire (SEBQ), 22 the Breathing Pattern Assessment Tool (BPAT), 23 the Total Faulty Breathing Scale (TFBS) 24 and Nijmegen Questionnaire (NQ). 25

The available outcome measures use various methods to detect DB. For example, in MARM, the examiners use the palpation method to detect DB 21 ; Hi-Lo and TFBS assess breathing motion through observation 26 and NQ through self-reported measures. 15 21 25 Before any outcome measure is viable for routine clinical practice, validity and reliability must be established to ensure clinicians’ confidence in the measurement. To determine best practices for the assessment of DB, a systematic review of the existing literature to explore the reliability and validity of outcome measures is imperative. The systematic retrieval and appraisal of all literature about DB with a quantitative synthesis will lead to best practice guidelines for clinicians and researchers. This systematic review aims to provide a synthesis of outcome measures used to evaluate DB and appraise the psychometric properties of these outcome measures.

This study used the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) guidelines for systematic reviews of patient-reported outcome measures. 27–29 The methods of this systematic review follow the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) recommendations for systematic reviews and outcome measurement instrument selection, which are currently being piloted. 30 We registered this review protocol on PROSPERO (CRD42021274960) and updated the amendments regularly.

Search strategy

An experienced medical librarian (DY) carried out literature searches in the Cochrane Library database of systematic reviews and the Cochrane Central Register of Controlled Trials, the Ovid Medline (full), the Ovid Excerta Medica Database (Embase), the Ovid Allied and complementary medicines database, the Ebscohost Cumulative Index to Nursing and Allied Health Literature and the Physiotherapy Evidence Database (PEDro).

To perform the literature searches, a construct (DB), instrument (assessment instruments) and outcome (validity and reliability) framework were employed. Following a scoping search, relevant synonyms were found and validated as suitable and informative by the review team’s clinicians and academics. Searches were carried out to identify the relevant subject headings for those databases with a subject thesaurus (MeSH or Emtree) and text words in each database’s title and abstract fields. Proximity operators were used to combine search words together in the title and abstract fields to increase search sensitivity. To increase the precision of the results returned by the searches, the review team decided to include a NOT operator in the search strategies to screen out papers related to sleep apnoea at the database search stage.

Searches were run in February 2022 and repeated in May 2023 before study completion to ensure the review considered the most recently published research. Due to the limited search functionality of the PEDro, this was searched using separate individual search phrases to identify relevant research on DB. On 22 February 2022, five of those phrases were identified as abstracts, and these were ‘dysfunctional breathing’, ‘breathing disorder’, ‘thoracoabdominal synchrony’, ‘apical breathing’ and ‘respiratory dysfunction’. These phrases were searched again on 11 May 2023. Date limits were applied to screen out papers published before 1990. The rationale for this decision was that the term DB or breathing pattern dysfunction, only came into existence and began to be used commonly in the medical literature in 1990. A copy of the full search strategy run in Ovid Medline and other databases is available ( online supplemental file S1 ). The resulting references identified by the database searches were uploaded into the Endnote reference management software package to allow for an initial screening.

Supplemental material

Study selection.

The following inclusion criteria were considered: (1) an outcome measure that investigated the validity and/or reliability of DB in the adult population (18+ years) using clinician-reported and patient-reported outcome measures and (2) full articles and service evaluation reports published in a peer-reviewed journal in English. Exclusion criteria were studies that used laboratory-based outcome measures, systematic reviews, conference abstracts, research letters, commentaries and letters to the editor.

Data extraction (selection and coding)

Two independent reviewers (VM and CR) screened the titles and abstracts for relevancy using the inclusion and exclusion criteria. Reference lists of all included studies were also searched for relevant titles. The authors (VM and CR) retrieved full-text articles that met the study criteria. The first author (VM) article (TFBS) was included in this review; to mitigate conflict of interest and reduce bias, only CR investigated the articles related to TFBS. The PRISMA flow diagram of this procedure is depicted in figure 1 using the PRISMA 2020 statement. 31

  • Download figure
  • Open in new tab
  • Download powerpoint

Preferred Reporting Items for Systematic reviews and Meta-Analyses flow chart. AMED, Ovid allied and complementary medicines database; CINAHL, Ebscohost Cumulative Index to Nursing and Allied Health Literature; EMBASE, Ovid Excerta Medica Database; MEDLINE, Medical Literature Analysis and Retrieval System Online; PEDro, Physiotherapy Evidence Database.

Risk of bias and quality of results

The team used the COSMIN methodology for systematic reviews of patient-reported outcome measures (PROM) and clinician-reported outcome measures to evaluate the psychometric characteristics of outcome measures used in persons with DB. 27–29 The COSMIN PROM recommends using an outcome measure with ’sufficient' content validity and internal consistency. 27–29 The reviewers (VM and CR) individually extracted and evaluated the data for the first nine attributes listed in the COSMIN tool.

The COSMIN checklist was used to assess the methodological rigour of each outcome measure across the measurement attributes. These include reliability, validity and other psychometric properties. The methodologies provided for evaluating the measurement properties of all the outcome measures are included in this systematic review. Study quality was assessed separately for each measurement property using a four-point rating system (very good, adequate, doubtful, inadequate or not applicable). 29 The 'worst score counts' principle was used, where the overall rating for each measurement property is given by the lowest rating of any standard in the box. 28 29 The results of individual studies on measurement characteristics were compared with COSMIN criteria for good measurement qualities. Each outcome was graded as sufficient (+), insufficient (−) or indeterminate (?). Relevance, comprehensiveness and comprehensibility criteria were used to grade the quality of the results in research reporting on content validity.

The result of each study on a measurement property is rated using the most recent standards for good measurement properties. The total ratings of the study outcomes for each measurement property per outcome measure were summarised as sufficient (+), insufficient (−), indeterminate (?) or inconsistent (±). An overall rating was calculated by summing the scoring of each study; if 75% of the studies had the same scoring, that scoring became the overall rating (+ or −). However, if <75% of the studies had the same scoring, the overall rating would become inconsistent (±). If more than two articles were available, a summary of the overall evidence for measuring the properties of the outcome measures was determined. The lowest and highest results for each measurement property of an outcome measure are displayed to illustrate a set of findings that have been qualitatively aggregated.

The evidence’s quality was rated using a modified Grading of Recommendations Assessment, Development and Evaluation (GRADE) system, with grades of ‘high’, ‘moderate’, ‘low’ or ‘very low’. 27 The quality of the evidence was not rated for studies with an uncertain overall rating. For the quality assessment, two reviewers (VM and CR) independently worked on each stage while taking into account factors including the risk of bias, inconsistency, imprecision and indirectness. Starting with high-quality evidence, the quality of the evidence was reduced while considering all factors for the outcome measures. Disagreements were addressed by discussion and/or consultation with a third reviewer (AP).

Patient and public involvement

Patients were not involved in this review due to the complexity of evaluating the psychometric properties of the DB tools.

Our first search (22 February 2022) yielded 1735 references. After removing duplicates, 1246 references were included for title and abstract screening. In our second search (11 May 2023), we identified 144 references. After removing duplicates, 96 references were included for title and abstract screening. Sixteen papers met inclusion criteria, seven through database searching and nine through searching reference lists of included studies ( figure 1 ).

Overview of outcome measures

Our search identified the following ten outcome measures that have examined reliability and/or validity components: Breathing Vigilance Questionnaires (Breathe-VQ), 32 MARM, 15 21 NQ, 33–37 BPAT, 38 39 Hi Lo test, 21 clinical assessments of increased work of breathing, 40 Milstein Breathing Pattern Assessment Index (M-BPAI), 41 SEBQ, 14 22 TFBS 24 26 and Dyspnoea-12 (D12) questionnaire. 38 The Hi-Lo and D-12 scales were not included in this review for evaluation because they are not the primary scales used to assess DB. 21 38 Of the 16 studies, only nine included participants with DB, and the remaining seven included healthy participants. The COSMIN guidelines recommend testing the measurement properties on the target population. 27 However, the identified studies have used these outcome measures in patients and healthy people. Therefore, these groups’ measurement properties were given separately ( table 1 ( online supplemental file S2 ).

  • View inline

Characteristics of outcome measures in studies involving patients

Developmental and content validity studies

Developmental studies.

The evidence synthesis of the developmental and content validity of available outcome measures is summarised in table 2 . Of the eight outcome measures, only two were reported to have developmental and content validity properties. 32 33 35 36 A representative patient sample and a cognitive interview are required to develop an outcome measure. A cognitive interview study offers information on the items’ depth, especially their readability as an outcome measure. However, this was only followed in some of the included studies. All four studies involved experts 32 34–36 and three involved patients. 34–36 Concept elucidation was deemed ‘inadequate’ for Breathe-VQ and Korean-NQ because only healthy participants engaged in the studies. 32 33 Other NQ trials were rated ‘very good’ since the patients involved were typical of the target population.

Evidence synthesis of developmental, content validity of Nijmegen Questionnaire and Breathing Vigilance Questionnaire using COnsensus-based Standards for the selection of health Measurement INstruments checklist

Content validity studies

Three of the four articles on the content validity of NQ involved patients 34–36 and all four involved experts. 33–36 Of these three studies, patients’ relevance, comprehensiveness and comprehensibility were evaluated for one study only. 34 A cognitive interview was conducted for the Breathe-VQ, but the quality was ‘inadequate’ as it was not conducted in a patient population. 32 No studies on the development and content validity of TFBS, MARM or SEBQ were found. Only the NQ has been considered for rating, and it was judged as ‘sufficient’.

Risk of bias assessment rating of other measurement properties

The evidence synthesis for all outcome measures and additional measurement properties is summarised in table 3 ( online supplemental file S3 ) and Supplementary file S4— https://osf.io/49hju/ .

Methodological quality and rating of psychometric properties in studies involving patients

Internal structure

Among the included studies, only three reported structural validity. 32 34 37 Two studies explored the structural validity of the NQ measure, 34 37 and one study explored the Breathe-VQ measure. 26 NQ structural validity was examined using the Rasch model and exploratory factor analysis with ‘very good’ and ‘inadequate’ quality. 34 37 For structural validity, the Breathe-VQ study employed exploratory/confirmatory factor analysis of ‘very good’ quality. 32 The internal consistency of the NQ, as measured by Cronbach’s alpha, ranged from >0.70 to 0.92 with a ‘very good’ quality and ‘sufficient’ rating. 34 35 However, Rasch and factor analysis do not apply to other outcome measures, especially clinician-reported outcome measures.

Reliability

In total, 10 studies reported reliability measures. M-BPAI 41 and TFBS 24 26 were rated to have ‘very good’ methodological quality and ‘sufficient’ rating. 41 The methodological quality and rating were the same as the MARM and SEBQ. 15 21 22 Only one study that reported the reliability of clinical assessment of the work of breathing exhibited ‘adequate’ methodological quality and was ‘indeterminate’ for the rating. 40 The test-retest reliability values for NQ were in the range of 0.90 and 0.98, corresponding to ‘very good’ to ‘adequate’ quality. 34–36 It was also judged that the NQ’s overall rating was ‘sufficient’. The correlation value ranges from 0.81 to 0.82 when analysing NQ’s hypothesis testing for construct validity, indicating ‘very good’ quality and ‘sufficient’ rating. 34 35 A more comprehensive evidence synthesis for these and other outcome measures is available in Supplementary file S4— https://osf.io/49hju/ .

GRADE quality

The reviewers used GRADE to assess the quality of studies that involved participants with respiratory disease since the clinical application would be acceptable in the actual patient population. As a result, only the NQ that included individuals with asthma and hyperventilation syndrome was included in the GRADE quality assessment. The evidence quality is ‘high’ for the NQ in reliability and hypothesis testing for construct validity domains but ‘low’ for cross-cultural and structural validity domains. The GRADE quality assessment cannot be applied to the remaining outcome measures.

This systematic review presented an overview of outcome measures used to assess DB and evaluated the psychometric properties of outcome measures used in healthy and DB populations. NQ is the only outcome measure with sufficient psychometric properties to be considered by clinicians for the DB assessment.

Nijmegen Questionnaire

NQ’s measurement properties have gained much attention due to its long record and frequent use in DB assessment, notably in conditions including bronchial asthma and hyperventilation syndrome. 39 42 The available evidence indicated that the NQ had been evaluated using rigorous methods, and its content validity, internal consistency and reliability were commonly reported. This outcome measure has been translated into other languages, but for one of the translated versions, the PROM development and content validity were not well documented. 34 However, other measurement properties were well established. 34

PROM development and content validity studies were not consistent across the included studies. This is due to variations in the methodological description, and it was the least reported psychometric property, followed by structural validity and hypothesis testing for construct validity. Despite this, the reviewers have used the COSMIN tool to infer the quality of PROM and content validity, and the NQ was found to have most of the measurement properties with ‘sufficient’ quality. This is an area that needs further exploration in future studies. In addition, the language and structure of the items used in the NQ need improvement. For instance, item NQ14 (cold hands or feet) does not fit the structural validity, and similarly, item NQ9 (bloated feeling in the stomach) also does not fit the Rasch model. 37 Since NQ looks at many DB dimensions, these factors could be considered for prospective use.

Breathe-VQ and BPAT

Breathe-VQ is the next potential outcome measure that can be used in the DB population because the measurement properties, such as structural validity, internal consistency, reliability, measurement error, criterion validity and hypothesis testing for construct validity, are well established. 32 The Breathe-VQ is best suited to assess changes related to excessive conscious breathing rather than as an outcome measure for diagnosing the DB disorder. In contrast to the NQ, the Breathe-VQ has only been examined in one study; therefore, more research is required to determine its use in the DB population before considering it for clinical use. 32 It may be helpful to use NQ with Breathe-VQ to identify excessive conscious breathing caused by anxiety. The same comments apply to the BPAT, which has proven criterion validity for patients with asthma, breathing pattern disorder and post-COVID breathless individuals. 38 39 BPAT is more suitable for evaluating breathing irregularities in the DB population. However, BPAT is still in the trial phase, and its clinical utility has yet to be determined.

Other outcome measures

The remaining outcome measures, such as MARM, clinical evaluation of increased effort of breathing, TFBS, SEBQ and M-BPAI, had examined only a few psychometric properties. 15 21 22 24 26 40 41 The reviewers could only comment on its clinical utility once the remaining properties had been thoroughly investigated.

Limitations

This review excluded grey literature, conference abstracts, poster abstracts and dissertations; therefore, potential studies could have been missed. The second ordered reference check was not done, which may lead to missing other relevant studies. Only English-language studies were considered for this review, which may have reduced the number of potentially acceptable studies in other languages in the DB population. Another limitation is the lack of primary data, which prevented the review team from conducting a meta-analysis. The reviewers had no specific training to use the COSMIN. Instead, they relied on their clinical and scholarly experience to reach an agreement. This might affect how studies are rated for quality. However, the review team mitigated this by sending the collected data to the corresponding authors of the included studies for verification, comments and triangulation.

Future consideration

Only five papers in our review briefly described the process of developing outcome measures and content validity. Determining whether the outcome measure development process had been rigorously carried out or was selectively reported is challenging. This might imply that the available outcome measures do not satisfy practitioners’ expectations or recognise the researchers’ requirements. An outcome measure with ‘inadequate’ content validity or a lack of evidence of content validity has questionable use in clinical practice. Therefore, particular attention should be given to determining the content validity of those outcome measures that do not possess this property. Detailed information on the outcome measure development process and content validity should be provided in future research. The reviewers recommended addressing these aspects in future studies.

It should be noted that the COSMIN checklist is both comprehensive and rigorous in its quality. Any other outcome measures considered here are unlikely to fulfil the standards. As a result, some of the outcome measures are rated as ‘inadequate’ quality. However, the authors recommend considering these measurement properties when constructing an outcome measure that fulfils the stringent criteria.

Conclusions

This review found 10 outcome measures used to assess DB. The NQ is the only outcome measure that showed evidence quality to be ‘high’ for internal consistency and hypothesis testing for construct validity and reliability. The evidence quality is ‘low’ for NQ structural validity and cross-cultural validity. The measurement properties of NQ are sufficient to recommend its use as part of a clinical application of DB. Most outcome measures have examined only a few psychometric properties; a more comprehensive investigation of all psychometric properties is needed before considering their clinical use. Future research on the existing outcome measure or developing a new outcome measure may follow the COSMIN guidelines.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

Acknowledgments.

We thank Rachel Senior, Specialist Physiotherapist, Dorset Adult Integrated Respiratory Service, Dorset, for her expertise and comments about the outcome measures; Dr Kathryn Collins, Bournemouth University, for her contribution in editing and all the authors of the included studies for extracted data verification and corrections.

  • Ratnovsky A ,
  • CliftonSmith T ,
  • Boulding R ,
  • Niven R , et al
  • McKinley RK ,
  • Freeman E , et al
  • Vidotto LS ,
  • Carvalho CRF de ,
  • Harvey A , et al
  • McKeown P ,
  • O’Connor-Reina C ,
  • Cartwright M ,
  • Hirani S , et al
  • Yardley L , et al
  • Connolly CK
  • Courtney R ,
  • Greenwood KM
  • van Dixhoorn J ,
  • Gunnesson IL ,
  • Chapman EB ,
  • Hansen-Honeycutt J ,
  • Nasypany A , et al
  • Mitchell AJ ,
  • Bondarenko J ,
  • Button B , et al
  • Paungmali A , et al
  • Dixhoorn JV ,
  • Folgering H
  • Branney J ,
  • Clark C , et al
  • Prinsen CAC ,
  • Mokkink LB ,
  • Bouter LM , et al
  • Terwee CB ,
  • Chiarotto A , et al
  • de Vet HCW ,
  • Prinsen CAC , et al
  • Elsman EBM ,
  • Butcher NJ ,
  • Mokkink LB , et al
  • McKenzie JE ,
  • Bossuyt PM , et al
  • Steinmann J ,
  • Ellmers TJ , et al
  • Grammatopoulou EP ,
  • Skordilis EK ,
  • Georgoudis G , et al
  • Ravanbakhsh M ,
  • Nargesi M ,
  • Raji H , et al
  • Piya-Amornphan N
  • Li Ogilvie V ,
  • Francis C , et al
  • Walsted ES ,
  • Grillo L , et al
  • Tulaimat A ,
  • Wisniewski M , et al
  • Milstein CF ,
  • Laurash E , et al
  • Johnston R ,
  • Gow AM , et al

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1
  • Data supplement 2
  • Data supplement 3

X @VikramMohan10, @DJY-LIB-EBP

Contributors VM, CR, AP and CB were involved in study conceptualisation. VM and CR were responsible for screening, selecting articles and data entry, data interpreting, reporting and for preparing the final manuscript. DY was responsible for constructing search strategy and conducting searches. VM and CR are acting as guarantors for the work. All authors read, provided feedback and approved the final manuscript.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests VM and AP are authors of some of the included articles. They were not engaged in evaluating the methodological quality of these articles. They have no other competing interests. The first author (VM) article (TFBS) was included in this review; to mitigate conflict of interest and risk of bias, only CR investigated the articles related to TFBS.

Patient and public involvement statement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

Correlates of teachers’ classroom management self-efficacy: A systematic review and meta-analysis

  • Meta-Analysis
  • Open access
  • Published: 12 April 2024
  • Volume 36 , article number  43 , ( 2024 )

Cite this article

You have full access to this open access article

  • Siyu Duan 1 , 2 ,
  • Kerry Bissaker   ORCID: orcid.org/0000-0002-8008-3744 2 &
  • Zhan Xu 1  

354 Accesses

3 Altmetric

Explore all metrics

This meta-analysis examined literature from the last two decades to identify factors that correlate with teachers’ classroom management self-efficacy (CMSE) and to estimate the effect size of these relationships. Online and reference list searches from international and Chinese databases yielded 1085 unique results. However, with a focus on empirical research the final sample consisted of 87 studies and 22 correlates. The findings cluster the correlates of CMSE into three categories: teacher-level factors (working experience, constructivist beliefs, teacher stress, job satisfaction, teacher commitment, teacher personality, and teacher burnout), classroom-level factors (classroom climate, classroom management, students’ misbehaviour, students’ achievement, classroom interaction, and student-teacher relationship), and school-level factors (principal leadership and school culture). The results of this meta-analysis show small to large correlations between these 15 factors with CMSE. How these factors are associated with teachers’ CMSE and recommendations for future CMSE research are discussed.

Similar content being viewed by others

is a systematic literature review empirical research

Iranian EFL teachers’ self-efficacy: structural equation modeling of the consequences

Mohammadnasser Mossafaie, Goudarz Alibakhshi & Hossein Heidari Tabrizi

is a systematic literature review empirical research

Teachers’ Psychological Characteristics: Do They Matter for Teacher Effectiveness, Teachers’ Well-being, Retention, and Interpersonal Relations? An Integrative Review

Lisa Bardach, Robert M. Klassen & Nancy E. Perry

is a systematic literature review empirical research

Teacher self-efficacy and reform: a systematic literature review

Danielle Gordon, Christopher Blundell, … Terri Bourke

Avoid common mistakes on your manuscript.

Introduction

Classroom management is frequently defined as “the actions teachers take to create an environment that supports and facilitates both academic and social-emotional learning (p.4)” (Evertson & Weinstein, 2006 ). There is a consensus that classroom management no longer simply refers to responses towards student misbehaviour, rather it is serves as “an umbrella term for an array of teaching strategies that enhance effective time use in class (p.2)” (Lazarides et al., 2020 ). Effective classroom management is highly related to students' academic, behavioural, and social-emotional outcomes (Korpershoek et al., 2016 ), as well as teachers' wellbeing (Sutton et al., 2009 ). To effectively manage classrooms, teacher must possess professional knowledge, skills, and efficacy beliefs in their classroom management capability (Main & Hammond, 2008 ). Classroom management self-efficacy (CMSE) is a teacher’s belief about his or her capabilities to organize and execute the courses of action required to create a positive learning environment that supports successful student learning outcomes (Lazarides et al., 2018 ).

Self-efficacy is a self-perception of one's capacity to accomplish a certain task (Bandura, 1977 ), which has been well represented in educational research. Teacher self-efficacy (TSE) refers to a teacher’s belief of his or her capabilities to perform a specific teaching task successfully in a particular teaching context (Tschannen-Moran et al., 1998 ), with a growing acknowledgment of its influence on important outcomes for teachers and students (Klassen & Tze, 2014 ). According to Bandura ( 1997 ), individual’s self-efficacy is reflected and evaluated through interpreting information from four sources, namely mastery experiences , vicarious experiences , social persuasions , and emotional arousal . Tschannen-Moran et al. ( 1998 ) proposed an integrated model of teacher self-efficacy applying these four sources of efficacy information. They suggested that the development of teacher self-efficacy is cyclical with teachers’ interpretations of efficacy-relevant information affecting teacher self-efficacy. This in turn has an impact on the setting of teaching objectives, teaching effort and persistence in managing challenging situations. The performance of completing a teaching task, either successfully or not, becomes a source of new efficacy information, which may have either positive or negative effects on TSE.

Of the four sources of influence on TSE, mastery experiences, which involve teachers achieving their desired goals, are often viewed as the strongest predictor of TSE (Usher & Pajares, 2008 ) TSE in the school context has been linked to students’ successful academic achievements and positive classroom climate (Klassen & Tze, 2014 ). Vicarious experiences including the observation of highly effective teachers or noting students’ preferred teachers provide another source of influence on teachers’ TSE. Individual’s TSE may be influenced either positively or negatively as they engage in reflecting on their own personal teaching competence or relationships with their students in comparison to those of colleagues. Social persuasions, often in the form of feedback from experts or students, is a particularly powerful influence for preservice and novice teachers (individuals with little prior experience) as they can either develop more confidence or doubt their capacities to be a successful teacher (Tschannen-Moran & Hoy, 2007 ). Finally, emotional arousal is elicited by stress, anxiety and tension and can affect individual’s perceived self-efficacy when coping with tough demanding situations, for example, novice teachers receiving either positive or negative feedback from leaders or reactions from students and/or parents (Marschall, 2023 ; Morris et al., 2017 ). Together these four sources, being both intrinsic and extrinsic, generate either positive or negative self-efficacy in a specific teaching context.

The specificity of context is important to individual’s TSE. Although teachers might feel efficacious in one area of instruction or with one group of students, they may also report low confidence in different areas or student group (Tschannen-Moran & Hoy, 2001 ). Acknowledging the context-specific nature of TSE, some researchers began exploring TSE in specific areas, such as science teaching (Riggs & Enochs, 1990 ), language and literacy teaching (Cantrell & Callaway, 2008 ), mathematics teaching (Bardach et al., 2022 ), special/inclusive education (Coladarci & Breton, 1997 ; Woodcock et al., 2022 ), teaching with technology (Alt, 2018 ), and classroom management (Dicke et al., 2014 ; Emmer & Hickman, 1991 ; Hettinger et al., 2021 ; Lazarides et al., 2018 ; Tschannen-Moran & Hoy, 2001 ). Tschannen-Moran et al. ( 1998 ) noted that determining an optimal specificity level of TSE is challenging. TSE measures are most useful and generalizable when measures refer to specific teaching activities and tasks, but their predictive power is limited to specific skills and contexts. Classroom management, a critical teaching skill domain rather than a particular context (e.g., teaching science), has therefore attracted the attention of TSE researchers (O'Neill & Stephenson, 2011 ). Aloe et al. ( 2014 ) examined the relationship between CMSE and teacher burnout, acknowledging that CMSE is a domain specifical construct.

CMSE has already been identified as a distinct dimension of TSE, both for pre-service teachers (Emmer & Hickman, 1991 ) and in-service teachers (Tschannen-Moran & Hoy, 2001 ). To measure self-efficacy for classroom management, some researchers (e.g., Brouwers & Tomic, 2000 ; Hettinger et al., 2023 ) used sub-scales of TSE scales, such as Teachers’ Sense of Efficacy Scale (TSES) (Tschannen-Moran & Hoy, 2001 ), and Teacher Efficacy Scale (TES) (Emmer & Hickman, 1991 ). Specific scales investigating CMSE, however, have also been developed. For example, the Behaviour Management Self-Efficacy Scale was designed to measure preservice teachers’ self-efficacy of classroom management (Main & Hammond, 2008 ). The items used to measure CMSE were mainly for maintaining order and control in classrooms and facilitating student socialisation and cooperation. Whereas the other aspects of classroom management, establishing and enforcing rules, gaining and maintaining engagement, and resources allocation, were less represented in CMSE measurements (O'Neill & Stephenson, 2011 ).

Research findings identified that teachers with high levels of CMSE show more interests in using student-centred strategies to approach problems (Emmer & Hickman, 1991 ), hold more humanistic classroom management beliefs (Woolfolk & Hoy, 1990 ), feel more empowered to help students in social-emotional (Reilly, 2002 ) and academic areas (Lazarides et al., 2018 ) as well as in the area of classroom behaviour (Dicke et al., 2014 ). However, high general TSE do not ensure high levels of CMSE. Understanding the factors of influence on teachers’ CMSE is important for eductaion practitioners, policy development, preservice teacher programs and researchers. Therefore, this article examines the factors that serve to shape CMSE and to provide an overview of the current research about CMSE and how CMSE could be improved.

To date, there have been multiple reviews of global TSE conducted in reviewing several distinct areas: the measurements of TSE (Tschannen-Moran et al., 1998 ); summarizing key issues surrounding the TSE research (Klassen et al., 2011 ); examining the effectiveness of interventions on TSE (McArthur & Munn, 2015 ); synthesizing the research exploring the relationship between TSE and teaching effectiveness (Klassen & Tze, 2014 ; Tschannen-Moran et al., 1998 ), and teacher burnout (Brown, 2012 ). However, limited attention has been paid specifically to reviewing research about CMSE with O'Neill and Stephenson ( 2011 ) conducting a comprehensive review of CMSE items and scales and Aloe et al. ( 2014 ) examining the evidence of CMSE in relation to teacher burnout. By reviewing factors that correlate with CMSE, our paper also makes contributions to TSE theory and clarifies the special features of TSE in specific area of classroom management.

Conceptualizing the review

To frame our conceptual understanding of CMSE, we draw on the three categories of factors related to TSE as defined by Fackler et al. ( 2021 ) and explore in more depth some of these factors as they align to recent research. As presented in Fig. 1 , the proposed conceptual framework guided our synthesis of the literature, which indicated that factors associated with CMSE can be divided into three main strands: (1) personal characteristics of teacher (teacher-level factors); (2) characteristics of classroom composition (classroom-level factors); (3) teachers’ working conditions (school-level factors).

figure 1

Conceptual Framework for Synthesizing Empirical Research on CMSE

First, the conceptual framework includes teacher personal characteristic, facilitating our understanding of how teacher background (e.g., gender, age, working experience, educational level) influences teachers’ beliefs towards the area of classroom management. Previous studies suggest that there is no age effect on teachers’ CMSE (Ford, 2019 ; Hicks, 2012 ; Lazarides et al., 2020 ; Lee & van Vlack, 2018 ). Unlike age, mixed results were found for the relationship between other demographic variables and CMSE. Some found a positive relationship for female teachers (Calkins et al., 2021 ; Zee et al., 2016 ), for male teachers (Hettinger et al., 2021 ; Tran, 2015 ), but also no gender effect (Lazarides et al., 2020 ). Some found teacher education level had a positive relationship with CMSE (Hu et al., 2021 ; Valente et al., 2020 ) but some found a negative relationship (Fackler et al., 2021 ). A higher level of CMSE was identified for more experienced teachers (Brouwers & Tomic, 2000 ; Klassen & Chiu, 2010 ), whereas no changes in CMSE were found from early until mid-career teachers (Lazarides et al., 2020 ).

In addition to demographic variables, many studies have examined the relationship between teachers’ CMSE and psychometric constructs. Self-efficacy has been viewed as a protective factor for teacher against psychological strain (Lazarides et al., 2020 ; Schwerdtfeger et al., 2008 ). Teachers who perceive they have sound ability to manage the classroom are less prone to increased stress levels. The findings of extant research are as expected indicating that CMSE is negatively related to psychological strain (teacher stress, teacher burnout) (Brouwers & Tomic, 2000 ; Eddy et al., 2019 ; Vidic et al., 2021 ; Williams, 2012 ) and positively related to teachers’ wellbeing (job satisfaction, teacher commitment) (Dicke et al., 2018 ; Klassen & Chiu, 2010 ; Liu et al., 2018 ; Miller, 2020 ; von der Embse et al., 2016 ).

Personality traits are general behavioural tendencies that may influence how efficacy information is evaluated and/or have an effect on people’s behaviour, and in turn, influence the evaluation of self-efficacy (Baranczuk, 2021 ). The Big Five, known as the most widely accepted taxonomy of personality traits, consists of neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness (McCrae & Costa, 2003 ). Several studies have also examined the relationship between CMSE and teacher’s personality assessed by the Big Five Inventory suggesting that openness and extraversion have positive relationships with CMSE (Bullock et al., 2015 ; Rahimi & Saberi, 2014 ). Teachers’ constructivist beliefs, which refers to “the ways they (teachers) believe students learn best and how they as teachers might facilitate this learning” (OECD, 2014 , p.165), have also shown a positive relationship with CMSE (Berger et al., 2018 ; Fackler et al., 2021 ).

Turning to the second category, researchers have examined several characteristics of classroom composition, ranging from class setting to teachers’ behaviour towards student, and students’ outcomes. It was suggested that a bigger class size was associated with a high level of teacher global self-efficacy (Raudenbush et al., 1992 ). However, for teacher self-efficacy in classroom management in particular, a negative association with large class size was found (Kunemund et al., 2020 ). Self-efficacy beliefs has been suggested to affect individuals’ behaviours at different level, from initial choice of behaviour type to the amount of effort and the extent of persistence in the implementation process (Bandura, 1977 ). CMSE theoretically acts as a mediator factor between teachers’ knowledge and practice towards the area of classroom management. Positive aspects of classroom management (Chen et al., 2020 ; Lazarides et al., 2020 ) were positively related to CMSE. On the other hand, the success that teachers achieve in the classroom (mastery experiences) such as good classroom climate (Guangbao & Timothy, 2021 ), students’ achievement (Hassan, 2019 ), high quality classroom interaction (Ryan et al., 2015 ), and positive student-teacher relationship (Zee et al., 2017 ) were suggested to inform teachers’ CMSE.

According to social cognitive theory, human behaviour is a product of the interaction between personal processes and the external environment (Bandura, 1986 ). School-level factors representing, to some extent, the environment and conditions in which teachers work, have increasingly gained the attention of researchers of CMSE. Similar to teachers, the relationship between general teacher self-efficacy and principal demographics (e.g., gender, age, working experience) has been examined. For TSE in classroom management in particular, mixed results were found for principal gender, with a positive association for male principals (Fackler et al., 2021 ), but also no effect based on principal gender (Ford, 2019 ). In regard to time-related characteristics like principal age and working experience, a negative association was found (Fackler et al., 2021 ). Another factor that has been reported by many studies (Bellibas & Liu, 2017 ; Buentello, 2019 ; Fackler et al., 2021 ; Ford, 2019 ; Holzberger & Prestele, 2021 ) to have a significant impact on teachers' perceptions of CMSE is principal leadership. For school organizational characteristics, some studies (McLeod, 2012 ; Öztürk et al., 2021 ) have examined the relationship between school culture and CMSE, some (Fackler et al., 2021 ; George et al., 2018 ) have examined CMSE between teachers in private school and public school, while some (Fackler et al., 2021 ; Looney, 2004 ) have examined CMSE for teachers located in different school size (number of student enrolments).

Overall, the conceptual framework depicted in Fig. 1 captures correlates of CMSE that are both theoretically and empirically tested. In light of the ongoing inconsistency of findings in these studies, we then provided a meta-analysis of CMSE, outlining the extent to which it is correlated with teacher-level factors, classroom-level factors and school-level factors. We also used this framework to guide the discussion and to clarify the areas that need further exploration.

Literature Search

This meta-analysis selected empirical research results published in international and Chinese databases between 2000 and 2021. We searched commonly used social science databases (e.g., ERIC, Web of Science, Academic Search Complete, PsycArticles and PsycINFO). Chinese literature was searched mainly from CNKI (China National Knowledge Infrastructure), Wanfang, and Weipu Database. Given the limited number of studies that solely focussed on CMSE, we used the descriptors: " self-efficacy " and/or " efficacy " as keywords, and “ teacher ” and " classroom management " and/or " behaviour management " as abstract search terms. The searches were restricted to return peer-reviewed articles and dissertations (conference papers, books and book chapters were excluded as the review process can be non-standard). In addition to searching databases, we also included an examination of reference lists of the existing reviews of CMSE (Aloe et al., 2014 ; O'Neill & Stephenson, 2011 ).

Eligibility Criteria

The primary studies eligible for inclusion in this meta-analysis met the following criteria: (1) the study measured teacher self-efficacy for classroom management; (2) the study examined the relationship between at least one factors and teachers’ CMSE instead of just reporting descriptive data of CMSE; and (3) the study reported statistical data (e.g., Pearson r , sample size) to quantify the relationship between a factor of interest and teachers’ CMSE. Moreover, given evidence that classroom management content differs between general and special education (Stough, 2006 ) this meta-analysis exclusively examines studies in which (4) the sample comprised of in-service teacher in mainstream school settings. Finally, to facilitate the meta-analysis, (5) correlates included in two or more studies were included in the meta-analysis.

Selection and Inclusion of Studies

In the first phase of the literature search a total of 1,386 Chinese and English articles were identified using the above search strategy, and 301 duplicates were excluded, resulting in a total of 1085 articles. We went through a three-phase process to screen primary studies included in this meta-analysis as illustrated by Fig. 2 . First, 1,085 studies were examined by reviewing the titles and the abstracts, 756 studies were qualitative research, or not focused on CMSE so were not appropriate to include in the study.

figure 2

PRISMA flow diagram. Adapted from Moher et al., ( 2009 )

In phase two, 329 articles were left for full-text reading to ensure that studies reported clear, explicit, and complete data on the findings of the research. The result of this examination found that 81 of the research studies in the pool were appropriate, and 248 were found not to be suitable (n=15, qualitative analysis; n=13, full-text not available; n=60, missing data; n=38, reported global TSE; n=17, not reported CMSE; n=76, less than two studies focusing on related variables; N=29, miscellaneous reasons).

In phase three, we backwards searched eligible articles in the reference lists of previous reviews (Aloe et al., 2014 ; O'Neill & Stephenson, 2011 ) and included six additional studies. Of these six studies, the abstracts of five (Bumen, 2010 ; Huk, 2011 ; Skaalvik & Skaalvik, 2007 ; Williams, 2012 ; Yoon, 2002 ) did not mention the term “classroom management” or “behaviour management” and one (Ozdemir, 2007 ) was not included in commonly used databases, resulting in the omission of these literature from our retrieval. Hence, the present meta-analysis included a sample of 87 primary studies (see supplementary information  for details of studies selected).

Study Coding

Studies that met all inclusion criteria were reviewed and all factors associated with teachers’ CMSE in these studies were coded. Studies were coded according to year of publication, publication type, country of origin, school level, CMSE measurement, sample size and reported effect size. Since the primary goal was to synthesize estimates of the relationship between teachers’ CMSE and various factors, the primary effect size we coded was Pearson r. In order to include as many studies as possible, several formulas were used to compute effect sizes (Pearson r):

(1) If only Spearman r was reported in studies, we converted it to Pearson r using the following equation \({r}_s=\frac{6}{\pi }\ {\mathit{\sin}}^{-1}\frac{r}{2}\) (Xiao et al., 2021 ).

(2) If only a β coefficient was reported in studies (β∈(−0.5, 0.5)), then the following equation was used r  =  β  ∗ 0.98 + 0.05( β  ≥ 0); r  =  β  ∗ 0.98 − 0.05( β  < 0) (Peterson & Brown, 2005 ).

(3) If only a t-test value was reported in studies, then the following equation was used \(r=\sqrt{\frac{t^2}{t^2+ df}}\) (Card, 2012 ).

(4) If studies conducted an ANOVA between two groups (i.e., F (1, df ) ), then the following equation was used \(r=\sqrt{\frac{F_{\left(1, df\right)}}{F_{\left(1, df\right)}+ df}}\) (Card, 2012 ).

(5) If x 2 with 1 degree of freedom was reported in studies, then the following equation was used \(r=\sqrt{\frac{x_{(1)}^2}{N}}\) (Card, 2012 ).

Coding for study characteristics and effect sizes was done by the first author. A randomly selected sample of 50% of the included studies ( K =44) were coded a second time by the first author to establish intracoder reliability (Wilson, 2019 ). Estimates of intracoder reliability were recorded for each variable. The agreement rate was higher than 95% for all variables of interests.

In addition, given that more than one correlation value were reported in some studies, two approaches were used in the determination of which one was to be included in this meta-analysis: (1) if the correlations were independent (e.g., Wettstein et al., 2021 ), all the correlations were included in the analysis and were considered to be independent studies, and (2) if the correlations were dependent (e.g., Lazarides et al., 2020 ), then the highest correlation value was recorded.

Meta-Analytical Procedure

This meta-analysis was conducted with the aid of Comprehensive Meta-Analysis version 3.0 Footnote 1 . Pearson's correlation coefficient r was determined to be the effect size in this study. Specifically, r values were firstly converted to Fisher’s Z scale,

then the transformed values were used to calculate the aggregated correlation coefficients, and finally we converted summary Fisher’s Z back to correlation coefficients r to obtain the final overall effect sizes (Borenstein et al., 2009 ).

If the original study only reported r coefficients for each dimension of the variable (e.g., Eddy et al., 2019 ), these coefficients were convert to Fisher’s Z to compute mean scores, and then converted back to r values (Yali et al., 2019 ). A random effect model was used for this meta-analysis because substantial variation exists across studies in terms of various factors that may correlate with teachers’ CMSE.

The following indicators were taken into account in this analysis:  k (the number of studies included in the meta-analysis), r (the average effect size expressed in Cohen’s index (Cohen, 1988 ), with values around 0.1 considered a small effect, around 0.25 a medium effect, and 0.4 or higher a large effect), lower limit and upper limit effect sizes (the values of the 95% confident interval), Z values (statistical test for the null hypothesis regarding the average effect size), and the indicators of heterogeneity, namely Q and I 2 .

The Q test is used to test whether the total heterogeneity of the weighted mean effect sizes was statistically significant. The I 2 index provides estimates of the degree of inconsistency in the observed relationship across studies, and values of 0.25, 0.5, and 0.75 indicate low, medium, and high levels of heterogeneity (Borenstein et al., 2009 ). For heterogeneity, we performed moderator analysis. Given evidence that classroom management content differs at different school level (Evertson & Weinstein, 2006 ) moderators tested here included school level. In addition to sample characteristic, the characteristics of the study itself are often also responsible for heterogeneity. Hence, we also included year of publication, publication type, and country of origin as moderators. In the first step in this process subgroup analysis was used for categorical moderators (publication type, country of origin, and school level), which estimated synthetic effects for each category. Specially, we used a Q-test based on analysis of variance to compare subgroups. At the second step, for a non-categorical moderator (year of publication), meta-regression analysis was used to test if the variable was a significant covariate within the meta-regression model.

Characteristics of Included Studies

Based on the eligibility criteria above, 87 primary studies were included in the meta-analysis (see Table 1 ). It is worth noting that most (87.36%) of these eligible 87 studies were published between 2010-2021 and over half (68.97%) of the 87 studies included in this meta-analysis were published in peer-reviewed journals. As can be seen from Table 1 , research on teachers’ CMSE included here was most frequently undertaken in the USA (n=32). Almost half of the included studies (n=49) were conducted with a sample size between 100 and 500 observations. Additionally, three studies used extensive data with more than 100,000 observations (Bellibas & Liu, 2017 ; Fackler et al., 2021 ; Yuan & Jinjie, 2019 ). As for the school level, most studies were conducted with middle school (19), elementary school (14), high school (14), and pre-kindergarten (4) teachers. There were also 4 studies focused on higher education and one was designed for vocational education (Berger et al., 2018 ). The vast majority of studies (n=61) have used the classroom management sub-scale of the Teachers’ Sense of Efficacy Scale (TSES) (Tschannen-Moran & Hoy, 2001 ) or its adapted version to access teachers’ CMSE, either its long form (e.g., Sims et al., 2021 ) or short form (e.g., Guangbao & Timothy, 2021 ).

Overall meta-analytic effect sizes

The analysis included 87 samples, 189 correlations, and a total listwise sample size over 480,000. Overall, 22 factors that correlated with CMSE have been generated from these 87 studies (see details of which factors were included in which studies in the  supplementary information ). The results of this meta-analysis show small to large correlations between 15 factors and CMSE (see Tables 2 , 3 , 4 ).

Teacher-level Factors

Table 2 presents the overall effects of the association between various teacher-level factors and teachers’ CMSE. Through the eligibility criteria, the present study identified four teacher demographic variables (See Panel A of Table 2 ). Our results showed no evidence that in-service teachers’ sense of self-efficacy in the area of classroom management varied with age, or gender, or educational level. We did find that teachers’ CMSE was positively correlated with teachers’ working experience, which indicated that more experienced teachers hold higher level of CMSE.

In addition to teacher demographic variables, the present study also identified six psychometric constructs of teachers (See Panel B of Table 2 ). Our results showed that all these six factors were significantly correlated with CMSE. Overall, the strongest correlation with CMSE was personal accomplish PA ( r =0.415, CI [0.318, 0.504]). Openness showed the largest correlation among the Big Five personality traits ( r =0.220, CI [0.135, 0.303]), followed by extraversion ( r =0.212, CI [0.121, 0.298]) and conscientiousness ( r =0.121, CI [0.033, 0.207]), whereas agreeableness and neuroticism had no relationship with teacher’ CMSE. Moderate correlations were observed for job satisfaction ( r =0.302, CI [0.255, 0.347]), teacher commitment ( r =0.371, CI [0.198, 0.522]), emotional exhaustion EE ( r =−0.289, CI [−0.349, −0.227]), and depersonalisation DP ( r =−0.281, CI [−0.340 to −0.221]). Teachers’ constructivist beliefs ( r =0.159, CI [0.010, 0.302]) and teacher stress ( r =-0.134, CI [-0.169, -0.098]) were similarly correlated to a small degree.

Classroom-level Factors

Table 3 presents the overall effects of the association between various classroom-level factors and teachers’ CMSE. Our results showed that six classroom-level factors were significantly related to CMSE. There was no significant effect size associated with size of class. Among these six factors showing relationships with significant effect sizes, classroom climate ( r =0.552, CI [0.210, 0.774]) and classroom management practice ( r =0.436, CI [0.108, 0.679]) showed large and positive correlations with CMSE. Moderate correlations were observed for students’ misbehaviours ( r =-0.297, CI [-0.395, -0.192]) and student achievement ( r =0.382, CI [0.307, 0.453]). In terms of classroom interaction, there were four studies included in this meta-analysis. Our results show that all dimensions of classroom interactions were significantly related with teachers’ CMSE with a medium-level effect. For student-teacher relationship, we found that conflict ( r =-0.381, CI [-0.636, -0.050]) negatively related to teachers’ CMSE. Conversely, closeness had no relationship with teacher’ CMSE.

School-level factors

Table 4 presents the overall effects of the association between various school-level characteristics and teachers’ CMSE. This category presents new information about associations among the identified factors and CMSE and contains five distinct factors: principal gender, principal leadership, school type, school size, and school culture. Of these factors we found that only principal leadership and school culture were positively related to teachers’ CMSE, both with a low-level effect.

Moderator analysis

We assessed the heterogeneity of the results using the Q statistic and the I 2 index (see Tables 2 , 3 , 4 ). The Q tests yielded statistically significant results for a total of 11 factors: teacher gender, teacher working experience, job satisfaction, teacher commitment, teacher burnout, classroom climate, classroom management practice, students’ misbehaviour, student-teacher relationship, principal leadership, and school type, which may be influenced by moderators. Of these 11 factors, only seven factors were included in moderator analysis, as fewer than five studies were available for the remaining factors.

Teacher gender

Meta regression suggested that publication year (β=0.006, P=0.613) did not moderate the relationship between teacher gender and their CMSE. Subgroup analyses suggested effects were no different across publication types (Q bet =1.189, P=0.276), or school levels (Q bet =5.628, P=0.344). However, subgroup analysis showed that effects varied across countries (Q bet =53.543, P <0.001).

Teacher working experience

Meta regression suggested that publication year (β=-0.000, P=0.986) did not moderate the relationship between teacher working experience and CMSE. Subgroup analysis suggested publication type did not significantly relate to the correlation outcomes (Q bet =0.063, P=0.802). However, subgroup analyses show that effects varied across countries (Q bet =11850.865, P <0.001), and school levels (Q bet =351.484, P <0.001).

Job satisfaction

Meta regression and subgroup analysis suggested that publication year (β=-0.013, P=0.065) and publication type (Q bet =0.006, P=0.939) did not moderate the relationship between teacher job satisfaction and CMSE. However, other subgroup analyses show that countries (Q bet =11.359, P=0.045) and school level (Q bet =14.649, P=0.002) significantly moderate the relationship between job satisfaction and CMSE.

Teacher burnout

Meta regression suggested that publication year (EE: β=0.005, P=0.308; DP: β=0.008, P=0.091; PA: β=-0.006, P=0.543) did not moderate the relationship between teacher burnout and CMSE. Subgroup analyses suggested that there were no differences in effect across publication types (EE: Q bet =4.690, P=0.030; DP: Q bet =3.530, P=0.060; PA: Q bet =0.805, P=0.370) or school levels (EE: Q bet =2.495, P=0.476; DP: Q bet =2.759, P=0.430; PA: Q bet =4.661, P=0.198). Country, however, emerged as a significant moderator of the relationships between teacher burnout and CMSE (EE: Q bet =27.700, P <0.001; DP: Q bet =25.687, P<0.001; PA: Q bet =44.212, P <0.001).

Classroom management

Meta regression subgroup analysis suggested that publication year (β=0.024, P=0.928) and publication type (Q bet =0.000, P=1.000) did not moderate the relationship between teachers' classroom management practice and their CMSE. However, other subgroup analyses showed that effects varied across countries (Q bet =1204.532, P <0.001), and school levels (Q bet =99.343, P <0.001).

Students’ misbehaviour

Meta regression and subgroup analysis suggested that publication year (β=0.009, P=0.541), publication type (Q bet =0.000, P=1.000) and school levels (Q bet =1.376, P=0.241) did not moderate the relationship between students’ misbehaviour and teachers’ CMSE. However, other subgroup analyses show that effects varied across countries (Q bet =9.245, P=0.026).

Principal leadership

Meta regression and subgroup analyses suggested that there were no differences in effect across publication year (β=-0.009, P=0.459), publication types (Q bet =0.192, P=0.662), or countries (Q bet =0.221, P=0.895). School level, however, emerged as a significant moderator of the relationship between principal leadership and teachers’ CMSE (Q bet =12.792, P=0.002).

Publication bias

In general, articles with positive or statistically significant results are more likely to be published, which can lead to publication bias (Rothstein, 2008 ). Therefore, the present study conducted Egger’s regression test (Egger et al., 1997 ) to assess publication bias (see Table 5 ). Given that less than three studies included teacher personality, constructivist beliefs, class size, students’ achievement, principal gender, school size, school type and school culture, we did not run publication bias procedures on these effect sizes. The results for the analysis of the factors included indicated that there was no publication bias in meta-analyses for most of the factors, with only studies related to teacher educational level showing such bias.

However, for teacher educational level, the trim and fill procedure (Duval & Tweedie, 2000 ) signalled no bias (0 trimmed studies). In addition, the Classic fail-safe N (Rosenthal, 1979 ) test was performed to check the robustness of this finding by computing the number of studies that would be required to nullify the effect. A larger value for this coefficient indicates that we can be confident on the effects, despite the presence of publication bias. Whereas, if the number of missing studies is relatively small then there is indeed cause of concern. The value of fail-safe N of this analysis is 9975, which means that we would need to include 9975 studies to nullify the observed effect. Put another way, publication bias did not pose a threat to the meta-analytic result for teacher educational level.

Discussion and implications

The primary objective of the systematic review was to identify and review the existing evidence regarding factors that correlate with CMSE. A total of 87 studies were included in the review and 22 correlates were identified from these included studies. The findings of the systematic review clustered the factors related to CMSE into three themes (teacher-level factors, classroom-level factors, and school-level factors). Given the number of studies that met inclusion criteria for each analysis ranged from 29 to 2, interpretation of effect sizes derived from the meta-analyses still requires caution.

Turning to the first major strand, we found teacher personal factors are thoroughly examined in CMSE research and 10 teacher-level factors related to CMSE were identified. As for teacher demographic characteristics, our results indicated that teacher gender, teacher age, and teacher educational level were not significantly related to CMSE, the only exception being teacher working experience, for which a positive association was found indicating that more experienced teacher are more likely to hold higher level of confidence about their classroom management ability. Previous research (Brouwers & Tomic, 2000 ; Calkins et al., 2021 ; Fackler et al., 2021 ; Hettinger et al., 2021 ; Hu et al., 2021 ; Klassen & Chiu, 2010 ; Lazarides et al., 2020 ; Tran, 2015 ; Valente et al., 2020 ; Zee et al., 2016 ) reported mixed resulted were found for the relationship between many teacher demographic variables and CMSE. While our synthesis results provide a definitive conclusion in response to the current mixed results, one should be wary as these are conclusions drawn from cross-sectional data, especially for time-related characteristics like teacher age and working experience.

Among these six psychological correlates of CMSE (teacher constructivist beliefs, teacher stress, job satisfaction, teacher commitment, teacher personality and teacher burnout), teacher burnout stood out with a large effect based on a substantial number of studies and large sample size. Job burnout is a psychological syndrome that develops when individuals are under prolonged stressful work conditions, including three dimensions of emotional exhaustion, depersonalization and diminished personal accomplishment (Maslach, 2003 ). Fernet et al. ( 2012 ) reported that many teachers have experienced job burnout. Our results suggest that there is a negative relationship between CMSE and the three dimensions of burnout (i.e., emotional exhaustion, depersonalization, and diminished personal accomplishment), of which the largest effect is between CMSE and diminished personal accomplishment. This is in line with previous meta-analysis conducted by Aloe et al. ( 2014 ). When CMSE increases, the teacher’s feelings of emotional exhaustion and depersonalization decrease, and feelings of personal accomplishment increase.

Our synthesis results also indicated that teacher commitment and job satisfaction had moderate and positive associations with CMSE, whereas teacher stress had a low and negative association with CSME. These findings were as expected since self-efficacy in classroom management serves as a personal resource and plays an important role in teachers’ stress development or management. Teacher commitment and job satisfaction achieved medium effect (r= 0.302), which indicated classroom management plays a significant role in teachers’ wellbeing. Classroom management has been cited as a significant factor contributing to teacher stress and one of the primary causes of teacher turnover (Aloe et al., 2014 ; Davis, 2018 ). A small effect for teacher stress might be explained by the fact that teacher stress encompasses multiple dimensions (e.g., workload stress and classroom stress) and subsequently, attention should be paid to the relationship between sub-dimensions of teacher stress and CMSE (Klassen & Chiu, 2010 ). Although emotional arousal has been viewed as one of the sources of TSE, it was also suggested that self-efficacy could have a dampening effect on psychological stress arousal (Bandura, 1997 ). The relationship between CMSE and teacher burnout/teacher stress seems clear, however, the directions of these relationship are still unknown. Longitudinal research is highly recommended to clarify the directions.

Lazarides et al. ( 2020 ) suggested that TSE functions as a part of teachers’ personal resources. Attention has been paid to the relationship between teacher personality traits and CMSE, which has been examined among pre-service teachers (Senler & Sungur-Vural, 2013 ; Yingjie & Yan, 2016 ) and in-service teachers (Bullock et al., 2015 ; Rahimi & Saberi, 2014 ). Across these two studies, openness showed the largest correlation among the Big Five personality traits, followed by extraversion and conscientiousness, whereas agreeableness and neuroticism had no relationship with teachers’ CMSE. This finding was partially in line with previous studies (Bullock et al., 2015 ; Rahimi & Saberi, 2014 ), where openness and extraversion were significantly correlated with CMSE, while mixed results were found for conscientiousness, agreeableness and neuroticism. People high in openness are receptive to new things, have a wide range of interests, are imaginative and creative (Xiaoqing, 2013 ), suggesting that teachers rating higher on this trait may have more opportunities to practice new classroom management approaches and be more likely to be persistent in stressful situations. Compared with introverted peers, extroverted teachers are more sociable and self-confident. They may have been more likely to be engaged in various activities (Reeve, 2009 ), more likely to discuss with other teachers about how to manage classroom (Bullock et al., 2015 ), and approach more opportunities to gain experience and improve their ability to promote classroom management self-efficacy. Conscientiousness refers to dependability and the ability to resist impulsive behavior. Teachers with strong conscientiousness often weaken their negative emotions and enhance their positive emotions in their work (Xiaoxian et al., 2014 ). It may appear reasonable to correlate neuroticism with teachers’ CMSE as teachers higher on this trait are more likely to experience anxiety and stress when facing disruptive classroom environment. However, we did not find a relationship between these two. Teachers who are more agreeable are more likely to be empathetic and more pleased to help others. However, we did not find correlation between agreeableness and CSME. This seems to be another indication of the complexity of classroom management, where teachers merely showing care and empathy towards students may not contribute to a well-managed classroom. Considering the limited studies included in this meta-analysis and mixed results found in previous research, we should be cautious about this definitive conclusion. Further research focusing on the relationship between teacher personality traits and CMSE is recommended.

Teachers’ constructivist beliefs about teaching have been viewed as an intrinsic teacher characteristic. Across two studies, our meta-analytic results indicated a small but positive association between teachers’ constructivist beliefs and CMSE. This finding was expected as teachers who hold higher level constructivist beliefs were demonstrated to hold higher level of global TSE (Fackler et al., 2021 ; Fackler & Malmberg, 2016 ). Teachers who hold a high level of constructivist beliefs prefer to use student-centred teaching methods, focus on facilitating students’ learning, and tend to believe they are capable of managing their classroom.

Our meta-analysis showed that almost all classroom characteristics were highly influential in teachers’ CMSE, except class size and closeness in teacher-student relationships. Our synthesised result indicated that CMSE did not relate to class size, however, previous studies (Fackler et al., 2021 ; Kunemund et al., 2020 ) found a significant but negative relationship. This can be explained by the limited number of included studies (only two) potentially leading to unstable findings.

In relation to teachers’ behaviours towards students, one of the most mentioned factors was teachers’ classroom management practice. Our results suggested that CMSE functioned as a personal resource and positively related to positive aspect of classroom management, which is in line with the theoretical assumption of Bandura’s ( 1997 ) self-efficacy theory, self-efficacy acts as a mediating factor between individual behaviour and knowledge. On the other hand, teachers’ appraisals of past performance (e.g., classroom management practice) have been viewed as one of the sources of self-efficacy, however, Morris et al. ( 2017 ) found that teachers reflect on a variety of sources when they reflect on past performance. Instead Morris et al. ( 2017 ) suggested to conceptualize mastery experiences as teachers’ desired goals, such as classroom climate, student achievement, high quality classroom interaction, positive teacher-student relationship.

Classroom climate refers to the instructional and social-emotional environments students live in, which showed a positive relationship with CMSE and had a large effect size. A good classroom climate means teachers have less focus on individual student behaviours, focusing instead on building a positive learning climate. Many studies also paid attention to classroom interaction, as classroom processes are identified as teacher-student interaction pattern that have a significant impact on students’ outcomes (Mashburn et al., 2008 ). The classroom interactions framework (Hamre et al., 2013 ) focuses on teachers’ classroom interactional behaviours in three domains: emotional support, classroom organization, and instructional support. Ryan et al. ( 2015 ) found that American elementary and middle school teachers with higher CMSE tend to exhibit better emotional, behavioural, and instructional support. These findings were also noted in a study on Chinese preschool teachers (Hu et al., 2021 ). Our synthesis results confirmed the moderate and positive relationship between classroom interaction and CMSE among 4 studies, which indicated that teachers who feel confident in their classroom management skills are more likely to provide higher quality emotional support, classroom organization, and instructional support to their students.

One of the most important goals of classroom management is to establish positive student-teacher relationships (Evertson & Weinstein, 2006 ). Our results showed that conflict within teacher-student relationships was negatively related to CMSE and had a medium effect size, while closeness did not show any relationship. Teachers experiencing higher degrees of conflict in their relationships with students are more likely to have stronger emotional vulnerability and result in an increased likelihood of perceived professional and personal failure (Spilt et al., 2011 ), thereby they are at higher risk of developing unhealthy self-efficacy beliefs in classroom management. However, positive aspects of student-teacher relationships (closeness) did not show a positive relationship with CMSE as expected. This seems to indicate that teachers perceived conflict with students would lead to teachers feeling less confident in classroom management, yet being close to students does not make teachers feel empowered in classroom management either. We still need to be cautious about this finding as only three studies were included in this meta-analysis and one previous study (Zee et al., 2017 ) found a significant and positive relationship between closeness and CMSE.

In terms of students’ outcomes, many studies found a positive relationship between teacher general self-efficacy and students’ achievement (Fackler & Malmberg, 2016 ; Malmberg et al., 2014 ), but the focus of the research was rarely on the area of classroom management. Across two studies, we found a moderate and positive relationship between student achievement and CMSE, which further highlights the importance of self-efficacy in classroom management. Response towards student misbehaviour is the key part of managing classrooms and our results showed that student misbehaviour had a moderate and negative association with CMSE, which indicated that teacher with higher level of CMSE are more likely to experience less problem behaviours in the classroom.

Compared to the teacher- and classroom-level characteristics, few school-level factors have been identified and few studies were included in each meta-analysis (range from 2 to 5). We found principal gender does not correlate with CMSE based on two studies, however, one (Fackler et al., 2021 ) found a negative association. Our synthesis results also indicated that principal leadership has a significant impact on teachers' perceptions of CMSE. This suggests that principals play an important role in teachers’ sense of confidence in classroom management however, further research on how principals are of influence is recommended.

School culture is deeply rooted in people's attitudes, values and skills (Sezgin, 2010 ). We found two studies focused on school culture (McLeod, 2012 ; Öztürk et al., 2021 ) and the synthesised result found a small but positive association with CSME. Given individual's sense of efficacy is influenced by their interaction with their environment (Bandura, 1997 ), this supports the connection between environmental conditions (like organizational culture) and CMSE. Our results indicated that teachers’ efficacy beliefs towards classroom management did not differ based on school type (i.e., private vs public) or school size, however, Fackler et al. ( 2021 ) found private and larger schools were positively associated with teachers’ CMSE. This may again be due to the limited number of included studies. Further investigation into the examination of the relationship between school-level factors and their CMSE seems worthwhile.

In addition to the overall effect, we conducted moderator analysis and the results showed that participants’ countries and school levels played a role in moderating the relationship between CMSE and many correlates (e.g., teacher working experience, classroom management, job satisfaction). This may suggest that future research could focus on the differences of CMSE at different teaching levels. Research conducted in different countries proved to be a moderating variable, which supports the call for additional cross-cultural/cross-national studies of TSE (Fackler et al., 2020 ; Vieluf et al., 2013 ).

Teacher self-efficacy towards classroom management is an important facet of teacher self-efficacy, and the mechanisms driving efficacy beliefs toward classroom management remain unclear as a result of inconsistent findings across studies. This meta-analytic review synthesizes the literature over the last two decades to identify factors that correlate with CMSE and to estimate the effect size of these relationships. The findings identified 22 correlates of CMSE and clustered these correlates into three categories: teacher-level factors, classroom-level factors, and school-level factors. Teacher personal factors are thoroughly examined in current CMSE research, while there seems to be a lack of attention on teacher-student interaction and teachers’ working conditions. We identified 10 teacher-level factors including teacher demographic characteristics and psychometric variables. All teacher demographic characteristics except teaching experience were not related to teachers’ CMSE, whereby a positive association was found between level of CMSE and years of teaching experience. Six psychological correlates of CMSE (teacher constructivist beliefs, teacher stress, job satisfaction, teacher commitment, teacher personality and teacher burnout) were identified and most of them showed medium to large correlations with CMSE. Seven classroom-level factors were identified (classroom climate, classroom management, students’ misbehaviour, students’ achievement, classroom interaction, and student-teacher relationship) and almost all factors were significantly related to CMSE, except class size and closeness in teacher-student relationships. Limited studies focused on the relationship between teachers’ working environment and CMSE. Five school-level factors were identified, and only principal leadership and school culture showed a small and positive relationship with CMSE. In addition, sub-group moderation analysis revealed most of these effect sizes differed as a function of participants’ countries and school levels. Future work should focus on exploring more classroom- and school-level factors and conducting cross-cultural comparison research in order to contribute to a comprehensive body of literature. Experimental and longitudinal studies should also be the focus of future CMSE research due to the large amount of correlation work currently contributing to the field. Likewise, reviews of TSE in other specific areas, such as student engagement, instructional strategies and/or inclusive education, are needed to help shed light on those areas where self-perceptions of TSE are more diversified or where it is a global trait.

Limitations

There were some limitations to this study. First, we were selective about the studies included in our meta-analysis. To include relevant correlates of CMSE as comprehensively as possible, correlates included in two or more studies were meta-analysed. The limited number of studies included in some analyses (e.g., school culture, school type, school size) may have an impact on the validity of the synthesis results. A potential second limitation is that we did not place any restriction on the measurement instrument for CMSE and any other psychometric factors. Although previous meta-analyses reported that effect sizes did not vary in studies with different scales (Madigan & Kim, 2021 ), it is still theoretically a potentially important variable. A third potential limitation was that we only provided the correlates of CMSE and we cannot assess the predictors and outcomes. We recommend conducting longitudinal and quasi-experimental research in the future to explore in more depth the directions of these relationships.

Data availability

Data is available on request.

Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2014). Comprehensive meta-analysis (Version 3.0) [Computer software]. Biostat. https://www.meta-analysis.com/

Aloe, A. M., Amo, L. C., & Shanahan, M. E. (2014). Classroom Management Self-Efficacy and Burnout: A Multivariate Meta-analysis. Educational Psychology Review, 26 (1), 101–126. https://doi.org/10.1007/s10648-013-9244-0

Article   Google Scholar  

Alt, D. (2018). Science teachers' conceptions of teaching and learning, ICT efficacy, ICT professional development and ICT practices enacted in their classrooms. Teaching and Teacher Education, 73 , 141–150. https://doi.org/10.1016/j.tate.2018.03.020

Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84 (2), 191–215. https://doi.org/10.1037/0033-295X.84.2.191

Bandura, A. (1986). Social foundations of thought and action : a social cognitive theory . Prentice-Hall.

Google Scholar  

Bandura, A. (1997). Self-efficacy : the exercise of control . W.H. Freeman.

Baranczuk, U. (2021). The Five-Factor Model of Personality and Generalized Self Efficacy: A Meta-Analysis. Journal of Individual Differences, 42 (4), 183–193. https://doi.org/10.1027/1614-0001/a000345

Bardach, L., Klassen, R. M., & Perry, N. E. (2022). Teachers’ Psychological Characteristics: Do They Matter for Teacher Effectiveness, Teachers’ Well-being, Retention, and Interpersonal Relations? An Integrative Review. Educational psychology review, 34 (1), 259–300. https://doi.org/10.1007/s10648-021-09614-9

Bellibas, M. S., & Liu, Y. (2017). Multilevel analysis of the relationship between principals’ perceived practices of instructional leadership and teachers’ self-efficacy perceptions. Journal of Educational Administration, 55 (1), 49–69. https://doi.org/10.1108/JEA-12-2015-0116

Berger, J.-L., Girardet, C., Vaudroz, C., & Crahay, M. (2018). Teaching Experience, Teachers’ Beliefs, and Self-Reported Classroom Management Practices: A Coherent Network. SAGE Open, 8 (1), 215824401775411. https://doi.org/10.1177/2158244017754119

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis (2nd ed.). John Wiley & Sons. https://doi.org/10.1002/9780470743386

Book   Google Scholar  

Brouwers, A., & Tomic, W. (2000). A longitudinal study of teacher burnout and perceived self-efficacy in classroom management. Teaching and Teacher Education, 16 (2), 239–253. https://doi.org/10.1016/S0742-051X(99)00057-8

Brown, C. G. (2012). A systematic review of the relationship between self-efficacy and burnout in teachers. Educational and Child Psychology, 29 (4), 47–63 https://www.scopus.com/inward/record.uri?eid=2-s2.0-84883093990&partnerID=40&md5=194582c37a6022dd84eb8d48d7d9d894

Buentello, O. (2019). A Leader's Impact: The Relationship between School Administrators' Full-Range Leadership Styles and Teachers' Sense of Self-Efficacy . ProQuest Dissertations Publishing https://www.proquest.com/dissertations-theses/leaders-impact-relationship-between-school/docview/2323557058/se-2

Bullock, A., Coplan, R. J., & Bosacki, S. (2015). Exploring links between early childhood educators' psychological characteristics and classroom management self-efficacy beliefs. Canadian Journal of Behavioural Science, 47 (2), 175–183. https://doi.org/10.1037/a0038547

Bumen, N. T. (2010). The Relationship between Demographics, Self Efficacy, and Burnout among Teachers. Eurasian Journal of Educational Research (EJER), 40 , 16–35.

Calkins, L., Yoder, P. J., & Wiens, P. (2021). Renewed Purposes for Social Studies Teacher Preparation: An Analysis of Teacher Self-Efficacy and Initial Teacher Education. Journal of Social Studies Education Research, 12 (2), 54–77 https://www.proquest.com/scholarly-journals/renewed-purposes-social-studies-teacher/docview/2580843422/se-2

Cantrell, S. C., & Callaway, P. (2008). High and low implementers of content literacy instruction: Portraits of teacher efficacy. Teaching and Teacher Education, 24 (7), 1739–1750. https://doi.org/10.1016/j.tate.2008.02.020

Card, N. A. (2012). Applied meta-analysis for social science research . Guilford Press.

Chen, R. J.-C., Lin, H.-C., Hsueh, Y.-L., & Hsieh, C.-C. (2020). Which is more influential on teaching practice, classroom management efficacy or instruction efficacy? Evidence from TALIS 2018. Asia Pacific Education Review, 21 (4), 589–599. https://doi.org/10.1007/s12564-020-09656-8

Cohen, J. (1988). Statistical power analysis for the behavioral sciences ED (2nd ed.). Lawrence Erlbaum Associates. https://doi.org/10.4324/9780203771587

Coladarci, T., & Breton, W. A. (1997). Teacher Efficacy, Supervision, and the Special Education Resource-Room Teacher. The Journal of educational research (Washington, D.C.), 90 (4), 230–239. https://doi.org/10.1080/00220671.1997.10544577

Davis, J. R. (2018). Classroom Management in Teacher Education Programs ((1st ed) ed.). Springer International Publishing. https://doi.org/10.1007/978-3-319-63850-8

Dicke, T., Parker, P. D., Marsh, H. W., Kunter, M., Schmeck, A., & Leutner, D. (2014). Self-Efficacy in Classroom Management, Classroom Disturbances, and Emotional Exhaustion: A Moderated Mediation Analysis of Teacher Candidates. Journal of Educational Psychology, 106 (2), 569–583. https://doi.org/10.1037/a0035504

Dicke, T., Stebner, F., Linninger, C., Kunter, M., & Leutner, D. (2018). A Longitudinal Study of Teachers' Occupational Well-Being: Applying the Job Demands-Resources Model. Journal of Occupational Health Psychology, 23 (2), 262–277. https://doi.org/10.1037/ocp0000070

Duval, S., & Tweedie, R. (2000). Trim and Fill: A Simple Funnel-Plot–Based Method of Testing and Adjusting for Publication Bias in Meta-Analysis. Biometrics, 56 (2), 455–463. https://doi.org/10.1111/j.0006-341X.2000.00455.x

Eddy, C. L., Herman, K. C., & Reinke, W. M. (2019). Single-item teacher stress and coping measures: Concurrent and predictive validity and sensitivity to change. Journal of School Psychology, 76 , 17–32. https://doi.org/10.1016/j.jsp.2019.05.001

Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. BMJ, 315 (7109), 629–634. https://doi.org/10.1136/bmj.315.7109.629

Emmer, E. T., & Hickman, J. (1991). Teacher Efficacy in Classroom Management and Discipline. Educational and Psychological Measurement, 51 (3), 755–765. https://doi.org/10.1177/0013164491513027

Evertson, C. M., & Weinstein, C. S. (2006). Handbook of classroom management : research, practice, and contemporary issues . Lawrence Erlbaum Associates.

Fackler, S., & Malmberg, L.-E. (2016). Teachers' self-efficacy in 14 OECD countries: Teacher, student group, school and leadership effects. Teaching and Teacher Education, 56 , 185–195. https://doi.org/10.1016/j.tate.2016.03.002

Fackler, S., Malmberg, L.-E., & Sammons, P. (2021). An international perspective on teacher self-efficacy: Personal, structural and environmental factors. Teaching and Teacher Education, 99 , 103255. https://doi.org/10.1016/j.tate.2020.103255

Fackler, S., Sammons, P., & Malmberg, L. E. (2020). A comparative analysis of predictors of teacher self-efficacy in student engagement, instruction and classroom management in Nordic, Anglo-Saxon and East and South-East Asian countries. Review of Education, 9 (1), 203–239. https://doi.org/10.1002/rev3.3242

Fernet, C., Guay, F., Senécal, C., & Austin, S. (2012). Predicting intraindividual changes in teacher burnout: The role of perceived school environment and motivational factors. Teaching and Teacher Education, 28 (4), 514–525. https://doi.org/10.1016/j.tate.2011.11.013

Ford, L. D. (2019). Comparison of Classroom Management Self-Efficacy of Teachers Based upon Their Certification Type, Principal’S Gender, and Leadership Style: A Quasi-Experimental Vignette Study . ProQuest Dissertations Publishing https://www.proquest.com/dissertations-theses/comparison-classroom-management-self-efficacy/docview/2387272797/se-2

George, S. V., Richardson, P. W., & Watt, H. M. G. (2018). Early career teachers' self-efficacy : A longitudinal study from Australia. The Australian Journal of Education, 62 (2), 217–233. https://doi.org/10.1177/0004944118779601

Guangbao, F., & Timothy, T. (2021). Investigating the Associations of Constructivist Beliefs and Classroom Climate on Teachers' Self-Efficacy Among Australian Secondary Mathematics Teachers. Frontiers in Psychology, 12 , 626271–626271. https://doi.org/10.3389/fpsyg.2021.626271

Hamre, B. K., Pianta, R. C., Downer, J. T., DeCoster, J., Mashburn, A. J., Jones, S. M., Brown, J. L., Cappella, E., Atkins, M., Rivers, S. E., Brackett, M. A., & Hamagami, A. (2013). Teaching through Interactions: Testing a Developmental Framework of Teacher Effectiveness in over 4,000 Classrooms. The Elementary School Journal, 113 (4), 461–487. https://doi.org/10.1086/669616

Hassan, M. U. (2019). Teachers' Self-Efficacy: Effective Indicator towards Students' Success in Medium of Education Perspective. Problems of Education in the 21st Century, 77 (5), 667–679. https://doi.org/10.33225/pec/19.77.667

Hettinger, K., Lazarides, R., Rubach, C., & Schiefele, U. (2021). Teacher classroom management self-efficacy: Longitudinal relations to perceived teaching behaviors and student enjoyment. Teaching and Teacher Education, 103 , 103349. https://doi.org/10.1016/j.tate.2021.103349

Hettinger, K., Lazarides, R., & Schiefele, U. (2023). Longitudinal relations between teacher self-efficacy and student motivation through matching characteristics of perceived teaching practice. European Journal of Psychology of Education , 1–27. https://doi.org/10.1007/s10212-023-00744-y

Hicks, S. D. (2012). Self-efficacy and classroom management: A correlation study regarding the factors that influence classroom management . ProQuest Dissertations Publishing https://www.proquest.com/dissertations-theses/self-efficacy-classroom-management-correlation/docview/1030435909/se-2

Holzberger, D., & Prestele, E. (2021). Teacher self-efficacy and self-reported cognitive activation and classroom management: A multilevel perspective on the role of school characteristics. Learning and Instruction, 76 , 101513. https://doi.org/10.1016/j.learninstruc.2021.101513

Hu, B. Y., Li, Y., Wang, C., Wu, H., & Vitiello, G. (2021). Preschool teachers’ self-efficacy, classroom process quality, and children’s social skills: A multilevel mediation analysis. Early Childhood Research Quarterly, 55 , 242–251. https://doi.org/10.1016/j.ecresq.2020.12.001

Huk, O. (2011). Predicting teacher burnout as a function of school demands and resources and teacher characteristics . ProQuest Dissertations Publishing https://www.proquest.com/dissertations-theses/predicting-teacher-burnout-as-function-school/docview/912193420/se-2

Klassen, R. M., & Chiu, M. M. (2010). Effects on Teachers' Self-Efficacy and Job Satisfaction: Teacher Gender, Years of Experience, and Job Stress. Journal of Educational Psychology, 102 (3), 741–756. https://doi.org/10.1037/a0019237

Klassen, R. M., & Tze, V. M. C. (2014). Teachers’ self-efficacy, personality, and teaching effectiveness: A meta-analysis. Educational Research Review, 12 , 59–76. https://doi.org/10.1016/j.edurev.2014.06.001

Klassen, R. M., Tze, V. M. C., Betts, S. M., & Gordon, K. A. (2011). Teacher Efficacy Research 1998—2009: Signs of Progress or Unfulfilled Promise? Educational Psychology Review, 23 (1), 21–43. https://doi.org/10.1007/s10648-010-9141-8

Korpershoek, H., Harms, T., de Boer, H., van Kuijk, M., & Doolaard, S. (2016). A Meta-Analysis of the Effects of Classroom Management Strategies and Classroom Management Programs on Students' Academic, Behavioral, Emotional, and Motivational Outcomes. Review of Educational Research, 86 (3), 643–680. https://doi.org/10.3102/0034654315626799

Kunemund, R. L., Nemer McCullough, S., Williams, C. D., Miller, C. C., Sutherland, K. S., Conroy, M. A., & Granger, K. (2020). The mediating role of teacher self-efficacy in the relation between teacher–child race mismatch and conflict. Psychology in the Schools, 57 (11), 1757–1770. https://doi.org/10.1002/pits.22419

Lazarides, R., Buchholz, J., & Rubach, C. (2018). Teacher enthusiasm and self-efficacy, student-perceived mastery goal orientation, and student motivation in mathematics classrooms. Teaching and Teacher Education, 69 , 1–10. https://doi.org/10.1016/j.tate.2017.08.017

Lazarides, R., Watt, H. M. G., & Richardson, P. W. (2020). Teachers’ classroom management self-efficacy, perceived classroom management and teaching contexts from beginning until mid-career. Learning and Instruction, 69 , 101346. https://doi.org/10.1016/j.learninstruc.2020.101346

Lee, M., & van Vlack, S. (2018). Teachers’ emotional labour, discrete emotions, and classroom management self-efficacy. Educational Psychology, 38 (5), 669–686. https://doi.org/10.1080/01443410.2017.1399199

Liu, S., Xu, X., & Stronge, J. (2018). The influences of teachers’ perceptions of using student achievement data in evaluation and their self-efficacy on job satisfaction: evidence from China. Asia Pacific Education Review, 19 (4), 493–509. https://doi.org/10.1007/s12564-018-9552-7

Looney, L. (2004). Understanding teachers' efficacy beliefs: The role of professional community . ProQuest Dissertations Publishing https://www.proquest.com/dissertations-theses/understanding-teachers-efficacy-beliefs-role/docview/305178139/se-2

Madigan, D. J., & Kim, L. E. (2021). Towards an understanding of teacher attrition: A meta-analysis of burnout, job satisfaction, and teachers’ intentions to quit. Teaching and Teacher Education, 105 , 103425. https://doi.org/10.1016/j.tate.2021.103425

Main, S., & Hammond, L. (2008). Best practice or most practiced? Pre-service teachers’ beliefs about effective behaviour management strategies and reported self-efficacy. Australian . Journal of Teacher Education, 33 (4), 28–39. https://doi.org/10.14221/ajte.2008v33n4.3

Malmberg, L.-E., Hagger, H., & Webster, S. (2014). Teachers' situation-specific mastery experiences: teacher, student group and lesson effects. European Journal of Psychology of Education, 29 (3), 429–451. https://doi.org/10.1007/s10212-013-0206-1

Marschall, G. (2023). Teacher self-efficacy sources during secondary mathematics initial teacher education. Teaching and Teacher Education, 132 , 104203. https://doi.org/10.1016/j.tate.2023.104203

Mashburn, A. J., Pianta, R. C., Hamre, B. K., Downer, J. T., Barbarin, O. A., Bryant, D., Burchinal, M., Early, D. M., & Howes, C. (2008). Measures of Classroom Quality in Prekindergarten and Childrens Development of Academic, Language, and Social Skills. Child Development, 79 (3), 732–749. https://doi.org/10.1111/j.1467-8624.2008.01154.x

Maslach, C. (2003). Job Burnout: New Directions in Research and Intervention. Current Directions in Psychological Science, 12 (5), 189–192. https://doi.org/10.1111/1467-8721.01258

McArthur, L., & Munn, Z. (2015). The effectiveness of interventions on the self-efficacy of clinical teachers: a systematic review protocol. JBI Database of Systematic Reviews and Implementation Reports, 13 (5), 118–130 http://ovidsp.ovid.com/ovidweb.cgi?T=JS&PAGE=reference&D=ovftq&NEWS=N&AN=01938924-201513050-00011

McCrae, R. R., & Costa, P. T. (2003). Personality in adulthood a five-factor theory perspective (Second ed.). Guilford Press.

McLeod, R. P. (2012). An Examination of the Relationship between Teachers' Sense of Efficacy and School Culture . ProQuest Dissertations Publishing https://www.proquest.com/dissertations-theses/examination-relationship-between-teachers-sense/docview/1153963088/se-2

Miller, T. M. S. (2020). Teacher Self-Efficacy and Years of Experience: Their Relation to Teacher Commitment and Intention to Leave . ProQuest Dissertations Publishing https://www.proquest.com/dissertations-theses/teacher-self-efficacy-years-experience-their/docview/2488681115/se-2

Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G. (2009). Reprint—Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. Physical Therapy, 89 (9), 873–880. https://doi.org/10.1093/ptj/89.9.873

Morris, D. B., Usher, E. L., & Chen, J. A. (2017). Reconceptualizing the Sources of Teaching Self-Efficacy: a Critical Review of Emerging Literature. Educational Psychology Review, 29 (4), 795–833. https://doi.org/10.1007/s10648-016-9378-y

OECD (2014), TALIS 2013 Results: An international perspective on teaching and learning, TALIS, OECD Publishing, Paris, https://doi.org/10.1787/9789264196261-en

O'Neill, S. C., & Stephenson, J. (2011). The measurement of classroom management self-efficacy: a review of measurement instrument development and influences. Educational Psychology, 31 (3), 261–299. https://doi.org/10.1080/01443410.2010.545344

Ozdemir, Y. (2007). The Role of Classroom Management Efficacy in Predicting Teacher Burnout. World Academy of Science, Engineering and Technology, Open Science Index 11. International Journal of Educational and Pedagogical Sciences, 1 (11), 751–757.

Öztürk, M., Bulut, M. B., & Yildiz, M. (2021). Predictors of Teacher Burnout in Middle Education: School Culture and Self-Efficacy. Studia Psychologica, 63 (1), 5–23. https://doi.org/10.31577/SP.2021.01.811

Peterson, R. A., & Brown, S. P. (2005). On the Use of Beta Coefficients in Meta-Analysis. Journal of Applied Psychology, 90 (1), 175–181. https://doi.org/10.1037/0021-9010.90.1.175

Rahimi, A., & Saberi, M. (2014). The Interface between Iranian EFL Instructors' Personality and Their Self-Efficacy. Advances in Language and Literary Studies, 5 (3), 134–142. https://doi.org/10.7575/aiac.alls.v.5n.3p.134

Raudenbush, S. W., Rowan, B., & Cheong, Y. F. (1992). Contextual Effects on the Self-perceived Efficacy of High School Teachers. Sociology of Education, 65 (2), 150–167. https://doi.org/10.2307/2112680

Reeve, J. (2009). Why Teachers Adopt a Controlling Motivating Style Toward Students and How They Can Become More Autonomy Supportive. Educational Psychologist, 44 (3), 159–175. https://doi.org/10.1080/00461520903028990

Reilly, J. C. (2002). Differentiating the concept of teacher efficacy for academic achievement, classroom management and discipline, and enhancement of social relations . ProQuest Dissertations Publishing https://www.proquest.com/dissertations-theses/differentiating-concept-teacher-efficacy-academic/docview/275767484/se-2

Riggs, I. M., & Enochs, L. G. (1990). Toward the development of an elementary teacher's science teaching efficacy belief instrument. Science Education (Salem, Mass.), 74 (6), 625–637. https://doi.org/10.1002/sce.3730740605

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86 (3), 638–641. https://doi.org/10.1037/0033-2909.86.3.638

Rothstein, H. R. (2008). Publication bias as a threat to the validity of meta-analytic results. Journal of Experimental Criminology, 4 (1), 61–81. https://doi.org/10.1007/s11292-007-9046-9

Ryan, A. M., Kuusinen, C. M., & Bedoya-Skoog, A. (2015). Managing peer relations: A dimension of teacher self-efficacy that varies between elementary and middle school teachers and is associated with observed classroom quality. Contemporary Educational Psychology, 41 , 147–156. https://doi.org/10.1016/j.cedpsych.2015.01.002

Schwerdtfeger, A., Konermann, L., & Schönhofen, K. (2008). Self-efficacy as a health-protective resource in teachers?: A biopsychological approach. Health Psychology, 27 (3), 358–368. https://doi.org/10.1037/0278-6133.27.3.358

Senler, B., & Sungur-Vural, S. (2013). Pre-Service Science Teachers’ Teaching Self-Efficacy in Relation to Personality Traits and Academic Self-Regulation. The Spanish Journal of Psychology, 16 , E12–E12. https://doi.org/10.1017/sjp.2013.22

Sezgin, F. (2010). School Organizational Culture as a Predictor of Teacher Organizational Commitment. Egitim ve Bilim, 35 (156), 142 https://www.proquest.com/scholarly-journals/school-organizational-culture-as-predictor/docview/1009841902/se-2

Sims, W. A., King, K. R., Reinke, W. M., Herman, K., & Riley-Tillman, T. C. (2021). Development and Preliminary Validity Evidence for the Direct Behavior Rating-Classroom Management (DBR-CM). Journal of Educational and Psychological Consultation, 31 (2), 215–245. https://doi.org/10.1080/10474412.2020.1732990

Skaalvik, E. M., & Skaalvik, S. (2007). Dimensions of Teacher Self-Efficacy and Relations With Strain Factors, Perceived Collective Teacher Efficacy, and Teacher Burnout. Journal of Educational Psychology, 99 (3), 611–625. https://doi.org/10.1037/0022-0663.99.3.611

Spilt, J. L., Koomen, H. M. Y., & Thijs, J. T. (2011). Teacher Wellbeing: The Importance of Teacher—Student Relationships. Educational Psychology Review, 23 (4), 457–477. https://doi.org/10.1007/s10648-011-9170-y

Stough, L. M. (2006). The Place of Classroom Management and Standards in Teacher Education. In C. M. E. C. S. Weinstein (Ed.), Handbook of classroom management : research, practice, and contemporary issues . Lawrence Erlbaum Associates.

Sutton, R. E., Mudrey-Camino, R., & Knight, C. C. (2009). Teachers' Emotion Regulation and Classroom Management. Theory Into Practice, 48 (2), 130–137. https://doi.org/10.1080/00405840902776418

Tran, V. D. (2015). Effects of Gender on Teachers’ Perceptions of School Environment, Teaching Efficacy, Stress and Job Satisfaction. International Journal of Higher Education, 4 (4), 147–157. https://doi.org/10.5430/ijhe.v4n4p147

Tschannen-Moran, M., & Hoy, A. W. (2001). Teacher efficacy: capturing an elusive construct. Teaching and Teacher Education, 17 (7), 783–805. https://doi.org/10.1016/S0742-051X(01)00036-1

Tschannen-Moran, M., & Hoy, A. W. (2007). The differential antecedents of self-efficacy beliefs of novice and experienced teachers. Teaching and Teacher Education, 23 (6), 944–956. https://doi.org/10.1016/j.tate.2006.05.003

Tschannen-Moran, M., Hoy, A. W., & Hoy, W. K. (1998). Teacher Efficacy: Its Meaning and Measure. Review of Educational Research, 68 (2), 202–248. https://doi.org/10.3102/00346543068002202

Usher, E. L., & Pajares, F. (2008). Sources of Self-Efficacy in School: Critical Review of the Literature and Future Directions. Review of Educational Research, 78 (4), 751–796. https://doi.org/10.3102/0034654308321456

Valente, S., Lourenço, A. A., Alves, P., & Dominguez-Lara, S. (2020). The role of the teacher's emotional intelligence for efficacy and classroom management. Revista CES Psicología, 13 (2), 18–31. https://doi.org/10.21615/CESP.13.2.2

Vidic, T., Duranovic, M., & Klasnic, I. (2021). Student Misbehaviour, Teacher Self-Efficacy, Burnout And Job Satisfaction: Evidence From Croatia. Problems of Education in the 21st Century, 79 (4), 657–673. https://doi.org/10.33225/pec/21.79.657

Vieluf, S., Kunter, M., & van de Vijver, F. J. R. (2013). Teacher self-efficacy in cross-national perspective. Teaching and Teacher Education, 35 , 92–103. https://doi.org/10.1016/j.tate.2013.05.006

von der Embse, N. P., Sandilos, L. E., Pendergast, L., & Mankin, A. (2016). Teacher stress, teaching-efficacy, and job satisfaction in response to test-based educational accountability policies. Learning And Individual Differences, 50 , 308–317. https://doi.org/10.1016/j.lindif.2016.08.001

Wettstein, A., Ramseier, E., & Scherzinger, M. (2021). Class- and subject teachers’ self-efficacy and emotional stability and students’ perceptions of the teacher–student relationship, classroom management, and classroom disruptions. BMC Psychology, 9 (1), 1–103. https://doi.org/10.1186/s40359-021-00606-6

Williams, A. Y. (2012). Applications of Dweck's model of implicit theories to teachers' self-efficacy and emotional experiences . ProQuest Dissertations Publishing https://www.proquest.com/dissertations-theses/applications-dwecks-model-implicit-theories/docview/1178991009/se-2

Wilson, D. B. (2019). Systematic Coding For Research Synthesis. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The Handbook of Research Synthesis and Meta-Analysis (pp. 153–172). Russell Sage Foundation. https://doi.org/10.7758/9781610448864.12

Chapter   Google Scholar  

Woodcock, S., Sharma, U., Subban, P., & Hitches, E. (2022). Teacher self-efficacy and inclusive education practices: Rethinking teachers’ engagement with inclusive practices. Teaching and Teacher Education, 117 , 103802. https://doi.org/10.1016/j.tate.2022.103802

Woolfolk, A. E., & Hoy, W. K. (1990). Prospective Teachers' Sense of Efficacy and Beliefs About Control. Journal of Educational Psychology, 82 (1), 81–91. https://doi.org/10.1037/0022-0663.82.1.81

Xiao, Y., Lutang, L., & Rong, N. (2021). Evaluation of the Overall Effect of Influencing Factors on Chinese Farmers’ Withdrawal of Homestead. Journal of Northwest A&F University(Social Science Edition), 21 (6), 72–84. https://doi.org/10.13968/j.cnki.1009-9107.2021.06.09

Xiaoqing, M. (2013). The relationship between praise behavior and personality traits in the classroom of preschool teachers . Master, Henan University https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD201401&filename=1013349737.nh

Xiaoxian, L., Wujun, S., Linlin, Y., Yimeng, Z., Kaili, L., & Xiaoyan, L. (2014). A study on the relationship between personality and emotion regulation styles of early childhood teachers. Journal of Educational Development, 08 , 26–29 https://oversea.cnki.net/KCMS/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLASN2014&filename=YJYE201408006&uniplatform=OVERSEA&v=p0yMd8_Q591V8Rg6MBMcr_qnNXTLLVnlja4X65xpKGGkoAnna2AlbRxnyXAqyVJE

Yali, Z., Sen, L., & Guoliang, Y. (2019). The relationship between self-esteem and social anxiety: A meta-analysis with Chinese students. Advances in Psychological Science, 27 (6), 1005–1018. https://doi.org/10.3724/sp.J.1042.2019.01005

Yingjie, W., & Yan, L. (2016). The Study of Pre-service Kindergarten Teachers’ Classroom Management Efficacy. Studies in Early Childhood Education , (10), 57–66. https://doi.org/10.13861/j.cnki.sece.2016.10.006

Yoon, J. S. (2002). Teacher characteristics as predictors of teacher-student relationships: Stress, negative affect, and self-efficacy. Social Behavior and Personality, 30 (5), 485. https://doi.org/10.2224/sbp.2002.30.5.485

Yuan, G., & Jinjie, X. (2019). Narrow the Professional Gap for Beginning Teacher——Based on the Data Results and Implications of TALIS2018 Survey. Primary & Secondary Schooling Abroad , (12), 61–68 https://oversea.cnki.net/KCMS/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2020&filename=WGZX201912008&uniplatform=OVERSEA&v=5OH8Xy6uBsS0BSlqEO1odRhgxGAZq0TLG3FPY9uO8O9feLvJES0HIah35_3ON7uz

Zee, M., de Jong, P. F., & Koomen, H. M. Y. (2016). Teachers' Self-Efficacy in Relation to Individual Students With a Variety of Social-Emotional Behaviors: A Multilevel Investigation. Journal of Educational Psychology, 108 (7), 1013–1027. https://doi.org/10.1037/edu0000106

Zee, M., de Jong, P. F., & Koomen, H. M. Y. (2017). From externalizing student behavior to student-specific teacher self-efficacy: The role of teacher-perceived conflict and closeness in the student–teacher relationship. Contemporary Educational Psychology, 51 , 37–50. https://doi.org/10.1016/j.cedpsych.2017.06.009

Download references

Open Access funding enabled and organized by CAUL and its Member Institutions Support for the open access funding was provided by Flinders University.

Author information

Authors and affiliations.

Department of Psychology, Southwest University, Chongqing, China

Siyu Duan & Zhan Xu

College of Education, Psychology and Social Work, Flinders University, Adelaide, Australia

Siyu Duan & Kerry Bissaker

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Kerry Bissaker .

Ethics declarations

Conflict of interests.

The authors declare no competing interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

(DOCX 113 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Duan, S., Bissaker, K. & Xu, Z. Correlates of teachers’ classroom management self-efficacy: A systematic review and meta-analysis. Educ Psychol Rev 36 , 43 (2024). https://doi.org/10.1007/s10648-024-09881-2

Download citation

Accepted : 20 March 2024

Published : 12 April 2024

DOI : https://doi.org/10.1007/s10648-024-09881-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • classroom management self-efficacy
  • teacher self-efficacy
  • meta-analysis
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 13 December 2023

Arts and creativity interventions for improving health and wellbeing in older adults: a systematic literature review of economic evaluation studies

  • Grainne Crealey 1 ,
  • Laura McQuade 2 ,
  • Roger O’Sullivan 2 &
  • Ciaran O’Neill 3  

BMC Public Health volume  23 , Article number:  2496 ( 2023 ) Cite this article

1166 Accesses

22 Altmetric

Metrics details

As the population ages, older people account for a larger proportion of the health and social care budget. A significant body of evidence suggests that arts and creativity interventions can improve the physical, mental and social wellbeing of older adults, however the value and/or cost-effectiveness of such interventions remains unclear.

We systematically reviewed the economic evidence relating to such interventions, reporting our findings according to PRISMA guidelines. We searched bibliographic databases (MEDLINE, EMBASE, Econlit and Web of Science and NHSEED), trial registries and grey literature. No language or temporal restrictions were applied. Two screening rounds were conducted independently by health economists experienced in systematic literature review. Methodological quality was assessed, and key information extracted and tabulated to provide an overview of the published literature. A narrative synthesis without meta-analysis was conducted.

Only six studies were identified which provided evidence relating to the value or cost-effectiveness of arts and creativity interventions to improve health and wellbeing in older adults. The evidence which was identified was encouraging, with five out of the six studies reporting an acceptable probability of cost-effectiveness or positive return on investment (ranging from £1.20 to over £8 for every £1 of expenditure). However, considerable heterogeneity was observed with respect to study participants, design, and outcomes assessed. Of particular concern were potential biases inherent in social value analyses.

Conclusions

Despite many studies reporting positive health and wellbeing benefits of arts and creativity interventions in this population, we found meagre evidence on their value or cost-effectiveness. Such evidence is costly and time-consuming to generate, but essential if innovative non-pharmacological interventions are to be introduced to minimise the burden of illness in this population and ensure efficient use of public funds. The findings from this review suggests that capturing data on the value and/or cost-effectiveness of such interventions should be prioritised; furthermore, research effort should be directed to developing evaluative methods which move beyond the confines of current health technology assessment frameworks, to capture a broader picture of ‘value’ more applicable to arts and creativity interventions and public health interventions more generally.

PROSPERO registration

CRD42021267944 (14/07/2021).

Peer Review reports

The number and proportion of older adults in the population has increased in virtually every country in the world over past decades [ 1 ]. In 2015, there were around 901 million people aged 60 years and over worldwide, by 2030, this will have increased to 1.4 billion [ 2 ]. An ageing population is one of the greatest successes of public health but it has implications for economies in numerous ways: slower labour force growth; working-age people will have to make greater provisions in welfare payments for older people who are no longer economically active; provisions for increased long-term care; and, society must adjust to the changing needs, expectations and capabilities of an expanding group of its citizens.

The Covid-19 pandemic shone an uncompromising light on the health and social care sector, highlighting the seriousness of gaps in policies, systems and services. It also focused attention on the physical and mental health consequences of loneliness and social isolation. To foster healthy ageing and improve the lives of older people, their families and communities, sustained and equitable investment in health and wellbeing is required [ 3 ]. The prevailing model of health and social care which is based ostensibly on formal care provision is unlikely to be sustainable over the longer term. New models, which promote healthy ageing and recognise the need for increasing reliance on self-care are required, as will be evidence of their effectiveness, cost-effectiveness and scalability.

Arts and creativity interventions (ACIs) can have positive effects on health and well-being, as several reviews have shown [ 4 , 5 ]. For older people, ACI’s can enhance wellbeing [ 6 , 7 , 8 , 9 ], quality of life [ 10 , 11 ] and cognitive function [ 12 , 13 , 14 , 15 , 16 ]. They can also foster social cohesion [ 17 , 18 , 19 ] and reduce social disparities and injustices [ 20 ]; promote healthy behaviour; prevent ill health (including enhancing well-being and mental health) [ 21 , 22 , 23 , 24 , 25 ], reducing cognitive decline [ 26 , 27 ], frailty [ 28 , 29 , 30 , 31 , 32 , 33 ] and premature mortality [ 34 , 35 , 36 , 37 , 38 ]); support people with stroke [ 39 , 40 , 41 , 42 ]; degenerative neurological disorders and dementias and support end of life care [ 43 , 44 ]. Moreover, ACIs can benefit not only individuals, but also others, such as supporting the well-being of formal and informal carers, enriching our knowledge of health, and improving clinical skills [ 4 , 5 ].

The benefits of ACIs have also been acknowledged at a governmental level by those responsible for delivering health and care services: The UK All-Party Parliamentary Special Interest group on Arts, Health and Wellbeing produced a comprehensive review of creative intervention for health and wellbeing [ 45 ]. This report contained three key messages: that the arts can keep us well, aid recovery and support longer better lived lives; they can help meet major challenges facing health and social care; and that the arts can save money for the health service and social care.

Despite robust scientific evidence and governmental support, no systematic literature review has collated the evidence with respect to the value, cost or cost-effectiveness of such interventions. Our objective was to assess the economic impact of ACIs aimed at improving the health and wellbeing of older adults; to determine the range and quality of available studies; identify gaps in the evidence-base; and guide future research, practice and policy.

A protocol for this review was registered at PROSPERO, an international prospective register of systematic reviews (Registration ID CRD42021267944). We used pre-determined criteria for considering studies to include in the review, in terms of types of studies, participant and intervention characteristics.

The review followed the five-step approach on how to prepare a Systematic Review of Economic Evaluations (SR-EE) for informing evidence-based healthcare decisions [ 46 , 47 , 48 ]. Subsequent to developing and registering the protocol, the International Society for Pharmacoeconomic Outcomes and Research (ISPOR) published a good practice task force report for the critical appraisal of systematic reviews with costs and cost-effectiveness outcomes (SR-CCEOs) [ 49 ]. This was also used to inform the conduct of this review.

Eligibility criteria

Full economic evaluations are regarded as the optimal type of evidence for inclusion in a SR-EE [ 46 ], hence cost-minimisation analyses (CMA), cost-effectiveness analyses (CEA), cost-utility analyses (CUA) and cost–benefit analyses (CBA) were included. Social value analyses were also included as they are frequently used to inform decision-making and commissioning of services within local government. Additionally, they represent an important intermediate stage in our understanding of the costs and consequences of public health interventions, where significant challenges exist with regard to performing full evaluations [ 50 , 51 , 52 , 53 ].

Development of search strategies

The population (P), intervention (I), comparator (C) and outcomes (O) (PICO) tool provided a framework for development of the search strategy. Studies were included if participants were aged 50 years or older (or if the average age of the study population was 50 years or over). Interventions could relate to performance art (dance, singing, theatre, drama etc.), creative and visual arts (painting, sculpture, art making and design), or creative writing (writing narratives, poetry, storytelling). The intervention had to be active (for example, creating art as opposed to viewing art; playing an instrument as opposed to listening to music). The objective of the intervention had to be to improve health and wellbeing; it had to be delivered under the guidance of a professional; delivered in a group setting and delivered on more than one occasion. No restrictions were placed on the type of comparator(s) or the type of outcomes captured in the study. We deliberately limited the study to professionally led activities to provide a sharper distinction between social events where arts and creativity may occur and arts and creativity interventions per se. We set no language restriction nor a restriction on the date from which studies were reported.

Search methods

PRESS (peer-review electronic search strategies) guidelines informed the design our search strategy [ 54 , 55 ] and an information specialist adapted the search terms (outlined in Table S 1 ) for the following electronic bibliographic databases: MEDLINE, PubMed, EMBASE, Econlit and Web of Science and NHSEED. We also inspected references of all relevant studies; and searched trials registers (ClinicalTrials.gov). Search terms used included cost, return on investment, economic, arts, music, storytelling, dancing, writing and older adult as well as social return on investment (SROI). The last search was performed on 09/11/2022. As many economic evaluations of ACIs (especially SROIs) are commissioned by government bodies or charitable organisations, a search of the grey literature was undertaken.

Handling searches

A PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow chart was used to document study selection, illustrating the numbers of records retrieved and selection flow through the screening rounds [ 56 , 57 , 58 ]; all excluded records (with rationale for exclusion) were documented.

Selection of studies

Two screening rounds were conducted independently by two health economists experienced in undertaking reviews (GC, CO’N). The first round screened the title and abstract of articles based on the eligibility criteria; those selected at this stage entered a second round of full text screening with eligibility based on the inclusion and exclusion criteria. Any disagreements were discussed among the two reviewers, with access to a third reviewer available to resolve disagreements, though this proved unnecessary.

Data extraction and management

Two reviewers extracted relevant information independently using an proforma developed specifically for the purposes of this study, which included all 35 items suggested by Wijnen et al. (2016) [ 48 ]. Information was extracted in relation to the following factors: (1) general information including study title, author, year, funding source, country, setting and study design; (2) recruitment details, sample size, demographic characteristics (age, gender) and baseline health data (diagnosis, comorbidities); (3) interventions, effectiveness and cost data; (4) type of economic evaluation, perspective, payer, beneficiary, time horizon, measure of benefit and scale of intervention; (5) quality assessment, strength of evidence, any other important information; (6) results; (7) analysis of uncertainty and (8) conclusions. The quality assessment/risk of bias checklists were included in the data extraction proforma, and picklists were used to enhance uniformity of responses. The data extraction form was piloted by two reviewers (GC and CON) on one paper and discussion used to ensure consistent application thereafter.

Assessment of study quality

Two reviewers (GC & CON) independently assessed study quality, with recourse to a third reviewer for resolution of differences though this proved unnecessary. Quality assessment was based on the type of economic evaluation undertaken. Full and partial trial-based economic evaluations were assessed using the CHEC-extended checklist [ 59 ]. SROI analyses were assessed using a SROI-specific quality framework developed for the purpose of systematic review [ 60 ].

Data analysis methods

Due to the small number of evaluations detected, possible sources of heterogeneity and a lack of consensus on appropriate methods for pooling cost-effectiveness estimates [ 61 ] a narrative synthesis analysis was undertaken.

Database searches returned 11,619 records; from this, 402 duplicates were removed leaving 11,214 reports. From these 113 reports were assessment against the inclusion and exclusion criteria resulting in 4 studies for inclusion in the review. Over 40 websites were searched for relevant content returning 2 further studies for inclusion. The PRISMA 2020 diagram is presented in Fig.  1 . A high sensitivity search strategy was adopted to ensure all relevant studies were identified, resulting in a large number of studies being excluded at the first stage of screening.

figure 1

PRISMA 2020 flow diagram for new systematic reviews which include searches of databases, registers and other sources

A total of six studies were identified; key characteristics are presented in Table 1 . Identified studies were published between 2011 and 2020. Two studies used a health technology assessment (HTA) framework alongside clinical trials [ 62 , 63 ] to assess the cost-effectiveness of community singing interventions. Both evaluations scored highly on the CHEC-extended checklist (Table 2 ), with findings reported in line with the CHEERS (Consolidated Health Economic Estimation Reporting Standards) checklist 2022 [ 64 ].

Four further studies employed an SROI framework to assess art and/or craft interventions: two studies were published in the peer-reviewed literature [ 65 , 66 ] and a further two in the grey literature [ 67 , 68 ]. All four adhered closely to the suggested steps for performing an SROI and consequently secured high scores (Table 3 ). No quality differential was discerned between those studies published in the academic literature when compared with those from the grey literature.

Five of the studies were undertaken in the UK [ 63 , 66 , 67 , 68 , 69 ] and one in the US [ 63 ]. Four of the studies were designed for older adults with no cognitive impairment [ 62 , 63 , 67 , 68 ]; one was designed for participants with or without dementia [ 65 ], and another was specifically for older adults with dementia and their caregivers [ 66 ]. Three of the studies were delivered in a community setting [ 62 , 63 , 67 ], two in care homes [ 65 , 68 ] and one across a range of settings (hospital, community and residential) [ 66 ]. The length and duration of the ACIs varied; some lasted 1–2 h (with multiple classes available to participants) [ 65 ], whereas others were structured programmes with sessions lasting 90 min over a 14-week period [ 62 ]. The number of participants included in studies varied; the largest study contained data from 390 participants [ 63 ], whereas other studies measured engagement using numbers of care homes or housing associations included [ 67 , 68 ].

Costs were captured from a narrower perspective (i.e., the payer—health service) for those economic evaluations which followed a health technology assessment (HTA) framework [ 62 , 63 ]. Costs associated with providing the programme and health and social care utilisation costs were captured using cost diaries. Valuation of resource usage was in line with the reference case specified for each jurisdiction.

Social value analyses included in the review [ 65 , 66 , 67 , 68 ] captured a broader picture of cost; programme provision costs included were similar in nature to those identified using an HTA framework, however, the benefits captured went beyond the individual to capture costs to a wide range of stakeholders such as family members, activity co-ordinations and care home personnel. Costs were apportioned using financial proxies from a range of sources including HACT Social Value Bank [ 69 ] and market-based valuation methods.

The range of outcomes captured and valued across HTAs and SROIs was extensive: including, but not limited to, wellbeing, quality of life, physical health, cognitive functioning, communication, control over daily life choices, engagement and empowerment, social isolation, mobility, community inclusion, depressive symptoms, sadness, anxiety, loneliness, positive affect and interest in daily life. In the programmes assessed using an HTA framework, outcomes were captured using standardised and validated instruments, for both control and intervention groups across multiple time points. Statistical methods were used to assess changes in outcomes over time. Programmes assessed using SROI relied primarily on qualitative methods (such as reflective diaries and in-depth interviews) combined with routinely collected administrative data.

The evidence from the singing interventions was encouraging but not conclusive. The ‘Silver Song Club’ programme [ 62 ] reported a 64% probability of being cost-effective at a willingness-to-pay threshold of £30,000. This study was also included in the Public Health England (PHE) decision tool to support local commissioners in designing and implementing services to support older people’s healthy ageing, reporting a positive societal return on investment [ 70 ]. Evidence from the ‘Community of Voices’ trial [ 63 ] suggested that although intervention group members experienced statistically significant improvements in loneliness and interest in life compared to control participants, no significant group differences were observed for cognitive or physical outcomes or for healthcare costs.

A positive return on investment was reported by all social value analyses undertaken. The ‘Imagine Arts’ programme, reported a positive SROI of £1.20 for every £1 of expenditure [ 65 ]. A higher yield of between £3.20-£6.62 for each £1 invested was reported in the ‘Dementia and Imagination’ programme [ 66 ]. The ‘Craft Café’ programme, reported an SROI of £8.27 per £1 invested [ 68 ], and the ‘Creative Caring’ programme predicted a SROI of between £3 to £4 for every £1 spent [ 67 ]. The time period over which return on investment was calculated differed for each evaluation from less than one year to 4 years.

The primary finding from our review concerns the paucity of evidence relating to the value, cost and/or cost-effectiveness of ACIs aimed at improving health and wellbeing in this population. Despite few restrictions being applied to our search, only six studies were found which met our inclusion criteria. This is not indicative of research into ACIs in this population, as evidenced by the identification of ninety-three studies where arts and creativity interventions were found to support better health and wellbeing outcomes in another recent review [ 5 ]. An alternative explanation is that funders do not see the added value of undertaking such evaluations in this area. That is, for funders, the cost of evaluating an ACIs is likely to be deemed unjustified given the relatively small welfare loss a misallocation of resources to them might produce. While at first glance this may seem reasonable, it disadvantages ACIs in competing with other interventions for funding and arguably exposes an implicit prejudice in the treatment of interventions from which it may be difficult to extract profit in general. That is, the paucity of evidence, may reflect inherent biases within our political economy that favour the generation of marketable solutions to health issues from which value can be appropriated as profit. Pharmaceuticals are an obvious example of such solutions, where the literature is replete with examples of evaluations sponsored by pharmaceutical companies or where public funds are used to test the claims made by pharmaceutical companies in respect of the value of their products. If the potential of ACIs to improve health and well-being is to be robustly established, ACIs must effectively compete for funding with other interventions including those from pharma. This requires a larger, more robust evidence base than is currently available and investment in the creation of such an evidence base. As there is currently no ‘for-profit’ industry to generate such an evidence base, public funding of evaluations will be central to its creation.

Our second finding concerns the values reported in the meagre evidence we did find. In five of the six studies we identified, evidence indicated that ACIs targeted at older people offered value for money [ 62 , 65 , 66 , 67 , 68 ]. One study provided mixed evidence [ 63 ], however, in this study a ‘payer’ perspective was adopted when applying an HTA framework which, by virtue of the perspective adopted, excluded a range of benefits attributable to ACIs and public health interventions more generally. Among the four studies that adopted a SROI approach, estimated returns per £1 invested ranged from £1.20 to £8.27. Given the evident heterogeneity among studies in terms of context and methods, care is warranted in comparing estimates with each other or with other SROIs. Care is also required in accepting at face value the estimates reported given methodological issues that pertain to the current state of the art with respect to SROI. With these caveats in mind noted, the values reported for ACIs using the SROI approach are comparable with those from other SROI studies in other contexts including those as diverse as a first aid intervention [ 71 ], investment in urban greenways [ 72 ] and the provision of refuge services to those experiencing domestic violence [ 73 ] (a return on investment of £3.50-£4, £2.88-£5.81 and £4.94 respectively). Similarly, with respect to the study that adopted a cost-effectiveness approach, Coulton and colleagues (2015) reported a 64% probability of the intervention being cost-effective at a threshold of £30,000 [ 62 ]. Again, it is difficult to compare studies directly, but this is similar to that reported for interventions as diverse as a falls prevention initiative [ 74 ] and the treatment of depression using a collaborative approach [ 75 ] both in the UK. That the evidence base is meagre notwithstanding, there is, in other words, a prima facie case that ACIs are capable of offering value for money when targeted at older persons.

Our third finding relates to the state of the art with respect to SROIs in this area. Over the past 40 years, considerable time, effort and resources have been expended in the development of cost-effectiveness techniques in health and social care. While considerable heterogeneity can exist around their conduct, national guidance exists in many jurisdictions on the conduct of cost-effectiveness analyses (CEA) – such as the NICE reference case in the UK [ 76 ]– as well as in the reporting of these as set out in the CHEERS 2022 guidance [ 64 ]. This has helped raise the quality of published evaluations and the consistency with which they are reported. Despite the existence of a step-by-step guidance document on how to perform SROIs [ 77 ] which outlines how displacement effects, double counting, effect attribution and drop-off should be addressed, a significant body of work still remains to ensure that the methodology addresses a range of known biases in a robust manner. Where there is no comparator to the intervention being evaluated (as was the case in the SROIs reported here) it may be difficult to convince funders that the implicit incremental costs and benefits reported are indeed incremental and attributable to the intervention. Equally, where a comparator is present, greater consensus and standardisation is required regarding the identification, generation and application of, for example, financial proxies. Currently, SROI ratios combine value across a wide range of stakeholders, which is understandable if the objective is to capture all aspects of social benefit generated. This ratio, however, may not reflect the priorities and statutory responsibilities of healthcare funders. Whist all of the aforementioned issues can be addressed, investment is required to develop the SROI methodology further to more closely meet the needs of commissioning bodies.

Notwithstanding these challenges, social value analyses play a pivotal role within the procurement processes employed by government, local authorities and other non-departmental public bodies and should not be dismissed simply because the ‘burden of proof’ falls short of that required to secure remuneration within the health sector. As most SROIs are published in the grey literature, this means they often avoid peer scrutiny prior to publication and the potential quality assurance this can offer. It is noteworthy however that two of the SROIs included in this review [ 65 , 66 ] were published in the academic literature, suggesting that the academic community are engaging with this method which is to be applauded.

Moving forward, it is unlikely we will be able to meet all of the health and wellbeing needs of our ageing population solely in a primary or secondary care setting. New models of care are required, as are new models of funding to support interventions which can be delivered in non-healthcare settings. New hybrid models of evaluation will be required to provide robust economic evidence to assist in the allocation of scarce resources across health and non-healthcare settings; such evaluative frameworks must have robust theoretical underpinnings and be capable of delivering evidence from a non-clinical setting in a timely and cost-effective manner.

In the absence of a definitive evaluation framework for ACIs being currently available, we have a number of recommendations. First, and most importantly, all impact assessments should have a control group or credible counterfactual. This is currently not required when performing an SROI making it difficult to determine if all of the benefits ascribed to an intervention are in fact attributable. This recommendation is in line with the conclusion of a report by the London School of Economics [ 78 ] for the National Audit Office (NAO) which concluded that ‘any impact evaluation (and subsequent value for money calculation) requires construction of a counterfactual’. Second, a detailed technical appendix should accompany all impact assessments to allow independent review by a subject specialist. While this would assist peer review, it would allow providing greater transparency where peer review was not undertaken prior to publication. Furthermore, it would enable recalculation of SROI ratios to exclude ‘value’ attributable to stakeholders which are not relevant to a particular funder. Third, equity considerations should be addressed explicitly in all evaluations (this is currently not required in HTAs). Fourth, both costs and outcomes should be captured from a ‘broad’ perspective (adopting a ‘narrow’ healthcare perspective may underestimate the full economic impact), with non-healthcare sector costs being detailed as part of the analysis. Finally, data should be collected post-implementation to ensure that resources continue to be allocated efficiently.

As with any review, there are limitations which should be noted. A search of the grey literature was included as evaluations of applied public health interventions are not always reported in the academic literature. Systematically identifying grey literature and grey data can be problematic [ 79 , 80 , 81 , 82 , 83 ] as it is not collected, organised or stored in a consistent manner. Hence it is possible that we have not identified all relevant studies. Furthermore, as applied public health interventions can be performed in a non-healthcare setting we included SROIs in our review of economic evaluations. Current guidance on the systematic review of economic evaluations has been developed primarily for review of HTA as opposed to public health interventions and hence SROIs would be excluded, or if included would score poorly due to the inherent biases arising from no comparator or counterfactual being included.

This systematic review found that participation in group-based arts and creativity programmes was generally cost-effective and/or produced a positive return on investment whilst having a positive impact on older people’s physical, psychological, and social health and wellbeing outcomes. Unfortunately, the small number of studies identified, coupled with differences in methods used to assess economic impact hinders our ability to conclusively determine which types of art and creativity-based activities are more cost-effective or represent best value for money.

As well as the need for a greater focus on prevention of poor health as we age, new hybrid models of healthcare delivery are necessary to meet the needs of our ageing population. These models will integrate traditional medical care with other services such as home health aides (some of which may include artificial intelligence), telemedicine and social support networks. Alongside these, ACIs have the potential to provide a low cost, scalable, easily implementable and cost-effective solution to reduce the burden of illness in this age group and support healthy ageing.

Evidence on the cost-effectiveness of a range of ACIs is of utmost importance for policy and decision makers as it can both inform the development of policies that support the provision of ACIs in the context of ageing, but also identify the most cost-effective approaches for delivering such interventions. The development of hybrid models of evaluation, capable of capturing cost-effectiveness and social value, is becoming increasingly necessary as healthcare delivery for this age group moves beyond the realms of primary and secondary care and into the community. The development and refinement of such models will ensure a more comprehensive assessment of the impact of a diverse range of interventions providing a more nuanced understanding of the impact of an intervention. This will help inform decision making and ensure interventions are implemented in a cost-effective and socially beneficial manner.

Availability of data and materials

All data generated or analysed during this study are included in the published article and its supplementary information files.

United Nations, Department of Economic and Social Affairs, Population Division. World Population Prospects: The 2015 Revision, Key Findings and Advance Tables. Working Paper No. ESA/P/WP.241. 2015.

Office for National Statistics. Living longer: how our population is changing and why it matters. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/ageing/articles/livinglongerhowourpopulationischangingandwhyitmatters/2018-08-13#how-do-incomes-of-older-people-compare-with-younger-ages . 2018. Accessed 07/12/2022

Dyakova M, Hamelmann C, Bellis MA, Besnier E, Grey CNB, Ashton K, Schwappach A, Clar C. Investment for health and well-being: a review of the social return on investment from public health policies to support implementing the Sustainable Development Goals by building on Health 2020 [Internet]. Copenhagen: WHO Regional Office for Europe; 2017.

Google Scholar  

Fancourt D, Finn S. What is the evidence on the role of the arts in improving health and well-being? A scoping review. Copenhagen: WHO Regional Office for Europe; 2019.

McQuade L, O’Sullivan R. Examining arts and creativity in later life and its impact on older people’s health and wellbeing: a systematic review of the evidence. Perspect Publ Health. 2023;0(0). https://doi.org/10.1177/17579139231157533

Skingley A, De’Ath S, Napleton L. Evaluation of Edna: arts and dance for older people. Work Older People. 2016;20(1):46–56.

Article   Google Scholar  

Brustio PR, Liubicich ME, Chiabrero M, et al. Dancing in the golden age: a study on physical function, quality of life, and social engagement. Geriatr Nurs. 2018;39(6):635–9.

Article   PubMed   Google Scholar  

Beauchet O, Bastien T, Mittelman M, et al. Participatory art-based activity, community-dwelling older adults and changes in health condition: results from a pre-post intervention, single-arm, prospective and longitudinal study. Maturitas. 2020;134:8–14.

Article   CAS   PubMed   Google Scholar  

Roswiyani R, Hiew CH, Witteman CLM, et al. Art activities and qigong exercise for the well-being of older adults in nursing homes in Indonesia: a randomized controlled trial. Aging Ment Health. 2020;24(10):1569–78.

Shanahan J, Bhriain ON, Morris ME, et al. Irish set dancing classes for people with Parkinson’s disease: the needs of participants and dance teachers. Complement Ther Med. 2016;27:12–7.

Garcia Gouvêa JA, Antunes MD, Bortolozzi F, et al. Impact of senior dance on emotional and motor parameters and quality of life of the elderly. Rev Rene. 2017;18(1):51–8.

Sun J, Zhang N, Buys N, et al. The role of Tai Chi, cultural dancing, playing a musical instrument and singing in the prevention of chronic disease in Chinese older adults: a mind–body meditative approach. Int J Ment Health Pr. 2013;15:227–39.

Fu MC, Belza B, Nguyen H, et al. Impact of group-singing on older adult health in senior living communities: a pilot study. Arch Gerontol Geriatr. 2018;76:138–46.

Feng L, Romero-Garcia R, Suckling J, et al. Effects of choral singing versus health education on cognitive decline and aging: a randomized controlled trial. Aging-us. 2020;12(24):24798–816.

Seinfeld S, Figueroa H, Ortiz-Gil J, et al. Effects of music learning and piano practice on cognitive function, mood and quality of life in older adults. Front Psychol. 2013;4:810.

Article   PubMed   PubMed Central   Google Scholar  

MacRitchie J, Breaden M, Milne AJ, et al. Cognitive, motor and social factors of music instrument training programs for older adults’ improved wellbeing. Front Psychol. 2020;10:2868.

Freeman WJI. A neurobiological role of music in social bonding. In: Wallin N, Merkur B, Brown S, editors. The origins of music. Cambridge: MIT Press; 2000. http://escholarship.org/uc/item/9025x8rt .

Huron D. Is music an evolutionary adaptation? Ann N Y Acad Sci. 2001;930(1):43–61. https://doi.org/10.1111/j.1749-6632.2001.tb05724.x .

Tarr B, Launay J, Dunbar RIM. Music and social bonding: “self–other” merging and neurohormonal mechanisms. Front Psychol. 2014;5:1096. https://doi.org/10.3389/fpsyg.2014.01096 .

Cain M, Lakhani A, Istvandity L. Short and long term outcomes for culturally and linguistically diverse (cald) and at-risk communities in participatory music programs: a systematic review. Arts Health. 2016;8(2):105–24. https://doi.org/10.1080/17533015.2015.1027934 .

Martin L, Oepen R, Bauer K, Nottensteiner A, Mergheim K, Gruber H, et al. Creative arts interventions for stress management and prevention – a systematic review. Behav Sci (Basel). 2018;8(2):pii:E28. https://doi.org/10.3390/bs8020028 .

Linnemann A, Wenzel M, Grammes J, Kubiak T, Nater UM. Music listening and stress in daily life: a matter of timing. Int J Behav Med. 2018;25(2):223–30. https://doi.org/10.1007/s12529-017-9697-5 .

Linnemann A, Strahler J, Nater UM. The stress-reducing effect of music listening varies depending on the social context. Psychoneuroendocrinology. 2016;72:97–105. https://doi.org/10.1016/j.psyneuen.2016.06.003 .

Panteleeva Y, Ceschi G, Glowinski D, Courvoisier DS, Grandjean DM. Music for anxiety? meta-analysis of anxiety reduction in non-clinical samples. Psychol Music. 2017;46(4):473–87. https://doi.org/10.1177/0305735617712424 .

Fancourt D, Tymoszuk U. Cultural engagement and incident depression in older adults: evidence from the English longitudinal study of ageing. Br J Psychiatry. 2018;214(4):225–9. https://doi.org/10.1192/bjp.2018.267 .

Balbag MA, Pedersen NL, Gatz M. Playing a musical instrument as a protective factor against dementia and cognitive impairment: a population-based twin study. Int J Alzheimer’s Dis. 2014;2014:836748. https://doi.org/10.1155/2014/836748 .

Porat S, Goukasian N, Hwang KS, Zanto T, Do T, Pierce J, et al. Dance experience and associations with cortical gray matter thickness in the aging population. Dement Geriatr Cogn Dis Extra. 2016;6(3):508–17. https://doi.org/10.1159/000449130 .

Federici A, Bellagamba S, Rocchi MBL. Does dance-based training improve balance in adult and young old subjects? a pilot randomized controlled trial. Aging Clin Exp Res. 2005;17(5):385–9 PMID: 16392413.

Alpert PT, Miller SK, Wallmann H, Havey R, Cross C, Chevalia T, et al. The effect of modified jazz dance on balance, cognition, and mood in older adults. J Am Acad Nurse Pract. 2009;21(2):108–15. https://doi.org/10.1111/j.1745-7599.2008.00392.x .

Jeon MY, Bark ES, Lee EG, Im JS, Jeong BS, Choe ES. The effects of a Korean traditional dance movement program in elderly women. Taehan Kanho Hakhoe Chi. 2005;35(7):126876 (in Korean). PMID: 16418553.

Eyigor S, Karapolat H, Durmaz B, Ibisoglu U, Cakir S. A randomized controlled trial of Turkish folklore dance on the physical performance, balance, depression and quality of life in older women. Arch Gerontol Geriatr. 2009;48(1):84–8. https://doi.org/10.1016/j.archger.2007.10.008 .

Noopud P, Suputtitada A, Khongprasert S, Kanungsukkasem V. Effects of Thai traditional dance on balance performance in daily life among older women. Aging Clin Exp Res. 2018;31(7):961–7. https://doi.org/10.1007/s40520-018-1040-8 .

Trombetti A, Hars M, Herrmann FR, Kressig RW, Ferrari S, Rizzoli R. Effect of musicbased multitask training on gait, balance, and fall risk in elderly people: a randomized controlled trial. Arch Intern Med. 2011;171(6):525–33. https://doi.org/10.1001/archinternmed.2010.446 .

Hyyppä MT, Mäki J, Impivaara O, Aromaa A. Individual-level measures of social capital as predictors of all-cause and cardiovascular mortality: a population-based prospective study of men and women in Finland. Eur J Epidemiol. 2007;22(9):589–97. https://doi.org/10.1007/s10654-007-9153-y .

Hyyppä MT, Mäki J, Impivaara O, Aromaa A. Leisure participation predicts survival: a population-based study in Finland. Health Promot Int. 2006;21(1):5–12. https://doi.org/10.1093/heapro/dai027 .

Lennartsson C, Silverstein M. Does engagement with life enhance survival of elderly people in Sweden? the role of social and leisure activities. J Gerontol B Psychol Sci Soc Sci. 2001;56(6):S335–42. https://doi.org/10.1093/geronb/56.6.s335 .

Sundquist K, Lindström M, Malmström M, Johansson SE, Sundquist J. Social participation and coronary heart disease: a follow-up study of 6900 women and men in Sweden. Soc Sci Med. 1982;58(3):615–22. https://doi.org/10.1016/s0277-9536(03)00229-6 .

Väänänen A, Murray M, Koskinen A, Vahtera J, Kouvonen A, Kivimäki M. Engagement in cultural activities and cause-specific mortality: prospective cohort study. Prev Med. 2009;49(2–3):142–7. https://doi.org/10.1016/j.ypmed.2009.06.026 .

Särkämö T, Soto D. Music listening after stroke: beneficial effects and potential neural mechanisms. Ann N Y Acad Sci. 2012;1252(1):266–81. https://doi.org/10.1111/j.1749-6632.2011.06405.x .

Särkämö T, Pihko E, Laitinen S, Forsblom A, Soinila S, Mikkonen M, et al. Music and speech listening enhance the recovery of early sensory processing after stroke. J Cogn Neurosci. 2010;22(12):2716–27. https://doi.org/10.1162/jocn.2009.21376 .

Särkämö T, Ripollés P, Vepsäläinen H, Autti T, Silvenno HM, Salli E, et al. Structural changes induced by daily music listening in the recovering brain after middle cerebral artery stroke: a voxel-based morphometry study. Front Hum Neurosci. 2014;8:245. https://doi.org/10.3389/fnhum.2014.00245 .

Särkämö T, Tervaniemi M, Laitinen S, Forsblom A, Soinila S, Mikkonen M, et al. Music listening enhances cognitive recovery and mood after middle cerebral artery stroke. Brain. 2008;131(3):866–76. https://doi.org/10.1093/brain/awn013 .

Fancourt D, Steptoe A, Cadar D. Cultural engagement and cognitive reserve: museum attendance and dementia incidence over a 10-year period. Br J Psychiatry. 2018;213(5):661–3. https://doi.org/10.1192/bjp.2018.129 .

Fancourt D, Steptoe A, Cadar D. Cultural engagement predicts changes in cognitive function in older adults over a 10 year period: findings from the English longitudinal study of ageing. Sci Rep. 2018;8(1):10226. https://doi.org/10.1192/bjp.2018.129 .

All Party Parliamentary group on arts, health and wellbeing. Creative health: the arts for health and wellbeing. 2017.

van Mastrigt GA, Hiligsmann M, Arts JJ, Broos PH, Kleijnen J, Evers SM, Majoie MH. How to prepare a systematic review of economic evaluations for informing evidence-based healthcare decisions: a five-step approach (part 1/3). Expert Rev Pharmacoecon Outcomes Res. 2016;16(6):689–704. https://doi.org/10.1080/14737167.2016.1246960 . Epub 2016 Nov 2 PMID: 27805469.

Thielen FW, Van Mastrigt G, Burgers LT, Bramer WM, Majoie H, Evers S, Kleijnen J. How to prepare a systematic review of economic evaluations for clinical practice guidelines: database selection and search strategy development (part 2/3). Expert Rev Pharmacoecon Outcomes Res. 2016;16(6):705–21. https://doi.org/10.1080/14737167.2016.1246962 . Epub 2016 Nov 2 PMID: 27805466.

Wijnen B, Van Mastrigt G, Redekop WK, Majoie H, De Kinderen R, Evers S. How to prepare a systematic review of economic evaluations for informing evidence-based healthcare decisions: data extraction, risk of bias, and transferability (part 3/3). Expert Rev Pharmacoecon Outcomes Res. 2016;16(6):723–32. https://doi.org/10.1080/14737167.2016.1246961 . Epub 2016 Oct 21 PMID: 27762640.

Mandrik OL, Severens JLH, Bardach A, Ghabri S, Hamel C, Mathes T, Vale L, Wisløff T, Goldhaber-Fiebert JD. Critical appraisal of systematic reviews with costs and cost-effectiveness outcomes: an ISPOR good practices task force report. Value Health. 2021;24(4):463–72. https://doi.org/10.1016/j.jval.2021.01.002 . PMID: 33840423.

Kelly MP, McDaid D, Ludbrook A, Powell J: Economic appraisal of public health interventions. http://www.cawt.com/Site/11/Documents/Publications/Population%20Health/Economics%20of%20Health%20Improvement/Economic_appraisal_of_public_health_interventions.pdf

Weatherly H, Drummond M, Claxton K, Cookson R, Ferguson B, Godfrey C, Rice N, Sculpher M, Sowden A. Methods for assessing the cost-effectiveness of public health interventions: key challenges and recommendations. Health Policy. 2009;93(2–3):85–92. https://doi.org/10.1016/j.healthpol.2009.07.012 . Epub 2009 Aug 25 PMID: 19709773.

Payne K, McAllister M, Davies LM. Valuing the economic benefits of complex interventions: when maximising health is not sufficient. Health Econ. 2012. https://doi.org/10.1002/hec.2795 .

Edwards RT, Charles JM, Lloyd-Williams H. Public health economics: a systematic review of guidance for the economic evaluation of public health interventions and discussion of key methodological issues. BMC Public Health. 2013;24(13):1001. https://doi.org/10.1186/1471-2458-13-1001.PMID:24153037;PMCID:PMC4015185 .

Rethlefsen ML, Farrell AM, Osterhaus Trzasko LC, Brigham TJ. Librarian co-authors correlated with higher quality reported search strategies in general internal medicine systematic reviews. J Clin Epidemiol. 2015;68(6):617–26. https://doi.org/10.1016/j.jclinepi.2014.11.025 . Epub 2015 Feb 7 PMID: 25766056.

McGowan J, Sampson M, Salzwedel DM, Cogo E, Foerster V, Lefebvre C. PRESS peer review of electronic search strategies: 2015 guideline statement. J Clin Epidemiol. 2016;75:40–6. https://doi.org/10.1016/j.jclinepi.2016.01.021 . Epub 2016 Mar 19 PMID: 27005575.

Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA Statement. PLoS Med. 2008;6:e1000097. https://doi.org/10.1371/journal.pmed.1000097 .

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. 2009;21(339):b2700. https://doi.org/10.1136/bmj.b2700.PMID:19622552;PMCID:PMC2714672 .

Moher D, Liberati A, Tetzlaff J, Altman DG. PRISMA Group. PLoS Med. 2009;6(7):e1000097 Evers S, Goossens M, De Vet H, et al. Criteria list for assessment of methodological quality of economic evaluations: consensus on health economic criteria. Int J Technol Assess Health Care. 2005;21(02):240–245.

Evers S, Goossens M, de Vet H, van Tulder M, Ament A. Criteria list for assessment of methodological quality of economic evaluations: Consensus on Health Economic Criteria. Int J Technol Assess Health Care. 2005;21(2):240–5 PMID: 15921065.

Hutchinson CL, Berndt A, Gilbert-Hunt S, George S, Ratcliffe J. Valuing the impact of health and social care programmes using social return on investment analysis: how have academics advanced the methodology? A protocol for a systematic review of peer-reviewed literature. BMJ Open. 2018;8(12):e022534. https://doi.org/10.1136/bmjopen-2018-022534 . PMID:30530579;PMCID:PMC6303612.

Higgins J, Green S. Cochrane handbook for systematic reviews of interventions version 5.1. 0. Chichester: The Cochrane Collaboration; 2013.

Coulton S, Clift S, Skingley A, Rodriguez J. Effectiveness and cost-effectiveness of community singing on mental health-related quality of life of older people: randomised controlled trial. Br J Psychiatry. 2015;207(3):250–5. https://doi.org/10.1192/bjp.bp.113.129908 . Epub 2015 Jun 18 PMID: 26089304.

Johnson JK, Stewart AL, Acree M, Nápoles AM, Flatt JD, Max WB, Gregorich SE. A community choir intervention to promote well-being among diverse older adults: results from the community of voices trial. J Gerontol B Psychol Sci Soc Sci. 2020;75(3):549–59. https://doi.org/10.1093/geronb/gby132 . PMID:30412233;PMCID:PMC7328053.

Husereau D, Drummond M, Augustovski F, de Bekker-Grob E, Briggs AH, Carswell C, Caulley L, Chaiyakunapruk N, Greenberg D, Loder E, Mauskopf J, Mullins CD, Petrou S, Pwu RF, Staniszewska S, CHEERS 2022 ISPOR Good Research Practices Task Force. Consolidated Health Economic Evaluation Reporting Standards 2022 (CHEERS 2022) statement: updated reporting guidance for health economic evaluations. Value Health. 2022;25(1):3–9. https://doi.org/10.1016/j.jval.2021.11.1351 . PMID: 35031096.

Bosco A, Schneider J, Broome E. The social value of the arts for care home residents in England: a social return on investment (SROI) analysis of the imagine arts programme. Maturitas. 2019;124:15–24. https://doi.org/10.1016/j.maturitas.2019.02.005 . Epub 2019 Mar 13 PMID: 31097173.

Jones C, Windle G, Edwards RT. Dementia and imagination: a social return on investment analysis framework for art activities for people living with dementia. Gerontologist. 2020;60(1):112–23. https://doi.org/10.1093/geront/gny147 . PMID: 30476114.

Social Value Lab and Impact Arts Craft Café: creative solutions to isolation and loneliness; Social return on investment. 2011. http://www.socialvaluelab.org.uk/wp-content/uploads/2013/05/CraftCafeSROI.pdf

MB associates. Make my day: the impact of Creative Caring in older people’s care homes. 2013. https://www.suffolkartlink.org.uk/wp-content/uploads/2014/10/CreativeCarersSROIReport_Nov2013.pdf

HACT. n.d. UK Social Value Bank. Retrieved December 11, 2023. from https://hact.org.uk/tools-and-services/uk-social-value-bank/ .

The Older Adults’ NHS and social care return on investment tool. Project report. Public health England. December 2019. Last accessed 27/03/2023.

British Red Cross – Valuing First Aid Education. 2018. https://socialvalueuk.org/wp-content/uploads/2018/12/Valuing-First-Aid-Education-Social-Return-on-Investment-Report-on-the-value-of-First-Aid-Education-Assured-Report.pdf . Accessed 17/02/2023

Hunter R, Dallat M, Tully M, O’Neill C, Heron L, Kee F. Social return on investment analysis of an urban greenway. Cities and Health. 2020. https://doi.org/10.1080/23748834.2020.1766783 .

NEF Consulting. Refuge: A social return on investment evaluation. 2016. https://socialvalueuk.org/wp-content/uploads/2017/04/Refuge-SROI-2016.pdf Accessed 17/02/2022

Corbacho B, Cockayne S, Fairhurst C, Hewitt CE, Hicks K, Kenan AM, Lamb SE, MacIntosh C, Menz HB, Redmond AC, Rodgers S, Scantlebury A, Watson J, Torgerson DJ, on behalf of the REFORM study. Cost-Effectiveness of a Multifaceted Podiatry Intervention for the Prevention of Falls in Older People: The REducing Falls with Orthoses and a Multifaceted Podiatry Intervention Trial Findings. Gerontology. 2018;64(5):503–12. https://doi.org/10.1159/000489171 . Epub 2018 Jun 26 PMID: 29945150.

Green C, Richards DA, Hill JJ, Gask L, Lovell K, Chew-Graham C, Bower P, Cape J, Pilling S, Araya R, Kessler D, Bland JM, Gilbody S, Lewis G, Manning C, Hughes-Morley A, Barkham M. Cost-effectiveness of collaborative care for depression in UK primary care: economic evaluation of a randomised controlled trial (CADET). PLoS ONE. 2014;9(8):e104225. https://doi.org/10.1371/journal.pone.0104225.PMID:25121991;PMCID:PMC4133193 .

National Institute for Health and Care Excellence (NICE). NICE health technology evaluations: the manual. 2022. Retrieved 27 March, 2023 from  https://www.nice.org.uk/process/pmg36/chapter/introduction-to-health-technology-evaluation

NEF Consulting. SSE – Beatrice SROI framework – guidance document. https://www.sse.com/media/svnn5jpk/sroi-methodology-guidance-nef-consulting.pdf . Accessed 17/02/2022

Gibbons S, McNally S, Overman H. Review of Government Evaluations: A report for the NAO. London: National Audit Office; 2013.

Turner AM, Liddy ED, Bradley J, Wheatley JA. Modeling public health interventions for improved access to the gray literature. J Med Libr Assoc. 2005;93(4):487–94 PMID: 16239945; PMCID: PMC1250325.

PubMed   PubMed Central   Google Scholar  

Benzies KM, Premji S, Hayden KA, Serrett K. State-of-the-evidence reviews: advantages and challenges of including grey literature. Worldviews Evid Based Nurs. 2006;3(2):55–61. https://doi.org/10.1111/j.1741-6787.2006.00051.x . PMID: 17040510.

Franks H, Hardiker NR, McGrath M, McQuarrie C. Public health interventions and behaviour change: reviewing the grey literature. Public Health. 2012;126(1):12–7. https://doi.org/10.1016/j.puhe.2011.09.023 . Epub 2011 Nov 29 PMID: 22130477.

Mahood Q, Van Eerd D, Irvin E. Searching for grey literature for systematic reviews: challenges and benefits. Res Synth Methods. 2014;5(3):221–34. https://doi.org/10.1002/jrsm.1106 . Epub 2013 Dec 6 PMID: 26052848.

Godin K, Stapleton J, Kirkpatrick SI, Hanning RM, Leatherdale ST. Applying systematic review search methods to the grey literature: a case study examining guidelines for school-based breakfast programs in Canada. Syst Rev. 2015;22(4):138. https://doi.org/10.1186/s13643-015-0125-0 . PMID:26494010;PMCID:PMC4619264.

Download references

Acknowledgements

We would like to thank Ms. Louise Bradley (Information Resource Officer, Institute of Public Health) for her assistance in refining search strategies and literature search.

This study was supported by the Institute of Public Health (IPH), 200 South Circular Road, Dublin 8, Ireland, D08 NH90. This study was a collaboration between two health economists (GC, CO’N) and two members of staff from the funding organisation (LM, RO’S). Input from IPH staff was fundamental in defining the scope of work and research question, refining search terms and review and editing of the manuscript. Staff from IPH were not involved in quality assurance or review of papers included in the manuscript.

Author information

Authors and affiliations.

Clinical Costing Solutions, Belfast, BT15 4EB, UK

Grainne Crealey

Institute of Public Health, 200 South Circular Road, Dublin 8, D08 NH90, Ireland

Laura McQuade & Roger O’Sullivan

Centre for Public Health, Institute of Clinical Sciences, Royal Victoria Hospital, Belfast, BT12 6BA, UK

Ciaran O’Neill

You can also search for this author in PubMed   Google Scholar

Contributions

LMcQ and ROS were involved in defining the scope of work, refining the research question, provision of subject specific (public health) context, review of search strategy, review & editing of manuscript. CON and GC were involved in refining the research question and search strategy, provision of health economics and systematic reviewing expertise, review of returned reports, original draft preparation, review, editing and submission of manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ciaran O’Neill .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

: Table S1. Search strategy for electronic databases and grey literature.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Crealey, G., McQuade, L., O’Sullivan, R. et al. Arts and creativity interventions for improving health and wellbeing in older adults: a systematic literature review of economic evaluation studies. BMC Public Health 23 , 2496 (2023). https://doi.org/10.1186/s12889-023-17369-x

Download citation

Received : 23 April 2023

Accepted : 28 November 2023

Published : 13 December 2023

DOI : https://doi.org/10.1186/s12889-023-17369-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Economic evaluation
  • Older adults

BMC Public Health

ISSN: 1471-2458

is a systematic literature review empirical research

COMMENTS

  1. Guidance on Conducting a Systematic Literature Review

    This article is organized as follows: The next section presents the methodology adopted by this research, followed by a section that discusses the typology of literature reviews and provides empirical examples; the subsequent section summarizes the process of literature review; and the last section concludes the paper with suggestions on how to improve the quality and rigor of literature ...

  2. Introduction to systematic review and meta-analysis

    A systematic review attempts to gather all available empirical research by using clearly defined, systematic methods to obtain answers to a specific question. ... When performing a systematic literature review or meta-analysis, if the quality of studies is not properly evaluated or if proper methodology is not strictly applied, the results can ...

  3. Systematic Review

    A systematic review is a type of review that uses repeatable methods to find, select, and synthesize all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer. Example: Systematic review. In 2008, Dr. Robert Boyle and his colleagues published a systematic review in ...

  4. How-to conduct a systematic literature review: A quick guide for

    Method details Overview. A Systematic Literature Review (SLR) is a research methodology to collect, identify, and critically analyze the available research studies (e.g., articles, conference proceedings, books, dissertations) through a systematic procedure [12].An SLR updates the reader with current literature about a subject [6].The goal is to review critical points of current knowledge on a ...

  5. PDF Systematic Literature Reviews: an Introduction

    Systematic literature reviews (SRs) are a way of synthesising scientific evidence to answer a particular ... SRs treat the literature review process like a scientific process, and apply concepts of empirical research in order to make the review process more transparent and replicable and to reduce the possibility of bias. SRs have become a key ...

  6. Systematic reviews: Structure, form and content

    Topic selection and planning. In recent years, there has been an explosion in the number of systematic reviews conducted and published (Chalmers & Fox 2016, Fontelo & Liu 2018, Page et al 2015) - although a systematic review may be an inappropriate or unnecessary research methodology for answering many research questions.Systematic reviews can be inadvisable for a variety of reasons.

  7. Literature review as a research methodology: An ...

    A systematic review can be explained as a research method and process for identifying and critically appraising relevant research, as well as for collecting and analyzing data from said research (Liberati et al., 2009). The aim of a systematic review is to identify all empirical evidence that fits the pre-specified inclusion criteria to answer ...

  8. Distinguishing Between Integrative and Systematic Literature Reviews

    Systematic literature reviews are evidence-synthesizing, reproducible, and transparent literature, often referred to as the "gold standard" among literature reviews. 2 A systematic literature review aims to identify all empirical evidence focused on a research question in a specific context, with an explicit method to identify, appraise, select, and synthesize high-quality research ...

  9. Systematic Reviews & Evidence Synthesis Methods

    A systematic review gathers, assesses, and synthesizes all available empirical research on a specific question using a comprehensive search method with an aim to minimize bias. ... Any literature review is a type of evidence synthesis. For the various types of evidence syntheses/literature reviews, see the page on this guide Types of Reviews.

  10. (PDF) Systematic Literature Reviews: An Introduction

    Systematic literature reviews (SRs) are a way of synt hesising scientific evidence to answer a particular. research question in a way that is transparent and reproducible, while seeking to include ...

  11. Systematic Reviews and Meta-analysis: Understanding the Best Evidence

    A systematic review is a summary of the medical literature that uses explicit and reproducible methods to systematically search, critically appraise, and synthesize on a specific issue. ... The research question for the systematic reviews may be related to a major public health problem or a controversial clinical situation which requires ...

  12. 1.2.2 What is a systematic review?

    A systematic review attempts to collate all empirical evidence that fits pre-specified eligibility criteria in order to answer a specific research question. It uses explicit, systematic methods that are selected with a view to minimizing bias, thus providing more reliable findings from which conclusions can be drawn and decisions made (Antman ...

  13. What is a Systematic Review (SR)?

    Systematic Approaches to a Successful Literature Review by Andrew Booth; Anthea Sutton; Diana Papaioannou Showing you how to take a structured and organized approach to a wide range of literature review types, this book helps you to choose which approach is right for your research. Packed with constructive tools, examples, case studies and ...

  14. Writing the literature review for empirical papers

    Empirical paper s usually are structured in at. least five sections: (1) introduction, (2) literature review, (3) empirical methods, (4) data analysi s, discussion and. findings, and (5 ...

  15. A systematic literature review of empirical research on quality

    1. Start set I. We defined Start set I for our systematic literature review by using a systematic mapping study on empirical evidence for requirements engineering in general [] from 2018 and a systematic literature review from 2010 [] with similar research questions as in our paper.The systematic literature review from 2010 by Berntsson Svensson et al. includes 18 primary studies [].

  16. Are meta-analysis and systematic reviews theoretical or empirical research?

    Dear Francesco, I assume that you wish to publish a systematic review (performed by you) on a certain topic and possibly a meta-analysis of the evidence collected through this systematic review ...

  17. A Systematic Literature Review of Empirical Research on the Impacts of

    This systematic literature review examines 60 empirical studies on the impacts of e-Government published in the leading public administration and information systems journals. The impacts are classified using public value theory, first, by the role for whom value is generated and, second, by the nature of the impact.

  18. PDF A systematic literature review of empirical research on quality

    Berntsson Svensson et al. performed a systematic lit-erature review on empirical studies on managing quality requirements in 2010 [15]. They identified 18 primary studies. They classified 12 out of the 18 primary studies as case studies, three as experiments, two as surveys, and one as a mix of survey and experiment.

  19. A Systematic Literature Review of Empirical Research on Epistemic

    This article offers a comprehensive systematic review of ENA educational applications in empirical studies ( $\text{n}=76$ ) published between 2010 and 2021. We review the ENA methods that research has relied on, the use of educational theories, their method of application, comparisons across groups and the main findings.

  20. Writing Motivation in School: a Systematic Review of Empirical Research

    Motivation is a catalyst of writing performance in school. In this article, we report a systematic review of empirical studies on writing motivation conducted in school settings, published between 2000 and 2018 in peer-reviewed journals. We aimed to (1) examine how motivational constructs have been defined in writing research; (2) analyze group differences in writing motivation; (3) unveil ...

  21. An overview of methodological approaches in systematic reviews

    1. INTRODUCTION. Evidence synthesis is a prerequisite for knowledge translation. 1 A well conducted systematic review (SR), often in conjunction with meta‐analyses (MA) when appropriate, is considered the "gold standard" of methods for synthesizing evidence related to a topic of interest. 2 The central strength of an SR is the transparency of the methods used to systematically search ...

  22. Full article: Early childhood pre-service teachers' preparation for

    This systematic literature review examined how teacher education prepared early childhood pre-service teachers to utilise digital technology with children. ... "A Systematic Literature Review of Empirical Research on Technology Education in Early Childhood Education." International Journal of Technology & Design Education 33 : 793-818 ...

  23. Validity and reliability of outcome measures to assess dysfunctional

    Methods Studies on developing and evaluating measurement properties to assess DB were included. The study investigated the empirical research published between 1990 and February 2022, with an updated search in May 2023 in the Cochrane Library database of systematic reviews and the Cochrane Central Register of Controlled Trials, the Ovid Medline (full), the Ovid Excerta Medica Database, the ...

  24. Across the Great Divide: A Systematic Literature Review to Address the

    The use of a systematic literature review method complemented by a narrative analysis provided the tools to identify information scattered across different fields of study and analyze their content. Systematic reviews can be an effective tool for guiding transdisciplinary research which is required to achieve the objective of this study ...

  25. Business Simulation Games in Higher Education: A Systematic Review of

    There is a need to understand the current state of research and future research opportunities; however, there is a lack of recent systematic literature reviews in BSG literature. This study addresses this gap by systematically compiling online empirical research from January 2015 to April 2022.

  26. Mapping the Landscape: A Systematic Literature Review And ...

    Keywords: e-Government, e-Reporting, Public Reporting, Systematic Literature Review, bibliometric analysis, Scopus Database Suggested Citation: Suggested Citation Ahmed, Mahfooz, Mapping the Landscape: A Systematic Literature Review And Bibliometric Analysis of Public to Government E-Reporting Systems Research.

  27. Open government data: A systematic literature review of empirical research

    Open government data (OGD) holds great potential for firms and the digital economy as a whole and has attracted increasing interest in research and practice in recent years. Governments and ...

  28. Correlates of teachers' classroom management self-efficacy: A

    This meta-analysis examined literature from the last two decades to identify factors that correlate with teachers' classroom management self-efficacy (CMSE) and to estimate the effect size of these relationships. Online and reference list searches from international and Chinese databases yielded 1085 unique results. However, with a focus on empirical research the final sample consisted of 87 ...

  29. Arts and creativity interventions for improving health and wellbeing in

    Background As the population ages, older people account for a larger proportion of the health and social care budget. A significant body of evidence suggests that arts and creativity interventions can improve the physical, mental and social wellbeing of older adults, however the value and/or cost-effectiveness of such interventions remains unclear. Methods We systematically reviewed the ...