• Library databases
  • Library website

Evidence-Based Research: Levels of Evidence Pyramid

Introduction.

One way to organize the different types of evidence involved in evidence-based practice research is the levels of evidence pyramid. The pyramid includes a variety of evidence types and levels.

  • systematic reviews
  • critically-appraised topics
  • critically-appraised individual articles
  • randomized controlled trials
  • cohort studies
  • case-controlled studies, case series, and case reports
  • Background information, expert opinion

Levels of evidence pyramid

The levels of evidence pyramid provides a way to visualize both the quality of evidence and the amount of evidence available. For example, systematic reviews are at the top of the pyramid, meaning they are both the highest level of evidence and the least common. As you go down the pyramid, the amount of evidence will increase as the quality of the evidence decreases.

Levels of Evidence Pyramid

Text alternative for Levels of Evidence Pyramid diagram

EBM Pyramid and EBM Page Generator, copyright 2006 Trustees of Dartmouth College and Yale University. All Rights Reserved. Produced by Jan Glover, David Izzo, Karen Odato and Lei Wang.

Filtered Resources

Filtered resources appraise the quality of studies and often make recommendations for practice. The main types of filtered resources in evidence-based practice are:

Scroll down the page to the Systematic reviews , Critically-appraised topics , and Critically-appraised individual articles sections for links to resources where you can find each of these types of filtered information.

Systematic reviews

Authors of a systematic review ask a specific clinical question, perform a comprehensive literature review, eliminate the poorly done studies, and attempt to make practice recommendations based on the well-done studies. Systematic reviews include only experimental, or quantitative, studies, and often include only randomized controlled trials.

You can find systematic reviews in these filtered databases :

  • Cochrane Database of Systematic Reviews Cochrane systematic reviews are considered the gold standard for systematic reviews. This database contains both systematic reviews and review protocols. To find only systematic reviews, select Cochrane Reviews in the Document Type box.
  • JBI EBP Database (formerly Joanna Briggs Institute EBP Database) This database includes systematic reviews, evidence summaries, and best practice information sheets. To find only systematic reviews, click on Limits and then select Systematic Reviews in the Publication Types box. To see how to use the limit and find full text, please see our Joanna Briggs Institute Search Help page .

Open Access databases provide unrestricted access to and use of peer-reviewed and non peer-reviewed journal articles, books, dissertations, and more.

You can also find systematic reviews in this unfiltered database :

Some journals are peer reviewed

To learn more about finding systematic reviews, please see our guide:

  • Filtered Resources: Systematic Reviews

Critically-appraised topics

Authors of critically-appraised topics evaluate and synthesize multiple research studies. Critically-appraised topics are like short systematic reviews focused on a particular topic.

You can find critically-appraised topics in these resources:

  • Annual Reviews This collection offers comprehensive, timely collections of critical reviews written by leading scientists. To find reviews on your topic, use the search box in the upper-right corner.
  • Guideline Central This free database offers quick-reference guideline summaries organized by a new non-profit initiative which will aim to fill the gap left by the sudden closure of AHRQ’s National Guideline Clearinghouse (NGC).
  • JBI EBP Database (formerly Joanna Briggs Institute EBP Database) To find critically-appraised topics in JBI, click on Limits and then select Evidence Summaries from the Publication Types box. To see how to use the limit and find full text, please see our Joanna Briggs Institute Search Help page .
  • National Institute for Health and Care Excellence (NICE) Evidence-based recommendations for health and care in England.
  • Filtered Resources: Critically-Appraised Topics

Critically-appraised individual articles

Authors of critically-appraised individual articles evaluate and synopsize individual research studies.

You can find critically-appraised individual articles in these resources:

  • EvidenceAlerts Quality articles from over 120 clinical journals are selected by research staff and then rated for clinical relevance and interest by an international group of physicians. Note: You must create a free account to search EvidenceAlerts.
  • ACP Journal Club This journal publishes reviews of research on the care of adults and adolescents. You can either browse this journal or use the Search within this publication feature.
  • Evidence-Based Nursing This journal reviews research studies that are relevant to best nursing practice. You can either browse individual issues or use the search box in the upper-right corner.

To learn more about finding critically-appraised individual articles, please see our guide:

  • Filtered Resources: Critically-Appraised Individual Articles

Unfiltered resources

You may not always be able to find information on your topic in the filtered literature. When this happens, you'll need to search the primary or unfiltered literature. Keep in mind that with unfiltered resources, you take on the role of reviewing what you find to make sure it is valid and reliable.

Note: You can also find systematic reviews and other filtered resources in these unfiltered databases.

The Levels of Evidence Pyramid includes unfiltered study types in this order of evidence from higher to lower:

You can search for each of these types of evidence in the following databases:

TRIP database

Background information & expert opinion.

Background information and expert opinions are not necessarily backed by research studies. They include point-of-care resources, textbooks, conference proceedings, etc.

  • Family Physicians Inquiries Network: Clinical Inquiries Provide the ideal answers to clinical questions using a structured search, critical appraisal, authoritative recommendations, clinical perspective, and rigorous peer review. Clinical Inquiries deliver best evidence for point-of-care use.
  • Harrison, T. R., & Fauci, A. S. (2009). Harrison's Manual of Medicine . New York: McGraw-Hill Professional. Contains the clinical portions of Harrison's Principles of Internal Medicine .
  • Lippincott manual of nursing practice (8th ed.). (2006). Philadelphia, PA: Lippincott Williams & Wilkins. Provides background information on clinical nursing practice.
  • Medscape: Drugs & Diseases An open-access, point-of-care medical reference that includes clinical information from top physicians and pharmacists in the United States and worldwide.
  • Virginia Henderson Global Nursing e-Repository An open-access repository that contains works by nurses and is sponsored by Sigma Theta Tau International, the Honor Society of Nursing. Note: This resource contains both expert opinion and evidence-based practice articles.
  • Previous Page: Phrasing Research Questions
  • Next Page: Evidence Types
  • Office of Student Disability Services

Walden Resources

Departments.

  • Academic Residencies
  • Academic Skills
  • Career Planning and Development
  • Customer Care Team
  • Field Experience
  • Military Services
  • Student Success Advising
  • Writing Skills

Centers and Offices

  • Center for Social Change
  • Office of Academic Support and Instructional Services
  • Office of Degree Acceleration
  • Office of Research and Doctoral Services
  • Office of Student Affairs

Student Resources

  • Doctoral Writing Assessment
  • Form & Style Review
  • Quick Answers
  • ScholarWorks
  • SKIL Courses and Workshops
  • Walden Bookstore
  • Walden Catalog & Student Handbook
  • Student Safety/Title IX
  • Legal & Consumer Information
  • Website Terms and Conditions
  • Cookie Policy
  • Accessibility
  • Accreditation
  • State Authorization
  • Net Price Calculator
  • Contact Walden

Walden University is a member of Adtalem Global Education, Inc. www.adtalem.com Walden University is certified to operate by SCHEV © 2024 Walden University LLC. All rights reserved.

Systematic Reviews

  • Levels of Evidence
  • Evidence Pyramid
  • Joanna Briggs Institute

The evidence pyramid is often used to illustrate the development of evidence. At the base of the pyramid is animal research and laboratory studies – this is where ideas are first developed. As you progress up the pyramid the amount of information available decreases in volume, but increases in relevance to the clinical setting.

Meta Analysis  – systematic review that uses quantitative methods to synthesize and summarize the results.

Systematic Review  – summary of the medical literature that uses explicit methods to perform a comprehensive literature search and critical appraisal of individual studies and that uses appropriate st atistical techniques to combine these valid studies.

Randomized Controlled Trial – Participants are randomly allocated into an experimental group or a control group and followed over time for the variables/outcomes of interest.

Cohort Study – Involves identification of two groups (cohorts) of patients, one which received the exposure of interest, and one which did not, and following these cohorts forward for the outcome of interest.

Case Control Study – study which involves identifying patients who have the outcome of interest (cases) and patients without the same outcome (controls), and looking back to see if they had the exposure of interest.

Case Series   – report on a series of patients with an outcome of interest. No control group is involved.

  • Levels of Evidence from The Centre for Evidence-Based Medicine
  • The JBI Model of Evidence Based Healthcare
  • How to Use the Evidence: Assessment and Application of Scientific Evidence From the National Health and Medical Research Council (NHMRC) of Australia. Book must be downloaded; not available to read online.

When searching for evidence to answer clinical questions, aim to identify the highest level of available evidence. Evidence hierarchies can help you strategically identify which resources to use for finding evidence, as well as which search results are most likely to be "best".                                             

Hierarchy of Evidence. For a text-based version, see text below image.

Image source: Evidence-Based Practice: Study Design from Duke University Medical Center Library & Archives. This work is licensed under a Creativ e Commons Attribution-ShareAlike 4.0 International License .

The hierarchy of evidence (also known as the evidence-based pyramid) is depicted as a triangular representation of the levels of evidence with the strongest evidence at the top which progresses down through evidence with decreasing strength. At the top of the pyramid are research syntheses, such as Meta-Analyses and Systematic Reviews, the strongest forms of evidence. Below research syntheses are primary research studies progressing from experimental studies, such as Randomized Controlled Trials, to observational studies, such as Cohort Studies, Case-Control Studies, Cross-Sectional Studies, Case Series, and Case Reports. Non-Human Animal Studies and Laboratory Studies occupy the lowest level of evidence at the base of the pyramid.

  • Finding Evidence-Based Answers to Clinical Questions – Quickly & Effectively A tip sheet from the health sciences librarians at UC Davis Libraries to help you get started with selecting resources for finding evidence, based on type of question.
  • << Previous: What is a Systematic Review?
  • Next: Locating Systematic Reviews >>
  • Getting Started
  • What is a Systematic Review?
  • Locating Systematic Reviews
  • Searching Systematically
  • Developing Answerable Questions
  • Identifying Synonyms & Related Terms
  • Using Truncation and Wildcards
  • Identifying Search Limits/Exclusion Criteria
  • Keyword vs. Subject Searching
  • Where to Search
  • Search Filters
  • Sensitivity vs. Precision
  • Core Databases
  • Other Databases
  • Clinical Trial Registries
  • Conference Presentations
  • Databases Indexing Grey Literature
  • Web Searching
  • Handsearching
  • Citation Indexes
  • Documenting the Search Process
  • Managing your Review

Research Support

  • Last Updated: May 1, 2024 4:09 PM
  • URL: https://guides.library.ucdavis.edu/systematic-reviews

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Perspective
  • Published: 12 October 2020

Eight problems with literature reviews and how to fix them

  • Neal R. Haddaway   ORCID: orcid.org/0000-0003-3902-2234 1 , 2 , 3 ,
  • Alison Bethel 4 ,
  • Lynn V. Dicks 5 , 6 ,
  • Julia Koricheva   ORCID: orcid.org/0000-0002-9033-0171 7 ,
  • Biljana Macura   ORCID: orcid.org/0000-0002-4253-1390 2 ,
  • Gillian Petrokofsky 8 ,
  • Andrew S. Pullin 9 ,
  • Sini Savilaakso   ORCID: orcid.org/0000-0002-8514-8105 10 , 11 &
  • Gavin B. Stewart   ORCID: orcid.org/0000-0001-5684-1544 12  

Nature Ecology & Evolution volume  4 ,  pages 1582–1589 ( 2020 ) Cite this article

12k Accesses

88 Citations

388 Altmetric

Metrics details

  • Conservation biology
  • Environmental impact

An Author Correction to this article was published on 19 October 2020

This article has been updated

Traditional approaches to reviewing literature may be susceptible to bias and result in incorrect decisions. This is of particular concern when reviews address policy- and practice-relevant questions. Systematic reviews have been introduced as a more rigorous approach to synthesizing evidence across studies; they rely on a suite of evidence-based methods aimed at maximizing rigour and minimizing susceptibility to bias. Despite the increasing popularity of systematic reviews in the environmental field, evidence synthesis methods continue to be poorly applied in practice, resulting in the publication of syntheses that are highly susceptible to bias. Recognizing the constraints that researchers can sometimes feel when attempting to plan, conduct and publish rigorous and comprehensive evidence syntheses, we aim here to identify major pitfalls in the conduct and reporting of systematic reviews, making use of recent examples from across the field. Adopting a ‘critical friend’ role in supporting would-be systematic reviews and avoiding individual responses to police use of the ‘systematic review’ label, we go on to identify methodological solutions to mitigate these pitfalls. We then highlight existing support available to avoid these issues and call on the entire community, including systematic review specialists, to work towards better evidence syntheses for better evidence and better decisions.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 12 digital issues and online access to articles

111,21 € per year

only 9,27 € per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

literature review scientific evidence

Similar content being viewed by others

literature review scientific evidence

Challenges and recommendations on the conduct of systematic reviews of observational epidemiologic studies in environmental and occupational health

literature review scientific evidence

Insights from a cross-sector review on how to conceptualise the quality of use of research evidence

literature review scientific evidence

The past, present and future of Registered Reports

Change history, 19 october 2020.

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

Grant, M. J. & Booth, A. A typology of reviews: an analysis of 14 review types and associated methodologies. Health Info Libr. J. 26 , 91–108 (2009).

PubMed   Google Scholar  

Haddaway, N. R. & Macura, B. The role of reporting standards in producing robust literature reviews. Nat. Clim. Change 8 , 444–447 (2018).

Google Scholar  

Pullin, A. S. & Knight, T. M. Science informing policy–a health warning for the environment. Environ. Evid. 1 , 15 (2012).

Haddaway, N., Woodcock, P., Macura, B. & Collins, A. Making literature reviews more reliable through application of lessons from systematic reviews. Conserv. Biol. 29 , 1596–1605 (2015).

CAS   PubMed   Google Scholar  

Pullin, A., Frampton, G., Livoreil, B. & Petrokofsky, G. Guidelines and Standards for Evidence Synthesis in Environmental Management (Collaboration for Environmental Evidence, 2018).

White, H. The twenty-first century experimenting society: the four waves of the evidence revolution. Palgrave Commun. 5 , 47 (2019).

O’Leary, B. C. et al. The reliability of evidence review methodology in environmental science and conservation. Environ. Sci. Policy 64 , 75–82 (2016).

Woodcock, P., Pullin, A. S. & Kaiser, M. J. Evaluating and improving the reliability of evidence syntheses in conservation and environmental science: a methodology. Biol. Conserv. 176 , 54–62 (2014).

Campbell Systematic Reviews: Policies and Guidelines (Campbell Collaboration, 2014).

Higgins, J. P. et al. Cochrane Handbook for Systematic Reviews of Interventions (John Wiley & Sons, 2019).

Shea, B. J. et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ 358 , j4008 (2017).

PubMed   PubMed Central   Google Scholar  

Haddaway, N. R., Land, M. & Macura, B. “A little learning is a dangerous thing”: a call for better understanding of the term ‘systematic review’. Environ. Int. 99 , 356–360 (2017).

Freeman, R. E. Strategic Management: A Stakeholder Approach (Cambridge Univ. Press, 2010).

Haddaway, N. R. et al. A framework for stakeholder engagement during systematic reviews and maps in environmental management. Environ. Evid. 6 , 11 (2017).

Land, M., Macura, B., Bernes, C. & Johansson, S. A five-step approach for stakeholder engagement in prioritisation and planning of environmental evidence syntheses. Environ. Evid. 6 , 25 (2017).

Oliver, S. & Dickson, K. Policy-relevant systematic reviews to strengthen health systems: models and mechanisms to support their production. Evid. Policy 12 , 235–259 (2016).

Savilaakso, S. et al. Systematic review of effects on biodiversity from oil palm production. Environ. Evid. 3 , 4 (2014).

Savilaakso, S., Laumonier, Y., Guariguata, M. R. & Nasi, R. Does production of oil palm, soybean, or jatropha change biodiversity and ecosystem functions in tropical forests. Environ. Evid. 2 , 17 (2013).

Haddaway, N. R. & Crowe, S. Experiences and lessons in stakeholder engagement in environmental evidence synthesis: a truly special series. Environ. Evid. 7 , 11 (2018).

Sánchez-Bayo, F. & Wyckhuys, K. A. Worldwide decline of the entomofauna: a review of its drivers. Biol. Conserv. 232 , 8–27 (2019).

Agarwala, M. & Ginsberg, J. R. Untangling outcomes of de jure and de facto community-based management of natural resources. Conserv. Biol. 31 , 1232–1246 (2017).

Gurevitch, J., Curtis, P. S. & Jones, M. H. Meta-analysis in ecology. Adv. Ecol. Res. 32 , 199–247 (2001).

CAS   Google Scholar  

Haddaway, N. R., Macura, B., Whaley, P. & Pullin, A. S. ROSES RepOrting standards for Systematic Evidence Syntheses: pro forma, flow-diagram and descriptive summary of the plan and conduct of environmental systematic reviews and systematic maps. Environ. Evid. 7 , 7 (2018).

Lwasa, S. et al. A meta-analysis of urban and peri-urban agriculture and forestry in mediating climate change. Curr. Opin. Environ. Sustain. 13 , 68–73 (2015).

Pacifici, M. et al. Species’ traits influenced their response to recent climate change. Nat. Clim. Change 7 , 205–208 (2017).

Owen-Smith, N. Ramifying effects of the risk of predation on African multi-predator, multi-prey large-mammal assemblages and the conservation implications. Biol. Conserv. 232 , 51–58 (2019).

Prugh, L. R. et al. Designing studies of predation risk for improved inference in carnivore-ungulate systems. Biol. Conserv. 232 , 194–207 (2019).

Li, Y. et al. Effects of biochar application in forest ecosystems on soil properties and greenhouse gas emissions: a review. J. Soil Sediment. 18 , 546–563 (2018).

Moher, D., Liberati, A., Tetzlaff, J. & Altman, D. G., The PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 6 , e1000097 (2009).

Bernes, C. et al. What is the influence of a reduction of planktivorous and benthivorous fish on water quality in temperate eutrophic lakes? A systematic review. Environ. Evid. 4 , 7 (2015).

McDonagh, M., Peterson, K., Raina, P., Chang, S. & Shekelle, P. Avoiding bias in selecting studies. Methods Guide for Effectiveness and Comparative Effectiveness Reviews [Internet] (Agency for Healthcare Research and Quality, 2013).

Burivalova, Z., Hua, F., Koh, L. P., Garcia, C. & Putz, F. A critical comparison of conventional, certified, and community management of tropical forests for timber in terms of environmental, economic, and social variables. Conserv. Lett. 10 , 4–14 (2017).

Min-Venditti, A. A., Moore, G. W. & Fleischman, F. What policies improve forest cover? A systematic review of research from Mesoamerica. Glob. Environ. Change 47 , 21–27 (2017).

Bramer, W. M., Giustini, D. & Kramer, B. M. R. Comparing the coverage, recall, and precision of searches for 120 systematic reviews in Embase, MEDLINE, and Google Scholar: a prospective study. Syst. Rev. 5 , 39 (2016).

Bramer, W. M., Giustini, D., Kramer, B. M. R. & Anderson, P. F. The comparative recall of Google Scholar versus PubMed in identical searches for biomedical systematic reviews: a review of searches used in systematic reviews. Syst. Rev. 2 , 115 (2013).

Gusenbauer, M. & Haddaway, N. R. Which academic search systems are suitable for systematic reviews or meta‐analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11 , 181–217 (2020).

Livoreil, B. et al. Systematic searching for environmental evidence using multiple tools and sources. Environ. Evid. 6 , 23 (2017).

Mlinarić, A., Horvat, M. & Šupak Smolčić, V. Dealing with the positive publication bias: why you should really publish your negative results. Biochem. Med. 27 , 447–452 (2017).

Lin, L. & Chu, H. Quantifying publication bias in meta‐analysis. Biometrics 74 , 785–794 (2018).

Haddaway, N. R. & Bayliss, H. R. Shades of grey: two forms of grey literature important for reviews in conservation. Biol. Conserv. 191 , 827–829 (2015).

Viechtbauer, W. Conducting meta-analyses in R with the metafor package. J. Stat. Softw. 36 , 1–48 (2010).

Bilotta, G. S., Milner, A. M. & Boyd, I. On the use of systematic reviews to inform environmental policies. Environ. Sci. Policy 42 , 67–77 (2014).

Englund, G., Sarnelle, O. & Cooper, S. D. The importance of data‐selection criteria: meta‐analyses of stream predation experiments. Ecology 80 , 1132–1141 (1999).

Burivalova, Z., Şekercioğlu, Ç. H. & Koh, L. P. Thresholds of logging intensity to maintain tropical forest biodiversity. Curr. Biol. 24 , 1893–1898 (2014).

Bicknell, J. E., Struebig, M. J., Edwards, D. P. & Davies, Z. G. Improved timber harvest techniques maintain biodiversity in tropical forests. Curr. Biol. 24 , R1119–R1120 (2014).

Damette, O. & Delacote, P. Unsustainable timber harvesting, deforestation and the role of certification. Ecol. Econ. 70 , 1211–1219 (2011).

Blomley, T. et al. Seeing the wood for the trees: an assessment of the impact of participatory forest management on forest condition in Tanzania. Oryx 42 , 380–391 (2008).

Haddaway, N. R. et al. How does tillage intensity affect soil organic carbon? A systematic review. Environ. Evid. 6 , 30 (2017).

Higgins, J. P. et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ 343 , d5928 (2011).

Stewart, G. Meta-analysis in applied ecology. Biol. Lett. 6 , 78–81 (2010).

Koricheva, J. & Gurevitch, J. Uses and misuses of meta‐analysis in plant ecology. J. Ecol. 102 , 828–844 (2014).

Vetter, D., Ruecker, G. & Storch, I. Meta‐analysis: a need for well‐defined usage in ecology and conservation biology. Ecosphere 4 , 1–24 (2013).

Stewart, G. B. & Schmid, C. H. Lessons from meta-analysis in ecology and evolution: the need for trans-disciplinary evidence synthesis methodologies. Res. Synth. Methods 6 , 109–110 (2015).

Macura, B. et al. Systematic reviews of qualitative evidence for environmental policy and management: an overview of different methodological options. Environ. Evid. 8 , 24 (2019).

Koricheva, J. & Gurevitch, J. in Handbook of Meta-analysis in Ecology and Evolution (eds Koricheva, J. et al.) Ch. 1 (Princeton Scholarship Online, 2013).

Britt, M., Haworth, S. E., Johnson, J. B., Martchenko, D. & Shafer, A. B. The importance of non-academic coauthors in bridging the conservation genetics gap. Biol. Conserv. 218 , 118–123 (2018).

Graham, L., Gaulton, R., Gerard, F. & Staley, J. T. The influence of hedgerow structural condition on wildlife habitat provision in farmed landscapes. Biol. Conserv. 220 , 122–131 (2018).

Delaquis, E., de Haan, S. & Wyckhuys, K. A. On-farm diversity offsets environmental pressures in tropical agro-ecosystems: a synthetic review for cassava-based systems. Agric. Ecosyst. Environ. 251 , 226–235 (2018).

Popay, J. et al. Guidance on the Conduct of Narrative Synthesis in Systematic Reviews: A Product from the ESRC Methods Programme Version 1 (Lancaster Univ., 2006).

Pullin, A. S. et al. Human well-being impacts of terrestrial protected areas. Environ. Evid. 2 , 19 (2013).

Waffenschmidt, S., Knelangen, M., Sieben, W., Bühn, S. & Pieper, D. Single screening versus conventional double screening for study selection in systematic reviews: a methodological systematic review. BMC Med. Res. Methodol. 19 , 132 (2019).

Rallo, A. & García-Arberas, L. Differences in abiotic water conditions between fluvial reaches and crayfish fauna in some northern rivers of the Iberian Peninsula. Aquat. Living Resour. 15 , 119–128 (2002).

Glasziou, P. & Chalmers, I. Research waste is still a scandal—an essay by Paul Glasziou and Iain Chalmers. BMJ 363 , k4645 (2018).

Haddaway, N. R. Open Synthesis: on the need for evidence synthesis to embrace Open Science. Environ. Evid. 7 , 26 (2018).

Download references

Acknowledgements

We thank C. Shortall from Rothamstead Research for useful discussions on the topic.

Author information

Authors and affiliations.

Mercator Research Institute on Climate Change and Global Commons, Berlin, Germany

Neal R. Haddaway

Stockholm Environment Institute, Stockholm, Sweden

Neal R. Haddaway & Biljana Macura

Africa Centre for Evidence, University of Johannesburg, Johannesburg, South Africa

College of Medicine and Health, Exeter University, Exeter, UK

Alison Bethel

Department of Zoology, University of Cambridge, Cambridge, UK

Lynn V. Dicks

School of Biological Sciences, University of East Anglia, Norwich, UK

Department of Biological Sciences, Royal Holloway University of London, Egham, UK

Julia Koricheva

Department of Zoology, University of Oxford, Oxford, UK

Gillian Petrokofsky

Collaboration for Environmental Evidence, UK Centre, School of Natural Sciences, Bangor University, Bangor, UK

Andrew S. Pullin

Liljus ltd, London, UK

Sini Savilaakso

Department of Forest Sciences, University of Helsinki, Helsinki, Finland

Evidence Synthesis Lab, School of Natural and Environmental Sciences, University of Newcastle, Newcastle-upon-Tyne, UK

Gavin B. Stewart

You can also search for this author in PubMed   Google Scholar

Contributions

N.R.H. developed the manuscript idea and a first draft. All authors contributed to examples and edited the text. All authors have read and approve of the final submission.

Corresponding author

Correspondence to Neal R. Haddaway .

Ethics declarations

Competing interests.

S.S. is a co-founder of Liljus ltd, a firm that provides research services in sustainable finance as well as forest conservation and management. The other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary table.

Examples of literature reviews and common problems identified.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Haddaway, N.R., Bethel, A., Dicks, L.V. et al. Eight problems with literature reviews and how to fix them. Nat Ecol Evol 4 , 1582–1589 (2020). https://doi.org/10.1038/s41559-020-01295-x

Download citation

Received : 24 March 2020

Accepted : 31 July 2020

Published : 12 October 2020

Issue Date : December 2020

DOI : https://doi.org/10.1038/s41559-020-01295-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Facing the storm: developing corporate adaptation and resilience action plans amid climate uncertainty.

  • Katharina Hennes
  • David Bendig
  • Andreas Löschel

npj Climate Action (2024)

A review of the necessity of a multi-layer land-use planning

  • Hashem Dadashpoor
  • Leyla Ghasempour

Landscape and Ecological Engineering (2024)

Synthesizing the relationships between environmental DNA concentration and freshwater macrophyte abundance: a systematic review and meta-analysis

  • Toshiaki S. Jo

Hydrobiologia (2024)

A Systematic Review of the Effects of Multi-purpose Forest Management Practices on the Breeding Success of Forest Birds

  • João M. Cordeiro Pereira
  • Grzegorz Mikusiński
  • Ilse Storch

Current Forestry Reports (2024)

Parasitism in viviparous vertebrates: an overview

  • Juan J. Palacios-Marquez
  • Palestina Guevara-Fiore

Parasitology Research (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

literature review scientific evidence

  • Open access
  • Published: 17 August 2023

Data visualisation in scoping reviews and evidence maps on health topics: a cross-sectional analysis

  • Emily South   ORCID: orcid.org/0000-0003-2187-4762 1 &
  • Mark Rodgers 1  

Systematic Reviews volume  12 , Article number:  142 ( 2023 ) Cite this article

3636 Accesses

13 Altmetric

Metrics details

Scoping reviews and evidence maps are forms of evidence synthesis that aim to map the available literature on a topic and are well-suited to visual presentation of results. A range of data visualisation methods and interactive data visualisation tools exist that may make scoping reviews more useful to knowledge users. The aim of this study was to explore the use of data visualisation in a sample of recent scoping reviews and evidence maps on health topics, with a particular focus on interactive data visualisation.

Ovid MEDLINE ALL was searched for recent scoping reviews and evidence maps (June 2020-May 2021), and a sample of 300 papers that met basic selection criteria was taken. Data were extracted on the aim of each review and the use of data visualisation, including types of data visualisation used, variables presented and the use of interactivity. Descriptive data analysis was undertaken of the 238 reviews that aimed to map evidence.

Of the 238 scoping reviews or evidence maps in our analysis, around one-third (37.8%) included some form of data visualisation. Thirty-five different types of data visualisation were used across this sample, although most data visualisations identified were simple bar charts (standard, stacked or multi-set), pie charts or cross-tabulations (60.8%). Most data visualisations presented a single variable (64.4%) or two variables (26.1%). Almost a third of the reviews that used data visualisation did not use any colour (28.9%). Only two reviews presented interactive data visualisation, and few reported the software used to create visualisations.

Conclusions

Data visualisation is currently underused by scoping review authors. In particular, there is potential for much greater use of more innovative forms of data visualisation and interactive data visualisation. Where more innovative data visualisation is used, scoping reviews have made use of a wide range of different methods. Increased use of these more engaging visualisations may make scoping reviews more useful for a range of stakeholders.

Peer Review reports

Scoping reviews are “a type of evidence synthesis that aims to systematically identify and map the breadth of evidence available on a particular topic, field, concept, or issue” ([ 1 ], p. 950). While they include some of the same steps as a systematic review, such as systematic searches and the use of predetermined eligibility criteria, scoping reviews often address broader research questions and do not typically involve the quality appraisal of studies or synthesis of data [ 2 ]. Reasons for conducting a scoping review include the following: to map types of evidence available, to explore research design and conduct, to clarify concepts or definitions and to map characteristics or factors related to a concept [ 3 ]. Scoping reviews can also be undertaken to inform a future systematic review (e.g. to assure authors there will be adequate studies) or to identify knowledge gaps [ 3 ]. Other evidence synthesis approaches with similar aims have been described as evidence maps, mapping reviews or systematic maps [ 4 ]. While this terminology is used inconsistently, evidence maps can be used to identify evidence gaps and present them in a user-friendly (and often visual) way [ 5 ].

Scoping reviews are often targeted to an audience of healthcare professionals or policy-makers [ 6 ], suggesting that it is important to present results in a user-friendly and informative way. Until recently, there was little guidance on how to present the findings of scoping reviews. In recent literature, there has been some discussion of the importance of clearly presenting data for the intended audience of a scoping review, with creative and innovative use of visual methods if appropriate [ 7 , 8 , 9 ]. Lockwood et al. suggest that innovative visual presentation should be considered over dense sections of text or long tables in many cases [ 8 ]. Khalil et al. suggest that inspiration could be drawn from the field of data visualisation [ 7 ]. JBI guidance on scoping reviews recommends that reviewers carefully consider the best format for presenting data at the protocol development stage and provides a number of examples of possible methods [ 10 ].

Interactive resources are another option for presentation in scoping reviews [ 9 ]. Researchers without the relevant programming skills can now use several online platforms (such as Tableau [ 11 ] and Flourish [ 12 ]) to create interactive data visualisations. The benefits of using interactive visualisation in research include the ability to easily present more than two variables [ 13 ] and increased engagement of users [ 14 ]. Unlike static graphs, interactive visualisations can allow users to view hierarchical data at different levels, exploring both the “big picture” and looking in more detail ([ 15 ], p. 291). Interactive visualizations are often targeted at practitioners and decision-makers [ 13 ], and there is some evidence from qualitative research that they are valued by policy-makers [ 16 , 17 , 18 ].

Given their focus on mapping evidence, we believe that scoping reviews are particularly well-suited to visually presenting data and the use of interactive data visualisation tools. However, it is unknown how many recent scoping reviews visually map data or which types of data visualisation are used. The aim of this study was to explore the use of data visualisation methods in a large sample of recent scoping reviews and evidence maps on health topics. In particular, we were interested in the extent to which these forms of synthesis use any form of interactive data visualisation.

This study was a cross-sectional analysis of studies labelled as scoping reviews or evidence maps (or synonyms of these terms) in the title or abstract.

The search strategy was developed with help from an information specialist. Ovid MEDLINE® ALL was searched in June 2021 for studies added to the database in the previous 12 months. The search was limited to English language studies only.

The search strategy was as follows:

Ovid MEDLINE(R) ALL

(scoping review or evidence map or systematic map or mapping review or scoping study or scoping project or scoping exercise or literature mapping or evidence mapping or systematic mapping or literature scoping or evidence gap map).ab,ti.

limit 1 to english language

(202006* or 202007* or 202008* or 202009* or 202010* or 202011* or 202012* or 202101* or 202102* or 202103* or 202104* or 202105*).dt.

The search returned 3686 records. Records were de-duplicated in EndNote 20 software, leaving 3627 unique records.

A sample of these reviews was taken by screening the search results against basic selection criteria (Table 1 ). These criteria were piloted and refined after discussion between the two researchers. A single researcher (E.S.) screened the records in EPPI-Reviewer Web software using the machine-learning priority screening function. Where a second opinion was needed, decisions were checked by a second researcher (M.R.).

Our initial plan for sampling, informed by pilot searching, was to screen and data extract records in batches of 50 included reviews at a time. We planned to stop screening when a batch of 50 reviews had been extracted that included no new types of data visualisation or after screening time had reached 2 days. However, once data extraction was underway, we found the sample to be richer in terms of data visualisation than anticipated. After the inclusion of 300 reviews, we took the decision to end screening in order to ensure the study was manageable.

Data extraction

A data extraction form was developed in EPPI-Reviewer Web, piloted on 50 reviews and refined. Data were extracted by one researcher (E. S. or M. R.), with a second researcher (M. R. or E. S.) providing a second opinion when needed. The data items extracted were as follows: type of review (term used by authors), aim of review (mapping evidence vs. answering specific question vs. borderline), number of visualisations (if any), types of data visualisation used, variables/domains presented by each visualisation type, interactivity, use of colour and any software requirements.

When categorising review aims, we considered “mapping evidence” to incorporate all of the six purposes for conducting a scoping review proposed by Munn et al. [ 3 ]. Reviews were categorised as “answering a specific question” if they aimed to synthesise study findings to answer a particular question, for example on effectiveness of an intervention. We were inclusive with our definition of “mapping evidence” and included reviews with mixed aims in this category. However, some reviews were difficult to categorise (for example where aims were unclear or the stated aims did not match the actual focus of the paper) and were considered to be “borderline”. It became clear that a proportion of identified records that described themselves as “scoping” or “mapping” reviews were in fact pseudo-systematic reviews that failed to undertake key systematic review processes. Such reviews attempted to integrate the findings of included studies rather than map the evidence, and so reviews categorised as “answering a specific question” were excluded from the main analysis. Data visualisation methods for meta-analyses have been explored previously [ 19 ]. Figure  1 shows the flow of records from search results to final analysis sample.

figure 1

Flow diagram of the sampling process

Data visualisation was defined as any graph or diagram that presented results data, including tables with a visual mapping element, such as cross-tabulations and heat maps. However, tables which displayed data at a study level (e.g. tables summarising key characteristics of each included study) were not included, even if they used symbols, shading or colour. Flow diagrams showing the study selection process were also excluded. Data visualisations in appendices or supplementary information were included, as well as any in publicly available dissemination products (e.g. visualisations hosted online) if mentioned in papers.

The typology used to categorise data visualisation methods was based on an existing online catalogue [ 20 ]. Specific types of data visualisation were categorised in five broad categories: graphs, diagrams, tables, maps/geographical and other. If a data visualisation appeared in our sample that did not feature in the original catalogue, we checked a second online catalogue [ 21 ] for an appropriate term, followed by wider Internet searches. These additional visualisation methods were added to the appropriate section of the typology. The final typology can be found in Additional file 1 .

We conducted descriptive data analysis in Microsoft Excel 2019 and present frequencies and percentages. Where appropriate, data are presented using graphs or other data visualisations created using Flourish. We also link to interactive versions of some of these visualisations.

Almost all of the 300 reviews in the total sample were labelled by review authors as “scoping reviews” ( n  = 293, 97.7%). There were also four “mapping reviews”, one “scoping study”, one “evidence mapping” and one that was described as a “scoping review and evidence map”. Included reviews were all published in 2020 or 2021, with the exception of one review published in 2018. Just over one-third of these reviews ( n  = 105, 35.0%) included some form of data visualisation. However, we excluded 62 reviews that did not focus on mapping evidence from the following analysis (see “ Methods ” section). Of the 238 remaining reviews (that either clearly aimed to map evidence or were judged to be “borderline”), 90 reviews (37.8%) included at least one data visualisation. The references for these reviews can be found in Additional file 2 .

Number of visualisations

Thirty-six (40.0%) of these 90 reviews included just one example of data visualisation (Fig.  2 ). Less than a third ( n  = 28, 31.1%) included three or more visualisations. The greatest number of data visualisations in one review was 17 (all bar or pie charts). In total, 222 individual data visualisations were identified across the sample of 238 reviews.

figure 2

Number of data visualisations per review

Categories of data visualisation

Graphs were the most frequently used category of data visualisation in the sample. Over half of the reviews with data visualisation included at least one graph ( n  = 59, 65.6%). The least frequently used category was maps, with 15.6% ( n  = 14) of these reviews including a map.

Of the total number of 222 individual data visualisations, 102 were graphs (45.9%), 34 were tables (15.3%), 23 were diagrams (10.4%), 15 were maps (6.8%) and 48 were classified as “other” in the typology (21.6%).

Types of data visualisation

All of the types of data visualisation identified in our sample are reported in Table 2 . In total, 35 different types were used across the sample of reviews.

The most frequently used data visualisation type was a bar chart. Of 222 total data visualisations, 78 (35.1%) were a variation on a bar chart (either standard bar chart, stacked bar chart or multi-set bar chart). There were also 33 pie charts (14.9% of data visualisations) and 24 cross-tabulations (10.8% of data visualisations). In total, these five types of data visualisation accounted for 60.8% ( n  = 135) of all data visualisations. Figure  3 shows the frequency of each data visualisation category and type; an interactive online version of this treemap is also available ( https://public.flourish.studio/visualisation/9396133/ ). Figure  4 shows how users can further explore the data using the interactive treemap.

figure 3

Data visualisation categories and types. An interactive version of this treemap is available online: https://public.flourish.studio/visualisation/9396133/ . Through the interactive version, users can further explore the data (see Fig.  4 ). The unit of this treemap is the individual data visualisation, so multiple data visualisations within the same scoping review are represented in this map. Created with flourish.studio ( https://flourish.studio )

figure 4

Screenshots showing how users of the interactive treemap can explore the data further. Users can explore each level of the hierarchical treemap ( A Visualisation category >  B Visualisation subcategory >  C Variables presented in visualisation >  D Individual references reporting this category/subcategory/variable permutation). Created with flourish.studio ( https://flourish.studio )

Data presented

Around two-thirds of data visualisations in the sample presented a single variable ( n  = 143, 64.4%). The most frequently presented single variables were themes ( n  = 22, 9.9% of data visualisations), population ( n  = 21, 9.5%), country or region ( n  = 21, 9.5%) and year ( n  = 20, 9.0%). There were 58 visualisations (26.1%) that presented two different variables. The remaining 21 data visualisations (9.5%) presented three or more variables. Figure  5 shows the variables presented by each different type of data visualisation (an interactive version of this figure is available online).

figure 5

Variables presented by each data visualisation type. Darker cells indicate a larger number of reviews. An interactive version of this heat map is available online: https://public.flourish.studio/visualisation/10632665/ . Users can hover over each cell to see the number of data visualisations for that combination of data visualisation type and variable. The unit of this heat map is the individual data visualisation, so multiple data visualisations within a single scoping review are represented in this map. Created with flourish.studio ( https://flourish.studio )

Most reviews presented at least one data visualisation in colour ( n  = 64, 71.1%). However, almost a third ( n  = 26, 28.9%) used only black and white or greyscale.

Interactivity

Only two of the reviews included data visualisations with any level of interactivity. One scoping review on music and serious mental illness [ 22 ] linked to an interactive bubble chart hosted online on Tableau. Functionality included the ability to filter the studies displayed by various attributes.

The other review was an example of evidence mapping from the environmental health field [ 23 ]. All four of the data visualisations included in the paper were available in an interactive format hosted either by the review management software or on Tableau. The interactive versions linked to the relevant references so users could directly explore the evidence base. This was the only review that provided this feature.

Software requirements

Nine reviews clearly reported the software used to create data visualisations. Three reviews used Tableau (one of them also used review management software as discussed above) [ 22 , 23 , 24 ]. Two reviews generated maps using ArcGIS [ 25 ] or ArcMap [ 26 ]. One review used Leximancer for a lexical analysis [ 27 ]. One review undertook a bibliometric analysis using VOSviewer [ 28 ], and another explored citation patterns using CitNetExplorer [ 29 ]. Other reviews used Excel [ 30 ] or R [ 26 ].

To our knowledge, this is the first systematic and in-depth exploration of the use of data visualisation techniques in scoping reviews. Our findings suggest that the majority of scoping reviews do not use any data visualisation at all, and, in particular, more innovative examples of data visualisation are rare. Around 60% of data visualisations in our sample were simple bar charts, pie charts or cross-tabulations. There appears to be very limited use of interactive online visualisation, despite the potential this has for communicating results to a range of stakeholders. While it is not always appropriate to use data visualisation (or a simple bar chart may be the most user-friendly way of presenting the data), these findings suggest that data visualisation is being underused in scoping reviews. In a large minority of reviews, visualisations were not published in colour, potentially limiting how user-friendly and attractive papers are to decision-makers and other stakeholders. Also, very few reviews clearly reported the software used to create data visualisations. However, 35 different types of data visualisation were used across the sample, highlighting the wide range of methods that are potentially available to scoping review authors.

Our results build on the limited research that has previously been undertaken in this area. Two previous publications also found limited use of graphs in scoping reviews. Results were “mapped graphically” in 29% of scoping reviews in any field in one 2014 publication [ 31 ] and 17% of healthcare scoping reviews in a 2016 article [ 6 ]. Our results suggest that the use of data visualisation has increased somewhat since these reviews were conducted. Scoping review methods have also evolved in the last 10 years; formal guidance on scoping review conduct was published in 2014 [ 32 ], and an extension of the PRISMA checklist for scoping reviews was published in 2018 [ 33 ]. It is possible that an overall increase in use of data visualisation reflects increased quality of published scoping reviews. There is also some literature supporting our findings on the wide range of data visualisation methods that are used in evidence synthesis. An investigation of methods to identify, prioritise or display health research gaps (25/139 included studies were scoping reviews; 6/139 were evidence maps) identified 14 different methods used to display gaps or priorities, with half being “more advanced” (e.g. treemaps, radial bar plots) ([ 34 ], p. 107). A review of data visualisation methods used in papers reporting meta-analyses found over 200 different ways of displaying data [ 19 ].

Only two reviews in our sample used interactive data visualisation, and one of these was an example of systematic evidence mapping from the environmental health field rather than a scoping review (in environmental health, systematic evidence mapping explicitly involves producing a searchable database [ 35 ]). A scoping review of papers on the use of interactive data visualisation in population health or health services research found a range of examples but still limited use overall [ 13 ]. For example, the authors noted the currently underdeveloped potential for using interactive visualisation in research on health inequalities. It is possible that the use of interactive data visualisation in academic papers is restricted by academic publishing requirements; for example, it is currently difficult to incorporate an interactive figure into a journal article without linking to an external host or platform. However, we believe that there is a lot of potential to add value to future scoping reviews by using interactive data visualisation software. Few reviews in our sample presented three or more variables in a single visualisation, something which can easily be achieved using interactive data visualisation tools. We have previously used EPPI-Mapper [ 36 ] to present results of a scoping review of systematic reviews on behaviour change in disadvantaged groups, with links to the maps provided in the paper [ 37 ]. These interactive maps allowed policy-makers to explore the evidence on different behaviours and disadvantaged groups and access full publications of the included studies directly from the map.

We acknowledge there are barriers to use for some of the data visualisation software available. EPPI-Mapper and some of the software used by reviews in our sample incur a cost. Some software requires a certain level of knowledge and skill in its use. However numerous online free data visualisation tools and resources exist. We have used Flourish to present data for this review, a basic version of which is currently freely available and easy to use. Previous health research has been found to have used a range of different interactive data visualisation software, much of which does not required advanced knowledge or skills to use [ 13 ].

There are likely to be other barriers to the use of data visualisation in scoping reviews. Journal guidelines and policies may present barriers for using innovative data visualisation. For example, some journals charge a fee for publication of figures in colour. As previously mentioned, there are limited options for incorporating interactive data visualisation into journal articles. Authors may also be unaware of the data visualisation methods and tools that are available. Producing data visualisations can be time-consuming, particularly if authors lack experience and skills in this. It is possible that many authors prioritise speed of publication over spending time producing innovative data visualisations, particularly in a context where there is pressure to achieve publications.

Limitations

A limitation of this study was that we did not assess how appropriate the use of data visualisation was in our sample as this would have been highly subjective. Simple descriptive or tabular presentation of results may be the most appropriate approach for some scoping review objectives [ 7 , 8 , 10 ], and the scoping review literature cautions against “over-using” different visual presentation methods [ 7 , 8 ]. It cannot be assumed that all of the reviews that did not include data visualisation should have done so. Likewise, we do not know how many reviews used methods of data visualisation that were not well suited to their data.

We initially relied on authors’ own use of the term “scoping review” (or equivalent) to sample reviews but identified a relatively large number of papers labelled as scoping reviews that did not meet the basic definition, despite the availability of guidance and reporting guidelines [ 10 , 33 ]. It has previously been noted that scoping reviews may be undertaken inappropriately because they are seen as “easier” to conduct than a systematic review ([ 3 ], p.6), and that reviews are often labelled as “scoping reviews” while not appearing to follow any established framework or guidance [ 2 ]. We therefore took the decision to remove these reviews from our main analysis. However, decisions on how to classify review aims were subjective, and we did include some reviews that were of borderline relevance.

A further limitation is that this was a sample of published reviews, rather than a comprehensive systematic scoping review as have previously been undertaken [ 6 , 31 ]. The number of scoping reviews that are published has increased rapidly, and this would now be difficult to undertake. As this was a sample, not all relevant scoping reviews or evidence maps that would have met our criteria were included. We used machine learning to screen our search results for pragmatic reasons (to reduce screening time), but we do not see any reason that our sample would not be broadly reflective of the wider literature.

Data visualisation, and in particular more innovative examples of it, is currently underused in published scoping reviews on health topics. The examples that we have found highlight the wide range of methods that scoping review authors could draw upon to present their data in an engaging way. In particular, we believe that interactive data visualisation has significant potential for mapping the available literature on a topic. Appropriate use of data visualisation may increase the usefulness, and thus uptake, of scoping reviews as a way of identifying existing evidence or research gaps by decision-makers, researchers and commissioners of research. We recommend that scoping review authors explore the extensive free resources and online tools available for data visualisation. However, we also think that it would be useful for publishers to explore allowing easier integration of interactive tools into academic publishing, given the fact that papers are now predominantly accessed online. Future research may be helpful to explore which methods are particularly useful to scoping review users.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Organisation formerly known as Joanna Briggs Institute

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Munn Z, Pollock D, Khalil H, Alexander L, McLnerney P, Godfrey CM, Peters M, Tricco AC. What are scoping reviews? Providing a formal definition of scoping reviews as a type of evidence synthesis. JBI Evid Synth. 2022;20:950–952.

Peters MDJ, Marnie C, Colquhoun H, Garritty CM, Hempel S, Horsley T, Langlois EV, Lillie E, O’Brien KK, Tunçalp Ӧ, et al. Scoping reviews: reinforcing and advancing the methodology and application. Syst Rev. 2021;10:263.

Article   PubMed   PubMed Central   Google Scholar  

Munn Z, Peters MDJ, Stern C, Tufanaru C, McArthur A, Aromataris E. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med Res Methodol. 2018;18:143.

Sutton A, Clowes M, Preston L, Booth A. Meeting the review family: exploring review types and associated information retrieval requirements. Health Info Libr J. 2019;36:202–22.

Article   PubMed   Google Scholar  

Miake-Lye IM, Hempel S, Shanman R, Shekelle PG. What is an evidence map? A systematic review of published evidence maps and their definitions, methods, and products. Syst Rev. 2016;5:28.

Tricco AC, Lillie E, Zarin W, O’Brien K, Colquhoun H, Kastner M, Levac D, Ng C, Sharpe JP, Wilson K, et al. A scoping review on the conduct and reporting of scoping reviews. BMC Med Res Methodol. 2016;16:15.

Khalil H, Peters MDJ, Tricco AC, Pollock D, Alexander L, McInerney P, Godfrey CM, Munn Z. Conducting high quality scoping reviews-challenges and solutions. J Clin Epidemiol. 2021;130:156–60.

Lockwood C, dos Santos KB, Pap R. Practical guidance for knowledge synthesis: scoping review methods. Asian Nurs Res. 2019;13:287–94.

Article   Google Scholar  

Pollock D, Peters MDJ, Khalil H, McInerney P, Alexander L, Tricco AC, Evans C, de Moraes ÉB, Godfrey CM, Pieper D, et al. Recommendations for the extraction, analysis, and presentation of results in scoping reviews. JBI Evidence Synthesis. 2022;10:11124.

Google Scholar  

Peters MDJ GC, McInerney P, Munn Z, Tricco AC, Khalil, H. Chapter 11: Scoping reviews (2020 version). In: Aromataris E MZ, editor. JBI Manual for Evidence Synthesis. JBI; 2020. Available from https://synthesismanual.jbi.global . Accessed 1 Feb 2023.

Tableau Public. https://www.tableau.com/en-gb/products/public . Accessed 24 January 2023.

flourish.studio. https://flourish.studio/ . Accessed 24 January 2023.

Chishtie J, Bielska IA, Barrera A, Marchand J-S, Imran M, Tirmizi SFA, Turcotte LA, Munce S, Shepherd J, Senthinathan A, et al. Interactive visualization applications in population health and health services research: systematic scoping review. J Med Internet Res. 2022;24: e27534.

Isett KR, Hicks DM. Providing public servants what they need: revealing the “unseen” through data visualization. Public Adm Rev. 2018;78:479–85.

Carroll LN, Au AP, Detwiler LT, Fu T-c, Painter IS, Abernethy NF. Visualization and analytics tools for infectious disease epidemiology: a systematic review. J Biomed Inform. 2014;51:287–298.

Lundkvist A, El-Khatib Z, Kalra N, Pantoja T, Leach-Kemon K, Gapp C, Kuchenmüller T. Policy-makers’ views on translating burden of disease estimates in health policies: bridging the gap through data visualization. Arch Public Health. 2021;79:17.

Zakkar M, Sedig K. Interactive visualization of public health indicators to support policymaking: an exploratory study. Online J Public Health Inform. 2017;9:e190–e190.

Park S, Bekemeier B, Flaxman AD. Understanding data use and preference of data visualization for public health professionals: a qualitative study. Public Health Nurs. 2021;38:531–41.

Kossmeier M, Tran US, Voracek M. Charting the landscape of graphical displays for meta-analysis and systematic reviews: a comprehensive review, taxonomy, and feature analysis. BMC Med Res Methodol. 2020;20:26.

Ribecca, S. The Data Visualisation Catalogue. https://datavizcatalogue.com/index.html . Accessed 23 November 2021.

Ferdio. Data Viz Project. https://datavizproject.com/ . Accessed 23 November 2021.

Golden TL, Springs S, Kimmel HJ, Gupta S, Tiedemann A, Sandu CC, Magsamen S. The use of music in the treatment and management of serious mental illness: a global scoping review of the literature. Front Psychol. 2021;12: 649840.

Keshava C, Davis JA, Stanek J, Thayer KA, Galizia A, Keshava N, Gift J, Vulimiri SV, Woodall G, Gigot C, et al. Application of systematic evidence mapping to assess the impact of new research when updating health reference values: a case example using acrolein. Environ Int. 2020;143: 105956.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Jayakumar P, Lin E, Galea V, Mathew AJ, Panda N, Vetter I, Haynes AB. Digital phenotyping and patient-generated health data for outcome measurement in surgical care: a scoping review. J Pers Med. 2020;10:282.

Qu LG, Perera M, Lawrentschuk N, Umbas R, Klotz L. Scoping review: hotspots for COVID-19 urological research: what is being published and from where? World J Urol. 2021;39:3151–60.

Article   CAS   PubMed   Google Scholar  

Rossa-Roccor V, Acheson ES, Andrade-Rivas F, Coombe M, Ogura S, Super L, Hong A. Scoping review and bibliometric analysis of the term “planetary health” in the peer-reviewed literature. Front Public Health. 2020;8:343.

Hewitt L, Dahlen HG, Hartz DL, Dadich A. Leadership and management in midwifery-led continuity of care models: a thematic and lexical analysis of a scoping review. Midwifery. 2021;98: 102986.

Xia H, Tan S, Huang S, Gan P, Zhong C, Lu M, Peng Y, Zhou X, Tang X. Scoping review and bibliometric analysis of the most influential publications in achalasia research from 1995 to 2020. Biomed Res Int. 2021;2021:8836395.

Vigliotti V, Taggart T, Walker M, Kusmastuti S, Ransome Y. Religion, faith, and spirituality influences on HIV prevention activities: a scoping review. PLoS ONE. 2020;15: e0234720.

van Heemskerken P, Broekhuizen H, Gajewski J, Brugha R, Bijlmakers L. Barriers to surgery performed by non-physician clinicians in sub-Saharan Africa-a scoping review. Hum Resour Health. 2020;18:51.

Pham MT, Rajić A, Greig JD, Sargeant JM, Papadopoulos A, McEwen SA. A scoping review of scoping reviews: advancing the approach and enhancing the consistency. Res Synth Methods. 2014;5:371–85.

Peters MDJ, Marnie C, Tricco AC, Pollock D, Munn Z, Alexander L, McInerney P, Godfrey CM, Khalil H. Updated methodological guidance for the conduct of scoping reviews. JBI Evid Synth. 2020;18:2119–26.

Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, Moher D, Peters MDJ, Horsley T, Weeks L, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169:467–73.

Nyanchoka L, Tudur-Smith C, Thu VN, Iversen V, Tricco AC, Porcher R. A scoping review describes methods used to identify, prioritize and display gaps in health research. J Clin Epidemiol. 2019;109:99–110.

Wolffe TAM, Whaley P, Halsall C, Rooney AA, Walker VR. Systematic evidence maps as a novel tool to support evidence-based decision-making in chemicals policy and risk management. Environ Int. 2019;130:104871.

Digital Solution Foundry and EPPI-Centre. EPPI-Mapper, Version 2.0.1. EPPI-Centre, UCL Social Research Institute, University College London. 2020. https://eppi.ioe.ac.uk/cms/Default.aspx?tabid=3790 .

South E, Rodgers M, Wright K, Whitehead M, Sowden A. Reducing lifestyle risk behaviours in disadvantaged groups in high-income countries: a scoping review of systematic reviews. Prev Med. 2022;154: 106916.

Download references

Acknowledgements

We would like to thank Melissa Harden, Senior Information Specialist, Centre for Reviews and Dissemination, for advice on developing the search strategy.

This work received no external funding.

Author information

Authors and affiliations.

Centre for Reviews and Dissemination, University of York, York, YO10 5DD, UK

Emily South & Mark Rodgers

You can also search for this author in PubMed   Google Scholar

Contributions

Both authors conceptualised and designed the study and contributed to screening, data extraction and the interpretation of results. ES undertook the literature searches, analysed data, produced the data visualisations and drafted the manuscript. MR contributed to revising the manuscript, and both authors read and approved the final version.

Corresponding author

Correspondence to Emily South .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Typology of data visualisation methods.

Additional file 2.

References of scoping reviews included in main dataset.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

South, E., Rodgers, M. Data visualisation in scoping reviews and evidence maps on health topics: a cross-sectional analysis. Syst Rev 12 , 142 (2023). https://doi.org/10.1186/s13643-023-02309-y

Download citation

Received : 21 February 2023

Accepted : 07 August 2023

Published : 17 August 2023

DOI : https://doi.org/10.1186/s13643-023-02309-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Scoping review
  • Evidence map
  • Data visualisation

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

literature review scientific evidence

Different open access routes, varying societal impacts: evidence from the Royal Society biological journals

  • Published: 10 May 2024

Cite this article

literature review scientific evidence

  • Liwei Zhang 1 &
  • Liang Ma   ORCID: orcid.org/0000-0002-8779-5891 2  

36 Accesses

Explore all metrics

Compared to academic impacts (e.g., the citation advancement) brought by Open Access (OA), societal impacts of scientific studies have not been well elaborated in prior studies. In this article, we explore different OA routes (i.e., gold OA, hybrid OA, and bronze OA) and their varying effects on multiple types of societal impacts (i.e., social media and web) by using the case of four biological journals founded by the Royal Society. The results show that (1) gold OA is significantly and positively related to social media indicators (Twitter counts and Facebook counts), but significantly and negatively associated with web indicators (Blog counts and News counts); (2) hybrid OA has a significant and positive effect on both social media and web indicators; and (3) bronze OA is significantly and positively associated with social media indicators, but it turns to be negative albeit nonsignificant for web indicators. The findings suggest that OA policies could increase the societal impact on the public by varying degrees. Specifically, OA policies could amplify the societal impacts of research articles on social media, but the effects are inconsistent for societal impacts on the web.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

literature review scientific evidence

See: https://doaj.org/ .

See: https://www.coalition-s.org/about/ .

See: https://royalsociety.org/journals/open-access/free-content/ .

Alkhawtani, R. H. M., Kwee, T. C., & Kwee, R. M. (2020). Citation advantage for open access articles in European Radiology. European Radiology, 30 (1), 482–486.

Article   Google Scholar  

AlRyalat, S. A., Saleh, M., Alaqraa, M., Alfukaha, A., Alkayed, Y., Abaza, M., Saa, H. A., & Alshamiry, M. (2019). The impact of the open-access status on journal indices: A review of medical journals. F1000 Research, 8 , 266.

AlRyalat, S. A., Nassar, A. A., Tamimi, F., Al-Fraihat, E., Assaf, L., Ghareeb, R., & Al-Essa, M. (2019b). The impact of the open-access status on journal indices: Oncology journals. Journal of Gastrointestinal Oncology, 10 (4), 777–782.

Antelman, K. (2004). Do open access articles have a greater research impact? College and Research Libraries, 65 (2), 372–382.

Armstrong, M. (2021). Plan S: An economist’s perspective. Managerial and Decision Economics, 42 (2), 2017–2026.

Aung, H. H., Zheng, H., Erdt, M., Aw, A. S., Sin, S. C. J., & Theng, Y. L. (2019). Investigating familiarity and usage of traditional metrics and altmetrics. Journal of the Association for Information Science and Technology, 70 (8), 872–887.

Bar-Ilan, J., Shema, H., & Thelwall, M. (2014). Bibliographic references in web 2.0. In B. Cronin & C. R. Sugimoto (Eds.), Beyond bibliometrics: Harnessingmulti-dimensional indicators of performance (pp. 307–325). MIT Press.

Chapter   Google Scholar  

Basson, I., Blanckenberg, J. P., & Prozesky, H. (2021). Do open access journal articles experience a citation advantage? Results and methodological reflections of an application of multiple measures to an analysis by WoS subject areas. Scientometrics, 126 (1), 459–484.

Bautista-Puig, N., Lopez-Illescas, C., de Moya-Anegon, F., Guerrero-Bote, V., & Moed, H. F. (2020). Scientometrics, 124 (3), 2551–2575.

Bik, H. M., & Goldstein, M. C. (2013). An introduction to social media for scientists. PLoS Biology, 11 (4), e1001535.

Björk, B. C. (2017). Growth of hybrid open access, 2009–2016. PeerJ, 5 , e3878.

Bonetta, L. (2007). Scientists enter the blogosphere. Cell, 129 (3), 443–445.

Bornmann, L. (2014). Is there currently a scientific revolution in scientometrics? Journal of the Association for Information Science and Technology, 65 (3), 647–648.

Bornmann, L. (2014). Validity of altmetrics data for measuring societal impact: A study using data from Altmetric and F1000Prime. Journal of Informetrics, 8 (4), 935–950.

Bornmann, L. (2013). What is societal impact of research and how can it be assessed? A literature survey. Journal of the American Society for Information Science and Technology, 64 (2), 217–233.

Bornmann, L. (2016). Scientific revolution in scientometrics: The broadening of impact from citation to societal. In C. R. Sugimoto (Ed.), Theories of informetrics and scholarly communication (pp. 347–359). De Gruyter.

Google Scholar  

Brainard, J. (2021). Open access takes flight. Science, 371 (6524), 16–17.

Brody, T., Stamerjohanns, H., Harnad, S. Gingras, Y. Vallieres, F., & Oppenheim, C. (2004). The effect of Open Access on Citation Impact. Presented at: National Policies on Open Access (OA) . Provision for University Research Output: An International meeting. Southampton University, Southampton UK. Retrieved February 19, 2004, from http://opcit.eprints.org/feb19oa/brody-impact.pdf

Budapest Open Access Initiative (BOAI). (2002). Read the Budapest Open Access Initiative . Retrieved August 17, 2021, from https://www.budapestopenaccessinitiative.org/read

Butler, D. (2017). Gates Foundation announces open-access publishing venture. Nature, 543 (7647), 599–599.

Chen, B. K., Custis, T., Monteggia, L. M., & George, T. P. (2024). Effects of open access publishing on article metrics in Neuropsychopharmacology. Neuropsychopharmacology, 49 , 757–763.

Clayson, P. E., Baldwin, S. A., & Larson, M. J. (2021). The open access advantage for studies of human electrophysiology: Impact on citations and Altmetrics. International Journal of Psychophysiology, 164 (2021), 103–111.

Costas, R., Zahedi, Z., & Wouters, P. (2015). Do “altmetrics” correlate with citations? Extensive comparison of altmetric indicators with citations from a multidisciplinary perspective. Journal of the Association for Information Science and Technology, 66 (10), 2003–2019.

Craig, I. D., Plume, A. M., McVeigh, M. E., Pringle, J., & Amin, M. (2007). Do open access articles have greater citation impact? Journal of Informetrics, 1 (3), 239–248.

Davis, P. M. (2011). Open access, readership, citations: A randomized controlled trial of scientific journal publishing. FASEB Journal, 25 (7), 2129–2134.

Davis, P. M., & Walters, W. H. (2011). The impact of free access to the scientific literature: A review of recent research. Journal of the Medical Library Association, 99 (3), 208–217.

Didegah, F., Bowman, T. D., & Holmberg, K. (2018). On the differences between citations and altmetrics: An investigation of factors driving altmetrics versus citations for finnish articles. Journal of the Association for Information Science and Technology, 69 (6), 832–843.

Dorta-González, P., & Santana-Jiménez, Y. (2018). Prevalence and citation advantage of gold open access in the subject areas of the Scopus database. Research Evolution, 27 (1), 1–15.

Eysenbach, G. (2006). Citation advantage of open access articles. PLoS Biology, 4 (5), 692–698.

Gargouri, Y., Hajjem, C., Larivière, V., Gingras, Y., Carr, L., Brody, T., et al. (2010). Self-selected or mandated, open access increases citation impact for higher quality research. PLoS One, 5 (10), e13636.

Gaule, P., & Maystre, N. (2011). Getting cited: Does open access help? Research Policy, 40 (10), 1332–1338.

Ghane, M. R., Niazmand, M. R., & Sarvestani, A. S. (2020). The citation advantage for open access science journals with and without article processing charges. Journal of Information Science, 46 (1), 118–130.

Gold, E. R. (2021). The fall of the innovation empire and its possible rise through open science. Research Policy . https://doi.org/10.1016/j.respol.2021.104226

Graham, A. (2024). An Open Access Odyssey. Retrieved March 10, 2024, from https://royalsociety.org/blog/2024/03/open-access-2023/

Hafeez, D. M., Jalal, S., & Khosa, F. (2019). Bibliometric analysis of manuscript characteristics that influence citations: A comparison of six major psychiatry journals. Journal of Psychiatric Research, 108 , 90–94.

Harnad, S., Brody, T., Vallières, F., Carr, L., Hitchcock, S., Gingras, Y., Oppenheim, C., Hajjem, C., & Hilf, E. R. (2008). The access/impact problem and the green and gold roads to open access: An update. Serials Review, 34 (1), 36–40.

Haustein, S., Costas, R., & Larivière, V. (2015). Characterizing social media metrics of scholarly papers: The effect of document properties and collaboration patterns. PLoS One, 10 (3), e0120495.

Holmberg, K., Bowman, S., Bowman, T., Didegah, F., & Kortelainen, T. (2019). What is societal impact and where do altmetrics fit into the equation? Journal of Altmetrics . https://doi.org/10.29024/joa.21

Hua, F., Sun, H., Walsh, T., Worthington, H., & Glenny, A.-M. (2016). Open access to journal articles in dentistry: Prevalence and citation impact. Journal of Dentistry, 47 , 41–48.

Hunter, K. (2005). Critical issues in the development of STM journal publishing. Learned Publishing, 18 (1), 51–55.

Kousha, K., & Thelwall, M. (2019). An automatic method to identify citations to journals in news stories: A case study of the UK newspapers citing Web of Science journals. Journal of Data and Information Science, 4 (3), 73–95.

Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C., Demleitner, M., Henneken, E., & Murray, S. S. (2005). The effect of use and access on citations. Information Processing and Management, 41 (6), 1395–1402.

Laakso, M., & Björk, B. C. (2013). Delayed open access: An overlooked high-impact category of openly available scientific literature. Journal of the American Society for Information Science and Technology, 64 (7), 1323–1329.

Laakso, M., & Björk, B. C. (2016). Hybrid open access-a longitudinal study. Journal of Informetrics, 10 (4), 919–932.

Langham-Putrow, A., Bakker, C., & Riegelman, A. (2021). Is the open access citation advantage real? A systematic review of the citation of open access and subscription-based articles. PLoS ONE, 16 (6), e0253129.

Lawrence, S. (2001). Free online availability substantially increases a paper’s impact. Nature, 411 (6837), 521–521.

Lee, J. J., & Haupt, J. P. (2021). Scientific globalism during a global crisis: Research collaboration and open access publications on COVID-19. Higher Education, 81 (5), 949–966.

Lin, W. Y. C. (2021). Effects of open access and articles-in-press mechanisms on publishing lag and first-citation speed: A case on energy and fuels journals. Scientometrics, 126 (6), 4841–4869.

McCabe, M., & Snyder, C. (2014). Identifying the effect of open access on citations using a panel of science journals. Economic Inquiry, 52 (4), 1284–1300.

McKiernan, E., Bourne, P., Brown, C., Buck, S., Kenall, A., LinMcDougall, J. D., Nosek, B. A., Ram, K., Soderberg, C. K., Spies, J. R., Thaney, K., Updegrove, A., Woo, K. H., & Yarkoni, T. (2016). How open science helps researchers succeed. eLife, 5 , e16800.

Mering, M., & Hoeve, C. D. (2020). A brief history to the future of open access. Serials Review, 46 (4), 300–304.

Moed, H. F. (2007). The effect of “open access” on citation impact: An analysis of ArXiv’s condensed matter section. Journal of the American Society for Information Science and Technology, 58 (13), 2047–2054.

Mohammadi, E., Barahmand, N., & Thelwall, M. (2020). Who shares health and medical scholarly articles on facebook? Learned Publishing, 33 (2), 111–118.

Mohammadi, E., Thelwall, M., Kwasny, M., & Holmes, K. (2018). Academic information on Twitter: A user survey. PLoS ONE, 13 (5), e0197265.

Momeni, F., Mayr, P., Fraser, N., & Peters, I. (2021). What happens when a journal converts to open access? A Bibliometric Analysis. Scientometrics, 126 (12), 9811–9827.

Morillo, F. (2020). Is open access publication useful for all research fields? Presence of funding, collaboration and impact. Scientometrics, 125 (1), 689–716.

Mueller-Langer, F., Scheufen, M., & Waelbroeck, P. (2020). Does online access promote research in developing countries? Empirical evidence from article-level data: Research Policy. https://doi.org/10.1016/j.respol.2019.103886

Book   Google Scholar  

Nelson, G. M., & Eggett, D. L. (2017). Citations, mandates, and money: Author motivations to publish in chemistry hybrid open access journals. Journal of the Association for Information Science and Technology, 68 (10), 2501–2510.

Norris, M., Oppenheim, C., & Rowland, F. (2008). The citation advantage of open-access articles. Journal of the American Society for Information Science and Technology, 59 (12), 1963–1972.

Ortega, J. L. (2019). Availability and audit of links in altmetric data providers: Link checking of blogs and news in Altmetric.com, Crossref Event Data and PlumX. Journal of Altmetrics, 2 (1), 4. https://doi.org/10.29024/joa.14

Patel, R. B., Vaduganathan, M., Mosarla, R. C., Venkateswaran, R. V., Bhatt, D. L., & Bonow, R. O. (2019). Open access publishing and subsequent citations among articles in major cardiovascular journals. American of Medicine, 132 (9), 1103–1105.

Piwowar, H., Priem, J., Lariviere, V., Alperin, J. P., Matthias, L., Norlander, B., et al. (2018). The state of OA: A large-scale analysis of the prevalence and impact of Open Access articles. PeerJ, 6 , e4375.

Plume, A. (2024). Open-access publishing: Citation advantage is unproven. Nature, 626 (7999), 480–480.

Priem, J. (2014). Altmetrics. In B. Cronin & C. R. Sugimoto (Eds.), Beyond bibliometrics: Harnessing multi-dimensional indicators of performance (pp. 263–288). MIT Press.

Priem, J., Taraborelli, D., Groth, P., & Neylon, C. (2010). Altmetrics: A manifesto. Retrieved July 22, 2022, from http://altmetrics.org/manifesto/

Rodrigues, R. S., Abadal, E., & de Araujo, B. K. H. (2020). Open access publishers: The new players. PLoS ONE, 15 (6), e0233432.

Shema, H., Bar-Ilan, J., & Thelwall, M. (2014). Do blog citations correlate with a higher number of future citations? Research blogs as a potential source for alternative metrics. Journal of the Association for Information Science and Technology, 65 (5), 1018–1027.

Shema, H., Bar-Ilan, J., & Thelwall, M. (2015). How is research blogged? A content analysis approach. Journal of the Association for Information Science and Technology, 66 (6), 1136–1149.

Sotudeh, H., Arabzadeh, H., & Mirzabeigi, M. (2019). How do self-archiving and Author-pays models associate and contribute to OA citation advantage within hybrid journals. Journal of Academic Librarianship, 45 (4), 377–385.

Sotudeh, H., & Estakhr, Z. (2018). Sustainability of open access citation advantage: The case of Elsevier’s author-pays hybrid open access journals. Scientometrics, 115 (1), 563–576.

Stuart, T. (2017). The Rise of Open Access. Retrieved July 30, 2023, from https://royalsociety.org/blog/2017/10/the-rise-of-open-access/

Stuart, T. (2023). The Road to Open Access. Retrieved March 10, 2024, from https://royalsociety.org/blog/2023/09/the-road-to-open-access/

Sugimoto, C. R., Work, S., Larivière, V., & Haustein, S. (2017). Scholarly use of social media and altmetrics: A review of the literature. Journal of the Association for Information Science and Technology, 68 (9), 2037–2062.

Taylor, M. (2020). An altmetric attention advantage for open access books in the humanities and social sciences. Scientometrics, 125 (3), 2523–2543.

Thelwall, M., & Kousha, K. (2015). Web indicators for research evaluation. Part 2: Social media metrics. Profesional De La Informacion, 24 (5), 607–620.

Thelwall, M. (2021). Measuring societal impacts of research with altmetrics? Common problems and mistakes. Journal of Economic Surveys, 35 (5), 1302–1314.

Thelwall, M., Tsou, A., Weingart, S., Holmberg, K., & Haustein, S. (2013). Tweeting links to academic articles. Cybermetrics: International Journal of Scientometrics Informetrics and Bibliometrics, 17 , 1–8.

Torres-Salinas, D. (2020). Daily growth rate of scientific production on Covid-19. Analysis in databases and open access repositories. Profesional de la informacion, 29 (2), e290215.

Turgut, Y. E., Aslan, A., & Denizalp, N. V. (2021). Academicians’ awareness, attitude, and use of open access during the COVID-19 pandemic. Journal of Librarianship and Information Science . https://doi.org/10.1177/09610006211016509

Wang, X., Liu, C., Mao, W., & Fang, Z. (2015). The open access advantage considering citation, article usage and social media attention. Scientometrics, 103 (2), 555–564.

Wray, K. B. (2016). No new evidence for a citation benefit for Author-Pay Open Access Publications in the social sciences and humanities. Scientometrics, 103 (3), 1031–1035.

Wren, J. D. (2005). Open access and openly accessible: A study of scientific publications shared via the internet. British Medical Journal, 330 (7500), 1128–1131.

Yi, H., Leng, Q. H., Zhou, J., Peng, S. F., & Mao, Y. S. (2023). Do open access articles have a citation advantage in Journal of Hepatology? Journal of Hepatology, 79 (2), E71–E73.

Young, J. S., & Brandes, P. M. (2020). Green and gold open access citation and interdisciplinary advantage: A bibliometric study of two science journals. Journal of Academic Librarianship . https://doi.org/10.1016/j.acalib.2019.102105

Zhang, L. W., & Wang, J. (2018). Why highly cited articles are not highly tweeted? A Biology Case. Scientometrics, 117 (1), 495–509.

Article   MathSciNet   Google Scholar  

Zhang, L. W., & Wang, J. (2021). What affects publications’ popularity on Twitter? Scientometrics, 126 (11), 9185–9198.

Zhang, L., & Watson, E. M. (2017). Measuring the impact of gold and green open access. The Journal of Academic Librarianship, 43 (4), 337–345.

Download references

Acknowledgements

We thank altmetric.com for sharing the data used in this study.

Financial support is from Beijing Social Science Fund (No. 21DTR058), National Social Science Fund of China (20&ZD071; 23&ZD080), and National Natural Science Foundation of China (No. 72274203, No. 72241434).

Author information

Authors and affiliations.

School of Innovation and Entrepreneurship, Shandong University, Shandong Province, Qingdao, China

Liwei Zhang

School of Public Administration and Policy, Renmin University of China, 59 Zhongguancun Avenue, Haidian District, Beijing, 100872, China

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Liang Ma .

Ethics declarations

Conflict of interest.

There is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Zhang, L., Ma, L. Different open access routes, varying societal impacts: evidence from the Royal Society biological journals. Scientometrics (2024). https://doi.org/10.1007/s11192-024-05032-0

Download citation

Received : 05 October 2023

Accepted : 16 April 2024

Published : 10 May 2024

DOI : https://doi.org/10.1007/s11192-024-05032-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Societal impacts
  • Web effects
  • Social media effects
  • Find a journal
  • Publish with us
  • Track your research

This paper is in the following e-collection/theme issue:

Published on 14.5.2024 in Vol 12 (2024)

The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation

Authors of this article:

Author Orcid Image

  • Yong Nam Gwon 1, * , MD ; 
  • Jae Heon Kim 1, * , MD, PhD ; 
  • Hyun Soo Chung 2 , MD ; 
  • Eun Jee Jung 2 , MD ; 
  • Joey Chun 1, 3 , MD ; 
  • Serin Lee 1, 4 , MD ; 
  • Sung Ryul Shim 5, 6 , MPH, PhD

1 Department of Urology, Soonchunhyang University College of Medicine, Soonchunhyang University Seoul Hospital, , Seoul, , Republic of Korea

2 College of Medicine, Soonchunhyang University, , Cheonan, , Republic of Korea

3 Cranbrook Kingswood Upper School, , Bloomfield Hills, MI, , United States

4 Department of Biochemistry, Case Western Reserve University, , Cleveland, OH, , United States

5 Department of Biomedical Informatics, Konyang University College of Medicine, , Daejeon, , Republic of Korea

6 Konyang Medical Data Research Group-KYMERA, Konyang University Hospital, , Daejeon, , Republic of Korea

*these authors contributed equally

Corresponding Author:

Sung Ryul Shim, MPH, PhD

Background: A large language model is a type of artificial intelligence (AI) model that opens up great possibilities for health care practice, research, and education, although scholars have emphasized the need to proactively address the issue of unvalidated and inaccurate information regarding its use. One of the best-known large language models is ChatGPT (OpenAI). It is believed to be of great help to medical research, as it facilitates more efficient data set analysis, code generation, and literature review, allowing researchers to focus on experimental design as well as drug discovery and development.

Objective: This study aims to explore the potential of ChatGPT as a real-time literature search tool for systematic reviews and clinical decision support systems, to enhance their efficiency and accuracy in health care settings.

Methods: The search results of a published systematic review by human experts on the treatment of Peyronie disease were selected as a benchmark, and the literature search formula of the study was applied to ChatGPT and Microsoft Bing AI as a comparison to human researchers. Peyronie disease typically presents with discomfort, curvature, or deformity of the penis in association with palpable plaques and erectile dysfunction. To evaluate the quality of individual studies derived from AI answers, we created a structured rating system based on bibliographic information related to the publications. We classified its answers into 4 grades if the title existed: A, B, C, and F. No grade was given for a fake title or no answer.

Results: From ChatGPT, 7 (0.5%) out of 1287 identified studies were directly relevant, whereas Bing AI resulted in 19 (40%) relevant studies out of 48, compared to the human benchmark of 24 studies. In the qualitative evaluation, ChatGPT had 7 grade A, 18 grade B, 167 grade C, and 211 grade F studies, and Bing AI had 19 grade A and 28 grade C studies.

Conclusions: This is the first study to compare AI and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI. The limitations of this study using the generative pre-trained transformer model are that the search for research topics was not diverse and that it did not prevent the hallucination of generative AI. However, this study will serve as a standard for future studies by providing an index to verify the reliability and consistency of generative AI from a user’s point of view. If the reliability and consistency of AI literature search services are verified, then the use of these technologies will help medical research greatly.

Introduction

The global artificial intelligence (AI) health care market size was estimated to be at US $15.1 billion in 2022 and is expected to surpass approximately US $187.95 billion by 2030, growing at an annualized rate of 37% during the forecast period from 2022 to 2030 [ 1 ]. In particular, innovative applications of medical AI are expected to increase in response to medical demand, which will explode in 2030 [ 2 , 3 ].

A large language model (LLM) is a type of AI model that opens up great possibilities for health care practice, research, and education, although scholars have emphasized the need to proactively address the issue of unvalidated and inaccurate information regarding its use [ 4 , 5 ]. One of the best-known LLMs is ChatGPT (OpenAI). It was launched in November 2022. Similar to other LLMs, ChatGPT is trained on huge text data sets in numerous languages, allowing it to respond to text input with humanlike responses [ 4 ]. Developed by the San Francisco–based AI research laboratory OpenAI, ChatGPT is based on a generative pre-trained transformer (GPT) architecture. It is considered an advanced form of a chatbot, an umbrella term for a program that uses a text-based interface to understand and generate responses. The key difference between a chatbot and ChatGPT is that a chatbot is usually programmed with a limited number of responses, whereas ChatGPT can produce personalized responses according to the conversation [ 4 , 6 ].

Sallam’s [ 5 ] systematic review (SR) sought to identify the benefits and current concerns regarding ChatGPT. That review advises that health care research could benefit from ChatGPT, since it could be used to facilitate more efficient data set analysis, code generation, and literature reviews, thus allowing researchers to concentrate on experimental design as well as drug discovery and development. The author also suggests that ChatGPT could be used to improve research equity and versatility in addition to its ability to improve scientific writing. Health care practice could also benefit from ChatGPT in multiple ways, including enabling improved health literacy and delivery of more personalized medical care, improved documentation, workflow streamlining, and cost savings. Health care education could also use ChatGPT to provide more personalized learning with a particular focus on problem-solving and critical thinking skills [ 5 ]. However, the same review also lays out the current concerns, including copyright issues, incorrect citations, and increased risk of plagiarism, as well as inaccurate content, risk of excessive information leading to an infodemic on a particular topic, and cybersecurity issues [ 5 ].

A key question regarding the use of ChatGPT is if it can use evidence to identify premedical content. Evidence-based medicine (EBM) provides the highest level of evidence in medical treatment by integrating clinician experience, patient value, and best-available scientific information to guide decision-making on clinical management [ 7 ]. The principle of EBM means that the most appropriate treatment plan for patients should be devised based on the latest empirical research evidence. However, the scientific information identified by ChatGPT is not yet validated in terms of safety or accuracy according to Sallam [ 5 ], who further suggests that neither doctors nor patients should rely on it at this stage. In contrast, another study by Zhou et al [ 8 ] found that answers provided by ChatGPT were generally based on the latest verified scientific evidence, that is, the advice given followed high-quality treatment protocols and adhered to guidelines from experts.

In medicine, a clinical decision support system (CDSS) uses real-time evidence to support clinical decision-making. This is a fundamental tool in EBM, which uses SRs based on a systematic, scientific search of a particular subject. If ChatGPT becomes a CDSS, it is fundamental to determine whether it is capable of performing a systematic search based on real-time generation of evidence in the medical field. Therefore, this study will be the first to determine whether ChatGPT can search papers for an SR. In particular, this study aims to present a standard for medical research using generative AI search technology in the future by providing indicators for the reliability and consistency of generative AI searches from a user’s perspective.

Ethical Considerations

As per 45 CFR §46.102(f), the activities performed herein were considered exempt from institutional review board approval due to the data being publicly available. Informed consent was not obtained, since this study used previously published deidentified information that was available to the general public. This study used publicly available data from PubMed, Embase, and Cochrane Library and did not include human participant research.

Setting the Benchmark

To determine whether ChatGPT, currently the most representative LLM, is capable of systematic searches, we set an SR that was performed by human experts as a benchmark and checked how many studies were finally included in the benchmark were presented by ChatGPT. We chose Lee et al [ 9 ] as the benchmark for the following reasons. First, Lee et al [ 9 ] performed an SR and meta-analysis about the medical treatment for Peyronie disease (PD) with human experts. PD typically presents with discomfort, curvature, or deformity of the penis in association with palpable plaques and erectile dysfunction [ 10 ]. Second, it was easy to compare the results of ChatGPT and the benchmark, because we had full information about the interim process and results of the study. Third, a sufficient amount of studies has been published about the medical treatment for PD, but there is still no consensus answer. So, we expected to assess the sole ability of ChatGPT as a systematic search tool with sufficient data while avoiding any possible pretrained bias. Lastly, with the topic of Lee et al [ 9 ], we could build questions that start broad and become more specific and add some conditions that could test ChatGPT’s comprehension about scientific research. For example, questions could not only be built broadly by asking about “medical treatment for Peyronie’s disease” but also specifically by asking about “oral therapy for Peyronie’s disease” or “colchicine for Peyronie’s disease.” Because Lee et al [ 9 ] only contained randomized controlled trials (RCTs), we could add a condition to the questions to restrict the study type to RCTs, which could be useful to assess the comprehension of ChatGPT.

Systematic Search Formula of Benchmark

Lee et al [ 9 ] used the following search query in PubMed and Cochrane Library: (“penile induration”[MeSH Terms] OR “Peyronie’s disease”[Title/Abstract]) AND “male”[MeSH Terms] AND “randomized controlled trial”[Publication Type] , and the following query in Embase: (‘Peyronie disease’/exp OR ’Peyronie’s diseas’:ab,ti) AND ’male’/exp AND ’randomized controlled trial’/de . After the systematic search, a total of 217 records were identified. Studies were excluded for the following reasons: not RCTs, not perfectly fit to the topic, not enough sample size or outcome, and not written in English. Finally, 24 RCTs were included in the SR, with only 1 RCT published in 2022 ( Figure 1 ) [ 9 ]. The characteristics of all studies included in Lee et al [ 9 ] are summarized in Section S1 in Multimedia Appendix 1 .

literature review scientific evidence

Methodology of Systematic Search for ChatGPT

Based on the search formula used in Lee et al [ 9 ], a simple mandatory prompt in the form of a question was created, starting with comprehensive questions and gradually asking more specific questions ( Textbox 1 ). For example, questions could be built as “Could you show RCTs of colchicine for Peyronie’s disease in PubMed?” with the treatment and database changed under the same format. In addition to mandatory questions, we added questions about treatment additionally provided by ChatGPT during the conversation. Considering the possibility that ChatGPT might respond differently depending on the interaction, we arranged questions into 2 logical flows, focusing on database and treatment, respectively ( Figure 2 and Figure S1 in Multimedia Appendix 1 ). We asked about search results from 4 databases: PubMed [ 11 ], Google (Google Scholar) [ 12 ], Cochrane Library [ 13 ], and ClinicalTrials.gov [ 14 ]. PubMed is a leading biomedical database offering access to peer-reviewed articles. Google Scholar provides a wide-ranging index of scholarly literature, including medical studies. Cochrane Library specializes in high-quality evidence through SRs and clinical trials. ClinicalTrials.gov, managed by the National Library of Medicine, serves as a comprehensive repository for clinical study information globally. These databases collectively serve researchers by providing access to diverse and credible sources, facilitating literature reviews and evidence synthesis, and informing EBM in the medical field. They play crucial roles in advancing medical knowledge, supporting informed decision-making, and ultimately improving patient care outcomes [ 11 - 14 ]. These 4 databases were easy to access and contained most of the accessible studies. Each question was repeated at least twice. We extracted the answers and evaluated the quality of information based on the title, author, journal, and publication year (Sections S2-S5 Multimedia Appendix 1 ).

Basic format of questions

  • “Could you show RCTs of (A) for Peyronie’s disease in (B)?”

(A) Treatment category and specific treatment

  • Vitamin E, colchicine, L-carnitine, potassium aminobenzoate, tamoxifen, pentoxifylline, tadalafil, L-arginine, and sildenafil
  • Verapamil, interferon-a2B, collagenase Clostridium histolyticum , transdermal electromotive administration, hyaluronidase, triamcinolone, mitomycin C, super-oxide dismutase, and 5-fluorouracil
  • Extracorporeal shockwave therapy, iontophoresis, traction therapy, vacuum, penile massage, and exercise shockwave therapy
  • 5-Alpha-reductase inhibitors, superficial heat, diclofenac gel, collagenase Clostridium histolyticum gel, verapamil gel, potassium aminobenzoate gel, and propionyl-L-carnitine gel

(B) Database

  • Google (Google Scholar)
  • Cochrane Library
  • ClinicalTrials.gov

literature review scientific evidence

We used the GPT-3.5 version of ChatGPT, which was pretrained with data before 2021, for the systematic search and evaluated how many RCTs that were included in Lee et al [ 9 ] were present in the search results from ChatGPT. To assess the reliability of ChatGPT’s answers, we also evaluated whether the studies presented actually existed. ChatGPT’s response style and the amount of information might vary from answer to answer. Thus, we evaluated the accuracy of the responses by prioritizing a match by (1) title; (2) author, journal, and publication year; and (3) other items.

To obtain higher-quality responses, it is important to structure the prompts using refined language that is well understood by the LLM [ 15 - 17 ]. In this study, we performed the following fine-tuning to clearly convey the most important content or information. We first defined roles and provided context and input data before asking complete questions to get responses, and we used specific and varied examples to help the model narrow its focus and produce more accurate results [ 18 , 19 ]. During the prompt engineering, the treatment category, specific treatment, and target databases were structured in order, and the order was changed in the detailed elements to induce consistent answers. Details of this are presented in Multimedia Appendix 1 .

Quality Assessment of Answers

To evaluate the quality of individual studies derived from AI answers, we created a structured rating system based on bibliographic information related to the publications ( Table 1 ). We classified its answers into 4 grades if the title existed: A, B, C, and F. No grade was given for a fake title or no answer.

a PICOS: population, intervention, comparison, outcome, and study design (research questions).

c Any mismatch in essential information or accessory information.

d Mismatch.

e N/A: not assessed.

A grade of “A” was given to an answer that was appropriate for the question and perfectly consistent with the actual study. For example, for the question “Could you show all RCTs of medical treatment for Peyronie’s disease in Google?” ChatGPT answered “Gelbard MK, James K, Riach P, Dorey F. Collagenase versus placebo in the treatment of Peyronie’s disease: a double-blind study. J Urol. 1993; 149(1):56-58.” This answer was perfectly consistent with actual study because the authors, title, journal, publication year, issue number, and even page number matched, and it was appropriate because it was an RCT, was about medical treatment for PD, and could be found in Google.

A grade of “B” was given to an answer that was appropriate for the question and the title, author, journal, and publication year matched with the actual study. For example, for the question “Could you show RCTs of Intralesional therapy for Peyronie’s disease in Cochrane Library?” ChatGPT answered “Gelbard MK, James K, Riach P, Dorey F. Collagenase versus placebo in the treatment of Peyronie’s disease: a double-blind study. J Urol. 1993 Feb; 149(2):56-8. doi: 10.1016/s0022-5347(17)36148-1. PMID: 8426418.” This answer was appropriate because it was an RCT, was about intralesional therapy for PD, and could be found in Cochrane Library. However, this answer was consistent with the title, authors, journal, and publication year of actual study but inconsistent with the accessory information for publication month, issue number, page number, DOI, and PMID. So, it was able to specify the presented study with essential information, but the accessory information was not consistent with the actual study.

A grade of “C” was given to an answer that was inappropriate for the question but the title, author, journal, and publication year matched with the actual study. For example, for the question “Could you show RCTs of Pentoxifylline for Peyronie’s disease in PubMed?” ChatGPT answered “Gelbard MK, James K, Riach P, Dorey F. Collagenase versus placebo in the treatment of Peyronie’s disease: a double-blind study. J Urol. 1993; 149(1):56-8. doi: 10.1016/s0022-5347(17)36091-7. PMID: 8417228.” This answer was consistent with the title, authors, journal, and publication year of the actual study, but it was inappropriate because it was not about the use of pentoxifylline for PD.

A grade of “F” was given to an answer that was inconsistent with the author, journal, or publication year of the actual study that matched the title, making it impossible to be specified. Thus, it was considered as a fake study that was generated. For example, for the question “Could you show RCTs of collagenase Clostridium histolyticum for Peyronie’s disease in PubMed?” ChatGPT answered “Gelbard MK, James K, Riach P, Dorey FJ, & Collagenase Study Group. (2012). Collagenase versus placebo in the treatment of Peyronie’s disease: a double-blind study. The Journal of urology, 187(3), 948-953.” This answer was consistent with the title of the actual study but inconsistent with the authors, publication year, and so on.

Searching Strategy for Bing AI

To compare with ChatGPT, we performed the same process with Bing AI [ 20 ], also known as “New Bing,” an AI chatbot developed by Microsoft and released in 2023. Since Bing AI functions based on the huge AI model “Prometheus” that includes OpenAI’s GPT-4 with web searching capabilities, it is expected to give more accurate answers than the GPT-3.5 version of ChatGPT. We performed the conversation with the “Precise” tone. Because Bing AI limited the number of questions per session to 20, we did not arrange questions into 2 logical flows (Section S6 in Multimedia Appendix 1 ). We compared the number of studies included in the benchmark [ 9 ] and provided by Bing AI. We also evaluated the reliability of answers with the same method described above or using links of websites presented by Bing AI (Figure S2 and Section S7 in Multimedia Appendix 1 ).

Systematic Search Results via ChatGPT

A total of 639 questions were entered into ChatGPT, and 1287 studies were obtained ( Table 2 ). The systematic search via ChatGPT was performed from April 17 to May 6, 2023. At the beginning of the conversation, we gave ChatGPT the role of a researcher conducting a systematic search who intended to perform a meta-analysis for more appropriate answers. At first, we tried to build question format by using the word “find,” such as “Could you find RCTs of medical treatment for Peyronie’s disease?” However, ChatGPT did not present studies and only suggested how to find RCTs in a database, such as PubMed. Therefore, we changed the word “find” to “show,” and ChatGPT presented lists of RCTs. For comprehensive questions, ChatGPT did not give an answer, saying that it did not have the capability to show a list of RCTs as an AI language model. However, when questions were gradually specified, it created answers (Sections S2 and S4 in Multimedia Appendix 1 ).

a AI: artificial intelligence.

b From Lee et al [ 9 ].

Of the 1287 studies provided by ChatGPT, only 7 (0.5%) studies were perfectly eligible and 18 (1.4%) studies could be considered suitable under the assumption that they were real studies if only the title, author, journal, and publication year matched ( Table 2 ). Among these, only 1 study was perfectly consistent with studies finally included in Lee et al [ 9 ], and 4 studies were matched under the assumption (Sections S1, S3, and S5 in Multimedia Appendix 1 ).

Specifically, systematic search via ChatGPT was performed in 2 logical flow schemes, database setting and treatment setting ( Figure 2 and Figure S1 in Multimedia Appendix 1 ). With the logical flow by database setting, among the 725 obtained studies, 6 (0.8%) and 8 (1.1%) studies were classified as grade A and grade B, respectively ( Table 1 ). Of these, 1 grade A study and 1 grade B study were included in Lee et al [ 5 ]. With the logical flow by treatment setting, among the 562 obtained studies, 1 (0.2%) study was classified as grade A and 10 (1.8%) studies were classified as grade B. Of these, 3 grade B studies were included in the benchmark [ 9 ] ( Table 2 ).

It was common for answers to be changed. There were many cases where answers contradicted themselves. In addition, there were cases where the answer was “no capability” or “no RCT found” at first, but when another question was asked and the previous question was asked again, an answer was given. ChatGPT showed a tendency to create articles by rotating some format and words. Titles presented were so plausible that it was almost impossible to identify fake articles until an actual search was conducted. The presented authors were also real people. Titles often contained highly specific numbers, devices, or brand names that were real. There were some cases where it was possible to infer which articles ChatGPT mimicked in the fake answers (Sections S3 and S5 in Multimedia Appendix 1 ). Considering these characteristics, when generating sentences, ChatGPT seemed to list words with a high probability of appearing among pretrained data rather than presenting accurate facts or understanding questions.

In conclusion, of the 1287 studies presented by ChatGPT, only 1 (0.08%) RCT matched the 24 RCTs of the benchmark [ 9 ].

Systematic Search Results via Bing AI

For Bing AI, a total of 223 questions were asked and 48 studies were presented. Among the 48 obtained studies, 19 (40%) studies were classified as grade A. There were no grade B studies ( Table 2 ). Because Bing AI always gave references with links to the websites, all studies presented by Bing AI existed. However, it also provided wrong answers about the study type, especially as it listed reviews as RCTs. Of the 28 studies with grade C, 27 (96%) were not RCTs and 1 (4%) was about a different treatment. Only 1 study had no grade because of a fake title; it presented a study registered in PubMed while pretending that it was the result of a search in ClinicalTrials.gov. However, the study was not in ClinicalTrials.gov (Section S7 in Multimedia Appendix 1 ).

Bing AI had more accurate answers than ChatGPT since it provides actual website references. However, it also showed a tendency to give more answers to more specific questions, similar to ChatGPT. For example, with a comprehensive question, Bing AI said “I am not able to access or search specific databases.” However, with more specific questions, it found studies or answered “I couldn’t find any RCTs’ without mention about accessibility.” In most cases, Bing AI either failed to find studies or listed too few studies to be used as a systematic searching tool.

In conclusion, of the 48 studies presented by Bing AI, 2 (4%) RCTs matched the 24 RCTs of the benchmark [ 9 ].

Principal Findings

This paper’s researchers sought to determine whether ChatGPT could conduct a real-time systematic search for EBM. For the first time, researchers compared the performance of ChatGPT with classic systematic searching as well as the Microsoft Bing AI search engine. Although Zhou et al [ 8 ] suggested that ChatGPT answered qualitative questions based on recent evidence, this study found that ChatGPT’s results were not based on a systematic search (which is the basis for an SR), meaning that they could not be used for real-time CDSS in their current state.

With recent controversy regarding the risks and benefits of advanced AI technologies [ 21 - 24 ], ChatGPT has received mixed responses from the scientific community and academia. Although many scholars agree that ChatGPT can increase the efficiency and accuracy of the output in writing and conversational tasks [ 25 ], others suggest that the data sets used in ChatGPT’s training might lead to possible bias, which not only limits its capabilities but also leads to the phenomenon of hallucination—apparently scientifically plausible yet factually inaccurate information [ 24 ]. Caution around the use of LLMs should also bear in mind security concerns, including the potential of cyberattacks that deliberately spread misinformation [ 25 ].

When applying the plug-in method in this study, especially when using PubMed Research [ 26 ], the process worked smoothly and there was not a single case of hallucination of fake research (by providing information along with a link), regardless of the designation of a specific database engine. Among the responses, 21 RCTs were included in the final SR, and out of a total of 24, all RCTs except 3 were provided. This is a very encouraging result. However, there is no plug-in that allows access to other databases yet, and if the conversation is long, the response speed is very slow. Furthermore, although it is a paid service, it only provides a total of 100 papers, so if more than 100 RCTs are searched, the user must manually search all papers. Ultimately, it is not intended for conducting an efficient and systematic search, as additional time and effort are required. If a more efficient plug-in is developed, this could play a promising part in systematic searches.

Although Sallam’s [ 5 ] SR suggests that academic and scientific writing as well as health care practice, research, and education could benefit from the use of ChatGPT, this study found that ChatGPT could not search scientific articles properly, with a 0.08% (1/1287) of probability of the desired paper being presented. In the case of Bing AI using GPT-4, this study showed that Bing AI could search scientific articles with a much higher accuracy than ChatGPT. However, the probability was only 4% (2/48). It was still an insufficient probability for performing systematic research. Moreover, fake answers generated by ChatGPT, known as hallucinations, caused researchers to spend extra time and effort by checking the accuracy of the answers. A typical problem with generative AI is that it creates hallucinations. However, this is difficult to completely remove due to the principle of generative AI. Therefore, if it cannot be prevented from the pretraining of the model, efforts to increase reliability and consistency in the use of generative AI in medical care by checking the accuracy from the user’s point of view are required, as shown in this study. Unlike ChatGPT, Bing AI did not generate fake studies. However, the total number of studies presented was too small. Very few studies have focused on the scientific searching accuracy of ChatGPT. Although this paper found many articles about the use of ChatGPT in the medical field, the majority concerned the role of ChatGPT as an author. Although the latter might accelerate writing efficiency, it also confirms the previously mentioned issues of transparency and plagiarism.

Wang et al [ 27 ] have recently investigated whether ChatGPT could be used to generate effective Boolean queries for an SR literature search. The authors suggest that ChatGPT should be considered a “valuable tool” for researchers conducting SRs, especially for time-constrained rapid reviews where trading off higher precision for lower recall is generally acceptable. They cite its ability to follow complex instructions and generate high-precision queries. Nonetheless, it should be noted that building a Boolean query is not a complex process. However, selecting the most appropriate articles for an SR is critical, which might be a more useful subject to examine in relation to the use of ChatGPT. Moreover, although Aydın and Karaarslan [ 28 ] have indicated that ChatGPT shows promise in generating a literature review, the iThenticate plagiarism tool found significant matches in paraphrased elements.

In scientific research, the most time-consuming and challenging task can be the process of filtering out unnecessary papers on the one hand and identifying those that are needed on the other hand. This difficult yet critical task can be daunting. It discourages many researchers from participating in scientific research. If AI could replace this process, it will be easier to collect and analyze data from the selected papers. Recently, commercial literature search services using generative AI models have emerged. Representative examples include Covidence [ 29 ], Consensus [ 30 ], and Elicit [ 31 ]. The technical details of these commercial AI literature search services are unknown, but they are based on LLMs using GPT. Therefore, these search services are not only insufficient to verify hallucinations but also lack information in the search target databases. Even if there may be mistakes, the researcher should aim for completeness, and unverified methods should be avoided. Although this study did not use a commercial literature search service, it manually searched the target databases one by one. If the reliability and consistency of AI literature search services are verified, the use of these technologies will help medical research greatly

This study suggests that ChatGPT still has limitations in academic search, despite the recent assertion from Zhou et al [ 8 ] about its potential in searching for academic evidence. Moreover, although ChatGPT can search and identify guidance in open-access guidelines, its results are brief and fragmentary, often with just 1 or 2 sentences that lack relevant details about the guidelines.

Arguably, more concern should be placed on the potential use of ChatGPT in a CDSS than its role in education or writing draft papers. On the one hand, if AI such as ChatGPT is used within a patient-physician relationship, this is unlikely to affect liability since the advice is filtered through professionals’ judgment and inaccurate advice generated by AI is no different from erroneous or harmful information disseminated by a professional. However, ChatGPT lacks sufficient accuracy and speed to be used in this manner. On the other hand, ChatGPT could also be used to give direct-to-consumer advice, which is largely unregulated since asking AI directly for medical advice or emotional support acts outside the established patient-physician relationship [ 32 ]. Since there is a risk of patient knowing inaccurate information, the medical establishment should seek to educate patients and guardians about the risk of inaccurate information in this regard.

Academic interest in ChatGPT to date has mainly focused on potential benefits including research efficiency and education, drawbacks related to ethical issues such as plagiarism and the risk of bias, as well as security issues including data privacy. However, in terms of providing medical information and acting as a CDSS, the use of ChatGPT is currently less certain because its academic search capability is potentially inaccurate, which is a fundamental issue that must be addressed.

The limitation of this study is that it did not address various research topics, because only 1 research topic was searched when collecting target literature. In addition, due to the time difference between the start of the study and the review and evaluation period, the latest technology could not be fully applied because it could become an outdated technology in a field of study where technology advances rapidly, such as generative AI. For example, there have already been significant technological advances since new AI models such as ChatGPT Turbo (4.0) were released between the time we started this study and the current revised time point.

This paper thus suggests that the use of AI as a tool for generating real-time evidence for a CDSS is a dream that has not yet become a reality. The starting point of evidence generation is a systematic search and ChatGPT is unsuccessful even for this initial purpose. Furthermore, its potential use in providing advice directly to patients in a direct-to-consumer form is concerning, since ChatGPT could provide inaccurate medical information that is not evidence based and can result in harm. For the proper use of generative AI in medical care in the future, it is suggested that a feedback model that evaluates accuracy according to experts’ perspective, as done in this study, and then reflects it back into an LLM is necessary.

This is the first study to compare AI and conventional human SR methods as a real-time literature collection tool for EBM. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI. The limitations of this study using the GPT model are that the search for research topics was not diverse and that it did not prevent the hallucinations of generative AI. However, this study will serve as a standard for future studies by providing an index to verify the reliability and consistency of generative AI from a user’s point of view. If the reliability and consistency of AI literature search services are verified, the use of these technologies will help medical research greatly.

Acknowledgments

This work was supported by the Soonchunhyang University Research Fund. This body had no involvement in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.

Authors' Contributions

SRS had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. YNG, HSC, EJJ, JC, SL, and SRS contributed to the analysis and interpretation of data. YNG, HSC, SRS, and JHK contributed to the drafting of the manuscript. SRS and JHK contributed to critical revision of the manuscript for important intellectual content. YNG and SRS contributed to statistical analysis.

Conflicts of Interest

None declared.

Additional logical flow diagrams, characteristics of studies included in Lee et al [ 9 ], ChatGPT and Microsoft Bing transcripts, and grade classification for answers.

  • Artificial intelligence (AI) in healthcare market (by component: software, hardware, services; by application: virtual assistants, diagnosis, robot assisted surgery, clinical trials, wearable, others; by technology: machine learning, natural language processing, context-aware computing, computer vision; by end user) - global industry analysis, size, share, growth, trends, regional outlook, and forecast 2022-2030. Precedence Research. Feb 2023. URL: https://www.precedenceresearch.com/artificial-intelligence-in-healthcare-market [Accessed 2024-03-31]
  • Bajwa J, Munir U, Nori A, Williams B. Artificial intelligence in healthcare: transforming the practice of medicine. Future Healthc J. Jul 2021;8(2):e188-e194. [ CrossRef ] [ Medline ]
  • Zahlan A, Ranjan RP, Hayes D. Artificial intelligence innovation in healthcare: literature review, exploratory analysis, and future research. Technol Soc. Aug 2023;74:102321. [ CrossRef ]
  • Models. OpenAI. URL: https://platform.openai.com/docs/models/gpt-3-5 [Accessed 2023-06-14]
  • Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel). Mar 19, 2023;11(6):887. [ CrossRef ] [ Medline ]
  • Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. arXiv. Preprint posted online on Jul 22, 2020. [ CrossRef ]
  • Evidence-Based Medicine Working Group. Evidence-based medicine. a new approach to teaching the practice of medicine. JAMA. Nov 4, 1992;268(17):2420-2425. [ CrossRef ] [ Medline ]
  • Zhou Z, Wang X, Li X, Liao L. Is ChatGPT an evidence-based doctor? Eur Urol. Sep 2023;84(3):355-356. [ CrossRef ] [ Medline ]
  • Lee HY, Pyun JH, Shim SR, Kim JH. Medical treatment for Peyronie’s disease: systematic review and network Bayesian meta-analysis. World J Mens Health. Jan 2024;42(1):133. [ CrossRef ]
  • Chung E, Ralph D, Kagioglu A, et al. Evidence-based management guidelines on Peyronie's disease. J Sex Med. Jun 2016;13(6):905-923. [ CrossRef ] [ Medline ]
  • PubMed. URL: https://pubmed.ncbi.nlm.nih.gov/about/ [Accessed 2023-06-14]
  • Google Scholar. URL: https://scholar.google.com/ [Accessed 2023-06-14]
  • Cochrane Library. URL: https://www.cochranelibrary.com/ [Accessed 2023-06-14]
  • ClinicalTrials.gov. URL: https://classic.clinicaltrials.gov/ [Accessed 2023-06-14]
  • Nori H, Lee YT, Zhang S, et al. Can generalist foundation models outcompete special-purpose tuning? case study in medicine. arXiv. Preprint posted online on Nov 28, 2023. [ CrossRef ]
  • Ziegler A, Berryman J. A developer’s guide to prompt engineering and LLMs. GitHub Blog. Jul 17, 2023. URL: https://github.blog/2023-07-17-prompt-engineering-guide-generative-ai-llms/ [Accessed 2023-07-17]
  • Introducing ChatGPT. OpenAI. Nov 30, 2022. URL: https://openai.com/blog/chatgpt [Accessed 2023-10-16]
  • Reid R. How to write an effective GPT-3 or GPT-4 prompt. Zapier. Aug 3, 2023. URL: https://zapier.com/blog/gpt-prompt/ [Accessed 2023-10-14]
  • Prompt engineering for generative AI. Google. Aug 8, 2023. URL: https://developers.google.com/machine-learning/resources/prompt-eng?hl=en [Accessed 2024-04-23]
  • Bing. URL: https://www.bing.com/ [Accessed 2024-04-30]
  • de Angelis L, Baglivo F, Arzilli G, et al. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health. Apr 25, 2023;11:1166120. [ CrossRef ] [ Medline ]
  • Howard J. Artificial intelligence: implications for the future of work. Am J Ind Med. Nov 2019;62(11):917-926. [ CrossRef ] [ Medline ]
  • Tai MCT. The impact of artificial intelligence on human society and bioethics. Tzu Chi Med J. Aug 14, 2020;32(4):339-343. [ CrossRef ] [ Medline ]
  • Wogu IAP, Olu-Owolabi FE, Assibong PA, et al. Artificial intelligence, alienation and ontological problems of other minds: a critical investigation into the future of man and machines. Presented at: 2017 International Conference on Computing Networking and Informatics (ICCNI); Oct 29 to 31, 2017:;1-10; Lagos, Nigeria. [ CrossRef ]
  • Deng J, Lin Y. The benefits and challenges of ChatGPT: an overview. Frontiers in Computing and Intelligent Systems. Jan 5, 2023;2(2):81-83. [ CrossRef ]
  • PubMed Research. whatplugin.ai. URL: https://www.whatplugin.ai/plugins/pubmed-research [Accessed 2024-04-30]
  • Wang S, Scells H, Koopman B, Zuccon G. Can ChatGPT write a good Boolean query for systematic review literature search? arXiv. Preprint posted online on Feb 9, 2023. [ CrossRef ]
  • Aydın Ö, Karaarslan E. OpenAI ChatGPT generated literature review: digital twin in healthcare. In: Aydın Ö, editor. Emerging Computer Technologies 2. İzmir Akademi Dernegi; 2022;22-31. [ CrossRef ]
  • Covidence. URL: https://www.covidence.org/ [Accessed 2024-04-24]
  • Consensus. URL: https://consensus.app/ [Accessed 2024-04-24]
  • Elicit. URL: https://elicit.com/ [Accessed 2024-04-24]
  • Haupt CE, Marks M. AI-generated medical advice-GPT and beyond. JAMA. Apr 25, 2023;329(16):1349-1350. [ CrossRef ] [ Medline ]

Abbreviations

Edited by Alexandre Castonguay; submitted 24.07.23; peer-reviewed by In Gab Jeong Jeong, Jinwon Noh, Lingxuan Zhu, Sachin Pandey, Taeho Greg Rhee; final revised version received 31.03.24; accepted 04.04.24; published 14.05.24.

© Yong Nam Gwon, Jae Heon Kim, Hyun Soo Chung, Eun Jee Jung, Joey Chun, Serin Lee, Sung Ryul Shim. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 14.5.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/ , as well as this copyright and license information must be included.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

The Levels of Evidence and their role in Evidence-Based Medicine

Patricia b. burns.

1 Research Associate, Section of Plastic Surgery, Department of Surgery, The University of Michigan Health System

Rod J. Rohrich

2 Professor of Surgery, Department of Plastic Surgery, University of Texas Southwestern Medical Center

Kevin C. Chung

3 Professor of Surgery, Section of Plastic Surgery, Department of Surgery, The University of Michigan Health System

As the name suggests, evidence-based medicine (EBM), is about finding evidence and using that evidence to make clinical decisions. A cornerstone of EBM is the hierarchical system of classifying evidence. This hierarchy is known as the levels of evidence. Physicians are encouraged to find the highest level of evidence to answer clinical questions. Several papers published in Plastic Surgery journals concerning EBM topics have touched on this subject. 1 – 6 Specifically, previous papers have discussed the lack of higher level evidence in PRS and need to improve the evidence published in the journal. Before that can be accomplished, it is important to understand the history behind the levels and how they should be interpreted. This paper will focus on the origin of levels of evidence, their relevance to the EBM movement and the implications for the field of plastic surgery as well as the everyday practice of plastic surgery.

History of Levels of Evidence

The levels of evidence were originally described in a report by the Canadian Task Force on the Periodic Health Examination in 1979. 7 The report’s purpose was to develop recommendations on the periodic health exam and base those recommendations on evidence in the medical literature. The authors developed a system of rating evidence ( Table 1 ) when determining the effectiveness of a particular intervention. The evidence was taken into account when grading recommendations. For example, a Grade A recommendation was given if there was good evidence to support a recommendation that a condition be included in the periodic health exam. The levels of evidence were further described and expanded by Sackett 8 in an article on levels of evidence for antithrombotic agents in 1989 ( Table 2 ). Both systems place randomized controlled trials (RCT) at the highest level and case series or expert opinions at the lowest level. The hierarchies rank studies according to the probability of bias. RCTs are given the highest level because they are designed to be unbiased and have less risk of systematic errors. For example, by randomly allocating subjects to two or more treatment groups, these types of studies also randomize confounding factors that may bias results. A case series or expert opinion is often biased by the author’s experience or opinions and there is no control of confounding factors.

Canadian Task Force on the Periodic Health Examination’s Levels of Evidence *

Levels of Evidence from Sackett *

Modification of levels

Since the introduction of levels of evidence, several other organizations and journals have adopted variation of the classification system. Diverse specialties are often asking different questions and it was recognized that the type and level of evidence needed to be modified accordingly. Research questions are divided into the categories: treatment, prognosis, diagnosis, and economic/decision analysis. For example, Table 3 shows the levels of evidence developed by the American Society of Plastic Surgeons (ASPS) for prognosis 9 and Table 4 shows the levels developed by the Centre for Evidence Based Medicine (CEBM) for treatment. 10 The two tables highlight the types of studies that are appropriate for the question (prognosis versus treatment) and how quality of data is taken into account when assigning a level. For example, RCTs are not appropriate when looking at the prognosis of a disease. The question in this instance is: “What will happen if we do nothing at all”? Because a prognosis question does not involve comparing treatments, the highest evidence would come from a cohort study or a systematic review of cohort studies. The levels of evidence also take into account the quality of the data. For example, in the chart from CEBM, poorly designed RCTs have the same level of evidence as a cohort study.

Levels of Evidence for Prognostic Studies *

Levels of Evidence for Therapeutic Studies *

A grading system that provides strength of recommendations based on evidence has also changed over time. Table 5 shows the Grade Practice Recommendations developed by ASPS. The grading system provides an important component in evidence-based medicine and assists in clinical decision making. For example, a strong recommendation is given when there is level I evidence and consistent evidence from Level II, III and IV studies available. The grading system does not degrade lower level evidence when deciding recommendations if the results are consistent.

Grade Practice Recommendations *

Interpretation of levels

Many journals assign a level to the papers they publish and authors often assign a level when submitting an abstract to conference proceedings. This allows the reader to know the level of evidence of the research but the designated level of evidence does always guarantee the quality of the research. It is important that readers not assume that level 1 evidence is always the best choice or appropriate for the research question. This concept will be very important for all of us to understand as we evolve into the field of EBM in Plastic Surgery. By design, our designated surgical specialty will always have important articles that may have a lower level of evidence due to the level of innovation and technique articles which are needed to move our surgical specialty forward.

Although RCTs are the often assigned the highest level of evidence, not all RCTs are conducted properly and the results should be carefully scrutinized. Sackett 8 stressed the importance of estimating types of errors and the power of studies when interpreting results from RCTs. For example, a poorly conducted RCT may report a negative result due to low power when in fact a real difference exists between treatment groups. Scales such as the Jadad scale have been developed to judge the quality of RCTs. 11 Although physicians may not have the time or inclination to use a scale to assess quality, there are some basic items that should be taken into account. Items used for assessing RCTs include: randomization, blinding, a description of the randomization and blinding process, description of the number of subjects who withdrew or drop out of the study; the confidence intervals around study estimates; and a description of the power analysis. For example, Bhandari et al. 12 published a paper assessing the quality of surgical RCTs. The authors evaluated the quality of RCTs reported in the Journal of Bone and Joint Surgery (JBJS) from 1988–2000. Papers with a score of > 75% were deemed high quality and 60% of the papers had a score < 75%. The authors identified 72 RCTs during this time period and the mean score was 68%. The main reason for the low quality score was lack of appropriate randomization, blinding, and a description of patient exclusion criteria. Another paper found the same quality score of papers in JBJS with a level 1 rating compared to level 2. 13 Therefore, one should not assume that level 1 studies have higher quality than level 2.

A resource for surgeons when appraising levels of evidence are the users’ guides published in the Canadian Journal of Surgery 14 , 15 and the Journal of Bone and Joint Surgery. 16 Similar papers that are not specific to surgery have been published in the Journal of the American Medical Association (JAMA). 17 , 18

Plastic surgery and EBM

The field of plastic surgery has been slow to adopt evidence-based medicine. This was demonstrated in a paper examining the level of evidence of papers published in PRS. 19 The authors assigned levels of evidence to papers published in PRS over a 20 year period. The majority of studies (93% in 1983) were level 4 or 5, which denotes case series and case reports. Although the results are disappointing, there was some improvement over time. By 2003 there were more level 1studies (1.5%) and fewer level 4 and 5 studies (87%). A recent analysis looked at the number of level 1 studies in 5 different plastic surgery journals from 1978–2009. The authors defined level 1 studies as RCTs and meta-analysis and restricted their search to these studies. The number of level 1 studies increased from 1 in 1978 to 32 by 2009. 20 From these results, we see that the field of plastic surgery is improving the level of evidence but still has a way to go, especially in improving the quality of studies published. For example, approximately a third of the studies involved double blinding, but the majority did not randomize subjects, describe the randomization process, or perform a power analysis. Power analysis is another area of concern in plastic surgery. A review of the plastic surgery literature found that the majority of published studies have inadequate power to detect moderate to large differences between treatment groups. 21 No matter what the level of evidence for a study, if it is under powered, the interpretation of results is questionable.

Although the goal is to improve the overall level of evidence in plastic surgery, this does not mean that all lower level evidence should be discarded. Case series and case reports are important for hypothesis generation and can lead to more controlled studies. Additionally, in the face of overwhelming evidence to support a treatment, such as the use of antibiotics for wound infections, there is no need for an RCT.

Clinical examples using levels of evidence

In order to understand how the levels of evidence work and aid the reader in interpreting levels, we provide some examples from the plastic surgery literature. The examples also show the peril of medical decisions based on results from case reports.

An association was hypothesized between lymphoma and silicone breast implants based on case reports. 22 – 27 The level of evidence for case reports, depending on the scale used, is 4 or 5. These case reports were used to generate the hypothesis that a possible association existed. Because of these results, several large retrospective cohort studies from the United States, Canada, Denmark, Sweden and Finland were conducted. 28 – 32 The level of evidence for a retrospective cohort is 2. All of these studies had many years of follow-up for a large number of patients. Some of the studies found an elevated risk and others no risk for lymphoma. None of the studies reached statistical significance. Therefore, higher level evidence from cohort studies does not provide evidence of any risk of lymphoma. Finally, a systematic review was performed that combined the evidence from the retrospective cohorts. 27 The results found an overall standardized incidence ratio of 0.89 (95% CI 0.67–1.18). Because the confidence intervals include 1, the results indicate there is no increased incidence. The level of evidence for the systematic review is 1. Based on the best available evidence, there is no association between lymphoma and silicone implants. This example shows how low level evidence studies were used to generate a hypothesis, which then led to higher level evidence that disproved the hypothesis. This example also demonstrates that RCTs are not feasible for rare events such as cancer and the importance of observational studies for a specific study question. A case-control study is a better option and provides higher evidence for testing the prognosis of the long-term effect of silicone breast implants.

Another example is the injection of epinephrine in fingers. Based on case reports prior to 1950, physicians were advised that epinephrine injection can result in finger ischemia. 33 We see in this example in which level 4 or 5 evidence was accepted as fact and incorporated into medical textbooks and teaching. However, not all physicians accepted this evidence and are performing injections of epinephrine into the fingers with no adverse effects on the hand. Obviously, it was time for higher level evidence to resolve this issue. An in-depth review of the literature from 1880 to 2000 by Denkler, 33 identified 48 cases of digital infarction of which 21 were injected with epinephrine. Further analysis found that the addition of procaine to the epinephrine injection was the cause of the ischemia. 34 The procaine used in these injections included toxic acidic batches that were recalled in 1948. In addition, several cohort studies found no complications from the use of epinephrine in the fingers and hand. 35 , 36 , 37 The results from these cohort studies increased the level of evidence. Based on the best available evidence from these studies, the hypothesis that epinephrine injection will harm fingers was rejected. This example highlights the biases inherent in case reports. It also shows the risk when spurious evidence is handed down and integrated into medical teaching.

Obtaining the best evidence

We have established the need for RCTs to improve evidence in plastic surgery but have also acknowledged the difficulties, particularly with randomization and blinding. Although RCTs may not be appropriate for many surgical questions, well designed and conducted cohort or case-control studies could boost the level of evidence. Many of the current studies tend to be descriptive and lack a control group. The way forward seems clear. Plastic surgery researchers need to consider utilizing a cohort or case-control design whenever an RCT is not possible. If designed properly, the level of evidence for observational studies can approach or surpass those from an RCT. In some instances, observation studies and RCTs have found similar results. 38 If enough cohort or case-control studies become available, this increases the prospect of systematic reviews of these studies that will increase overall evidence levels in plastic surgery.

The levels of evidence are an important component of EBM. Understanding the levels and why they are assigned to publications and abstracts helps the reader to prioritize information. This is not to say that all level 4 evidence should be ignored and all level 1 evidence accepted as fact. The levels of evidence provide a guide and the reader needs to be cautious when interpreting these results.

Acknowledgments

Supported in part by a Midcareer Investigator Award in Patient-Oriented Research (K24 AR053120) from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (to Dr. Kevin C. Chung).

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

REVIEW article

This article is part of the research topic.

Pathogen-Induced Immunosenescence: Where do Vaccines Stand?

TB and HIV Induced Immunosenescence: Where do vaccines play a role? Provisionally Accepted

  • 1 Western University of Health Sciences, United States
  • 2 Chamberlain University, United States

The final, formatted version of the article will be published soon.

This paper tackles the complex interplay between Human Immunodeficiency virus (HIV-1) and Mycobacterium tuberculosis (M. tuberculosis) infections, particularly their contribution to immunosenescence, the age-related decline in immune function. Using the current literature, we discuss the immunological mechanisms behind TB and HIV-induced immunosenescence and critically evaluate the BCG (Bacillus Calmette-Guérin) vaccine's role. Both HIV-1 and M. tuberculosis demonstrably accelerate immunosenescence: M. tuberculosis through DNA modification and heightened inflammation, and HIV-1 through chronic immune activation and T cell production compromise. HIV-1 and M. tuberculosis co-infection further hastens immunosenescence by affecting T cell differentiation, underscoring the need for prevention and treatment. Furthermore, the use of the BCG tuberculosis vaccine is contraindicated in patients who are HIV positive and there is a lack of investigation regarding the use of this vaccine in patients who develop HIV co-infection with possible immunosenescence. As HIV does not currently have a vaccine, we focus our review more so on the BCG vaccine response as a result of immunosenescence. We found that there are overall limitations with the BCG vaccine, one of which is that it cannot necessarily prevent reoccurance of infection due to effects of immunosenescence or protect the elderly due to this reason. Overall, there is conflicting evidence to show the vaccine's usage due to factors involving its production and administration. Further research into developing a vaccine for HIV and improving the BCG vaccine is warranted to expand scientific understanding for public health and beyond.

Keywords: immunosenescence, M. tuberculosis, Vaccine3, BCG, HIV

Received: 14 Feb 2024; Accepted: 13 May 2024.

Copyright: © 2024 Singh, Patel, Seo, Ahn, Shen, Nakka, Kishore and Venketaraman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Mx. Vishwanath Venketaraman, Western University of Health Sciences, Pomona, 91766-1854, California, United States

People also looked at

IMAGES

  1. 39 Best Literature Review Examples (Guide & Samples)

    literature review scientific evidence

  2. Levels of evidence and study design

    literature review scientific evidence

  3. Scientific Literature Review Aid From Skilled Helpers

    literature review scientific evidence

  4. Levels of Evidence

    literature review scientific evidence

  5. The Importance of Literature Review in Scientific Research Writing

    literature review scientific evidence

  6. hierarchy scientific evidence

    literature review scientific evidence

VIDEO

  1. Chapter two

  2. The Influence of Literature on Scientific Discoveries

  3. The Influence of Literature on Scientific Discovery

  4. The Influence of Literature on Scientific Discovery

  5. What is literature review?

  6. The Influence of Literature on Scientific Discovery

COMMENTS

  1. How to Write a Literature Review

    Examples of literature reviews. Step 1 - Search for relevant literature. Step 2 - Evaluate and select sources. Step 3 - Identify themes, debates, and gaps. Step 4 - Outline your literature review's structure. Step 5 - Write your literature review.

  2. Literature review as a research methodology: An ...

    As mentioned previously, there are a number of existing guidelines for literature reviews. Depending on the methodology needed to achieve the purpose of the review, all types can be helpful and appropriate to reach a specific goal (for examples, please see Table 1).These approaches can be qualitative, quantitative, or have a mixed design depending on the phase of the review.

  3. Ten Simple Rules for Writing a Literature Review

    Literature reviews are in great demand in most scientific fields. Their need stems from the ever-increasing output of scientific publications .For example, compared to 1991, in 2008 three, eight, and forty times more papers were indexed in Web of Science on malaria, obesity, and biodiversity, respectively .Given such mountains of papers, scientists cannot be expected to examine in detail every ...

  4. Chapter 9 Methods for Literature Reviews

    Literature reviews play a critical role in scholarship because science remains, first and foremost, a cumulative endeavour (vom Brocke et al., 2009). As in any academic discipline, rigorous knowledge syntheses are becoming indispensable in keeping up with an exponentially growing eHealth literature, assisting practitioners, academics, and graduate students in finding, evaluating, and ...

  5. Guidance on Conducting a Systematic Literature Review

    Literature review is an essential feature of academic research. Fundamentally, knowledge advancement must be built on prior existing work. To push the knowledge frontier, we must know where the frontier is. By reviewing relevant literature, we understand the breadth and depth of the existing body of work and identify gaps to explore.

  6. Writing a literature review

    A formal literature review is an evidence-based, in-depth analysis of a subject. There are many reasons for writing one and these will influence the length and style of your review, but in essence a literature review is a critical appraisal of the current collective knowledge on a subject. Rather than just being an exhaustive list of all that ...

  7. PDF Your essential guide to literature reviews

    Systematic vs. Literature Review Systematic Review Literature Review Definition High-level overview of primary research on a focused question that identifies, selects, synthesizes, and appraises all high-quality research evidence relevant to that question Qualitatively summarizes evidence on a topic using informal or

  8. PDF Systematic Literature Reviews: an Introduction

    However, informing practice with scientific evidence required methods to review and synthesise the existing knowledge about specific questions of practical relevance to medical professionals. The rate at which science progresses is so rapid that no practitioner could keep up with the scientific literature, even on very specific topics.

  9. Methodological Approaches to Literature Review

    A literature review is defined as "a critical analysis of a segment of a published body of knowledge through summary, classification, and comparison of prior research studies, reviews of literature, and theoretical articles." (The Writing Center University of Winconsin-Madison 2022) A literature review is an integrated analysis, not just a summary of scholarly work on a specific topic.

  10. Evidence-Based Reviews: How Evidence-Based Practices are Systematically

    Evidence-based reviews are a type of systematic literature review used to identify evidence-based practices. When conducting an evidence-based review, researchers apply predetermined standards to identify evidence-based practices—practices that have been shown to reliably improve an outcome for a population of learners, according to evidence from a body of rigorous, experimental studies.

  11. Literature Review in Scientific Research: An Overview

    A literature review is essential to any scientific research study, which entails an in-depth analysis and synthesis of the existing literature and studies related to the research topic. The ...

  12. Evidence appraisal: a scoping review, conceptual framework, and

    This is occurring during a period of growth in the published literature, leading to information overload. 41-43 Further, there is a large volume of nontraditional and emerging sources of evidence: results, analyses, and conclusions outside of the scientific peer-reviewed literature, including via trial registries and data repositories, 26, 27 ...

  13. Evidence-Based Research: Levels of Evidence Pyramid

    One way to organize the different types of evidence involved in evidence-based practice research is the levels of evidence pyramid. The pyramid includes a variety of evidence types and levels. Filtered resources: pre-evaluated in some way. systematic reviews. critically-appraised topics. critically-appraised individual articles.

  14. Literature reviews as independent studies: guidelines for academic

    Review articles or literature reviews are a critical part of scientific research. While numerous guides on literature reviews exist, these are often limited to the philosophy of review procedures, protocols, and nomenclatures, triggering non-parsimonious reporting and confusion due to overlapping similarities. To address the aforementioned limitations, we adopt a pragmatic approach to ...

  15. Research Guides: Systematic Reviews: Levels of Evidence

    Levels of Evidence. The evidence pyramid is often used to illustrate the development of evidence. At the base of the pyramid is animal research and laboratory studies - this is where ideas are first developed. As you progress up the pyramid the amount of information available decreases in volume, but increases in relevance to the clinical ...

  16. Eight problems with literature reviews and how to fix them

    The aims of literature reviews range from providing a primer for the uninitiated to summarizing the evidence for decision making 1. Traditional approaches to literature reviews are susceptible to ...

  17. (PDF) Systematic Literature Reviews: An Introduction

    Systematic literature reviews (SRs) are a way of synt hesising scientific evidence to answer a particular. research question in a way that is transparent and reproducible, while seeking to include ...

  18. PDF Conducting a Literature Review

    Literature Review A literature review is a survey of scholarly sources that provides an overview of a particular topic. Literature reviews are a collection of the most relevant and significant publications regarding that topic in order to provide a comprehensive look at what has been said on the topic and by whom.

  19. The Advantage of Literature Reviews for Evidence-Based Practice

    The Advantage of Literature Reviews for Evidence-Based Practice. Evidence-based practice is the mantra for nursing in all settings. Although the randomized clinical trial (RCT) is the gold standard for testing interventions, the publication of the RCT represents one study providing evidence. Scientific integrative, systematic, and meta-analytic ...

  20. Data visualisation in scoping reviews and evidence maps on health

    Scoping reviews and evidence maps are forms of evidence synthesis that aim to map the available literature on a topic and are well-suited to visual presentation of results. A range of data visualisation methods and interactive data visualisation tools exist that may make scoping reviews more useful to knowledge users. The aim of this study was to explore the use of data visualisation in a ...

  21. Efficacy and safety of crizotinib in the treatment of advanced non

    @article{Nadal2024EfficacyAS, title={Efficacy and safety of crizotinib in the treatment of advanced non-small cell Lung cancer with ROS1 gene fusion: A systematic literature review and Meta-Analysis of real-world evidence}, author={Ernest Nadal and Nada Rifi and Sarah Kane and Sokhna Diarra Mbacke and Lindsey Starkman and Beatrice Suero and ...

  22. Evidence-Based Quality Improvement: a Scoping Review of the Literature

    First, a search using the exact terms ("evidence based quality improvement," "evidence-based quality improvement," or "EBQI") was employed to identify publications published to March 2020 that explicitly refer to EBQI in the title, abstract, or keyword of the publication (i.e., the elements that are searchable in research databases).

  23. Different open access routes, varying societal impacts: evidence from

    Compared to academic impacts (e.g., the citation advancement) brought by Open Access (OA), societal impacts of scientific studies have not been well elaborated in prior studies. In this article, we explore different OA routes (i.e., gold OA, hybrid OA, and bronze OA) and their varying effects on multiple types of societal impacts (i.e., social media and web) by using the case of four ...

  24. Artificially Sweetened Beverages and Health Outcomes: An Umbrella Review

    Results of the umbrella review are shown in Table 1.Two additional health outcomes (CAD and stroke) were meta-analyzed and added to main results of the umbrella review from original studies pooled in the CVD incidence review [18].Nine out of the 13 meta-analyses showed statistically significant associations (P < 0.05) in the random effect model.The effect sizes (RR) ranged between 1.13 ...

  25. A practical guide to data analysis in general literature reviews

    A general literature review starts with formulating a research question, defining the population, and conducting a systematic search in scientific databases, steps that are well-described elsewhere. 1,2,3 Once students feel confident that they have thoroughly combed through relevant databases and found the most relevant research on the topic ...

  26. The Use of Generative AI for Scientific Literature Searches for

    Conclusions: This is the first study to compare artificial intelligence (AI) and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible.

  27. The Levels of Evidence and their role in Evidence-Based Medicine

    History of Levels of Evidence. The levels of evidence were originally described in a report by the Canadian Task Force on the Periodic Health Examination in 1979. 7 The report's purpose was to develop recommendations on the periodic health exam and base those recommendations on evidence in the medical literature. The authors developed a system of rating evidence (Table 1) when determining ...

  28. Frontiers

    This paper tackles the complex interplay between Human Immunodeficiency virus (HIV-1) and Mycobacterium tuberculosis (M. tuberculosis) infections, particularly their contribution to immunosenescence, the age-related decline in immune function. Using the current literature, we discuss the immunological mechanisms behind TB and HIV-induced immunosenescence and critically evaluate the BCG ...

  29. Narrative Reviews: Flexible, Rigorous, and Practical

    Introduction. Narrative reviews are a type of knowledge synthesis grounded in a distinct research tradition. They are often framed as non-systematic, which implies that there is a hierarchy of evidence placing narrative reviews below other review forms. 1 However, narrative reviews are highly useful to medical educators and researchers. While a systematic review often focuses on a narrow ...

  30. The underexposed nature-based solutions: A critical state-of-art review

    By analyzing all these aspects, especially the level of effectiveness and recommendations, insight was gained into the future potential of NBS and possible improvements. The research indicated a lack of scientific publications, especially in Belgium. Hence, grey literature was also included in the literature review.