Systematic Literature Review of Guidelines on Budget Impact Analysis for Health Technology Assessment

  • Systematic Review
  • Published: 06 May 2021
  • Volume 19 , pages 825–838, ( 2021 )

Cite this article

ispor guidelines literature review

  • Yashika Chugh 1 ,
  • Maria De Francesco 2 &
  • Shankar Prinja   ORCID: orcid.org/0000-0001-7719-6986 1  

640 Accesses

6 Citations

8 Altmetric

Explore all metrics

The objective of this systematic review was to review the recommendations for the conduct of a budget impact analysis in national or organisational guidelines globally.

We searched several databases including MELDINE, EMBASE, The Cochrane Library, National Guideline Clearinghouse, HTA Database (International Network of Agencies for Health Technology Assessment), Econlit and IDEAS Database (RePEc, Research Papers in Economics). The OVID platform was used to run the search in all databases simultaneously. In addition, a search of the grey literature was also conducted. The timeframe was set from 2000 to 2020 with language of publication restricted to English.

A total of 13 publications were selected. All the countries where financing of health is predominantly tax funded with public provisioning recommend a healthcare payer (government) perspective. However, countries where a healthcare payer includes a mix of federal government, communities, hospital authorities and patient communities recommend a complementary analysis with a wider societal perspective. While four guidelines prefer a simple cost calculator for costing, the rest rely on a decision modelling approach. None of the guidelines recommend discounting except the Polish guidelines, which recommend discounting at 5%. Only two countries, Belgium and Poland, mention that indirect costs, if significant, should be included in addition to direct costs.

Conclusions

The comparative cross-country analysis shows that a standard set of recommendations cannot be directly useful for all as there are contextual differences. Thus, budget impact analysis guidelines must be carefully contextualised in the policy environment of a country so as to reflect the dynamics of health systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

ispor guidelines literature review

Similar content being viewed by others

ispor guidelines literature review

National Methodological Guidelines to Conduct Budget Impact Analysis for Health Technology Assessment in India

The french national authority for health (has) guidelines for conducting budget impact analyses (bia).

ispor guidelines literature review

A comprehensive review of official discount rates in guidelines of health economic evaluations over time: the trends and roots

Mauskopf JA, Sullivan SD, Annemans L, Caro J, Mullins CD, Nuijten M, et al. Principles of good practice for budget impact analysis: report of the ISPOR Task Force on good research practices: budget impact analysis. Value Health. 2007;10(4):336–47.

Article   Google Scholar  

Sullivan SD, Mauskopf JA, Augustovski F, Caro JJ, Lee KM, Minchin M, et al. Budget impact analysis: principles of good practice: report of the ISPOR 2012 Budget Impact Analysis Good Practice II Task Force. Value Health. 2014;17:4–14.

Bilinski A, Neumann P, Cohen J, Thorat T, McDaniel K, Salomon JA. When cost-effective interventions are unaffordable: integrating cost-effectiveness and budget impact in priority setting for global health programs. PLoS Med. 2017;14(10):e1002397.

Ghabri S, Mauskopf J. The use of budget impact analysis in the economic evaluation of new medicines in Australia, England, France and the United States: relationship to cost-effectiveness analysis and methodological challenges. Eur J Health Econ. 2017;19(2):173–4.

Mauskopf J. Prevalence-based economic evaluation. Value Health. 1998;1(4):251–9.

Article   CAS   Google Scholar  

Trueman P, Drummond M, Hutton J. Developing guidance for budget impact analysis. Pharmacoeconomics. 2001;19(6):609–21.

Orlewska E, Mierzejewski P. Proposal of Polish guidelines for conducting financial analysis and their comparison to existing guidance on budget impact in other countries. Value Health. 2004;7(1):1–10.

The National Prescription Drug Utilization Information System. Budget impact analysis guidelines: guidelines for conducting pharmaceutical budget impact analyses for submission to public drug plans in Canada. Canada: Patented Medicine Prices Review Board; 2020.

Google Scholar  

Cleemput I, Neyt M, Van de Sande S, Thiry N. Belgian guidelines for economic evaluations and budget impact analyses: second edition. Health technology assessment (HTA). Brussels: Belgian Health Care Knowledge Centre (KCE). 2012. KCE Report 183C. D/2012/10.273/54.

Leelahavarong P. Budget impact analysis. J Med Assoc Thai 2014;97:S65–71.

PubMed   Google Scholar  

Neyt M, Cleemput I, Van de Sande S, Thiry N. Belgian guidelines for budget impact analyses. Acta Clin Belg. 2014;70:174–80.

Guideline for economic evaluations in healthcare. The Netherlands: The National Health Care Institute; 2016.

Pharmaceutical Benefits Advisory Committee (PBAC). Guidelines for preparing a submission to the PBAC. Version 4. Draft for public consultation. 2016. http://www.pbs.gov.au/reviews/pbacguidelines-review-files/draft-revised-pbac-guidelines-version4.0part-a-and-b.docx . Accessed 24 Feb 2016.

Health Information and Quality Authority. Guidelines for the budget impact analysis of health technologies in Ireland; 2018.

Ghabri S, Autin E, Poullié AI, Josselin JM. The French National Authority for Health (HAS) guidelines for conducting budget impact analyses (BIA). Pharmacoeconomics. 2018;36(4):407–17.

Ferreira-Da-Silva AL, Ribeiro RA, Santos VC, Elias FT, d’Oliveira AL, Polanczyk CA. Proposal of Brazilian guidelines for conducting budget impact analysis for health technologies. Cad Saude Publica. 2012;28(7):1223–38.

National Institute for Health and Care Excellence (NICE). Guide to the methods of economic appraisal 2013. 2013. http://www.nice.org.uk/article/pmg9/resources/non-guidance-guide-to-themethods-of-technology-appraisal-2013-pdf . Accessed 24 Feb 2016.

Garattini L, van de Vooren K. Budget impact analysis in economic evaluation: a proposal for a clearer definition. Eur J Health Econ. 2011;12(6):499–502.

Nuijten MJ, Mittendorf T, Persson U. Practical issues in handling data input and uncertainty in a budget impact analysis. Eur J Health Econ. 2011;12(3):231–41.

Foroutan N, Tarride JE, Xie F, Levine M. A methodological review of national and transnational pharmaceutical budget impact analysis guidelines for new drug submissions. Clinicoecon Outcomes Res. 2018;10:821.

Higgins JP, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al., editors. Cochrane handbook for systematic reviews of interventions. New York: Wiley; 2019.

Tacconelli E. Systematic reviews: CRD’s guidance for undertaking reviews in health care. Lancet Infect Dis. 2010;10(4):226.

National Institute for Health and Care Excellence (NICE). Assessing resource impact process manual: technology appraisals and highly specialised technologies. 2017. Available at https://www.nice.org.uk/Media/Default/About/what-we-do/Into-practice/assessing-resource-impact-process-manual-ta-hst.pdf .

Chugh Y, Dhiman RK, Premkumar M, Prinja S, Singh Grover G, Bahuguna P. Real-world cost-effectiveness of pan-genotypic sofosbuvir-velpatasvir combination versus genotype dependent directly acting anti-viral drugs for treatment of hepatitis C patients in the universal coverage scheme of Punjab state in India. PLoS ONE. 2019;14(8):e0221769.

Nemati E, Nosratnejad S, Doshmangir L, Gavgani VZ. The out of pocket payments in low and middle-income countries and the affecting factors: a systematic review and meta-analysis. Bali Med J. 2019;8(3):733.

Brandt J, Shearer B, Morgan SG. Prescription drug coverage in Canada: a review of the economic, policy and political considerations for universal pharmacare. J Pharm Policy Pract. 2018;11(1):1–3.

Van de Vooren K, Duranti S, Curto A, Garattini L. A critical systematic review of budget impact analyses on drugs in the EU countries. Appl Health Econ Health Policy. 2014;12(1):33–40.

Mauskopf J, Earnshaw S. A methodological review of US budget-impact models for new drugs. Pharmacoeconomics. 2016;34(11):1111–31.

Ghabri S, Mauskopf J. The use of budget impact analysis in the economic evaluation of new medicines in Australia, England, France and the United States: relationship to cost-effectiveness analysis and methodological challenges. Eur J Health Econ. 2018;19(2):173–5.

Snider JT, Sussell J, Tebeka MG, Gonzalez A, Cohen JT, Neumann P. Challenges with forecasting budget impact: a case study of six ICER reports. Value Health. 2019;22(3):332–9.

Download references

Author information

Authors and affiliations.

Department of Community Medicine and School of Public Health, Post Graduate Institute of Medical Education and Research, Sector-12, Chandigarh, 160012, India

Yashika Chugh & Shankar Prinja

International Decision Support Initiative (iDSI), Imperial College, London, United Kingdom

Maria De Francesco

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Shankar Prinja .

Ethics declarations

No funding was received for the preparation of this article.

Conflict of Interest

Yashika Chugh, Maria De Francesco, and Shankar Prinja have no conflicts of interest that are directly relevant to the content of this article. Maria De Francesco received a consulting fee by International Decision Support Initiative (iDSI), Imperial College, London.

Ethics Approval

thics approval for the study was obtained from the Institutional Ethics Committee of the Post Graduate Institute of Medical Education and Research Chandigarh.

Consent to Participate

Not applicable.

Consent for Publication

Availability of data and material.

All the data required to replicate the analysis are either mentioned in the text or given as ESM.

Code Availability

Author contributions.

Conception or design of the work: SP, YC; data analysis: MDF, YC; interpretation of data: YC, MF, SP; writing the first draft: YC, MDF, SP; revising critically for important intellectual content: SP; approved the version to be published: all authors; and agree to be accountable for all aspects of the work: all authors.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 20 kb)

Supplementary file2 (xlsx 317 kb), rights and permissions.

Reprints and permissions

About this article

Chugh, Y., De Francesco, M. & Prinja, S. Systematic Literature Review of Guidelines on Budget Impact Analysis for Health Technology Assessment. Appl Health Econ Health Policy 19 , 825–838 (2021). https://doi.org/10.1007/s40258-021-00652-6

Download citation

Accepted : 17 April 2021

Published : 06 May 2021

Issue Date : November 2021

DOI : https://doi.org/10.1007/s40258-021-00652-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

SYSTEMATIC REVIEW article

Budget impact analysis of diabetes drugs: a systematic literature review.

\nZejun Luo&#x;

  • 1 State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Taipa, Macao SAR, China
  • 2 Department of Public Health and Medicinal Administration, Faculty of Health Sciences, University of Macau, Taipa, Macao SAR, China
  • 3 School of Public Health and Management, Guangzhou University of Chinese Medicine, Guangzhou, China

Background: Budget impact analysis (BIA) is an economic assessment that estimates the financial consequences of adopting a new intervention. BIA is used to make informed reimbursement decisions, as a supplement to cost-effectiveness analyses (CEAs).

Objectives: We systematically reviewed BIA studies associated with anti-diabetic drugs and assessed the extent to which international BIA guidelines were followed in these studies.

Methods: We conducted a literature search on PubMed, Web of Science, Econlit, Medline, China National Knowledge Infrastructure (CNKI), Wanfang Data knowledge Service platform from database inception to June 30, 2021. ISPOR good practice guidelines were used as a methodological standard for assessing BIAs. We extracted and compared the study characteristics outlined by the ISPOR BIA Task Force to evaluate the guideline compliance of the included BIA.

Results: A total of eighteen studies on the BIA for anti-diabetic drugs were identified. More than half studies were from developed countries. Seventeen studies were based on model and one study was based on real-world data. Overall, analysis considered a payer perspective, reported potential budget impacts over 1–5 years. Assumptions were mainly made about target population size, market share uptake of new interventions, and scope of cost. The data used for analysis varied among studies and was rarely justified. Model validation and sensitivity analysis were lacking in the current BIA studies. Rebate analysis was conducted in a few studies to explore the price discount that was required for new interventions to demonstrate cost equivalence to comparators.

Conclusion: Existing studies evaluating budget impact for anti-diabetic drugs vary greatly in methodology, some of which showed low compliance to good practice guidelines. In order for the BIA to be useful for assisting in health plan decision-making, it is important for future studies to optimize compliance to national or ISPOR good practice guidelines on BIA. Model validation and sensitivity analysis should also be improved in future BIA studies. Continued improvement of BIA using real-world data is necessary to ensure high-quality analyses and to provide reliable results.

Introduction

Diabetes is one of the fastest-growing global health emergencies of the twenty-first century and has reached alarming levels, which is associated with significant clinical and economic burdens on society and people with diabetes ( 1 ). According to the latest report in the International Diabetes Federation Diabetes Atlas, it was estimated that 463 million people have diabetes in 2019 and this number was projected to reach 578 million by 2030. The total diabetes-related health expenditure was estimated to USD 760 million in 2019 and was projected to increase to USD 825 billion by 2030 ( 2 ). The total number of patients with diabetes in mainland China was estimated to be 129.8 million (70.4 million men and 59.4 million women), ranking first in the world, accounting for more than a quarter of the total number of adults with diabetes in the world ( 3 ). The high prevalence of diabetes and its risk of complications bring substantial economic burden to patients and their families, and to the health systems and society ( 4 ).

Diet and exercise are first line treatments along with metformin to achieve the goal of improving glycemic control and preventing both microvascular and macrovascular complications ( 5 ). In order to improve glycemic control in adults with diabetes and reduce the economic burden of diabetes and its complications, new hypoglycemic drugs were constantly developed and applied, including insulin (such as insulin degludec), glucagon-like peptide-1receptor agonist (GLP-1 RA), new oral hypoglycemic agents such as sodium-glucose co-transporter 2 inhibitors (SGLT-2i) and dipeptidyl peptidase-4 inhibitors (DPP-4i).

Budget impact analysis (BIA) addresses the expected changes in the expenditure of a healthcare system after the adoption of a new intervention. It estimates the financial consequences of adoption and diffusion of a new healthcare intervention within a specific healthcare setting given budget constraints. The structure of BIA can be adjusted according to different needs for different countries as well as for time horizons, perspective and underlying diseases ( 6 ). Budget impact analyses are an essential part of a comprehensive economic assessment of a health care intervention and are increasingly required by reimbursement authorities as part of a listing or reimbursement submission ( 7 ). The ISPOR Task Force developed good practice guidelines to improve high-quality BIAs ( 7 ). At the same time, many countries and regions presented specific guidelines ( 7 – 9 ).

As far as we know, there has been no review examined BIA studies in the field of diabetes. Since the high prevalence of diabetes and high treatment costs have a significant impact on drug availability and the sustainability of the reimbursement fund, it is important to study the financial budget for diabetes drugs. Therefore, focusing on the BIA of antidiabetic drugs, this study aimed to review the findings of the current BIA studies and assess the extent to which international BIA guidelines were followed in these studies.

Research Design

This systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement.

Search Strategy

Based on published guidelines for BIA and other published methodological studies, we conducted a literature search in four English databases (including PubMed, Econlit, Medline, and Web of Science) to identify studies on BIA of antidiabetic drugs published in English or Chinese from 1980 to June 30, 2021. The key concepts used for the search were “budget impact analysis” AND “diabetes mellitus” (see Appendix Tables 1–5 ). The following search strategy was used: {(Budget impact * OR budgetary impact * OR budget impact analy * OR budgetary impact analy * OR budget impact stud * OR budgetary impact stud * ) OR [(financial impact * OR economic impact * OR economic analy * ) AND budget * ]} AND (diabetes OR diabetes mellitus OR DM OR diabetic). Targeted keyword search was also conducted in two Chinese databases (China National Knowledge Infrastructure (CNKI) and Wanfang Data knowledge Service platform) to identify studies which were published between 1994 and June 30, 2021 and reported estimates of the budget impact of the introduction of a new drug to the treatment.

Eligibility Criteria

We included studies reporting results of an original BIA pertaining to antidiabetic drugs. Comments, letters, editorials, and meeting abstracts were excluded. We also excluded studies that were not related to diabetes or anti-diabetic drugs, or studies that conducted only cost-effectiveness analyses, reviews, and BIAs of other non-drug interventions for diabetic patients.

Literature Selection

We conducted the duplicate removal and filtering of studies first. Then two reviewers screened the title and abstracts followed by a second round of examining abstracts and full texts independently. A third author resolved any disagreement. This selection process was documented in a PRISMA flowchart (see Figure 1 ).

www.frontiersin.org

Figure 1 . PRISMA flowchart of literature search and selection of publications.

Data Extraction

Based on the ISPOR Task Force guidelines, we developed evidence tables which presented a summary of how each study addressed the key items of the study, including population size and characteristics, budget holder's perspective, budget time horizon, intervention and comparators, market share, model structure, clinical and cost data, cost calculation, and sensitivity analysis. We then systematically extracted data and summarized the findings from all included studies in the evidence tables.

Guideline Compliance

Assessment of guideline compliance of the included studies was conducted to check the extent to which the ISPOR Task Force guideline of BIA were followed in these studies ( 7 ). The assessment was conducted independently by two authors. Any divergence was resolved through discussion and subject to confirmation by another author.

Characteristics of Literature Included

Figure 1 summarized the search strategy and the results. Four hundred and eighty three articles were initially retrieved from the search using the keywords about the BIA of antidiabetic drugs. After removal of duplication ( n = 139), and by screening of title and abstract ( n = 298) and full text ( n = 28), 18 BIA studies ( 10 – 27 ) were finally included.

Table 1 summarized the general information of the included BIA studies. More than half of the studies ( n = 11) were from Europe and the U.S., of which five studies were conducted in US ( 20 , 24 – 27 ), two in Italy ( 14 , 22 ), one in Netherlands ( 19 ), one in Spain ( 18 ), one in England ( 13 )and one in Bosnia and Herzegovina ( 15 ). Apart from these, three studies were conducted in China ( 10 – 12 ), two in Brazil ( 21 , 23 ), one in Egypt ( 17 ) and one in Thailand ( 16 ). Among the 18 studies, most of them ( n = 11) were conducted from a payer's perspective and a few studies ( n = 5) from the perspective of health care system, while one study ( 17 ) was conducted from both of the payer and social perspectives. Besides, the research perspective was not reported in one article ( 13 ). All the BIA studies were conducted based on models with one exception ( 13 ) that was conducted based on retrospective real-world data. The majority of studies ( n = 15) focused on BIA only, while the other studies ( n = 3) combined the BIA with cost-effectiveness analysis.

www.frontiersin.org

Table 1 . General information of the included BIAs.

About half of the BIA studies ( n = 8) evaluated the budget impact of insulin, including basal insulin ( n = 5) ( 13 , 20 – 22 , 26 ), pre-mix insulin ( n = 1) ( 16 ), bolus insulin ( n = 1) ( 24 ). One study focused on the administration route of insulin (continuous subcutaneous insulin infusion vs. multiple daily insulin injections) instead of the specific type of insulin ( 18 ). Among the five basal insulin BIA studies, one of them was about insulin glargine biosimilar ( 13 ). The rest of BIA studies targeted GLP-1 RA ( n = 4, including once-weekly semaglutide, oral semaglutide, and benaglutide) ( 12 , 14 , 25 , 27 ), DPP-4i ( n = 3, including vildagliptin, saxagliptin, and linagliptin) ( 10 , 15 , 23 ), SGLT-2i ( n = 2, both were dapagliflozin) ( 11 , 17 ) and metformin ( n = 1) ( 19 ).

The eligible population were mainly chosen according to the indication of intervention and the coverage of the payer's plan. The majority of studies ( n = 13) ( 10 – 12 , 14 – 17 , 19 , 22 – 25 , 27 ) restricted the target population on type 2 diabetes patients while some targeted type 1 diabetes patients ( n = 2) ( 18 , 21 ), or type 1 and type 2 diabetes patients ( n = 2) ( 20 , 26 ), and one study did not specify the target population ( 13 ). Six studies estimated the eligible population size based on a hypothetical health plan ( 20 , 23 – 27 ), of which the assumed size varied widely ranging from 1 million to 35 million. All but one ( 23 ) of these studies reported the specific number of target population. Ten studies calculated the target populations based on the total population in the country ( 10 – 12 , 14 – 17 , 19 , 21 , 22 ). The rest two studies did not report the size of target population ( 13 , 18 ). Among the 15 studies that reported the specific size of target population, more than half of them ( n = 13) calculated the size based on local epidemiological data, while the other two studies calculated the size based on real world evidence ( 10 , 22 ). It is worth noting that one study applied the patient-years instead of patient number to measure the size of target population ( 10 ). Two other characteristics were checked: nine studies ( 16 , 18 , 20 – 24 , 26 , 27 ) reported conflict of interests (50%), 12 studies ( 14 , 16 – 20 , 22 – 27 ) reported pharmaceutical company funding (66.7%), and three studies ( 10 – 12 ) contained no details of conflicts of interests or funding sources (16.7%).

Methodology and Budget Results of BIAs

Table 2 summarized the methodology and budget results of the included BIA studies. We refined the key study characteristics such as model structure, budget time horizon, discounted rate, treatment strategy, market share of new intervention, cost, and sensitivity analysis methods ( 13 , 17 ). Among the 17 studies that were based on models, 15 studies used the cost calculation model, and the remaining two studies used Markov model ( 24 ), and International T2DM Budget Impact Model ( 16 ), respectively.

www.frontiersin.org

Table 2 . Methodology and budget results of BIAs.

The budget time horizon was concentrated in 1–5 years, which was mainly in accord with the guidelines and the requirements of the budget holders. The most commonly used of time horizon was 3 years ( n = 6) ( 14 – 17 , 19 , 23 ) and 5 years ( n = 6) ( 10 – 12 , 21 , 25 , 27 ), followed by 1 year ( n = 4) ( 20 , 22 , 24 , 26 ) and 4 years ( n = 2) ( 13 , 18 ). Notably, two studies also conducted within time-horizons of 24 weeks and 32 weeks based on the study duration of the clinical trials ( 22 , 24 ). The discounting rate was not always clearly reported in the included BIA studies. Four studies reported the discounting rate ranging from 3 to 5% ( 16 , 23 – 25 ), eight studies ( 11 , 15 , 17 – 21 , 27 ) did not consider the discounting rate in compliance with the ISPOR Task Force guidelines ( 7 ). Meanwhile, six studies ( 10 , 12 – 14 , 22 , 26 ) did not mentioned the discounting rate in the articles.

The treatment strategies were clearly described in all the included studies. Most studies ( n = 10) compared research drug used in two treatment strategies under different scenarios ( 13 , 16 , 18 – 22 , 24 , 26 , 27 ). Among these studies, one compared the generic drug insulin glargine biosimilar Abasaglar® with the reference listed drug Lantus® ( 13 ), one compared different doses of the same drug ( 18 ), and one not only compared different drugs but also the dosage regimen ( 24 ). Besides, seven studies examined the impact of adding a new drug to the current treatment regimen ( 10 – 12 , 15 , 17 , 23 , 25 ), and the remaining study compared the current use trend and the increased use trend of the same drug ( 14 ).

In all included studies, the new intervention was assumed to impact the market by substitution of the current treatments. In other words, it was assumed that the new intervention would replace one or more of the current interventions recommended in the clinical practice of diabetes treatment. Most studies ( n = 17) reported the hypothetical market shares of the new interventions within the time horizon of the study. Among the 17 studies, 11 assumed that the market share of the new intervention increased gradually ( 10 – 12 , 14 – 17 , 21 – 23 , 27 ), five assumed that the new intervention replaced 100% of the market share of the current intervention ( 18 – 20 , 24 , 26 ), and one study ( 22 ) assumed different market shares of the new intervention being higher or lower than its current market share.

The scope of costs calculated in the included studies was summarized as treatment-related costs and condition-related costs according to the ISPOR Task Force guidelines. Treatment-related costs mainly included drug acquisition costs and the associated costs such as administration, diagnostic testing and monitoring. Condition-related costs included adverse event costs and complication costs. Seven studies only calculated treatment-related costs only ( 10 , 13 – 15 , 21 – 23 ). Eleven studies took into account both treatment-related costs and condition-related costs ( 11 , 12 , 16 – 20 , 24 – 27 ), of which nine studies considered the cost of hypoglycemia which was further classified into minor and severe hypoglycemia ( 11 , 12 , 16 , 18 , 20 , 24 – 27 ), six studies considered the cost of diabetes-related complications ( 11 , 16 , 17 , 19 , 25 , 27 ) (mainly including myocardial infarction (MI), stroke, heart failure, heart disease) and only one took the cost of adverse event (including dizziness, vomiting, fatigue and loss of appetite) into account ( 12 ).

Of the 18 included studies, 11 studies were subjected to sensitivity analysis ( 10 , 11 , 16 – 19 , 21 , 23 , 24 , 26 , 27 ). Ten of them conducted one-way sensitivity analysis ( 10 , 11 , 16 – 19 , 23 , 24 , 26 , 27 ), while one study conducted one-way and multivariate sensitivity analysis ( 21 ). Among the studies that performed the one-way sensitivity analysis, parameters such as the cost of severe hypoglycemia, drug price, treatment adherence, prevalence and market share were commonly included.

Regarding the budget impact results, the budget amounts such as the total annual cost ( n = 16) ( 10 – 22 , 24 , 26 , 27 ) and the cumulative cost ( n = 6) ( 13 , 17 , 19 , 21 , 23 , 25 ) were presented in all the included studies. Some studies even presented the cost per patient per year ( n = 8) ( 10 , 14 , 16 , 18 , 20 , 24 – 26 ) and the cost per member per month in a hypothetical health plan ( n = 4) ( 20 , 24 , 25 , 27 ). Twelve studies ( 10 , 12 , 13 , 15 , 17 – 19 , 22 – 26 ) reported that increasing use of new drug or introducing a new drug into the reimbursement list would reduce the financial budget. Six studies ( 11 , 14 , 16 , 20 , 21 , 27 ) concluded that the adoption or increasing use of new intervention for diabetes treatment would increase the budget. Deerochanawong et al. ( 16 ) reported that the adoption of Insulin Aspart 30 instead of Biphasic Human Insulin 30 for people with T2DM in Thailand resulted in additional acquisition cost which was partially offset by reducing the cost of hypoglycemia. Two studies ( 20 , 27 ) conducted the rebate scenario analysis to estimate the rebate rate or discount rate that was required for a new intervention to generate equal budget impact of the old intervention. Wehler et al. ( 27 ) reported that a 71.6% cost discount would be required for oral semaglutide 14 mg to generate 5-year per patient costs equal to sitagliptin 100 mg in US. Lane et al. ( 20 ) also reported that rebates of 7.3 and 10.6% at the full list price were required for insulin degludec to break-even with insulin glargine for patients with T1DM and T2DM, respectively.

Guideline Compliance of the Included Studies

Table 3 provides a summary of the compliance of the included studies to the ISPOR Task Force guidelines ( 7 ). The compliance of methodology used the included BIA studies with the ISPOR Task Force guidelines indicates that the included studies were deemed appropriate in terms of perspective, hypothetical scenario, comparator and data sources. Nine studies complied with at least 8 of the 9 items in the guidelines (≥88.9%) ( 10 , 11 , 16 , 17 , 19 , 20 , 24 , 26 , 27 ), and six studies ( 12 , 15 , 21 – 23 , 25 ) complied with seven items (77.8%). Only one study ( 13 ) complied with fewer than five items (44.4%). Overall, most studies did not report model validation, and only 22.2% of the studies conducted model validation.

www.frontiersin.org

Table 3 . Guideline compliance of the included studies.

In this study, we systematically reviewed 18 BIA studies for anti-diabetic drugs for diabetes mellitus, which were conducted in various countries and regions including Europe, the U.S., Asia and South America. The methodological characteristics according to the ISPOR guidelines for BIA ( 7 ) and research results were retrieved, summarized and assessed. The primary finding from this review is that despite published guidelines for budget-impact analysis, there were still significant differences in the included studies. In addition to the ISPOR guidelines, many countries and regions had issued budget impact analysis guidelines, such as France ( 9 ), Canada ( 8 ), Australia ( 28 ), and Ireland ( 29 ). Although the key elements of budget-impact model design were consistent in these guidelines, the BIA method has not been specified in a unified and standardized form. In our review, most of the health care systems for which the BIA was carried out didn't have their own guidelines except Brazil ( 30 ) and the UK ( 31 ), but all of these included BIAs were conducted following the ISPOR guidelines.

Major deviations in the study design from the recommendations in the ISPOR Task Force guidelines ( 7 ) were the static treated population size, the selected time horizons, the mix of comparators, the limited or the lack of reporting about the validation, and the limited or the lack of sensitivity analysis. These deviations appear to be independent of the interventions in question. The finding of variability in the inclusion of key design elements was also made in the previous reviews of BIAs. Vooren et al. ( 32 ) considered that BIA was not a well-established technology in the literature in 2013, and many published studies have not yet reached acceptable quality. Mauskopf ( 33 ) found that recommended practice was not followed in many BIAs. Another previous review conducted by Faleiros et al. ( 34 ) also considered that most BIA currently conducted were still far from an agreed standard of excellence. Although we agree on the importance of a mature framework for BIA, it was more important to implement the operations of BIA. Such as liraglutide, one study in Italy ( 14 ) showed that liraglutide's budget increased, while the BIA study in the U.S. ( 25 ) recognized that budget of liraglutide decreased. Similarly, one study in Egypt ( 17 ) showed that budget of dapagliflozin decreased, while the BIA study in China ( 11 ) recognized that budget of dapagliflozin increased. These may be related not only to the uniform BIA guidelines, but also to the drug reimbursement policy in different countries.

An important recommendation from the ISPOR Task Force guideline ( 7 ) is to adopt a model structure as simple as possible. In general, the most commonly used model structure is the cost-calculation model which can indirectly account for these changes in treatment over time through the evolution of treatment shares over time and the related clinical impacts ( 33 ). Of the 18 studies included in this review, 14 used a cost-calculation model. Whatever model is used, it should reflect the changes in the resources used and the costs associated with the new intervention as much as possible. We found that some included studies combined the use of the cost-calculation model with other models, such as the IQVIA CORE diabetes model in order to assess the differences in case of chronic diabetic complications and the related costs between the new drug and comparator drugs. Some elements such as time horizon or discounting rates can easily be determined based on the published guidelines or the requirements by decision-makers. ISPOR guidelines suggested that a time horizon of 1–5 years is generally of interest to budget-holders to inform budget planning ( 7 ). A time horizon of 3 years is required for National Reimbursement Drug List (NRDL) negotiation submission in China. Mauskofp ( 33 ) recommended projections beyond 1 year even if the budget holder was only interested in a 1-year time horizon because the cost and population parameters might change over time. S. Simoens et al. ( 35 ) considered that BIA might incorporate future market interactions, competitions, and pricing effects and the stakeholders were increasingly considering long time horizons when contemplating the budget impact of chronic disease therapies. The majority of the studies included in this review determined the time horizon of 3 years or more, while only four studies ( 20 , 22 , 24 , 26 ) used a time horizon of 1 year. The discounting rates were not necessarily considered because the time frame of the research was relatively short and the focus was mainly on the real cost in the budgetary year. This may be the reason that the discounting rates were not always clearly reported for the included BIA studies. There were only four studies ( 16 , 23 – 25 ) reported the discounting rate which ranged from 3 to 5%.

Some critical study characteristics were difficult to determine, including the estimation of the population size of the new intervention, the determination of comparators, the market share and the selection of model structure. These items are critical because they determine the size of target population for the new intervention, which are the important factors influencing the results of the BIA. Target population are usually estimated by two ways: based on epidemiological data or based on real world evidence. In our review, most studies (72.22%) were conducted based on local epidemiological data. In some countries, such as China, epidemiological data related to diseases and drug treatments are also required for the NRDL submission. In addition, the guidelines of BIA indicated that the target population is not a static group, but a dynamic one that varies with incidence, cure, prognosis, and death. But most studies in this review did not appropriately account for a dynamically changing population while only six studies ( 12 , 14 , 15 , 19 , 21 , 23 ) used the dynamic population assumption. In a previous review, Mauskopf ( 33 ) believed that if the increase in the size of the treated population were not taken into account, the resulting budgetary impact estimates were likely to be biased ( 33 ).

Another difficulty in conducting BIA is making the assumption of intervention/comparator market uptake. When a new drug is introduced, there are many factors influencing the change in the market share of the new drug and the comparators. It is difficult for budget holders to evaluate the accuracy of these assumptions based on the available evidence and data. So far, many published BIA studies did not take market uptake into account, but assumed the extreme cases instead where the comparator drug was 100% replaced by a new drug. There are three types of market change according to the ISPOR guidelines, including substitution, combination, expansion ( 36 ). Our review shows that the substitution was assumed in all the included studies. This is determined by the characteristics of the disease and its treatment.

Our review corroborates precedent findings ( 32 , 33 ) on limited model validation and sensitivity analysis of current BIAs. Model validation and sensitivity analysis should be carried out to ensure the robustness of the BIA research. A sensitivity analysis is essential to investigate the influence of assumptions on structural aspects or variable inputs of the BIAs ( 37 ). Moreover, sensitivity analysis allows a more comprehensive prediction of budget impact. But sensitivity analysis was performed in only 11 of the 18 studies included in our review. In addition, we found that most of the included studies did not perform the model validation. Only two studies ( 17 , 19 ) stated that the validity of the BIA model was discussed with clinical experts and relevant researchers and two study adopted a verified model ( 20 , 27 ). Many guidelines had already put forward requirements for the validity verification of the BIA model. Obviously, the compliance of the BIA with the ISPOR guidelines should be improved especially in such areas as sensitivity analysis and model validation.

The features of health care systems should be taken into account when conducting BIA. For example, in China, the reimbursement rate for outpatient and inpatient is different. The reimbursement rate of inpatients (65%) is significantly higher than that of outpatients (50%), which should be taken into account when submitting the BIA for NRDL negotiation. Furthermore, the characteristics of the medical system are also related to the selection of cost range. We found that the studies included in this review varied in terms of treatment-related costs. Some studies not only included the drug acquisition costs, but also analyzed the associated costs, such as needle costs, self-measured blood glucose (SMBG) costs. This might not be applicable in all healthcare settings. For instance, in China, the needle and SMBG related expenses are not reimbursed in most provinces and cities. Therefore, such costs should not be included when a BIA is conducting from the perspective of payer.

The conflict of interest and the funding sources cannot be ignored when considering the quality of BIAs. It should be noted that the vast majority were sponsored by the pharmaceutical companies, and as expected, the authors' conclusions of all sponsored studies were in favor of these drugs. In this way, BIAs have deviated from the intended goal of providing short-term economic consequences from a health system perspective and appeared to be tailored to show short-term savings. In 2016, Faleiros et al. ( 34 ) reported the weakness of many current BIA studies might be directly linked to the funding of pharmaceutical companies and conflict of interest. The Vooren et al. ( 32 ) review also expressed a concern that most of the published BIAs for European Union countries were sponsored by the drug manufacturer and that this might be a bias of the estimates. In our review of 18 BIAs, 12 were sponsored by industry or had industry authors.

Another contributor to improper quality might be that there have been counterarguments on the usefulness of BIAs due to the close proximity of the technique to CEA ( 38 ). However, BIA is not a substitute for cost-effectiveness analysis. They are indeed complementary to each other to support decision making. BIA addresses the financial stream of consequences related to the uptake and diffusion of technologies to assess their affordability. CEA evaluates the costs and outcomes of alternative technologies over a specified time horizon to estimate their economic efficiency. Both CEA and BIA as should be considered as important yet separate components of a comprehensive pharmacoeconomic evaluation of an intervention ( 36 ). In our review, the majority of included studies focused on BIA only, while only three studies ( 23 – 25 ) combined the BIA with cost-effectiveness analysis together. For the three studies that presented the results of both cost-effectiveness analysis and budget-impact analysis, we found that the information provided for budget impact model design, assumptions, input and results was insufficient to completely characterize the model. Some detailed information was provided for the cost-effectiveness analysis, some of which was relevant for the budget-impact analysis, while no detailed information was provided on estimated population size, characteristics, and change in the treatment mix. Mauskopf ( 33 ) considered that it was critical for the structure, assumptions, and input values for both models to be described in detail in the published study.

Recommendations for Future BIA for Anti-diabetic Drugs

Unlike rare disease, diabetes is a chronic progressive disease with a high prevalence and a large patient population, especially in patients with type 2 diabetes. Moreover, the incidence and mortality of type 2 diabetes are equivalent, which should be negligible. Therefore, it is acceptable to consider the total population as a static group instead of taking short-term population changes into account when calculating the target population in BIA study.

Chronic complications are the main cause of the heavy economic burden of diabetes. Research showed that 81% of the total medical expenses for T2DM were used for the treatment of diabetes-related complications ( 39 ). In addition, hypoglycemia is a common acute complication in the treatment of diabetes mellitus which also brings heavy economic burden. Thus, in the analysis of the budget impact of anti-diabetic drugs, costs should not only be restricted to drug-costs and the cost of hypoglycemic event, the cost of diabetic chronic complications such as cardiovascular disease should also be considered.

The key analytical process and input parameters should be validated when conducting a BIA according to the ISPOR guidelines and the economic evaluation guidelines. Validation could be done by consulting budget holders and corroborating model parameters. All inputs and formulas should be validated by a second budget impact expert. After the new intervention is introduced, it is recommended to continue data collection and compare it with the estimates obtained from the BIA. This will provide important reference for future decision-making and studies. Furthermore, it is suggested to conduct analysis from multiple perspectives including payer, healthcare system for the complete consideration of related costs.

Strengths and Limitations

This review is the first to assess the methodology and the guideline compliance of BIAs specifically for anti-diabetic drugs. We have summarized the key elements to ensure the quality of BIA research comprehensively. In addition, we concluded the budget results of the included studies to provide a comprehensive reference for BIA studies of antidiabetic drugs. A potential limitation of this review is that we only included studies published in English and Chinese due to the language capacity limitation of the research team. References were retrieved from four international databases and two Chinese databases. (Pub-med, Econlit, Medline, Web of science, CKNI, and Wan-fang Data Knowledge Service Platform of China). Moreover, it should be noted that BIAs directly submitted to reimbursement agencies were not studied.

BIA is an important tool to assess the affordability of adopting a new antidiabetic drug in a certain health setting amid the rise of many new diabetes drugs. Our systematic review finds that there seems to be great variability in the study design and some studies had low compliance to the ISPOR guidelines. In order to provide useful and high-quality evidence to assist the decision-making process, researcher should ensure their BIA studies were conducted in compliance with the recommended guidelines or the requirements according to the decision makers. Besides, continued improvement of the validity of the model and sensitivity analysis are necessary. Furthermore, the accuracy of parameters in the BIA needs to be more rigorously demonstrated to indicate the quality of the findings. Finally, more BIA studies for antidiabetic drugs based on real-world data should be conducted in future research.

Data Availability Statement

The original contributions presented in the study are included in the article/ Supplementary Material , further inquiries can be directed to the corresponding author.

Author Contributions

ZL, ZR, and HH conceptualized this study. ZL, ZR, and DY collected the materials. ZL, ZR, and HH conducted analysis. ZL, ZR, CU, YL, and HH drafted the manuscript. All authors reviewed and approved the final vision. All authors contributed to the article and approved the submitted version.

This research is partially supported by the grants from the University of Macau (MYRG2020-00230-ICMS).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

The authors would like to thank the comments of ICMS colleagues on early visions of this work.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2021.765999/full#supplementary-material

1. Jaacks LM, Siegel KR, Gujral UP, Narayan KM. Type 2 diabetes: A 21st century epidemic. Best Pract Res Clin Endocrinol Metab . (2016) 30:331–43. doi: 10.1016/j.beem.2016.05.003

PubMed Abstract | CrossRef Full Text | Google Scholar

2. International diabetes federation. IDF Diabetes Atlas, 9th edn . (2019). Avaialble online at: https://www.diabetesatlas.org/en/ . (accessed July 28, 2021).

Google Scholar

3. Li Y, Teng D, Shi X, Qin G, Qin Y, Quan H, et al. Prevalence of diabetes recorded in mainland China using 2018 diagnostic criteria from the American Diabetes Association: national cross sectional study. BMJ. (2020) 369:m997. doi: 10.1136/bmj.m997

4. World Health Organization Global report on diabetes. Geneva: World Health Organization. (2016).

5. Lorenzati B, Zucco C, Miglietta S, Lamberti F, Bruno G. Oral Hypoglycemic drugs: pathophysiological basis of their mechanism of action. Pharmaceuticals (Basel). (2010) 3:3005–20. doi: 10.3390/ph3093005

6. Sul J, Blumenthal GM, Jiang X, He K, Keegan P, Pazdur R. FDA approval summary: Pembrolizumab for the treatment of patients with metastatic non-small cell lung cancer whose tumors express programmed death-ligand 1. Oncologist. (2016) 21:643–50. doi: 10.1634/theoncologist.2015-0498

7. Sullivan SD, Mauskopf JA, Augustovski F, Jaime Caro J, Lee KM, Minchin M, et al. Budget impact analysis-principles of good practice: report of the ISPOR 2012 Budget Impact Analysis Good Practice II Task Force. Value Health. (2014) 17:5–14. doi: 10.1016/j.jval.2013.08.2291

8. Marshall DA, Douglas PR, Drummond MF, Torrance GW, Macleod S, Manti O, et al. Guidelines for conducting pharmaceutical budget impact analyses for submission to public drug plans in Canada. Pharmacoeconomics. (2008) 26:477–95. doi: 10.2165/00019053-200826060-00003

9. Ghabri S, Autin E, Poullié AI, Josselin J. The French National Authority for Health (HAS) guidelines for conducting budget impact analyses (BIA). PharmacoEconomics . (2017) 36:407–417. doi: 10.1007/s40273-017-0602-5

10. Guan H, Fan C, Wang Y. Budget impact analysis on vildagliptin in treating type 2 diabetes in China. China Health Insurance . (2016) 56–59+62. doi: 10.369/j.issn.1674-3830.2016.5.015

CrossRef Full Text | Google Scholar

11. Liu C, Xie S, Wu J. Budget impact analysis of dapagliflozin in treating type 2 diabetes mellitus in China. China J Pharmaceutical Econom . (2018) 13:13–18. doi: 10.12010/j.issn.1673-5846.2018.03.002

12. Xuan J, Yang F. Budget impact analysis of benaglutide injection in the treatment of type 2 diabetes mellitus in Chinese patients. China J Pharmaceutical Economics. (2019) 14:5–12. doi: 10.12010/j.issn.1673-5846.2019.04.001

13. Agirrezabal I, Sánchez-Iriso E, Mandar K, Cabasés J.M. Real-world budget impact of the adoption of insulin glargine biosimilars in primary care in England (2015-2018). Diabetes Care . (2020) 43:1767–73. doi: 10.2337/dc19-2395

14. Capri S, Barbieri M. Estimating the budget impact of innovative pharmacological treatments for patients with type 2 diabetes mellitus in Italy: the case of liraglutide (GLP-1). Epidemiol Biostat Public Health. (2015) 12:1–8. doi: 10.2427/11082

15. Catic T, Lekic L, Zah V, Tabakovic V. Budget impact of introducing linagliptin into bosnia and herzegovina health insurance drug reimbursement list in 2016-2018. Mater Sociomed . (2017) 29:176–81. doi: 10.5455/msm.2017.29.176-181

16. Deerochanawong C, Kosachunhanun N, Chotikanokrat P, Permsuwan U. Biphasic insulin aspart 30 treatment for people with type 2 diabetes: a budget impact analysis based in Thailand. Curr Med Res Opin. (2018) 34:369–75. doi: 10.1080/03007995.2017.1410122

17. Elsisi GH, Anwar MM, Khattab M, Elebrashy I, Wafa A, Elhadad H, et al. Budget impact analysis for dapagliflozin in type 2 diabetes in Egypt. J Med Econ. (2020) 23:908–14. doi: 10.1080/13696998.2020.1764571

18. Giménez M, Elías I, Álvarez M, Quirós C, Conget I. Budget impact of continuous subcutaneous insulin infusion therapy in patients with type 1 diabetes who experience severe recurrent hypoglycemic episodes in Spain. Endocrinol Diabetes Nutr. (2017) 64:377–83. doi: 10.1016/j.endien.2017.04.010

19. Gout-Zwart JJ, de Jong LA, Saptenno L, Postma M.J. Budget impact analysis of metformin sustained release for the treatment of type 2 diabetes in the Netherlands. Pharmacoecon Open. (2020) 4:321–30. doi: 10.1007/s41669-019-00179-6

20. Lane WS, Weatherall J, Gundgaard J, Pollock RF. Insulin degludec versus insulin glargine U100 for patients with type 1 or type 2 diabetes in the US: a budget impact analysis with rebate tables. J Med Econ. (2018) 21:144–51. doi: 10.1080/13696998.2017.1384383

21. Laranjeira FO, Silva E .N, Pereira MG. Budget impact of long-acting insulin analogues: the case in Brazil. PLoS ONE. (2016) 11:e0167039. doi: 10.1371/journal.pone.0167039

22. Napoli R, Fanelli F, Gazzi L, Larosa M, Bitonti R, Furneri G. Using 2nd generation basal insulins in type 2 diabetes: costs and savings in a comparative economic analysis in Italy, based on the BRIGHT study. Nutr Metab Cardiovasc Dis. (2020) 30:1937–44. doi: 10.1016/j.numecd.2020.07.005

23. Nita ME, Eliaschewitz FG, Ribeiro E, Asano E, Barbosa E, Takemoto M, et al. Cost-effectiveness and budget impact of saxagliptine as additional therapy to metformin for the treatment of diabetes mellitus type 2 in the Brazilian private health system. Rev Assoc Med Bras . (1992) 58:294–301. doi: 10.1016/S0104-4230(12)70198-7

24. Saunders R, Lian J, Karolicki B, Valentine W. The cost-effectiveness and budget impact of stepwise addition of bolus insulin in the treatment of type 2 diabetes: evaluation of the FullSTEP trial. J Med Econ. (2014) 17:827–36. doi: 10.3111/13696998.2014.959590

25. Shah D, Risebrough NA, Perdrizet J, Iyer NN, Gamble C, Dang-Tan. Cost-effectiveness T, and budget impact of liraglutide in type 2 diabetes patients with elevated cardiovascular risk: a US-managed care perspective. Clinicoecon Outcomes Res. (2018) 10:791–803. doi: 10.2147/CEOR.S180067

26. Weatherall J, Bloudek L, Buchs S. Budget impact of treating commercially insured type 1 and type 2 diabetes patients in the United States with insulin degludec compared to insulin glargine. Curr Med Res Opin. (2017) 33:231–8. doi: 10.1080/03007995.2016.1251893

27. Wehler E, Lautsch D, Kowal S, Davies G, Briggs A, Li Q, et al. Budget impact of oral semaglutide intensification versus sitagliptin among US patients with type 2 diabetes mellitus uncontrolled with metformin. Pharmacoeconomics. (2021) 39:317–30. doi: 10.1007/s40273-020-00967-7

28. Department of health Australian Government. Guidelines for preparing a submission to the pharmaceutical benefits advisory comittee. pharmaceutical benefits advisory comittee guidelines . (2016). Available online at: https://pbac.pbs.gov.au/ (accessed October 24, 2021).

29. Health Information and Quality Authority Guidelines for the Budget Impact Analysis of Health Technologies in Ireland (2008). Available online at: https://www.hiqa.ie/reports-and-publications/health-technology-assessment/guidelines-budget-impact-analysis-health . (accessed October 24, 2021).

30. Ferreira-Da-Silva AL, Ribeiro RA, Santos VC, Elias FT, d'Oliveira AL, Polanczyk CA. Guidelines for budget impact analysis of health technologies in Brazil. Cad Saude Publica . (2012) 28:1223–38. doi: 10.1590/s0102-311x2012000700002

31. National Institute for Health and Care Excellence. Assessing resource impact process . (2017). Available online at: https://www.nice.org.uk/about/what-we-do/into-practice/resource-impact-assessment . (accessed October 24, 2021).

32. van de Vooren K, Duranti S, Curto A, Garattini L. A critical systematic review of budget impact analyses on drugs in the EU countries. Appl Health Econ Health Policy. (2014) 12:33–40. doi: 10.1007/s40258-013-0064-7

33. Mauskopf J, Earnshaw S. A methodological review of US budget-impact models for new drugs. Pharmacoeconomics . (2016) 34:1111–31. doi: 10.1007/s40273-016-0426-8

34. Faleiros DR, Álvares J, Almeida AM, de Araújo VE, Andrade EI, Godman BB, et al. Budget impact analysis of medicines: updated systematic review and implications. Expert Rev Pharmacoecon Outcomes Res. (2016) 16:257–66. doi: 10.1586/14737167.2016.1159958

35. Simoens S, Jacobs I, Popovian R. Isakov L, Shane LG. Assessing the value of biosimilars: a review of the role of budget impact analysis. Pharmacoeconomics. (2017) 35:1047–62. doi: 10.1007/s40273-017-0529-x

36. Mauskopf JA, Sullivan SD, Annemans L, Caro J, Mullins CD, Nuijten M, et al. Principles of good practice for budget impact analysis: report of the ISPOR Task Force on good research practices–budget impact analysis. Value Health. (2007) 10:336–47. doi: 10.1111/j.1524-4733.2007.00187.x

37. Abdallah K, Huys I, Claes K, Simoens S. Methodological quality assessment of budget impact analyses for orphan drugs: a systematic review. Front Pharmacol. (2021) 12:630949. doi: 10.3389/fphar.2021.630949

38. Niezen MG, de Bont A, Busschbach JJ, Cohen JP, Stolk EA. Finding legitimacy for the role of budget impact in drug reimbursement decisions. Int J Technol Assess Health Care. (2009) 25:49–55. doi: 10.1017/S0266462309090072

39. Chen X, Tang L, Chen H. Assessing the impact of complications on the costs of type 2 diabetes in urban China. Chin J Diabetes . (2003) 11–14. doi: 10.3321/j.issn:1006-6187.2003.04.003

Keywords: budget impact analysis, diabetes, cost-effectiveness, BIA, CEA

Citation: Luo Z, Ruan Z, Yao D, Ung COL, Lai Y and Hu H (2021) Budget Impact Analysis of Diabetes Drugs: A Systematic Literature Review. Front. Public Health 9:765999. doi: 10.3389/fpubh.2021.765999

Received: 28 August 2021; Accepted: 25 October 2021; Published: 19 November 2021.

Reviewed by:

Copyright © 2021 Luo, Ruan, Yao, Ung, Lai and Hu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Hao Hu, haohu@um.edu.mo

† These authors share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

  • Download PDF
  • Share X Facebook Email LinkedIn
  • Permissions

ISPOR Reporting Guidelines for Comparative Effectiveness Research

  • 1 Houston Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey VA Medical Center, Houston, Texas
  • 2 Division of Surgical Oncology, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, Texas
  • 3 Department of Emergency Medicine, Denver Health Medical Center, University of Colorado School of Medicine, Denver
  • 4 Department of Epidemiology, Colorado School of Public Health, Aurora
  • 5 Statistical Editor, JAMA Surgery
  • 6 Department of Surgery, University of Michigan, Ann Arbor
  • 7 Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor
  • Editorial Effective Use of Reporting Guidelines to Improve the Quality of Surgical Research Benjamin S. Brooke, MD, PhD; Amir A. Ghaferi, MD, MSc; Melina R. Kibbe, MD JAMA Surgery
  • Guide to Statistics and Methods SQUIRE Reporting Guidelines for Quality Improvement Studies Rachel R. Kelz, MD, MSCE, MBA; Todd A. Schwartz, DrPH; Elliott R. Haut, MD, PhD JAMA Surgery
  • Guide to Statistics and Methods STROBE Reporting Guidelines for Observational Studies Amir A. Ghaferi, MD, MS; Todd A. Schwartz, DrPH; Timothy M. Pawlik, MD, MPH, PhD JAMA Surgery
  • Guide to Statistics and Methods CHEERS Reporting Guidelines for Economic Evaluations Oluwadamilola M. Fayanju, MD, MA, MPHS; Jason S. Haukoos, MD, MSc; Jennifer F. Tseng, MD, MPH JAMA Surgery
  • Guide to Statistics and Methods TRIPOD Reporting Guidelines for Diagnostic and Prognostic Studies Rachel E. Patzer, PhD, MPH; Amy H. Kaji, MD, PhD; Yuman Fong, MD JAMA Surgery
  • Guide to Statistics and Methods PRISMA Reporting Guidelines for Meta-analyses and Systematic Reviews Shipra Arya, MD, SM; Amy H. Kaji, MD, PhD; Marja A. Boermeester, MD, PhD JAMA Surgery
  • Guide to Statistics and Methods AAPOR Reporting Guidelines for Survey Studies Susan C. Pitt, MD, MPHS; Todd A. Schwartz, DrPH; Danny Chu, MD JAMA Surgery
  • Guide to Statistics and Methods MOOSE Reporting Guidelines for Meta-analyses of Observational Studies Benjamin S. Brooke, MD, PhD; Todd A. Schwartz, DrPH, MS; Timothy M. Pawlik, MD, MPH, PhD JAMA Surgery
  • Guide to Statistics and Methods TREND Reporting Guidelines for Nonrandomized/Quasi-Experimental Study Designs Alex B. Haynes, MD, MPH; Jason S. Haukoos, MD, MSc; Justin B. Dimick, MD, MPH JAMA Surgery
  • Guide to Statistics and Methods The CONSORT Framework Ryan P. Merkow, MD, MS; Amy H. Kaji, MD, PhD; Kamal M. F. Itani, MD JAMA Surgery
  • Guide to Statistics and Methods SRQR and COREQ Reporting Guidelines for Qualitative Studies Lesly A. Dossett, MD, MPH; Amy H. Kaji, MD, PhD; Amalia Cochran, MD JAMA Surgery

Randomized clinical trials (RCTs) are considered the main source of data driving evidence-based practice and the primary method for establishing the efficacy of an intervention. However, for a variety of reasons, the universe of research questions that can be definitively addressed by traditional RCTs is limited. 1 At the heart of comparative effectiveness research (CER) is a desire to generate real-world evidence demonstrating the effectiveness (rather than the efficacy) of an intervention using real-world data (obtained outside the often-ideal conditions of a traditional RCT). As value-based reimbursement models are better integrated into the US health care system and patient-centered care is increasingly emphasized, there will be a greater need for high-quality CER studies to inform the most clinically effective and cost-effective treatments and to help identify the right type of treatment for patients being treated in specific clinical contexts.

  • Editorial Effective Use of Reporting Guidelines to Improve the Quality of Surgical Research JAMA Surgery

Read More About

Massarweh NN , Haukoos JS , Ghaferi AA. ISPOR Reporting Guidelines for Comparative Effectiveness Research. JAMA Surg. 2021;156(7):673–674. doi:10.1001/jamasurg.2021.0534

Manage citations:

© 2024

Artificial Intelligence Resource Center

Surgery in JAMA : Read the Latest

Browse and subscribe to JAMA Network podcasts!

Others Also Liked

Select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing
  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts
  • Open access
  • Published: 25 November 2022

Quality appraisal for systematic literature reviews of health state utility values: a descriptive analysis

  • Muchandifunga Trust Muchadeyi 1 , 2 ,
  • Karla Hernandez-Villafuerte 1 , 3 &
  • Michael Schlander 1 , 2 , 4  

BMC Medical Research Methodology volume  22 , Article number:  303 ( 2022 ) Cite this article

2993 Accesses

2 Citations

Metrics details

Health state utility values (HSUVs) are an essential input parameter to cost-utility analysis (CUA). Systematic literature reviews (SLRs) provide summarized information for selecting utility values from an increasing number of primary studies eliciting HSUVs. Quality appraisal (QA) of such SLRs is an important process towards the credibility of HSUVs estimates; yet, authors often overlook this crucial process. A scientifically developed and widely accepted QA tool for this purpose is lacking and warranted.

To comprehensively describe the nature of QA in published SRLs of studies eliciting HSUVs and generate a list of commonly used items.

A comprehensive literature search was conducted in PubMed and Embase from 01.01.2015 to 15.05.2021. SLRs of empirical studies eliciting HSUVs that were published in English were included. We extracted descriptive data, which included QA tools checklists or good practice recommendations used or cited, items used, and the methods of incorporating QA results into study findings. Descriptive statistics (frequencies of use and occurrences of items, acceptance and counterfactual acceptance rates) were computed and a comprehensive list of QA items was generated.

A total of 73 SLRs were included, comprising 93 items and 35 QA tools and good recommendation practices. The prevalence of QA was 55% (40/73). Recommendations by NICE and ISPOR guidelines appeared in 42% (16/40) of the SLRs that appraised quality. The most commonly used QA items in SLRs were response rates (27/40), statistical analysis (22/40), sample size (21/40) and loss of follow up (21/40). Yet, the most commonly featured items in QA tools and GPRs were statistical analysis (23/35), confounding or baseline equivalency (20/35), and blinding (14/35). Only 5% of the SLRS used QA to inform the data analysis, with acceptance rates of 100% (in two studies) 67%, 53% and 33%. The mean counterfactual acceptance rate was 55% (median 53% and IQR 56%).

Conclusions

There is a considerably low prevalence of QA in the SLRs of HSUVs. Also, there is a wide variation in the QA dimensions and items included in both SLRs and extracted tools. This underscores the need for a scientifically developed QA tool for multi-variable primary studies of HSUVs.

Peer Review reports

Introduction

The concept of evidence-based medicine (EBM) originated in the mid-nineteenth century in response to the need for a conscientious, explicit, and judicious use of current, best evidence in making healthcare decisions [ 1 ]. Emerging from the notion of evidence-based medicine is the systematic and transparent process of Health Technology Assessment (HTA). HTA can be defined as a state-of-the-art method to gather, synthesize and report on the best available evidence on health technologies at different points in their lifecycle [ 2 ]. This evidence informs policymakers, insurance companies and national health systems during approval, pricing, and reimbursement decisions. As the world continues to grapple with increased healthcare costs (mainly due to an ageing population and the rapid influx of innovative and expensive treatments), health economic evaluations are increasingly becoming an integral part of the HTA process.

Comparative health economic assessments, mainly in the form of cost-effectiveness analysis and cost-utility analysis (CUA), are currently the mainstay tools for the applied health economic evaluation of new technologies and interventions [ 3 ]. Within the framework of CUA, the quality-adjusted life years (QALY) is a generic outcome measure widely used by economic researchers and HTA bodies across the globe [ 3 ]. Quality-adjusted life years are calculated by adjusting (multiplying) the length of life gained (e.g. the number of years lived in each health state) by a single weight representing a cardinal preference for that particular state or outcome. These cardinal preferences are often called health state utility values (HSUVs), utilities or preferences in the context of health economics.

Notably, HSUVs are regarded as one of the most critical and uncertain input parameters in CUA studies [ 4 ]. A considerable body of evidence on cost-effectiveness analyses suggests that CUA results are sensitive to the utility values used [ 3 , 5 , 6 ]. A small margin of error in the HSUVs used in CUA can be enough to alter the reimbursement and pricing decision and have far-reaching consequences on drug quality-adjusted life years, the incremental cost-effectiveness ratios, and may potentially impact an intervention’s accessibility [ 3 , 5 , 6 ]. Besides, HSUVs are inherently heterogeneous. Applying different population groups (patients, general population, caregivers or spouses, and in some instances, experts or physicians), context, assumptions (theoretical grounding), and elicitation methods may generate different utility values for the same health state [ 7 , 8 , 9 ]. Thus, selecting appropriate, relevant and valid HSUVs is germane to comparative health economic assessments [ 3 , 4 , 10 ].

The preferences reflected in the HSUVs can be directly elicited using direct methods such as the time trade-off (TTO), the standard gamble (SG) or the visual analogue scale (VAS) [ 11 ]. Alternatively, indirect methods using multi-attribute health status classification systems with preference scores such as the EuroQol- 5 Dimension (EQ-5D), Short-Form Six-Dimension (SF6D), Health Utilities Index (HUI) or mapping from non-preference-based measures onto generic preference-based health measures can also be employed [ 12 ]. However, methodological infeasibility, costs, and time constraints make empirical elicitation of HSUVs a problematic and sometimes an unachievable task. Consequently, researchers often resort to synthesising evidence on HSUVs through rapid or systematic literature reviews (SLRs) [ 12 ]. Correspondingly, the number of SLRs of studies eliciting HSUVs has been growing exponentially over the years, particularly in the last five years [ 13 ].

The cornerstone of all SLRs is the process of Quality Appraisal (QA) [ 14 , 15 ]. Regardless of the source of utility values, HSUVs should be “free from known sources of bias, and measured using a validated method appropriate to the condition and population of interest and the perspective of the decision-maker for whom the economic model is being developed [ 4 ]”. The term garbage in garbage out (GIGO), originates from the information technology world, and is often referred to in quality discussions. The use of biased, low-quality HSUVs estimates will undoubtedly result in wrong and misleading outcomes, regardless of how robust the other elements of the model are. To avoid using biased estimates, it is imperative that empirical work on HSUVs, the reporting of such work, and subsequent reviews of studies eliciting HSUVs are of the highest level of quality. A robust, scientifically developed and commonly accepted QA tool is one step towards achieving such a requirement.

Over the years, some research groups and HTA agencies have developed checklists, ad-hoc tools, and good practice recommendations (GPRs) describing or listing the essential elements to consider when assessing the quality of primary studies eliciting HSUVs. Prominent among these GPRs are the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Taskforce report [ 16 ], the National Institute for Health and Care Excellence (NICE) Technical Document 9 [ 17 ], and related peer-reviewed publications [ 4 , 10 , 12 , 18 ], hereafter referred to as” NICE/ISPOR tools". Despite this effort and the importance placed on HSUVs and their QA process, there is still no accepted gold standard, scientifically developed, and widely accepted QA tool for studies eliciting HSUVs.

Several challenges impede the critical appraisal of studies eliciting HSUVs. Common to all QA processes is the significant heterogeneity in using the term QA. This heterogeneity leads to a misunderstanding of and or disagreements on what should and should not constitute QA [ 19 , 20 ]. The term quality represents an amorphous and multidimensional concept that should include the quality of reporting, methodological (e.g. risk of bias [RoB]) and external validity (applicability) [ 15 , 21 , 22 ]. However, it is often incompletely and or inappropriately applied by restricting quality only to a subset of its components (mostly one dimension). For example, many SLR authors use the term QA to refer to the RoB assessment [ 15 , 23 , 24 , 25 ], while others refer to the reporting quality assessment [ 19 , 26 , 27 ]. Similarly, several terms to define QA have also been used interchangeably in the literature. These terms include: quality assessment, methodological quality, methodological review, critical appraisal, critical assessment, grading of evidence, data appropriateness, and credibility check [ 22 ]. Resultantly, the domains, components and or items considered to evaluate the studies' quality also vary considerably [ 22 ].

Another challenge in appraising the quality of studies contributing to SLRs is the lack of guidance for applying the QA results into the subsequent stages of a review, particularly summarizing and data synthesis; interpreting the findings, and drawing conclusions [ 14 , 28 ]. The trend over the years has been shifting away from scale-based QA to domain-based RoB assessments [ 29 , 30 ]. Moreover, there is no consensus regarding the quality threshold for the scale-based approach nor risk summary judgment for the domain-based approaches [ 28 ].

Specific to SLRs of studies eliciting HSUVs is the unique nature and characteristics of these studies, mainly study design. While randomised controlled trials (RCTs) are the gold standard for intervention studies of effect size [ 31 ], multiple study designs, including experimental (e.g. RCTs) and observational (e.g. cohort, case–control, cross-sectional) designs, can be used in primary studies on HSUVs [ 14 ]. On the one hand, RCTs may suffer from a lack of representation of the real-world setting, mainly due to strict inclusion and exclusion criteria (which is a form of selection bias). On the other hand, observational studies, by design, are inherently prone to several problems that may bias their results, for example, confounding or baseline population heterogeneity. While confounding is mainly controlled at the design stage through randomisation in RCTs, statistical and analytical methods are vital for controlling confounding in observational studies. More so, some QA items such as the randomisation process, blinding of investigators/assessors, description of the treatment protocol for both intervention and control groups and use of intention-to-treat analysis [ 22 ] tend to be more specific to RCTs of intervention studies and of less value to observational and or primary studies of HSUVs.

By design, all intervention studies of measure of effect size should ideally be comparative and define at least one intervention. The gold standard is to include a control or comparator group that is “equivalent” to the intervention group, with only the intervention under investigation varying. On the contrary, not all studies eliciting HSUVs are intervention and comparative studies. Oftentimes, HSUVs are elicited from the population of interest (or the whole population) without regard to an intervention. This distinction between primary studies of HSUVs and intervention studies presents another unique feature to primary studies of HSUVs. QA of empirical studies of HSUVs (except when there is an intervention in question) may not find QA items such as intervention measurement, adherence to prescribed intervention, randomisation, concealment of allocation, blinding of subjects, and outcomes relevant or feasible.

Furthermore, the various methodologies used to elicit utility values make it challenging to identify a QA tool that allows an adequate comparison between studies. Direct methods are frequently used alongside indirect methods [ 12 ]. Consequently, using a single QA tool is insufficient; however, it remains unclear if using multiple tools would remedy the above-mentioned challenges.

Few studies in the literature where QA tools were used reflected the previously described multi-factorial nature of the QA of studies eliciting HSUVs. More recently, Yepes-Nuñez et al. [ 13 ] summarised the methodological quality (examining RoB) of SLRs of HSUVs published in top-ranking journals. The review culminated in a list of 23 items (grouped in 7 domains) pertinent to the RoB assessment. Nevertheless, RoB is only one necessary quality dimension, by itself insufficient [ 15 ].

Ara et al. mentioned that a researcher needs a well-reported study to perform any meaningful assessment of other quality dimensions [ 10 , 18 ]. Correspondingly, the completeness and transparency of the reporting (i.e., reporting quality dimension) is also needed. Similar to RoB, a focus on reporting quality without attention to RoB is also necessary, but alone, insufficient. Notably, an article can be of good reporting quality—reporting all aspects of the methods, presenting their findings in a clear and easy-to-understand manner—and still be subject to considerable methodological flaws that can bias the reported estimates [ 3 , 32 ].

Since HSUVs as an outcome can be highly subjective and context-driven compared to the commonly assessed clinical outcomes in clinical effectiveness studies, limiting the QA of studies eliciting HSUVs to reporting and methodological quality dimensions is not enough (necessary but insufficient rule). The relevance and applicability (i.e., external validity) of the included studies also matter. Relevance and applicability questions are equally crucial to the decision-maker, including whose utility values and when and where the assessment was done.

Gathering evidence on the current practices of SLRs authors to appraise the quality of primary studies eliciting HSUVs is key to solving the above-mentioned challenges. It forms the precursor to the development, based on a systematic process, of a QA tool that assures a consistent and comparable evaluation of the evidence available. Therefore, the main objective of this study is to review, consolidate, and comprehensively describe the current (within the last five years) nature of QA (methodological, reporting and relevance) in SLRs of HSUVs. Given the challenges hampering QA of studies eliciting HSUVs, we hypothesise that many SLR authors are reluctant to perform QA; hence we expected a low prevalence. We also hypothesise that there is significant heterogeneity in how the QAs are currently done. We precisely aim at:

Evaluating the prevalence of QA in published systematic reviews of HSUVs.

Determining the nature of QA in SLRs of HSUVs.

Exploring the impact of QA on the SLR analysis, its results, and recommendations.

Identifying and listing all items commonly used for appraising the quality in the SLR of HSUVs and comparing these to items of existing checklist, tools and GPRs.

Identifying and listing all checklists, tools and GPRs commonly used for QA of studies eliciting HSUVs

Methodology

A rapid review (RR) of evidence was conducted to identify peer-reviewed and published SLRs of studies eliciting HSUVs from 01.01.2015 to 11.05.2021. Cochrane RRs guidelines were followed with minor adjustments throughout the RR process [ 33 ].

Definition of terms

Table 1 defines some key terms applicable to quality and quality appraisal. Notably, since not all published QA tools have been validated, in this study, we define a standardised tool as a tool that has been scientifically developed and published with or without validation.

Data sources and study eligibility

A search strategy adopted from Petrou et al. 2018 [ 12 ] that combines terms related to HSUVs, preference-based instruments and systematic literature reviews (SLRs) was run in the PubMed electronic database on 11.05.2021. The search strategy did not impose restrictions on the disease entity or health states, population, intervention, comparators and setting. All retrieved articles were exported to EndNote version X9 software (Clarivate Analytics, Boston, MA, USA), and duplicate cases were deleted. The remaining articles were exported to Microsoft Excel for a step-wise screening process. To ensure we did not miss any relevant articles, the PubMed search strategy was translated into Embase search terms and run on 05.09.2022. For example, we converted MeSH and other search terms to Emtree and replaced the PubMed-specific field codes with Embase-specific codes. All articles retrieved were exported to Microsoft Excel for a step-wise screening process. Search strings and hits for both databases are summarised in the Additional file 1 , Supplementary Material 1 and 2, Table A.1 and A.2.

One author (MTM) developed the inclusion and exclusion criteria based on study objectives and previous reviews. All identified SLRs that performed a descriptive synthesis and or meta-analysis of primary HSUVs studies (direct or indirect elicitation) and were published in English from January, 01, 2015, to April, 29, 2021 were included. A pilot exercise was done on 50 randomly chosen titles and abstracts. Refinement of the inclusion and exclusion was done after this initial round of screening. Two experienced/senior health economists (KHV and MS) reviewed the inclusion and exclusion criteria with minor adjustments. The final inclusion and exclusion criteria is summarised in the Additional file 1 , Supplementary Material 3, Table A.3.

Data screening

A step-wise screening process starting with titles, followed by abstracts and the full text, was done by one reviewer (MTM) using the pre-developed inclusion and exclusion criteria. Full-text SLRs that matched the stage-wise inclusions and exclusion criteria (See Additional file 1 , Supplementary Material 3, Fig A.1) were retained for further analysis. The reference lists of the selected SLRs were further examined to identify any relevant additional reviews, tools, and GPRs. MTM repeated the same steps as described above (i.e., title, abstract and full-text scan) based on the mentioned inclusion and exclusion criteria to identify additional articles from the reference list of the initially selected SLRs.

MTM and KHV discussed any uncertainty about including certain studies and mutually decided on the final list of included articles.

Data extraction

A two-stage data extraction process was done using two predefined Microsoft Excel spreadsheet data extraction matrices. MTM designed the first drafts of the data extraction matrices based on a similar review [ 29 , 30 ] and research objectives. KHV and MS reviewed both matrices with minor adjustments. First, all the relevant bibliography and descriptive information on the QA process done by the SRL authors were extracted (See Additional file 1 , Supplementary Material 4, Table A.4a). One of our aims was to determine the prevalence of QA in the included SLRs. Therefore, we did not appraise the quality of included SLRs. Since high-quality SLRs must incorporate all the recommended review stages, including the QA stage; we assumed that including only high-quality SLRs may potentially bias our prevalence point estimates.

Second, all QA tools, checklists and GPRs, identified or cited in the included SLRs were extracted. A backward tracking was undertaken to identify all the original publications of these QA tools, checklists and GPRs. Authors’ names and affiliations, year of first use or publication, domains, items or signalling questions contained in each extracted QA tool, checklist and GPR were then harvested using the second data extraction sheet (see Additional file 1 , Supplementary Material 4 Table A. 4b).

Data synthesis

Narrative and descriptive statistics (i.e., frequencies, percentages, counterfactual acceptance rate [CAR], listing and ranking of items used) were performed on the selected SLRs and the identified QA tools, checklists and GPRs. All graphical visualizations were plotted using the ggplot2 package in R.

Descriptive analysis of included SLRs, checklists, tools and GPRs practices extracted

The SLRs were first categorised into those that performed a QA of the contributing studies or not. For those SLRs that appraised the quality of studies, descriptive statistics were calculated based on six stratifications: 1) QA appraisal tool type (i.e., an ad-hoc tool or custom-made, standardised or adapted tool); 2) critical assessment tool format (i.e., scale, domain-based or checklist); 3) QA dimensions used (i.e., reporting quality, RoB and/or relevancy); 4) how the QA results were summarised (i.e., summary scores, threshold summary score or risk judgments); 5) type of data synthesis used (quantitative including meta-analysis or qualitative), and 6) how QA results were used to inform subsequent stages of the analysis (i.e., synthesis/results and/or the conclusion-drawing). The distribution of the number of QA items and existing checklists, tools and GRPs used to generate these items were also tabulated (see Additional file 1 , Supplementary Material 4, Table A.4a).

Similarly, QA tools, checklists, and GPRs extracted in the second step of the review were categorised according to 1) document type (i.e., technical document [recommendations], technical document [recommendations] with a QA tool added, a previous SLR, reviews, SLRs or standardised tool), 2) critical assessment tool format (i.e., domain-based, checklists or scale-based tools) and 3) QA dimensions included in the tool (i.e., any of RoB or methodological, reporting or relevancy dimension) and items as they are listed [original items] (see Additional file 1 , Supplementary Material 4, Table A.4b).

Quality appraisal – Impact of QA on the synthesis of results

To explore the impact of the QA on the eligibility of studies for data synthesis, we first analysed the acceptance rate for each SLR that used the QA assessment results to exclude articles. We defined the acceptance rate of a SLR as the proportion of primary studies eliciting HSUVs that meet a predetermined (by the SLR’s authors) quality threshold. The threshold can be presented as a particular score for scale-based or an overall quality rating (e.g., high quality) for domain-based QA.

Second, a counterfactual analysis was done on a subset of SLRs that appraised the quality of contributing studies but did not incorporate the QA's results in the data synthesis. A counterfactual acceptance rate (CAR) was defined as the proportion of studies that would have been included if the QA results had informed such a decision. Based on a predetermined QA threshold, we defined the counterfactual acceptance rate as follows:

In the SLR by Marušić et al. [ 14 ], the majority (52%, N  = 90) of included SLRs used a quality score as a threshold to inform which primary studies qualify for data synthesis. A quality threshold of 3 out of 5 (60%) was used for the Jadad [ 36 ] and Oxford [ 14 ] scales and 6 out of 9 (67%) for the Newcastle–Ottawa scale. Consequently, we used a quality threshold of 60% in the CAR calculations (see Eq.  1 ). Reporting checklists with Yes, No, and Unclear responses were converted into a scale-based (Yes = 1, No = 0 and Unclear = 0). The resulting scores were summed to calculate the overall score percentage.

Regarding domain-based tools, the ROBINS-I tool [ 37 ] gives guidelines to make summary judgments of the overall RoB as follows: 1) a study is judged "low" risk of bias if it scores "low" in all RoB domains; 2) a study is judged "moderate" if it scores "moderate" to "low" in any of the RoB domains; 3) a study is judged "serious risk of bias” if scores "serious or critical" in any domain. By so doing, the tools assume that any RoB domain could contribute equally to the overall RoB assessment. On the contrary, the Cochrane RoB tool [ 28 ] requires review authors to pre-specify (depending on outcomes of interest) which domains are most important in the review context. In order to apply the Cochrane RoB, it is necessary to first rank the domains according to their level of importance. The level of importance, thus the ranking, depends on both the research question and context. A context-based ranking approach would be highly recommendable. However, given that the relevant SLR articles refer to different contexts, it was not feasible to establish an informed and justified ranking of the domains for each article based on context. Therefore, while considering that the context-based approach is highly desirable, we chose the method applied in the ROBINS-I tool [ 37 ] to evaluate the CAR of SLRs that used domain-based ratings and did not provide a summary judgment.

Quality appraisal – items used and their relative importance

We separately extracted and listed all original QA items: 1) used in the SLRs and 2) found in the original publications of QA tools, checklists and GPRs cited, adapted or customised by the SLR authors of included reviews. Based on a similar approach used by Yepes et al. [ 13 ], we iteratively and visually inspected the two mentioned lists for items that used similar wording and or reflected the same construct. Where plausible and feasible, we retained the original names of the items as spelt out in QA tools, checklists and GPRs or by the SLR authors. A new name or description was assigned to those items that used similar wording and or reflected the same constructs. For example, we assigned the name ‘missing (incomplete) data’ for all original items phrased as ‘incomplete information’, ‘missing data’, and ‘the extent of incomplete data’. Similarly, items reflecting preference elicitation groups, preference valuation methods, scaling methods, and or choice versus feeling-based methods were named ‘technique used to value the health states (see Additional file 1 , Supplementary Material 5a; Table A.5 and Table A.6 for the assignment process). In this way, apparent discrepancies in wording, spellings and expressions in the items were matched. All duplicate items and redundancies were concurrently removed. A single comprehensive list of items used in SLRs or extracted QA tools, checklists and GPRs was produced (see Additional file 1 , Supplementary Material 5a; Table A.7).

Using the comprehensive list of the items with assigned names, we counted the frequency of occurrence of each item included in 1) the SLRs of studies eliciting HSUVs and 2) identified QA tools, checklists and GPRs. We regard the frequency of each item in SLRs as a reasonable proxy to the relative importance that SLR authors place on the items. Similarly, the frequency of occurrence in QA tools and GRPs can be regarded as a reasonable proxy for what items are valued more highly in the currently existing tools that are commonly used for QA of studies eliciting HSUVs.

Additionally, we narrowed the above analysis to two selected groups of items: 1) one composed of the 14 items corresponding to the recommendations by the ISPOR Taskforce report [ 16 ], NICE Technical Document 9 [ 17 ] and related peer-reviewed publications [ 4 , 10 , 12 , 18 ] (hereafter referred to as ‘ISPOR items’), and 2) an additional list of 14 items (hereafter as ‘Additional items’ (see Additional file 1 , Supplementary Material 5b and 5c). Additional items were informed mainly by literature [ 38 ], theoretical considerations [ 39 , 40 , 41 , 42 , 43 , 44 , 45 ] and the study team’s conceptual understanding of HSUV elicitation process. Specifically, Additional items represent those that we considered "relevant" (based on the literature and theoretical considerations and were not included in the ISPOR items). For example, statistical consideration and the handling of confounders do not appear in the ISPOR items, yet they are relevant to the QA of studies eliciting HSUVs. We considered the combination of both lists (28 combined items) to be a comprehensive but not exclusive list of items that can be deemed "relevant" to QA of contributing studies to the SLR of studies eliciting HSUVs. Correspondingly, the frequency of ISPOR items in SLRs can be considered a reasonable proxy measure of the extent to which SLR authors are following the currently existing GPRs, while the frequency of the Additional items as a proxy of the importance of other “relevant” items in the QA process. The frequency of the ISPOR items and the Additional items in the existing QA tools, checklists and GPRs can be considered a proxy measure of how well the currently used tools covered the “relevant” items for the QA of studies eliciting HSUVs (i.e., suitability of purpose).

All analyses on SLRs that appraised quality were further stratified by considering separately: 1) the 16 SLRs [ 9 , 26 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 ] that either adapted or used one or more of the 6 QA tools, checklists and GPRs were considered to be NICE, ISPOR and related publications report [ 4 , 10 , 12 , 16 , 17 , 18 ] (hereafter ‘ QA based on NICE/ISPOR tools’ ) and 2) the 24 SLRs that adapted, customised or used other QA tools, checklists and GPRs (hereafter “ QA based on other tools ”). Similarly, all analyses on QA tools, checklists and GPRs were further stratified by considering separately: 1) the 6 QA tools, checklists that are considered to be NICE, ISPOR and related publications [ 4 , 10 , 12 , 16 , 17 , 18 ] (hereafter ‘NICE/ISPOR tools’ ) and 2) 29 QA tools, checklists and GPRs (hereafter “ Other tools ”).

The initial electronic search retrieved 3,253 records (1,997 from PubMed and 1,701 from Embase). After the initial step-wise screening process, 70 articles were selected. Three additional articles were retrieved from the snowball method of selecting relevant articles identified from the chosen SLRs. Thus in total, 73 SLRs were analysed (see Fig.  1 ).

figure 1

PRISMA flow diagram summarising the study selection process. HRQOL, Health Related Quality of Life; PRO, Patient Reported Outcomes; CEA, cost-effectiveness analysis; CUA, cost-utility analysis; PRISMA, Preferred Reporting Items for Systematic Review and Meta-Analyses; SLR, Systematic Literature Review

Characteristics of included SRLs, checklists, tools and GRPs

The SLRs included in the analysis consist of utility values for health states covering a wide range of disease areas: cardiovascular diseases (10%); neurological diseases, including Alzheimer’s disease, mild cognitive impairment and dementia (10%); cancers of all types (21%); infectious diseases, including human immunodeficiency virus and tuberculosis (10%); musculoskeletal disorders, including rheumatoid arthritis, osteoporosis, chronic pain, osteoarthritis, ankylosing spondylitis, psoriatic arthritis, total hip replacement, and scleroderma (6%); metabolic disorders, including diabetes (3%); gastrointestinal disorders (4%); respiratory disorders (non-infectious) including asthma (4%) and non-specific conditions, including injuries and surgeries (20%). Special attention was also given to mental health and childhood utilities which accounted for 1% and 10% of the eligible SRLs (see Additional file 1 , Supplementary Material 6, Table A.8).

Table 2 shows the characteristics of the QA tools, checklists and GPRs used to evaluate the quality of studies eliciting HSUVs in the SLRs analysed. A total of 35 tools, checklists and GPRs were extracted directly from the SLRs analysed. Most of these (37%) were standardised tools that are scientifically developed for QA of either RCTs or observational studies. Technical documents, which merely seek guidance on appraising quality, accounted for another 37%. Notably, a few SLRs (8%) of studies eliciting HSUVs [ 60 , 61 , 62 , 63 ] based their QA appraisal methods on those used in previous SLRs [ 21 , 64 , 65 ] or reviews [ 66 , 67 ], in which the authors of those SLRs had used guidance from a previous SLR [ 68 ].

Regarding the critical assessment format (see Table 1 for the definition of terms), domain-based tools contributed 26% to the total number of tools, checklists and GPRs extracted. In comparison, checklist and scale-based tools accounted for 20% and 17%, respectively, representing 37%. (see Additional file 1 , Supplementary Material 6, Table A.9, for more details on the 35 QA tools and GPRs).

Prevalence and characteristics of QA in included SLRs

Table 3 shows the prevalence of QA and the current nature of QA in the included SLRs. The number of QA tools and GPRs used or cited ranged from 1 to 9 (equal mean and median of 2 and IQR of 1). Notably, the observed prevalence of QA is 55%. Around a third of the SLR authors (33%) used all three QA dimensions (reporting, RoB [methodological] and relevancy) to appraise the quality of studies eliciting HSUVs. Of the 40 SLRs that appraised quality, 16 (42%) were based on NICE/ISPOR tools [ 9 , 26 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 ].

Impact of the QA on study outcomes

The 40 studies that appraised quality included 1,653 primary studies eliciting HSUVs, with the number of included studies ranging from 4 to 272 (median = 28, mean = 41 and IQR = 33). Surprisingly, most (35/40) SLRs that appraised the quality of their included studies did not use the QA findings to synthesise final results and overall review conclusions. Of the remaining five articles, three [ 47 , 60 , 62 ] used the QA results to inform the inclusion of studies for meta-analysis (acceptance rate was 100% for Afshari et al. [ 60 ] and Jiang et al. [ 62 ], and 53% for Blom et al. [ 47 ]). These represent only 15% (3/20) of the studies that performed a quantitative synthesis (i.e., meta-analysis or meta-regression). In the fourth [ 50 ] and fifth study [ 69 ], the QA results were used as a basis of inclusion for the qualitative synthesis, with 33% and 67% of the eligible studies being included in the final analysis.

We estimated the counterfactual acceptance rate (CAR) for those SLRs that appraised the quality of contributing studies but did not incorporate the QA's results in the data synthesis. Six of the 40 SLRs [ 48 , 53 , 55 , 56 , 70 , 71 ] did not provide sufficient information to calculate the threshold or summarize the judgement of risk of bias. For the other 6 studies [ 47 , 50 , 60 , 62 , 69 , 72 ], the actual acceptance rate was as reported by the SLR authors. CAR in the remaining 28 SRLs ranged from 0 to 100% (mean = 53%, median = 48% and IQR 56%).

If all the 28 SLRs for which a CAR was estimated had considered the QA results, on average, 57% of 1053 individual studies eliciting HSUVs would have been deemed ineligible for data synthesis. Had the 28 SLRs used QA results to decide on the inclusion of studies for the analysis stage, 52% (15/28) would have rejected at least 50% of the eligible studies. Figure  2 shows the estimated CAR and acceptance rates across the 32 analysed studies.

figure 2

Counterfactual acceptance rates (CAR) across the SLRs evaluated. Note: For Blom et al. [ 47 ], Copper et al. [ 50 ], Afshari et al. [ 60 ], Jiang et al. [ 62 ], Etxeandia-Ikobaltzeta et al. [ 72 ] and Eiring et al. [ 69 ], the actual acceptance rates reported by authors are presented. n  = xx represents the total number of articles considered eligible and evaluated for quality after screening. SLRs = Systematic Literature Reviews

Items used for the QA of primary studies in the included SLRs

The majority of the included SLRs (39/40) comprehensively described how the QA process was conducted. One study [ 70 ] mentioned that QA was done but did not describe how it was implemented. Furthermore, the terminology used to describe the QA process varied considerably among the SLRs. Terms such as quality appraisal or assessment [ 9 , 23 , 24 , 48 , 49 , 51 , 53 , 55 , 57 , 58 , 59 , 60 , 73 , 74 , 75 , 76 , 77 , 78 ], critical appraisal [ 47 ], risk of bias assessment [ 25 , 62 , 63 , 72 , 79 , 80 , 81 , 82 ], relevancy and quality assessment [ 52 , 56 ], assessment of quality and data appropriateness [ 50 ], methodological quality assessment [ 26 , 27 , 46 , 54 , 61 , 69 ], reporting quality [ 71 , 83 ] credibility checks and methodological review [ 70 ] were used loosely and interchangeably. One study [ 84 ] mentioned three terms, RoB, methodological quality and reporting quality, in their description of the QA process. Notably, most SLRs that used the term quality assessment incorporated all three QA dimensions (RoB [methodology], reporting and relevance) in the QA.

A comprehensive list of 93 items remained after reviewing the original list of items, assigning new names where necessary, and removing duplicates (see Additional file 1 , Supplementary Material 5a Table A.7). Only 70 out of the 93 items found a place in the 40 SLRs that appraised the quality of studies eliciting HSUVs. The number of items used per SLR ranged from 1 to 29 (mean = 10, median = 8, and IQR = 8).

Of the 70 items used in SLRs, only five were used in at least 50% of the 40 SLRs: ‘response rates’ (68%); ‘statistical and/or data analysis’ (55%), ‘loss to follow-up [attrition or withdrawals]’ (53%), ‘sample size’ (53%) and ‘missing (incomplete) data’. Some of the least frequently used items include: ‘sources of funding’, ‘administration procedures’, ‘ethical approval’, ‘reporting of p -values’, ‘appropriateness of endpoints’, ‘the generalizability of findings’ and ‘non-normal distribution of utility values’. Each was used in only one SLR (3%). Twenty-three items (23/93) were not used in the SLRs but appeared in QA tools, checklists and/or GPRs. Some of these include ‘allocation sequence concealment’, ‘questionnaire response time’ ‘description and use of anchor states’, ‘misclassification (bias) of interventions’, ‘reporting of adverse events’, ‘the integrity of intervention’ and ‘duration in health states’ (see Additional file 1 , Supplementary material 7, Table A.10).

Results of the ISPOR and Additional items are depicted in Fig.  3 . The ISPOR item (Panel A) that most frequently occurred in SLRs was ‘response rates’ (27/40). Notably, most SLRs that evaluated ‘response rates’ developed their QA based on NICE/ISPOR tools (14/27). Similarly, QA based on NICE/ISPOR tools tended to include items such as ‘sample size’ (12 vs 9), ‘loss of follow up’ (13 vs 8), ‘inclusion and exclusion criteria’ (8 vs 3) and ‘missing data’ (12 vs 7) more so than those based on other checklists, tools and GPRs. Moreover, among ISPOR items, the measure used to describe the health states appeared the least frequently (3/40) in the SLRs. Additionally, none of the 40 SLRs evaluated all the 14 ISPOR items, and 10 of these items were considered by less than 50% of the SLRs. This observed trend indicated that adherence to the currently published guidelines is limited.

figure 3

Frequency of use of ISPOR and Additional items in SLRs. GPBM, generic preference-based measure; HS, health states and HSUVs, Health state utility values

Similar to the ISPOR items, most of the Additional items (Panel B) were used in just a few SLRs, with 12 appearing in less than 25% of the SLRs. The Additional item that appeared most frequently was ‘statistical and/or data analysis’ (22/40). Five out of these 22 articles were SLRs that based their QA on NICE/ISPOR tools. Items related to administration procedures’, ‘indifferent search procedures’ and ‘time of assessment’ were the least used, each appearing only one to three times out of the 40 SLRs analysed. Of note, no SLR that based its QA on NICE/ISPOR tools included items related to ‘confounding and baseline equivalence'; study design features; ‘reporting biases and administration procedure, which were used in 17, 9, 5 and 3 of the 40 SLRs, respectively. The figure also suggests that QA, based on other currently existing QA tools, checklists, and GPRs, focused more on statistical and data analysis issues (17 vs 5) and blinding (8 vs 1).

Items occurring in the checklists, tools, and GRPs extracted from the SLRs

Out of the 93 items identified, 81 items appeared in the identified checklists, tools, and GPRs (see Additional file 1 , Supplementary Material 7, Table A.11). The most frequently featured items were ‘statistical/data analysis’ (23/30) and ‘confounding or baseline equivalency of groups’ (20/30). The least occurring items included instrument properties (feasibility, reliability, and responsiveness), ‘generalisability of findings’, ‘administration procedure’ and ‘ethical approval’, all of which were featured once. Twelve items (12/93) were not found in the checklists, tools, and GRPs, for instance, ‘bibliographic details (including the year of publication)’, ‘credible extrapolation of health state valuations’, and ‘source of tariff (value set)’.

Figure  4 shows the occurrence frequency of ISPOR (Panel A) and Additional items (Panel B) in the 35 QA tools, checklists and GPRs. Notably, each ISPOR item featured in less than 50% (18) of the 35 QA tools, checklists, and GPRs analysed. The most frequently appearing ISPOR item was ‘respondent and recruitment selection’ (17/35), followed by ‘response rates (13/35) and ‘missing or incomplete date’ 13/35, and ‘sample size’ (11/30). The most frequently occurring Additional item was ‘statistical/data analysis’ (23/35), which appeared in 3 out of the 6 of the NICE/ISPOR tools and 20 out of the 29 other checklists, tools and GPRs. This was followed by confounding (20/35), which also appeared in only 1 out of the 6 NICE/ISPOR tools. Remarkably, items such as ‘blinding’ (14/35), ‘study design features’ (11/35) and ‘randomisation’ (6/35) only appeared in other checklists, tools and GPRs which are not considered NICE/ISPOR tools.

figure 4

Frequency of occurrence in QA tools, checklist, and good practice recommendations. GPBM, generic preference based measure; HS, health states and HSUVs, Health state utility values

Out of the 93 items from the comprehensive list, Fig.  5 displays the ten most used items in SLRs (Panel A) and the ten most frequently occurring items in the QA tools, checklists, and GPRs analysed (Panel B). On the one hand, although ‘blinding’ and ‘study/experimental design features’ were not among the ten most frequent items in the SLRs, they were highly ranked among the QA tools, checklists, and GPRs (fourth [with 40% occurrence rate] and eighth [with 31% occurrence rate], respectively). On the other hand, items related to ‘response rates’ and ‘loss of follow up’ had a higher ranking among the SLRs (first [68%] and third [53%], respectively) than among the checklists, tools and GPRs (seventh [33%] and tenth [26%] and respectively).

figure 5

Top ten most occurring items in ( A ) SLRs and ( B ) QA tools, checklists and GPRs. GPBM, generic preference-based measure; HS, health states and HSUVs, Health state utility values

We reviewed 73 SLRs of studies eliciting HSUVs and comprehensively described the nature of QA undertaken. We identified 35 QA tools, checklists, and GPRs considered or mentioned in the selected SLRs and extracted their main characteristics. We then used the two sets of information to generate a comprehensive list of 93 items used in 1) SLRs (70 items) and 2) in the QA tools, checklists, and GPRs (81 items) (see Additional file 1 , Supplementary file 5).

With only 55% of SLRs appraising the quality of included studies, the results supported our hypothesis of a low prevalence of QA in SLRs of studies eliciting HSUVs. This is evident when compared to other fields such as sports and exercise medicine, in which the prevalence of QA in SLRs was 99% [ 30 ], general medicine, general practice public health and paediatrics (90%) [ 15 ], surgery, alternative medicine rheumatology, dentistry and hepatogastroenterology (97%), and anesthesiology 76% [ 14 ]. In these fields, the high prevalence is in part linked to the availability of standardised QA tools and the presence of generally accepted standards [ 15 , 30 ]. For instance, a study on sports and exercise medicine [ 30 ] estimated that standardised QA tools were used in 65% of the SLRs analysed compared to 16% in the current study. The majority of the SRLs in the Büttner et al. [ 30 ] study were either healthcare interventions (32/66) or observational epidemiology (26/66) reviews, where standardised QA tools are widely available and accepted. Examples include: the Jadad Tool [ 36 ], Downs and Black [ 85 ], Newcastle–Ottawa Scale (NOS) scale, Cochrane tool for RoB assessment tools [ 28 , 86 ], RoB 1 [ 37 ] and RoB 2 [ 87 ].

Our results showed that SRL authors incorporate heterogeneous QA dimensions in their QAs. These variations can be attributed to a strong and long-standing lack of consensus on the definition of quality and the overall aim of doing a QA [ 31 ]. Overall, the present review identified three QA dimensions, RoB, reporting and relevancy\applicability, which were evaluated to varying extents (see the breakdown in Table 2 ). This heterogeneity in dimensions often leads to considerable variations in the QA items considered and the overall conclusions drawn [ 22 , 38 ]. For instance, Büttner et al. [ 29 , 30 ] compared the QA results based on the Downs and Black checklist Footnote 1 and the Cochrane Risk of Bias 2 tool (RoB2). Footnote 2 Interestingly, QA using the RoB 2 resulted in 11/11 of the RCTs being rated high overall RoB, while using the Downs and Black checklist resulted in 8/11 of the same studies being judged as high-quality trials.

The result from the study by Büttner et al. [ 29 , 30 ] described above is in favour of focusing only on RoB when appraising the quality of studies included in a SLR. Nevertheless, additional challenges exist when the studies are not well reported. It is different from concluding that a study is prone to RoB because it had several methodological flaws and that another is prone to RoB because the reporting was unclear. In effect, we do not know anything about the RoB in a study that does not provide sufficient details for such an assessment.

Pivotal to any QA in a SLR process is the reporting quality of included studies. A well-reported study allows reviewers to judge whether the results of primary studies can be trusted and whether they should contribute to meta-analyses [ 14 ]. First, the reviewers should assess the studies' methodological characteristics (based on the reported information). Only then, based on the methodological rigour (or flaws) identified, should risk judgements, the perceived risk that the results of a research study deviate from the truth [ 29 , 30 ], be inferred. Inevitably, all three quality dimensions are necessary and sufficient components of a robust QA [ 88 ].

A challenge to the QA of studies eliciting HSUVs is the apparent lack of standardised and widely accepted QA tools to evaluate them. First, this is evident in some of the SLRs [ 89 , 90 , 91 , 92 ] that did not appraise the quality of contributing studies and cited a lack of a gold standard as the main barrier to conducting such. Second, most of the SLRs that appraised quality did this by customising elements from the different checklist(s) [ 24 , 27 , 75 , 79 , 80 ], or using standardised tools designed to evaluate quality in other types of studies, and not primarily for eliciting HSUVs [ 23 , 27 , 52 , 62 ], and GPRs [ 9 , 26 , 46 , 47 , 50 , 54 , 55 , 56 , 61 , 63 , 74 , 84 ]. In this regard, we estimated that SLR authors used, on average, two QA tools, checklists or GPRs (Max = 9) to construct their customised QA tools, with only 14/40 (35%) SLRs using one tool [ 24 , 25 , 49 , 51 , 53 , 57 , 58 , 59 , 60 , 73 , 75 , 79 , 80 , 82 ]. This finding is not consistent with other fields of research. For instance, Katikireddi et al. [ 15 ] conducted a comprehensive review of QA in general practice public health and paediatrics. Their study estimated that, out of the 678 selected SLRs, 513 (76%) used a single quality/RoB assessment tool. Tools used included the non-modified versions of the Cochrane tool for RoB assessment (36%), the Jadad tool (14%), and the Newcastle–Ottawa scale (6%) [ 15 ].

The observed use of multiple tools leads to a critical question regarding the appropriateness of combining or developing custom-made tools to address the challenges present in the QA in SLRs of HSUVs studies. Petrou et al.’s guide to conducting systematic reviews and meta-analyses of studies eliciting HSUVs stated that " In the absence of generic tools that encompass all potentially relevant features, it is incumbent on those involved in the review process to describe the quality of contributing studies in holistic terms, drawing where necessary upon the relevant features of multiple checklists ” [ 12 ]. While this may sound plausible and pragmatic to many pundits, it requires comprehension and an agreement on what should be considered “relevant features”. Here is where the evidence delineated in this comprehensive review may call into question the notion of Petrou et al. [ 12 ].

The analysis of the comprehensive list of 93 items (see Fig.  5 and Additional file 1 , Supplementary Material 7 Table A.10 and A.11) showed: 1) a high heterogeneity among the QA items included in the SLRs and 2) a considerable mismatch of what is included in the existing QA tools, checklists and GPRs — which may be relevant for those who created the tools and the specific fields they were created for — with what is used by SLR authors in the QA of studies eliciting HSUVs.

The plethora of QA tools that authors of SLRs can choose from are designed with a strong focus on healthcare intervention studies measuring effect size. Yet, primary studies of HSUVs are not restricted to intervention studies. Accordingly, features that could be considered more relevant to intervention studies than to studies eliciting HSUVs such as the blinding of participants and outcomes, appeared in 40% of the checklists and GPRs and did not appear in any of the QAs of studies eliciting HSUVs. Their exclusion could indicate that the SLRs authors omitted less "relevant” features.

However, authors of SLRs overlooked an essential set of core elements of the empirical elicitation of HSUVs. For instance, Stalmeier et al. [ 39 ] provided a shortlist of 10 items necessary to report in the methods sections of studies eliciting HSUVs. The list includes items on how utility questions were administered, how health states were described, which utility assessment method or methods were used, the response and completion rates, specification of the duration of the health states, which software program (if any) was used, the description of the worst health state (lower anchor of the scale), whether a matching or choice indifference search procedure was used, when the assessment was conducted relative to treatment, and which (if any) visual aids were used. Similarly, the Checklist for Reporting Valuation Studies of Multi-Attribute Utility-Based Instruments (CREATE)[ 43 ]—which can be considered to be very close to HSUVs elicitation—includes attribute levels and scoring algorithms used for the valuation process. Regrettably, core elements such as instrument administration procedures, respondent burden, construction of tasks, indifferent search procedures, and scoring algorithms were used in less than 22% of the SLRs (see Fig.  3 ). The lack of these core elements strongly suggests that existing tools may not be suitable for QA of empirical HSUVs studies.

Additionally, the most highly ranked items in existing tools are statistical analysis and confounding and baseline equivalence, which appeared in 66% and 57%, of QA tools, checklists and GPRs evaluated. These items are only used in 55% and 43% of the SLRs that appraised quality. Undeniably, studies eliciting HSUVs are not just limited to experimental and randomised protocols, where the investigator has the flexibility to choose which variables to account for and control for during the design stage. It becomes extremely relevant to control for confounding variables in HSUVs primary studies (both observational and experimental) and employ robust statistical methods to control for any remaining confounders.

Furthermore, several items found in currently existing checklists, tools and GPRs reviewed and used by SLR review authors may be considered redundant. These include, as examples, ‘items on sources of funding’, ‘study objectives and research questions’, ‘bibliographic details, including the year of publication’ and ‘reporting of ethical approval’.

Another argument against Petrou et al.’s recommendation to resort to multiple QA tools when customising existing tools for QA in SLRs of studies eliciting HSUV is the need for consistency, reproducibility and comparability of research. Consistency, reproducibility and comparability are key to all scientific methods and or research regardless of domain. Undeniably, using multivariable QA tools and methods, informed by many published critical appraisal tools and GPRs (35 in our study), does not ensure consistency, reproducibility and comparability of either QA results or overall conclusions [ 21 , 22 ].

The 14 ISPOR items drawn from the few available GPRs specific for studies eliciting HSUVs [ 4 , 5 , 10 , 16 , 17 , 18 , 93 ] and the 14 Additional items which were informed mainly by literature [ 38 ], theoretical considerations [ 39 , 40 , 41 , 42 , 43 , 44 , 45 ] and the study team’s conceptual understanding of HSUV elicitation process can be considered a plausible list of items to include when conducting QA of studies eliciting HSUVs. Nevertheless, besides the list being too extensive or broad, there is, and would be high heterogeneity in the contribution of these items to QA. Therefore, there is a strong need for a scientific and evidence-based process to streamline the list into a standardised one and hope that it can be widely accepted.

Although SLRs and checklists, tools, and GPRs shared the same top five ISPOR items (i.e., response rates, loss of follow up, sample size, respondent selection and recruitment, and missing data), the ISPOR items are more often considered in the SLRs than they appear in checklists, tools, and GPRs reviewed. Moreover, our results showed that Additional items, which are also valuable in QA, have a considerably lower prevalence than ISPOR items in the QA presented in the SLRs. This is of concern since relying only on NICE/ISPOR tools may overlook relevant items for the QA of studies eliciting HSUVs such as ‘statistical or data analysis’, ‘confounding’, ‘blinding’, ‘reporting of results’, and ‘study design features’. Arguably too, relying on the current set of the QA tools, checklist and GPRs that have a noticeable lack of attention (as implied by their low frequency of occurrence) to items that capture the core elements for studies eliciting HSUVs, such as techniques used to value health states, the population used to collect the HSUVs, appropriate use of valuation method, and proper use of generic preference-based methods will not address the present challenges.

Another critical area where SLR authors are undecided is which QA system to use. While the guidelines seem to favour domain-based over the checklist and scale-based systems, SLR authors still seem to favour checklist and scale-based QA, presumably due to their simplicity. Our results suggest that scale-based checklists were used in more than 66% of the SLRs that appraised quality. The pros and cons of either system are well documented in the literature [ 15 , 21 , 22 , 38 ]. Notably, the two systems will produce different QA judgements [ 15 , 21 , 22 , 38 ]. The combined effect of such heterogeneity and inconsistencies in QA is a correspondingly wide variation and uncertainty in the QA results, conclusions and recommendations for policy.

Our analysis also revealed an alarmingly low rate of SLRs in which the conducted QA impacts the analysis. Congruent with previous studies in other disciplines of general medicine, public health, and trials of therapeutic or preventive interventions [ 15 , 94 , 95 ], only 11% (5/35) of the SLRs that conducted a QA explicitly informed the synthesis stage based on the QA results [ 47 , 50 , 60 , 62 , 69 ]. The reasons for this low prevalence of incorporating QA findings into the synthesis stage of SLRs remain unclear. However, it could be firmly attributed to a lack of specific guidance and disagreements on how QA results can be incorporated into the analysis process [ 95 ].

Commonly used methods for incorporating QA results into the analysis process include sensitivity analysis, narrative discussion and exclusion of studies at high RoB [ 15 ]. The five SLRs in our review [ 47 , 50 , 60 , 62 , 69 ] excluded studies with high or unclear risk of bias (or moderate or low quality) from the synthesis. These findings are a cause of concern since the empirical evidence suggests that combining evidence from low-quality (RoB) articles with high-quality leads to bias in the overall review conclusions, which can be detrimental to policy-making [ 15 ]. Therefore, incorporating the QA findings into the synthesis and conclusion drawing of any SLR [ 28 , 29 , 30 ], mainly of HSUVs, which are heterogeneous and considered a highly sensitive input parameter in many CUA [ 3 , 5 , 6 ], is highly recommendable. Nevertheless, the lack of clear guidance and agreement on how to do so remains a significant barrier.

To explore the potential impact of QA, we calculated counterfactual acceptance rates for individual studies and corresponding summary statistics (mean, median and IQR). While there has been an increasing number of empirical studies eliciting HSUVs over the years, our results suggest that a staggering 46% of individual studies would be excluded from the SLRs analysis because of their lower quality. However, this needs to be interpreted with caution. First, there is a mixed bag of QA tools (reporting quality vs methodological flaws and RoB, domain-based vs scale-based). Second, there could be an overlap of individual primary studies across the 40 SLRs that appraised quality. Third, although informed by previous studies, the QA threshold we used is arbitrary. There is currently no agreed standard or recommended threshold cut-off point to use during QA. This has resulted in considerable heterogeneity on the threshold used to exclude studies for synthesis in the previous literature [ 14 ]. Forth, there are variations in approaches recommended by different tools on how to summarise the individual domain ratings into an overall score [ 14 , 15 ].

Two main strengths can be highlighted in our review. First, in comparison to Yepes-Nuñez et al. [ 13 ], who focused on RoB and included 43 SRLs (to our knowledge, the only review that looked at RoB items considered in the QA of SLRs studies eliciting HSUVs), our findings are based on a larger sample (73 SLRs) with a broader focus (three dimensions: RoB, reporting, and relevancy\applicability) [ 13 ]. Second, in addition to examining QA in SRLs, we systematically evaluated the original articles related to each of the 35 identified checklists, tools, and GPRs [ 13 ]. Consequently, our comprehensive list of items reflects the QA methods applied in the SLRs and the current practices applied in checklists, tools, and GPRs. More importantly, based on both types of articles (i.e., SLRs and checklists, tools and GPRs), we propose a subsample of 28 main items that can serve as the basis for developing a standardised QA tool for the evaluation of HSUVs.

A limitation of our study is that the understanding of how QA was done was solely based on our comprehension of the reported information in the SLRs. Since this was a rapid review, we did not contact the corresponding SLR authors for clarifications regarding extracted items and QA methodology. A second limitation is that the SLRs were selected from published articles between 2015 and 2021. We adopted this approach to capture only the recent trends in the QA of studies on HSUVs, including the current challenges. Furthermore, the review by Yepes-Nuñez et al. [ 13 ], which reviewed all SLRs of HSUV from inception to 2015, has been used as part of the evidence that informed the development of the “Additional items”. As a result, our list captured all the 23 items identified by Yepes-Nuñez et al. and considered relevant before 2015.

Our comprehensive review reveals a low prevalence of QA in identified SLRs of studies eliciting HSUVs. Most importantly, the review depicts wide inconsistencies in approaches to the QA process ranging from the tools used, QA dimensions, the corresponding QA items, use of scale- or domain-based tools, and how the overall QA outcomes are summarised (summary scores vs risk judgements). The origins of these variations can be attributed to an absence of consensus on the definition of quality and the consequent lack of a standardised and widely accepted QA tool to evaluate studies eliciting HSUVs.

Overall, the practice of QA of individual studies in SLRs of studies eliciting HSUVs is still in its infancy stage. There is a strong need to promote QA in such assessments. The use of a rigorously and scientifically developed QA tool specifically designed for studies on HSUVs will, to a greater extent, ensure the much-needed consistency, reproducibility and comparability of research. A key question remains: Is it feasible to have a gold standard, comprehensive and widely accepted tool for QA of studies eliciting HSUVs? Downs and Black [ 85 ] concluded that it is indeed feasible to create a "checklist" for assessing the methodological quality of both randomised and non-randomised studies of health care interventions.

Therefore, the next step to developing a much-needed QA tool in the field of HSUVs is for researchers to reach a consensus on the working definition of quality, particularly for HSUVs where contextual considerations matter. Once that is established, an agreement on the core dimensions, domains and items that can be used to measure the quality, based on the agreed concept of quality, then follows. This work provides a valuable pool of items that should be considered for any future QA tool development.

Availability of data and materials

All data is provided in the paper or supplementary material.

The checklist is comprised of 27 items across four subscales, including completeness of reporting (9 items); internal validity (13 items); precision (1 item), and; external validity (3 items).

RoB2 is the revised, second edition of the Cochrane Risk of Bias tool for RCTs with five RoB domains [1) bias arising from the randomisation process; 2) bias due to deviations from intended interventions; 3) bias due to missing outcome data; 4) bias in measurement of the outcome; and 5) bias in the selection of the reported results.

Abbreviations

Counterfactual Acceptance Rate

Cost-Utility Analysis

Euroqol- 5 Dimension

Generic Preference-Based Methods

Good Practice Recommendation(s)

Health States Utility Value(s)

Health Technology Assessment

Health Utilities Index

Interquartile Range

The Professional Society for Health Economics And Outcomes Research

The National Institute for Health and Care Excellence

Quality Appraisal or Quality Assessment

Randomised Controlled Trial(s)

Risk of Bias

Risk Of Bias In Non-Randomised Studies of Interventions

Rapid Review(s)

Short-Form Six-Dimension

Standard Gamble

Systematic Literature Review(s)

Time Trade-Off

Visual Analogue Scale

Masic I, Miokovic M, Muhamedagic B. Evidence based medicine - new approaches and challenges. Acta Inform Med. 2008;16(4):219–25.

Article   PubMed   PubMed Central   Google Scholar  

Health Technology Assessment [ https://htaglossary.net/health-technology-assessment ]

Xie F, Zoratti M, Chan K, Husereau D, Krahn M, Levine O, Clifford T, Schunemann H, Guyatt G. Toward a Centralized, Systematic Approach to the Identification, Appraisal, and Use of Health State Utility Values for Reimbursement Decision Making: Introducing the Health Utility Book (HUB). Med Decis Making. 2019;39(4):370–8.

Article   PubMed   Google Scholar  

Wolowacz SE, Briggs A, Belozeroff V, Clarke P, Doward L, Goeree R, Lloyd A, Norman R. Estimating Health-State Utility for Economic Models in Clinical Studies: An ISPOR Good Research Practices Task Force Report. Value Health. 2016;19(6):704–19.

Ara R, Peasgood T, Mukuria C, Chevrou-Severac H, Rowen D, Azzabi-Zouraq I, Paisley S, Young T, van Hout B, Brazier J. Sourcing and Using Appropriate Health State Utility Values in Economic Models in Health Care. Pharmacoeconomics. 2017;35(Suppl 1):7–9.

Ara R, Hill H, Lloyd A, Woods HB, Brazier J. Are Current Reporting Standards Used to Describe Health State Utilities in Cost-Effectiveness Models Satisfactory? Value Health. 2020;23(3):397–405.

Torvinen S, Bergius S, Roine R, Lodenius L, Sintonen H, Taari K. Use of patient assessed health-related quality of life instruments in prostate cancer research: a systematic review of the literature 2002–15. Int J Technol Assess Health Care. 2016;32(3):97–106.

Robinson A, Dolan P, Williams A. Valuing health status using VAS and TTO: what lies behind the numbers? Soc Sci Med (1982). 1997;45(8):1289–97.

Article   CAS   Google Scholar  

Li L, Severens JLH, Mandrik O. Disutility associated with cancer screening programs: A systematic review. PLoS ONE. 2019;14(7): e0220148.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Ara R, Brazier J, Peasgood T, Paisley S. The Identification, Review and Synthesis of Health State Utility Values from the Literature. Pharmacoeconomics. 2017;35(Suppl 1):43–55.

Arnold D, Girling A, Stevens A, Lilford R. Comparison of direct and indirect methods of estimating health state utilities for resource allocation: review and empirical analysis. BMJ (Clin Res Ed). 2009;339:b2688.

Article   Google Scholar  

Petrou S, Kwon J, Madan J. A Practical Guide to Conducting a Systematic Review and Meta-analysis of Health State Utility Values. Pharmacoeconomics. 2018;36(9):1043–61.

Yepes-Nuñez JJ, Zhang Y, Xie F, Alonso-Coello P, Selva A, Schünemann H, Guyatt G. Forty-two systematic reviews generated 23 items for assessing the risk of bias in values and preferences’ studies. J Clin Epidemiol. 2017;85:21–31.

Marušić MF, Fidahić M, Cepeha CM, Farcaș LG, Tseke A, Puljak L. Methodological tools and sensitivity analysis for assessing quality or risk of bias used in systematic reviews published in the high-impact anesthesiology journals. BMC Med Res Methodol. 2020;20(1):121.

Katikireddi SV, Egan M, Petticrew M. How do systematic reviews incorporate risk of bias assessments into the synthesis of evidence? A methodological study. J Epidemiol Community Health. 2015;69(2):189–95.

Brazier J, Ara R, Azzabi I, Busschbach J, Chevrou-Séverac H, Crawford B, Cruz L, Karnon J, Lloyd A, Paisley S, et al. Identification, Review, and Use of Health State Utilities in Cost-Effectiveness Models: An ISPOR Good Practices for Outcomes Research Task Force Report. Value Health. 2019;22(3):267–75.

Papaioannou D, Brazier J, Paisley S. NICE Decision Support Unit Technical Support Documents. In: NICE DSU Technical Support Document 9: The Identification, Review and Synthesis of Health State Utility Values from the Literature. edn. London: National Institute for Health and Care Excellence (NICE); 2010.

Google Scholar  

Papaioannou D, Brazier J, Paisley S. Systematic searching and selection of health state utility values from the literature. Value Health. 2013;16(4):686–95.

Viswanathan M, Patnode CD, Berkman ND, Bass EB, Chang S, Hartling L, Murad MH, Treadwell JR, Kane RL. Recommendations for assessing the risk of bias in systematic reviews of health-care interventions. J Clin Epidemiol. 2018;97:26–34.

Ma L-L, Wang Y-Y, Yang Z-H, Huang D, Weng H, Zeng X-T. Methodological quality (risk of bias) assessment tools for primary and secondary medical studies: what are they and which is better? Mil Med Res. 2020;7(1):7.

PubMed   PubMed Central   Google Scholar  

O’Connor SR, Tully MA, Ryan B, Bradley JM, Baxter GD, McDonough SM. Failure of a numerical quality assessment scale to identify potential risk of bias in a systematic review: a comparison study. BMC Res Notes. 2015;8:224.

Armijo-Olivo S, Fuentes J, Ospina M, Saltaji H, Hartling L. Inconsistency in the items included in tools used in general health research and physical therapy to evaluate the methodological quality of randomized controlled trials: a descriptive analysis. BMC Med Res Methodol. 2013;13:116.

Park HY, Cheon HB, Choi SH, Kwon JW. Health-Related Quality of Life Based on EQ-5D Utility Score in Patients With Tuberculosis: A Systematic Review. Front Pharmacol. 2021;12:659675.

Carrello J, Hayes A, Killedar A, Von Huben A, Baur LA, Petrou S, Lung T. Utility Decrements Associated with Adult Overweight and Obesity in Australia: A Systematic Review and Meta-Analysis. Pharmacoeconomics. 2021;39(5):503–19.

Landeiro F, Mughal S, Walsh K, Nye E, Morton J, Williams H, Ghinai I, Castro Y, Leal J, Roberts N, et al. Health-related quality of life in people with predementia Alzheimer’s disease, mild cognitive impairment or dementia measured with preference-based instruments: a systematic literature review. Alzheimers Res Ther. 2020;12(1):154.

Meregaglia M, Cairns J. A systematic literature review of health state utility values in head and neck cancer. Health Qual Life Outcomes. 2017;15(1):174.

Li YK, Alolabi N, Kaur MN, Thoma A. A systematic review of utilities in hand surgery literature. J Hand Surg Am. 2015;40(5):997–1005.

Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (Editors). Cochrane Handbook for Systematic Reviews of Interventions In: VA, W. (ed.). 2nd Edition. Chichester: Wiley; 2019.

Büttner F, Winters M, Delahunt E, Elbers R, Lura CB, Khan KM, Weir A, Ardern CL. Identifying the ’incredible’! Part 1: assessing the risk of bias in outcomes included in systematic reviews. Br J Sports Med. 2020;54(13):798–800.

Büttner F, Winters M, Delahunt E, Elbers R, Lura CB, Khan KM, Weir A, Ardern CL. Identifying the ’incredible’! Part 2: Spot the difference - a rigorous risk of bias assessment can alter the main findings of a systematic review. Br J Sports Med. 2020;54(13):801–8.

Dechartres A, Charles P, Hopewell S, Ravaud P, Altman DG. Reviews assessing the quality or the reporting of randomized controlled trials are increasing over time but raised questions about how quality is assessed. J Clin Epidemiol. 2011;64(2):136–44.

Downes MJ, Brennan ML, Williams HC, Dean RS. Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS). BMJ Open. 2016;6(12):e011458.

Garritty C, Gartlehner G, Nussbaumer-Streit B, King VJ, Hamel C, Kamel C, Affengruber L, Stevens A. Cochrane Rapid Reviews Methods Group offers evidence-informed guidance to conduct rapid reviews. J Clin Epidemiol. 2021;130:13–22.

Burls A. What is Critical Appraisal? [Online]. Hayward Medical Communications; 2009. Available: http://www.bandolier.org.uk/painres/download/whatis/What_is_critical_appraisal.pdf . Accessed 5 Nov 2021.

Verhagen AP, De Vet HC, De Bie RA, Kessels AG, Boers M, Bouter LM, Knipschild PG. The Delphi list: a criteria list for quality assessment of randomized clinical trials for conducting systematic reviews developed by Delphi consensus. J Clin Epidemiol. 1998;51(12):1235–41.

Article   CAS   PubMed   Google Scholar  

Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, McQuay HJ. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials. 1996;17(1):1–12.

Sterne JAC, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, Henry D, Altman DG, Ansari MT, Boutron I, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ (Clin Res Ed). 2016;355:i4919.

Katrak P, Bialocerkowski AE, Massy-Westropp N, Kumar VSS, Grimmer KA. A systematic review of the content of critical appraisal tools. BMC Med Res Methodol. 2004;4(1):22.

Stalmeier PF, Goldstein MK, Holmes AM, Lenert L, Miyamoto J, Stiggelbout AM, Torrance GW, Tsevat J. What should be reported in a methods section on utility assessment? Med Decis Making. 2001;21(3):200–7.

Bridges JF, Hauber AB, Marshall D, Lloyd A, Prosser LA, Regier DA, Johnson FR, Mauskopf J. Conjoint analysis applications in health–a checklist: a report of the ISPOR Good Research Practices for Conjoint Analysis Task Force. Value Health. 2011;14(4):403–13.

Petrou S, Rivero-Arias O, Dakin H, Longworth L, Oppe M, Froud R, Gray A. The MAPS Reporting Statement for Studies Mapping onto Generic Preference-Based Outcome Measures: Explanation and Elaboration. Pharmacoeconomics. 2015;33(10):993–1011.

Petrou S, Rivero-Arias O, Dakin H, Longworth L, Oppe M, Froud R, Gray A. Preferred Reporting Items for Studies Mapping onto Preference-Based Outcome Measures: The MAPS Statement. Pharmacoeconomics. 2015;33(10):985–91.

Xie F, Pickard AS, Krabbe PF, Revicki D, Viney R, Devlin N, Feeny D. A Checklist for Reporting Valuation Studies of Multi-Attribute Utility-Based Instruments (CREATE). Pharmacoeconomics. 2015;33(8):867–77.

Zhang Y, Alonso-Coello P, Guyatt GH, Yepes-Nuñez JJ, Akl EA, Hazlewood G, Pardo-Hernandez H, Etxeandia-Ikobaltzeta I, Qaseem A, Williams JW Jr, et al. GRADE Guidelines: 19. Assessing the certainty of evidence in the importance of outcomes or values and preferences-Risk of bias and indirectness. J Clin Epidemiol. 2019;111:94–104.

Zhang Y, Coello PA, Guyatt GH, Yepes-Nuñez JJ, Akl EA, Hazlewood G, Pardo-Hernandez H, Etxeandia-Ikobaltzeta I, Qaseem A, Williams JW Jr, et al. GRADE guidelines: 20. Assessing the certainty of evidence in the importance of outcomes or values and preferences-inconsistency, imprecision, and other domains. J Clin Epidemiol. 2019;111:83–93.

Aceituno D, Pennington M, Iruretagoyena B, Prina AM, McCrone P. Health State Utility Values in Schizophrenia: A Systematic Review and Meta-Analysis. Value Health. 2020;23(9):1256–67.

Blom EF, Haaf KT, de Koning HJ. Systematic Review and Meta-Analysis of Community- and Choice-Based Health State Utility Values for Lung Cancer. Pharmacoeconomics. 2020;38(11):1187–200.

Buchanan-Hughes AM, Buti M, Hanman K, Langford B, Wright M, Eddowes LA. Health state utility values measured using the EuroQol 5-dimensions questionnaire in adults with chronic hepatitis C: a systematic literature review and meta-analysis. Qual Life Res. 2019;28(2):297–319.

Carter GC, King DT, Hess LM, Mitchell SA, Taipale KL, Kiiskinen U, Rajan N, Novick D, Liepa AM. Health state utility values associated with advanced gastric, oesophageal, or gastro-oesophageal junction adenocarcinoma: a systematic review. J Med Econ. 2015;18(11):954–66.

Cooper JT, Lloyd A, Sanchez JJG, Sörstadius E, Briggs A, McFarlane P. Health related quality of life utility weights for economic evaluation through different stages of chronic kidney disease: a systematic literature review. Health Qual Life Outcomes. 2020;18(1):310.

Di Tanna GL, Urbich M, Wirtz HS, Potrata B, Heisen M, Bennison C, Brazier J, Globe G. Health State Utilities of Patients with Heart Failure: A Systematic Literature Review. Pharmacoeconomics. 2021;39(2):211–29.

Golicki D, Jaśkowiak K, Wójcik A, Młyńczak K, Dobrowolska I, Gawrońska A, Basak G, Snarski E, Hołownia-Voloskova M, Jakubczyk M, et al. EQ-5D-Derived Health State Utility Values in Hematologic Malignancies: A Catalog of 796 Utilities Based on a Systematic Review. Value Health. 2020;23(7):953–68.

Kua WS, Davis S. PRS49 - Systematic Review of Health State Utilities in Children with Asthma. Value Health. 2016;19(7):A557.

Magnus A, Isaranuwatchai W, Mihalopoulos C, Brown V, Carter R. A Systematic Review and Meta-Analysis of Prostate Cancer Utility Values of Patients and Partners Between 2007 and 2016. MDM Policy Practice. 2019;4(1):2381468319852332.

Paracha N, Abdulla A, MacGilchrist KS. Systematic review of health state utility values in metastatic non-small cell lung cancer with a focus on previously treated patients. Health Qual Life Outcomes. 2018;16(1):179.

Paracha N, Thuresson PO, Moreno SG, MacGilchrist KS. Health state utility values in locally advanced and metastatic breast cancer by treatment line: a systematic review. Expert Rev Pharmacoecon Outcomes Res. 2016;16(5):549–59.

Petrou S, Krabuanrat N, Khan K. Preference-Based Health-Related Quality of Life Outcomes Associated with Preterm Birth: A Systematic Review and Meta-analysis. Pharmacoeconomics. 2020;38(4):357–73.

Saeed YA, Phoon A, Bielecki JM, Mitsakakis N, Bremner KE, Abrahamyan L, Pechlivanoglou P, Feld JJ, Krahn M, Wong WWL. A Systematic Review and Meta-Analysis of Health Utilities in Patients With Chronic Hepatitis C. Value Health. 2020;23(1):127–37.

Szabo SM, Audhya IF, Malone DC, Feeny D, Gooch KL. Characterizing health state utilities associated with Duchenne muscular dystrophy: a systematic review. Quality Life Res. 2020;29(3):593–605.

Afshari S, Ameri H, Daroudi RA, Shiravani M, Karami H, Akbari Sari A. Health related quality of life in adults with asthma: a systematic review to identify the values of EQ-5D-5L instrument. J Asthma. 2021;59(6):1203–12.

Ó Céilleachair A, O’Mahony JF, O’Connor M, O’Leary J, Normand C, Martin C, Sharp L. Health-related quality of life as measured by the EQ-5D in the prevention, screening and management of cervical disease: A systematic review. Qual Life Res. 2017;26(11):2885–97.

Jiang M, Ma Y, Li M, Meng R, Ma A, Chen P. A comparison of self-reported and proxy-reported health utilities in children: a systematic review and meta-analysis. Health Qual Life Outcomes. 2021;19(1):45.

Rebchuk AD, O’Neill ZR, Szefer EK, Hill MD, Field TS. Health Utility Weighting of the Modified Rankin Scale: A Systematic Review and Meta-analysis. JAMA Netw Open. 2020;3(4):e203767.

Herzog R, Álvarez-Pasquin MJ, Díaz C, Del Barrio JL, Estrada JM, Gil Á. Are healthcare workers’ intentions to vaccinate related to their knowledge, beliefs and attitudes? a systematic review. BMC Public Health. 2013;13(1):154.

Gupta A, Giambrone AE, Gialdini G, Finn C, Delgado D, Gutierrez J, Wright C, Beiser AS, Seshadri S, Pandya A, et al. Silent Brain Infarction and Risk of Future Stroke: A Systematic Review and Meta-Analysis. Stroke. 2016;47(3):719–25.

Vistad I, Fosså SD, Dahl AA. A critical review of patient-rated quality of life studies of long-term survivors of cervical cancer. Gynecol Oncol. 2006;102(3):563–72.

Mitton C, Adair CE, McKenzie E, Patten SB, Waye Perry B. Knowledge transfer and exchange: review and synthesis of the literature. Milbank Q. 2007;85(4):729–68.

Gupta A, Kesavabhotla K, Baradaran H, Kamel H, Pandya A, Giambrone AE, Wright D, Pain KJ, Mtui EE, Suri JS, et al. Plaque echolucency and stroke risk in asymptomatic carotid stenosis: a systematic review and meta-analysis. Stroke. 2015;46(1):91–7.

Eiring Ø, Landmark BF, Aas E, Salkeld G, Nylenna M, Nytrøen K. What matters to patients? A systematic review of preferences for medication-associated outcomes in mental disorders. BMJ Open. 2015;5(4):e007848.

Hatswell AJ, Burns D, Baio G, Wadelin F. Frequentist and Bayesian meta-regression of health state utilities for multiple myeloma incorporating systematic review and analysis of individual patient data. Health Econ. 2019;28(5):653–65.

Kwon J, Kim SW, Ungar WJ, Tsiplova K, Madan J, Petrou S. A Systematic Review and Meta-analysis of Childhood Health Utilities. Med Decis Making. 2018;38(3):277–305.

Etxeandia-Ikobaltzeta I, Zhang Y, Brundisini F, Florez ID, Wiercioch W, Nieuwlaat R, Begum H, Cuello CA, Roldan Y, Chen R, et al. Patient values and preferences regarding VTE disease: a systematic review to inform American Society of Hematology guidelines. Blood Adv. 2020;4(5):953–68.

Yuan Y, Xiao Y, Chen X, Li J, Shen M. A Systematic Review and Meta-Analysis of Health Utility Estimates in Chronic Spontaneous Urticaria. Front Med (Lausanne). 2020;7:543290.

Ward Fuller G, Hernandez M, Pallot D, Lecky F, Stevenson M, Gabbe B. Health State Preference Weights for the Glasgow Outcome Scale Following Traumatic Brain Injury: A Systematic Review and Mapping Study. Value Health. 2017;20(1):141–51.

Van Wilder L, Rammant E, Clays E, Devleesschauwer B, Pauwels N, De Smedt D. A comprehensive catalogue of EQ-5D scores in chronic disease: results of a systematic review. Qual Life Res. 2019;28(12):3153–61.

Han R, François C, Toumi M. Systematic Review of Health State Utility Values Used in European Pharmacoeconomic Evaluations for Chronic Hepatitis C: Impact on Cost-Effectiveness Results. Appl Health Econ Health Policy. 2021;19(1):29–44.

Brennan VK, Mauskopf J, Colosia AD, Copley-Merriman C, Hass B, Palencia R. Utility estimates for patients with Type 2 diabetes mellitus after experiencing a myocardial infarction or stroke: a systematic review. Expert Rev Pharmacoecon Outcomes Res. 2015;15(1):111–23.

Gheorghe A, Moran G, Duffy H, Roberts T, Pinkney T, Calvert M. Health Utility Values Associated with Surgical Site Infection: A Systematic Review. Value Health. 2015;18(8):1126–37.

Yang Z, Li S, Wang X, Chen G. Health state utility values derived from EQ-5D in psoriatic patients: a systematic review and meta-analysis. J Dermatol Treat. 2020;33(2):1029–36.

Tran AD, Fogarty G, Nowak AK, Espinoza D, Rowbotham N, Stockler MR, Morton RL. A systematic review and meta-analysis of utility estimates in melanoma. Br J Dermatol. 2018;178(2):384–93.

Haridoss M, Bagepally BS, Natarajan M. Health-related quality of life in rheumatoid arthritis: Systematic review and meta-analysis of EuroQoL (EQ-5D) utility scores from Asia. Int J Rheum Dis. 2021;24(3):314–26.

Foster E, Chen Z, Ofori-Asenso R, Norman R, Carney P, O’Brien TJ, Kwan P, Liew D, Ademi Z. Comparisons of direct and indirect utilities in adult epilepsy populations: A systematic review. Epilepsia. 2019;60(12):2466–76.

Khadka J, Kwon J, Petrou S, Lancsar E, Ratcliffe J. Mind the (inter-rater) gap. An investigation of self-reported versus proxy-reported assessments in the derivation of childhood utility values for economic evaluation: A systematic review. Soc Sci Med. 1982;2019(240):112543.

Zrubka Z, Rencz F, Závada J, Golicki D, Rupel VP, Simon J, Brodszky V, Baji P, Petrova G, Rotar A, et al. EQ-5D studies in musculoskeletal and connective tissue diseases in eight Central and Eastern European countries: a systematic literature review and meta-analysis. Rheumatol Int. 2017;37(12):1957–77.

Downs SH, Black N. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health. 1998;52(6):377–84.

Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, Savović J, Schulz KF, Weeks L, Sterne JAC. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ (Clin Res Ed). 2011;343:d5928.

Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, Cates CJ, Cheng H-Y, Corbett MS, Eldridge SM, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ (Clin Res Ed). 2019;366:l4898.

Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, Leeflang MM, Sterne JA, Bossuyt PM. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.

Blanchard P, Volk RJ, Ringash J, Peterson SK, Hutcheson KA, Frank SJ. Assessing head and neck cancer patient preferences and expectations: A systematic review. Oral Oncol. 2016;62:44–53.

Brown V, Tan EJ, Hayes AJ, Petrou S, Moodie ML. Utility values for childhood obesity interventions: a systematic review and meta-analysis of the evidence for use in economic evaluation. Obes Rev. 2018;19(7):905–16.

Mohindru B, Turner D, Sach T, Bilton D, Carr S, Archangelidi O, Bhadhuri A, Whitty JA. Health State Utility Data in Cystic Fibrosis: A Systematic Review. Pharmacoecon Open. 2020;4(1):13–25.

Xia Q, Campbell JA, Ahmad H, Si L, de Graaff B, Otahal P, Palmer AJ. Health state utilities for economic evaluation of bariatric surgery: A comprehensive systematic review and meta-analysis. Obes Rev. 2020;21(8):e13028.

Brazier J, Rowen D. NICE DSU Technical Support Document 11: Alternatives to EQ-5D for Generating Health State Utility Values. National Institute for Health and Care Excellence (NICE). NICE Decision Support Unit Technical Support Documents. School of Health and Related Research, University of Sheffield, UK; 2011. https://www.ncbi.nlm.nih.gov/books/NBK425861/pdf/Bookshelf_NBK425861.pdf .

de Craen AJ, van Vliet HA, Helmerhorst FM. An analysis of systematic reviews indicated low incorpororation of results from clinical trial quality assessment. J Clin Epidemiol. 2005;58(3):311–3.

Hopewell S, Boutron I, Altman DG, Ravaud P. Incorporation of assessments of risk of bias of primary studies in systematic reviews of randomised trials: a cross-sectional study. BMJ Open. 2013;3(8):e003342.

Download references

Acknowledgements

The authors thank Rachel Eckford and Tafirenyika Brian Gwenzi for proofreading and editing this manuscript. We also immensely appreciate the members of the DKFZ Division of Health Economics ( https://www.dkfz.de/en/gesundheitsoekonomie/index.php ) for their insightful comments and suggestions during internal presentations (i.e., team meetings) of the current review process.

Open Access funding enabled and organized by Projekt DEAL. This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

Author information

Authors and affiliations.

Division of Health Economics, German Cancer Research Center (DKFZ), Foundation Under Public Law, Im Neuenheimer Feld 280, 69120, Heidelberg, Germany

Muchandifunga Trust Muchadeyi, Karla Hernandez-Villafuerte & Michael Schlander

Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany

Muchandifunga Trust Muchadeyi & Michael Schlander

Health Economics, WifOR institute, Rheinstraße 22, Darmstadt, 64283, Germany

Karla Hernandez-Villafuerte

Alfred Weber Institute for Economics (AWI), University of Heidelberg, Heidelberg, Germany

Michael Schlander

You can also search for this author in PubMed   Google Scholar

Contributions

MTM contributed to the conception, development of the search strategy, retrieval of articles for review, step-wise screening of articles, data extraction, data analysis, interpretation and discussion of findings and writing the final manuscript. KHV contributed to the conception, development of the search strategy, step-wise screening of articles (quality checks), data extraction (quality checks), data analysis, interpretation and discussion of findings and writing the final manuscript. MS contributed to the conception, design and analysis of the study, interpretation of findings and writing of the final manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Muchandifunga Trust Muchadeyi .

Ethics declarations

Ethics approval and consent to participate.

This rapid review involved no study participants and was exempt from institutional review. All data analysed in the current review came from previously published SLRs.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Muchadeyi, M.T., Hernandez-Villafuerte, K. & Schlander, M. Quality appraisal for systematic literature reviews of health state utility values: a descriptive analysis. BMC Med Res Methodol 22 , 303 (2022). https://doi.org/10.1186/s12874-022-01784-6

Download citation

Received : 24 April 2022

Accepted : 04 November 2022

Published : 25 November 2022

DOI : https://doi.org/10.1186/s12874-022-01784-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Quality appraisal
  • Health state utility values
  • Preferences
  • Critical appraisal
  • Risk of bias

BMC Medical Research Methodology

ISSN: 1471-2288

ispor guidelines literature review

Identification, Review, and Use of Health State Utilities in Cost-Effectiveness Models: An ISPOR Good Practices for Outcomes Research Task Force Report

Affiliations.

  • 1 University of Sheffield, Sheffield, South Yorkshire, UK. Electronic address: [email protected].
  • 2 University of Sheffield, Sheffield, South Yorkshire, UK.
  • 3 Takeda Pharmaceutical International AG, Zurich, Switzerland.
  • 4 Erasmus University Medical Center, Rotterdam, The Netherlands; Viersprong Institute for Studies of Personality Disorders, Halsteren, The Netherlands.
  • 5 Celgene International Sàrl, Boudry, Switzerland.
  • 6 APAC, Syneos Health, Tokyo, Japan.
  • 7 Health Technology Assessment Institute, Federal University of Rio Grande du Sol, Porto Alegre, Brazil.
  • 8 The University of Adelaide, Adelaide, Australia.
  • 9 Acaster Lloyd Consulting Ltd, Oxford, UK.
  • 10 Department of Pharmacy Systems, Outcomes and Policy, College of Pharmacy, University of Illinois at Chicago, Chicago, IL, USA.
  • PMID: 30832964
  • DOI: 10.1016/j.jval.2019.01.004

Cost-effectiveness models that present results in terms of cost per quality-adjusted life-year for health technologies are used to inform policy decisions in many parts of the world. Health state utilities (HSUs) are required to calculate the quality-adjusted life-years. Even when clinical studies assessing the effectiveness of health technologies collect data on HSUs to populate a cost-effectiveness model, which rarely happens, analysts typically need to identify at least some additional HSUs from alternative sources. When possible, HSUs are identified by a systematic review of the literature, but, again, this rarely happens. In 2014, ISPOR established a Good Practices for Outcome Research Task Force to address the use of HSUs in cost-effectiveness models. This task force report provides recommendations for researchers who identify, review, and synthesize HSUs for use in cost-effectiveness models; analysts who use the results in models; and reviewers who critically appraise the suitability and validity of the HSUs selected for use in models. The associated Minimum Reporting Standards of Systematic Review of Utilities for Cost-Effectiveness checklist created by the task force provides criteria to judge the appropriateness of the HSUs selected for use in cost-effectiveness models and is suitable for use in different international settings.

Keywords: cost effectiveness; economic evaluation; health state utility; preference-based; quality of life; systematic reviews; utilities.

Copyright © 2019 ISPOR–The Professional Society for Health Economics and Outcomes Research. Published by Elsevier Inc. All rights reserved.

Publication types

  • Advisory Committees* / trends
  • Cost-Benefit Analysis / methods*
  • Cost-Benefit Analysis / trends
  • Health Status Indicators
  • Outcome Assessment, Health Care / methods*
  • Outcome Assessment, Health Care / trends
  • Patient Acceptance of Health Care
  • Quality-Adjusted Life Years*
  • Research Report* / trends
  • Technology Assessment, Biomedical / methods*
  • Technology Assessment, Biomedical / trends
  • Open access
  • Published: 04 November 2020

Good practices for the translation, cultural adaptation, and linguistic validation of clinician-reported outcome, observer-reported outcome, and performance outcome measures

  • Shawn McKown 1 ,
  • Catherine Acquadro 2 ,
  • Caroline Anfray 3 ,
  • Benjamin Arnold 4 ,
  • Sonya Eremenco 5 ,
  • Christelle Giroudet 6 ,
  • Mona Martin 7 &
  • Dana Weiss 8  

Journal of Patient-Reported Outcomes volume  4 , Article number:  89 ( 2020 ) Cite this article

7665 Accesses

32 Citations

4 Altmetric

Metrics details

Within current literature and practice, the category of patient-reported outcome (PRO) measures has been expanded into the broader category of clinical outcome assessments (COAs), which includes the subcategory of PRO, as well as clinician-reported outcome (ClinRO), observer-reported outcome (ObsRO), and performance outcome (PerfO) measure subcategories. However, despite this conceptual expansion, recommendations associated with translation, cultural adaptation, and linguistic validation of COAs remain focused on PRO measures, which has created a gap in specific process recommendations for the remaining types. This lack of recommendations has led to inconsistent approaches being implemented, leading to uncertainty in the scientific community regarding suitable methods. To address this gap, the ISOQOL Translation and Cultural Adaptation Special Interest Group (TCA-SIG) has developed recommendations specific to each of the three COA types currently lacking such documentation to support a standardized approach to their translation, cultural adaptation, and linguistic validation. The recommended process utilized to translate ObsRO, ClinRO and PerfO measures from one language to another aligns closely with the industry standard process for PRO measures. The substantial differences between respondent categories across COA types require targeted approaches to the cognitive interviewing procedures utilized within the linguistic validation process, including the use of patients for patient-facing text in ClinRO measures, and the need to interview the targeted observers for ObsROs measures.

Clinical outcome assessments (COAs), defined by the United States Food and Drug Administration (FDA) as tools that “measure a patient’s symptoms, overall mental state, or the effects of a disease or condition on how the patient functions,” are widely utilized within global clinical trials as a means of assessing concepts of interest and determining whether clinical benefit has been demonstrated [ 1 ]. COAs are categorized into four types: patient-reported outcome (PRO), clinician-reported outcome (ClinRO), observer-reported outcome (ObsRO), and performance outcome (PerfO) measures [ 1 ]. While use of these classifications has become widespread, it is also relatively new. Previously, regulatory bodies and industry users referred primarily to PRO measures rather than the broader category of COAs. This approach was favored throughout the literature, most notably in FDA’s Guidance for Industry on Patient-Reported Outcome Measures from December 2009, which specifically addressed PRO translation methodology guidance [ 2 ].

Utilization of the broader COA concept, encouraging readers to consider PRO measures as one of several COA types rather than as the primary focus, likely entered the dialogue in 2013 with the FDA’s release of the COA Roadmap to Patient-Focused Outcome Measurement [ 3 ]. This roadmap encouraged clinical trial personnel to select from the four COA types noted above to measure clinical benefit in treatment trials. In 2014, FDA released the Qualification Process for Drug Development Tools, which further developed this shift and featured guidance for COA qualification, encouraging users to select specific COA types as part of the trial planning process [ 4 ]. In 2018, FDA expanded this approach by releasing a Patient-Focused Drug Development (PFDD) draft guidance which highlighted recommended processes for selecting, developing or modifying fit-for-purpose COAs [ 5 ].

As use of the preferred concept has widened from PRO to COA within industry and literature, a gap in recommendations associated with translation, cultural adaptation, and linguistic validation processes has developed. The robust and effective guidance developed by FDA in 2009 [ 2 ], as well as the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Principles of Good Practice for the Translation and Cultural Adaptation Process for Patient-Reported Outcomes (PRO) Measures published in 2005 [ 6 ], apply specifically to PRO measures, and do not explicitly address the procedural requirements for the development, cultural adaptation, and/or linguistic validation of ObsRO, ClinRO, or PerfO measures. Translation service providers and academic groups performing cultural adaptations or linguistic validation currently do not have access to consensus recommendations specific to these COA types, leading to inconsistent approaches across these stakeholders.

While it is becoming more common for all COAs being used in clinical trials to be translated, there are still some cases where pharmaceutical sponsors elect not to translate these COA types, particularly ClinRO measures. This may, in part, be due to the lack of current guidance and in part due to an assumption that clinicians and staff personnel will speak English well enough to complete the translations adequately. Both issues are of concern because it opens the door for lack of consistency in the interpretation and presentation of the data collection as all users, with varying language abilities, are expected to produce the same concept equivalency.

Lack of existing guidance for ClinRO, ObsRO, and PerfO measures is of particular concern from a methodological perspective because the existing PRO process recommendations include “cognitive debriefing of the new translation, usually with patients drawn from the target population,” a recommendation that cannot directly apply to all of the COA types due to its requirement of individual cognitive debriefing interviews with patients, as opposed to observers or clinicians [ 6 ]. This is true and can seem obvious for the cognitive debriefing step, but one can also question whether there might be other aspects specific to non-PRO COAs that should be addressed specifically from a cultural and conceptual perspective. In order to address this gap, the ISOQOL Translation and Cultural Adaptation Special Interest Group (TCA-SIG) has developed recommendations specific to each of the three COA types which currently lack such documentation through a consensus approach. These recommendations are designed to align process expectations across stakeholders and address the existing gap in process good practices.

Recommendations for ObsRO, ClinRO and PerfO measures

To understand this broader COA concept which has replaced the PRO concept in recent years, it is important to identify distinctions between the COA types. Table  1 presents definitions for each non-PRO COA type.

Proxy measures are excluded from the ObsRO measures category because these measures require that an informant report as if he or she was the patient. The FDA notes that “for patients who cannot respond for themselves (e.g., infants or cognitively impaired), we encourage observer reports that include only those events or behaviors that can be observed. As an example, observers cannot validly report an infant’s pain intensity (a symptom) but can report infant behavior thought to be caused by pain (e.g., crying)” [ 1 ]. COAs intended for completion by caregivers which collect information about the caregiver’s personal feelings and experiences are similarly excluded from the ObsRO measure category.

While specific recommendations for their translation are currently lacking, uses of ObsRO, ClinRO and PerfO measures in clinical trials have been presented in literature, workshops, and studies by task forces within ISPOR. An ISPOR Task Force reviewed use of PRO and ObsRO measures in rare disease trials and produced an emerging good practices report, noting that “further incorporation of the patient-perspective requires the inclusion of PROs for patients who can speak for themselves … [and] ObsROs by parents and caregivers for those who cannot” [ 8 ]. A 2017 article by Powers and colleagues focused on issues related to development and evaluation of ClinRO measures in evaluating treatment benefit [ 9 ]. Increasing focus on these COAs within the multinational clinical trial space, particularly for pediatric, rare disease, and cognitively impaired populations, indicates a need to develop distinct and rigorous methodology recommendations for their translation, cultural adaptation, and linguistic validation.

Authorship was determined on a volunteer basis from the pool of 130 ISOQOL TCA-SIG members. Volunteers were solicited for lead and contributing author roles based on COA type, with a different lead ultimately volunteering for each of the three non-PRO COA types (ObsRO, ClinRO, PerfO). A literature review group was also convened. These working groups consisted of representatives from non-profit (Critical Path Institute, Mapi Research Trust), academia (University of Washington), pharmaceutical industry (Janssen), and companies specializing in translation (Amplexor, FACITtrans, HRA/Evidera, ICON/Mapi, RWS Life Sciences), all with significant experience in reviewing and translating ObsRO, ClinRO, and PerfO measures.

Literature review

A sub-group was convened to identify publications which had previously explored the use of ObsRO, ClinRO, and PerfO measures in clinical trials, with particular attention paid to cross-cultural use and translation methodology of these measures. Results were compiled, consolidated, and provided to the methods working group for further discussion.

Creation and distribution of methodology questionnaires

Three questionnaires were designed to collect information regarding ObsRO, ClinRO, and PerfO measure translation methodology among experts in the field (Additional file 1 : Appendices A, B, and C in the online supplement). These questionnaires were administered online in English and contained between 15 and 19 items that were developed and refined by the working groups. Items largely focused on process specifics, as well as any elements that could distinguish the ObsRO, ClinRO or PerfO measure translation processes from the more well-documented processes utilized for PRO translation and linguistic validation. Items asked about frequency of projects, methodology differences compared to standard PRO project methodology, process steps required for translation and cognitive interviews/pilot testing, and process considerations specific to ObsRO, ClinRO and PerfO measures. Questionnaires were designed to include questions specific to their COA type, such as a question about observer categories in the ObsRO questionnaire, a question about clinician input in the ClinRO questionnaire, and questions about engaging with cognitively impaired patients in the PerfO questionnaire. The intent of the questionnaires was to gather insight into current practices and to identify potential best practices for consideration by the writing team. The team looked to see where there seemed to be consensus among the respondents and where there were areas of disagreement. Areas of consensus were discussed as a group to ensure agreement with the recommended best practice. For areas of disagreement, the team discussed and worked to achieve consensus, taking the survey results into consideration.

The ObsRO and ClinRO questionnaires were distributed to a total of 27 individuals representing 27 organizations, while the PerfO questionnaire was distributed to 35 individuals representing 34 organizations. Although the content of the questionnaires was targeted specifically to the translation process, a variety of organizations were invited to participate, including representatives from translation companies, COA developers, pharmaceutical sponsors, academia, non-profit, government, electronic COA (eCOA) vendors, and contract research organizations (CROs). The questionnaires were completed in an online survey between August 2017 and October 2017.

Overview of questionnaire results

Questionnaire responses were received from representatives of 10 organizations (Amplexor, Critical Path Institute, Signant Health, FACITtrans, HRA/Evidera, Lionbridge, ICON/Mapi, Oxford University Innovation, RWS Life Sciences, and TransPerfect). These organizations represent a good cross-section of experts in the field with decades of global, cross-cultural COA and linguistic validation expertise and experience. Each individual respondent was asked to complete three questionnaires (48 items total). Respondents were given the ability to skip questions according to their preferences and areas of expertise, which led to varying denominators per item during analysis. Respondents included representatives from translation companies, instrument developers, eCOA companies, and non-profit organizations. Two additional respondents completed some but not all of the questionnaires, and their organizational data was not captured as a result. As the surveys were completed anonymously, the ethnicities and countries of residence of the respondents are unknown because this information was not collected as part of the survey. The organizations of the respondents are headquartered in France, Ireland, the United Kingdom, and the United States.

The results indicated broad agreement among respondents regarding general experiences with, and approaches to, the linguistic validation of COAs. Most (27/33; 82%) responses indicated that requests for ObsRO/ClinRO/PerfO measure translation projects were either less common (18/33; 56%) or much less common (9/33; 27%) than requests for PRO measure translation projects. Most (25/33; 76%) responses indicated that ObsRO/ClinRO/PerfO measure translation projects usually take the same amount of time to set up as PRO measure translation projects.

Respondents also reported broad agreement regarding translation and linguistic validation methodology for COA projects. The following translation process steps were recommended by over 70% of responses:

Creation of concept definition document (28/30; 93%)

Developer review of concept definition document (28/30; 93%)

Dual forward translations (28/30; 93%)

Reconciliation of forward translations (27/30; 90%)

Single back-translation (24/30; 80%)

Project Manager review and evaluation of back-translation (29/30; 97%)

Developer review of back-translation evaluation (22/30; 73%)

Proofreading (27/30; 90%)

The most substantial translation process step difference between the COA type responses related to the issue of in-country clinician review of the translation. While the vast majority of respondents indicated that clinician review was necessary from the ClinRO (9/11; 82%) and PerfO (9/10; 90%) groups, a clinician review was not deemed necessary by the respondents from the ObsRO group (2/9; 22%).

Responses to the ClinRO questionnaire did diverge from the ObsRO and PerfO questionnaires in less substantial ways in terms of translator guidance provided and overall project length. While most (14/20; 70%) responses to the ObsRO and PerfO questionnaires indicated that there are no differences in the guidance they provide to translators as compared to PRO projects, few respondents of the ClinRO questionnaire agreed (3/11; 27%). Similarly, while most (15/21; 71%) responses to the ObsRO and PerfO questionnaires indicated that there are no differences in length of translation projects as compared to PRO projects, ClinRO respondents indicated that translation projects were shorter than PRO projects (8/12; 67%).

Cognitive interviewing (pilot testing)

In contrast to the relative agreement on translation methodology observed across respondent groups, review of the preferred cognitive interviewing process elicited unique and distinct methodology recommendations from each group.

Cognitive interviewing (pilot testing): ObsRO measures

The following cognitive interview/pilot testing process steps were recommended by over 70% of respondents to the ObsRO questionnaire:

Cognitive interviews with the patients’ caregivers (as applicable) (8/9; 89%)

Cognitive interviews with other observers of the patient (as applicable) (7/9; 78%)

For adult patients, interviews should be completed in-person with observer, with the patient not in the room (9/9; 100%)

For pediatric patients, interviews should be completed in-person with observer, with the child not in the room (8/9; 89%)

Questionnaire results uncovered some areas of disagreement related to specific challenges presented by the cognitive interviewing of translated ObsRO measures. When queried about whether a restriction should be placed on the maximum amount of time since the observer respondent last observed the patient’s behavior, the responses were split (55% [5/9] favored no restriction, 45% [4/9] favored including a restriction). For those respondents that favored a restriction, there was no consensus on what the restriction should be, and responses ranged from 1 week to 6 months. There was similarly no consensus on the question of how to approach cognitive interviews for ObsRO measures which indicate more than one observer type (i.e., parent, caregiver, teacher). Issues which did not show clear consensus within the questionnaire results were referred to the working group for further discussion and resolution.

Cognitive interviewing (pilot testing): ClinRO measures

No specific cognitive interview/pilot testing process steps were recommended by over 70% of respondents to the ClinRO questionnaire. Six of nine respondents indicated that cognitive interviews with patients should be undertaken in cases where the ClinRO measure contains patient-facing text. Five of nine respondents expressed a preference for including cognitive interviews with clinicians, while other respondents described interviewing clinicians as being less effective than a clinician review of the text.

Cognitive interviewing (pilot testing): PerfO measures

The following cognitive interview/pilot testing process steps were recommended by over 70% of respondents to the PerfO questionnaire:

Pilot testing with patients should be performed (administering the PerfO measure where respondents will perform the tasks): (7/10; 70%)

Cognitive interviews with patients should be performed, where patient-facing parts (e.g., instructions, stimuli) are reviewed: (8/10; 80%)

The PerfO tasks should be administered by the interviewer (5/7; 71%)

Cognitive interviews with clinicians/healthcare professionals are not required (6/8; 75%)

Areas of weaker consensus regarding the PerfO cognitive interview process included:

Whether cognitive interviews should be completed with the individual who administered the PerfO measure during pilot testing (60% [6/10] indicated “No”)

Whether cognitively impaired patients should participate in cognitive interviews/pilot testing for measures intended for use with a cognitively impaired population (63% [5/8] indicated “Yes”).

While the regulatory and industry view of clinical outcomes measurement has shifted from focusing on PRO measures to focusing on the broader category of COAs, specific methodological guidelines regarding the translation, cultural adaptation, and linguistic validation of non-PRO COAs (i.e., ObsRO, ClinRO, and PerfO measures) does not currently exist. Our working group sought to develop clear, actionable, and achievable process recommendations to fill this gap and to align process expectations across stakeholders.

Our research found that the process utilized to translate ObsRO, ClinRO and PerfO measures from one language to another aligns closely with the process outlined in the ISPOR recommendations for translation and cultural adaptation of PRO measures. A summary of these recommended good practices for all COAs can be found in Table  2 .

While the translation process for these measures did not require substantial modification from the generally accepted PRO translation methodology, we found it of particular importance to highlight the necessity of translating all assessment material, including the components given to the clinician or the rater. This is especially true for cognitive assessments that are used to measure a person’s cognitive functioning. They are generally composed of 3 elements: (1) the Stimuli, (2) the Instructions to the patients (i.e. read by the rater), (3) the Instructions to the rater on how to administer and score the test.

The stimuli may include images, numbers, letters, words, short stories, objects from daily life, etc. The latter—that are presented to the patients—may need to be adapted to the country of interest, following the standards of cultural adaptation of patient-facing attributes. Particular attention will also need to be given to the material that is for the raters, i.e. the Response Form for the rater to write down the patient’s score (sometimes also containing the stimuli) and the Instruction Manual containing the instructions to the rater and instructions to the patient. For constraints of timelines or budget, this material to the rater is often neglected, poorly translated, or even not translated at all due to the assumption that clinicians and site personnel are sufficiently fluent in English. Expecting clinicians and site personnel to use English forms and instruction manuals can dramatically result in bias or incorrect interpretations of the measure content that threaten the validity of the data. To tackle this risk, performing a rigorous translation of the rater material will standardize the measure across all clinicians in a given country and between countries, which will improve inter-rater reliability of the translated measure itself.

Our research also found that the recommended cognitive interviewing/pilot testing process differs substantially between COA types. A summary of the recommended COA interview processes in comparison with the generally accepted PRO process can be found in Table  3 .

Further discussion: ObsRO measure cognitive interviewing

While both the questionnaire results and initial working group discussions revealed broad alignment on the procedural recommendations for ObsRO measures as noted above, there were areas of disagreement that were referred to the working group for further discussion.

The first problematic issue related to the definition of “observer,” and specifically whether a restriction should be placed on the maximum amount of time since the observer respondent last observed the patient’s behavior. After discussion and review, there was consensus that there should be some sort of restriction, as the memories of the observers would become unreliable over time. Once agreeing to the existence of a restriction, the group debated its length. A short window of time, such as days or weeks, was thought to exclude too many potential observers who would otherwise provide useful data, while too long a window would present the same challenge as requiring no restriction at all. Ultimately, the group agreed to recommend that a restriction of 1 month since the respondent last observed the patient’s behavior should be implemented. Future research may be needed to confirm the feasibility and necessity of this recommendation.

The second problematic issue was the question of how to approach cognitive interviews for ObsRO measures which indicate more than one observer type (e.g., parent, caregiver, and/or teacher). Questionnaire results and initial discussions among the working group showed little consensus. After discussion and review, the group noted that while some ObsRO measures may be applicable to multiple observer categories, specific clinical trials would more likely target a particular category of observer based on the needs of the trial. In the interest of making the translation deliverable fit-for-purpose, it was agreed to recommend that groups performing translations should take into consideration the observer type that will be utilized in the clinical trial associated with the project when determining which observer type to interview. In cases where this information is unavailable or inapplicable, vendors should attempt to perform interviews with multiple types of observers when an ObsRO measure has multiple observer types indicated.

Further discussion: ClinRO measure cognitive interviewing

The questionnaire results related to performing cognitive interviews with ClinRO measures were less clear than the results related to other COA types. Overall, there was consensus that interviews with patients should be undertaken for patient-facing text in cases where patient-facing text is included within the ClinRO measure. There was not consensus, however, regarding whether clinicians themselves should be interviewed as part of the process. Ultimately, the working group decided to present clinician interviews as an acceptable but not mandatory approach, which could be supplemented or replaced by a clinician review of the translation in most cases.

Further discussion: PerfO measure cognitive interviewing

Relatively minor differences of opinion regarding the cognitive interviewing process for PerfO measures were reviewed by the working group. It was determined that while conducting additional cognitive interviewing with the individual who administered the PerfO measure during pilot testing could be interesting and fruitful, it was not a mandatory component of the process. The questionnaire results indicated a slight preference (5/8) for interviewing patients with mild cognitive impairment when testing PerfO measures intended for use with a cognitively impaired population. The working group agreed with this approach, while noting that recruitment of cognitively intact subjects with other specific criteria (e.g., within a specific age range) was a reasonable alternate approach in cases where interviewing patients with cognitive impairment was ineffective or otherwise not feasible.

Limitations

This paper focused on good practice recommendations for translation, cultural adaptation, and linguistic validation of ObsRO, ClinRO, and PerfO measures. Translatability assessment was not addressed as it is a separate process conducted during instrument development that precedes the translation process outlined here [ 11 ]. The ISOQOL TCA-SIG has published emerging good practice recommendations for translatability assessment of PRO measures but did not have sufficient evidence to expand the recommendations to non-PRO COAs [ 11 ]. Although one would expect the process to align closely with that of PRO measures, the need for good practice recommendations for translatability assessment of non-PRO COAs remains to be addressed.

In order to develop reasonable and actionable good practice recommendations for the translation, cultural adaptation, and linguistic validation of non-PRO COAs, the ISOQOL TCA-SIG examined the characteristics and requirements of each COA type by means of a literature review, completion of targeted questionnaires by industry experts, and group discussion and analysis. Our findings indicate that while recommended translation process steps generally align across all COA types (including PRO measures), the substantial differences between respondent categories across COA types require targeted approaches to the cognitive interviewing procedures utilized within the linguistic validation process. As a result, specific good practices and process recommendations have been developed for each non-PRO COA type, which will assist in further aligning procedures across service providers, COA instrument developers, and industry sponsors.

Availability of data and materials

Contained in Additional file 1 : Appendix A.

Food and Drug Administration. Clinical outcome assessment qualification program: defining a clinical outcome assessment. https://www.fda.gov/drugs/developmentapprovalprocess/drugdevelopmenttoolsqualificationprogram/ucm284077.htm Accessed 4 June 2018.

Food and Drug Administration (2009). Guidance for industry. Patient-reported outcome measures: Use in medical product development to support labeling claims. Federal Register , 74 (35), 65132–65133 http://www.fda.gov/downloads/Drugs/Guidances/UCM193282.pdf . Accessed 4 June 2018.

Google Scholar  

Food and Drug Administration. (2013). Roadmap to patient-focused outcome measurement in clinical trials. https://www.fda.gov/Drugs/DevelopmentApprovalProcess/DrugDevelopmentToolsQualificationProgram/ucm370177.htm . Accessed 3 Jul 2018.

Food and Drug Administration. (2014). Guidance for Industry and FDA Staff: Qualification Process for Drug Development Tools. https://www.fda.gov/drugs/guidancecomplianceregulatoryinformation/guidances/ucm335850.htm. Accessed 11 June 2018 .

Food and Drug Administration. (2018). Patient-focused drug development guidance: methods to identify what is important to patients and select, develop or modify fit-for-purpose clinical outcome assessments. https://www.fda.gov/drugs/news-events-human-drugs/patient-focused-drug-development-guidance-methods-identify-what-important-patients-and-select . Accessed 5 Jan 2020.

Wild, D., Grove, A., Martin, M., Eremenco, S., McElroy, S., Verjee-Lorenz, A., & Erickson, P. (2005). Principals of good practice for the translation and cultural adaptation process for patient-reported outcomes (PRO) measures: report of the ISPOR task force for translation and cultural adaptation. Value in Health , 8 , 94–104.

Article   PubMed   Google Scholar  

National Center for Biotechnology Information. (2018). Glossary: Terms and Definitions. https://www.ncbi.nlm.nih.gov/books/NBK338448/#IX-C

Benjamin, K., Vernon, M., Patrick, D., Perfetto, E., Nestler-Parr, S., & Burke, L. (2017). Patient-reported outcome and observer-reported outcome assessment in rare disease clinical trials: an ISPOR COA emerging good practices task force report. Value in Health , 20 , 838–855.

Powers III, J. H., Patrick, D. L., Walton, M. K., et al. (2017). Clinician-reported outcome (ClinRO) assessments of treatment benefit: Report of the ISPOR clinical outcome assessment emerging good practices task force. Value in Health , 20 (1), 2–14.

Article   PubMed   PubMed Central   Google Scholar  

Koller, M., Kantzer, V., Mear, I., Zarzar, K., Martin, M., Greimel, E., … on behalf of the ISOQOL TCA-SIG (2012). The process of reconciliation in translating quality of life questionnaires. Evaluation of existing translation guidelines and a set of recommendations. Expert Review of Pharmacoeconomics & Outcomes Research , 12 (2), 189–197.

Article   Google Scholar  

Acquadro, C., Patrick, D. L., Eremenco, S., Martin, M. L., Kulis, D., Correia, H., … on behalf of the International Society of Quality of Life Research (ISOQOL) Translation and Cultural Adaptation Special Interest Group TCA-SIG) (2018). Emerging good practices for translatability assessment (TA) of patient-reported outcome (PRO) measures. Journal of Patient-Reported Outcomes , 2 , 8. https://doi.org/10.1186/s41687-018-0035-8 .

Article   PubMed Central   Google Scholar  

Download references

Acknowledgements

Thanks to Elizabeth Yohe Moore and Tim Poepsel of RWS Life Sciences for contributing to the development and distribution of the methodology questionnaires, and to the members of the ISOQOL TCA-SIG who provided valuable input and review.

All authors are members of the ISOQOL Translation and Cultural Adaptation Special Interest Group (TCA-SIG).

A waiver of the article processing charge has been requested by the TCA-SIG.

All data generated or analyzed during this study are included in this published article.

Author information

Authors and affiliations.

RWS Life Sciences, East Hartford, CT, USA

Shawn McKown

ICON plc, Patient Centred Sciences, Lyon, France

Catherine Acquadro

Mapi Research Trust, Lyon, France

Caroline Anfray

FACITtrans, Ponte Vedra, FL, USA

Benjamin Arnold

Critical Path Institute, Tucson, AZ, USA

Sonya Eremenco

ICON plc, Language Services, Dublin, Ireland

Christelle Giroudet

Evidera, Bethesda, MD, USA

Mona Martin

Chicago, Illinois, USA

You can also search for this author in PubMed   Google Scholar

Contributions

SM served as lead author. SE was the lead reviewer and a major contributor to the writing of the manuscript. All listed authors served as reviewers and revisers of the text. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Shawn McKown .

Ethics declarations

Ethics approval and consent to participate, consent for publication.

No individual person’s data is included in this material. All authors consent to publication of this material.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

McKown, S., Acquadro, C., Anfray, C. et al. Good practices for the translation, cultural adaptation, and linguistic validation of clinician-reported outcome, observer-reported outcome, and performance outcome measures. J Patient Rep Outcomes 4 , 89 (2020). https://doi.org/10.1186/s41687-020-00248-z

Download citation

Received : 19 June 2020

Accepted : 24 September 2020

Published : 04 November 2020

DOI : https://doi.org/10.1186/s41687-020-00248-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Translation
  • Linguistic validation
  • Clinical outcome assessments
  • Clinician-reported outcome measures
  • Observer-reported outcome measures
  • Performance outcome measures

ispor guidelines literature review

  • HEOR Resources
  • Good Practices Reports & More

Published Mar 2005

Wild D, Grove A, Martin M, et al. Principles of good practice for the translation and cultural adaptation process for patient-reported outcomes (PRO) Measures: report of the ISPOR Task Force for Translation and Cultural Adaptation. Value Health. 2005;8(2):94-104.

In 1999, ISPOR formed the Quality of Life Special Interest group (QoL-SIG)—Translation and Cultural Adaptation group (TCA group) to stimulate discussion on and create guidelines and standards for the translation and cultural adaptation of patient-reported outcome (PRO) measures. After identifying a general lack of consistency in current methods and published guidelines, the TCA group saw a need to develop a holistic perspective that synthesized the full spectrum of published methods. This process resulted in the development of Translation and Cultural Adaptation of Patient Reported Outcomes Measures—Principles of Good Practice (PGP), a report on current methods, and an appraisal of their strengths and weaknesses. The TCA Group undertook a review of evidence from current practice, a review of the literature and existing guidelines, and consideration of the issues facing the pharmaceutical industry, regulators, and the broader outcomes research community. Each approach to translation and cultural adaptation was considered systematically in terms of rationale, components, key actors, and the potential benefits and risks associated with each approach and step. The results of this review were subjected to discussion and challenge within the TCA group, as well as consultation with the outcomes research community at large. Through this review, a consensus emerged on a broad approach, along with a detailed critique of the strengths and weaknesses of the differing methodologies. The results of this review are set out as “Translation and Cultural Adaptation of Patient Reported Outcomes Measures—Principles of Good Practice” and are reported in this document. Keywords: cultural adaptation, good practice, guidelines, linguistic validation, patient reported outcomes measures, translation.

Full Content

Related content.

  • Multinational Trials – Recommendations on the Translations Required, Approaches to Using the Same Language in Different Countries, and the Approaches to Support Pooling the Data

For any questions about this report please contact us .

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

IMAGES

  1. FREE 5+ Sample Literature Review Templates in PDF

    ispor guidelines literature review

  2. 50 Smart Literature Review Templates (APA) ᐅ TemplateLab

    ispor guidelines literature review

  3. 50 Smart Literature Review Templates (APA) ᐅ TemplateLab

    ispor guidelines literature review

  4. Translation process following the ISPOR guidelines.

    ispor guidelines literature review

  5. Literature Review Guidelines

    ispor guidelines literature review

  6. 50 Smart Literature Review Templates (APA) ᐅ TemplateLab

    ispor guidelines literature review

VIDEO

  1. Jeep Wrangler off road Uzbekistan

  2. The "W" Dad😅

  3. March 9, 2024

  4. 🥰##short viral # play shorts 😇👐

COMMENTS

  1. ISPOR

    ISPOR Good Practices Reports are highly cited, expert consensus guidance recommendations that set international standards for HEOR and its use in healthcare decision making. Other ISPOR Reports provide guidance on important areas such as applications, HTA and policy, terms and definitions, and reviews. All ISPOR Reports are made freely ...

  2. Ispor

    Consolidated Health Economic Evaluation Reporting Standards (CHEERS) 2022. CHEERS 2022 has been updated to reflect recent developments in economic evaluation methods and replaces the original 2013 CHEERS guidance for reporting health economics research. CHEERS 2022 is recognized by the EQUATOR Network as a reporting guideline for health ...

  3. Budget Impact Analysis of Diabetes Drugs: A Systematic Literature Review

    The primary finding from this review is that despite published guidelines for budget-impact analysis, there were still significant differences in the included studies. In addition to the ISPOR guidelines, many countries and regions had issued budget impact analysis guidelines, such as France , Canada , Australia , and Ireland . Although the key ...

  4. Systematic Literature Review of Guidelines on Budget Impact ...

    The ISPOR guidelines specifically mention that scenario analyses should test the impact of both changes in input parameter values as ... Y., De Francesco, M. & Prinja, S. Systematic Literature Review of Guidelines on Budget Impact Analysis for Health Technology Assessment. Appl Health Econ Health Policy 19, 825-838 (2021 ). https://doi ...

  5. Budget Impact Analysis of Diabetes Drugs: A Systematic Literature Review

    Data Extraction. Based on the ISPOR Task Force guidelines, we developed evidence tables which presented a summary of how each study addressed the key items of the study, including population size and characteristics, budget holder's perspective, budget time horizon, intervention and comparators, market share, model structure, clinical and cost data, cost calculation, and sensitivity analysis.

  6. Budget Impact Analysis of Diabetes Drugs: A Systematic Literature Review

    ISPOR good practice guidelines were used as a methodological standard for assessing BIAs. We extracted and compared the study characteristics outlined by the ISPOR BIA Task Force to evaluate the guideline compliance of the included BIA. Results: A total of eighteen studies on the BIA for anti-diabetic drugs were identified. More than half ...

  7. Criteria and Process for Initiating and Developing an ISPOR Good

    The International Society for Pharmacoeconomics and Outcomes Research (ISPOR)'s "Good Practices Task Force" reports are highly cited, multistakeholder perspective expert guidance reports that reflect international standards for health economics and outcomes research (HEOR) and their use in healthcare decision making. In this report, we discuss the criteria, development, and evaluation ...

  8. ISPOR Reporting Guidelines for Comparative Effectiveness Research

    This Guide to Statistics and Methods summarizes the key characteristics, strengths, and limitations of the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) reporting guidelines for comparative effectiveness research studies using nonrandomized, secondary data sources.

  9. ISPOR Principles of Good Practice: The Cross-Cultural Adaptation

    At the same time that the ISPOR group was working on its guidance document, the ERIQA group conducted a literature review of existing guidelines and developed a draft checklist of recommended steps in the process of developing questionnaires of cross-cultural adaptations. The steps identified in the ERIQA checklist mirror those identified in ...

  10. Systematic Review of Outcomes for Assessment of ...

    As a follow-up, based on the results of this literature review, the ISPOR Medication Adherence and Persistence Special Interest Group plans to undertake focus groups and a Delphi panel study involving various stakeholders (ie, patients, payers, industry, healthcare providers, and industry) to develop consensus-based value criteria for ...

  11. The ISPOR Good Practices for Quality Improvement of ...

    Literature reviews and surveys were conducted and the first preliminary findings presented at an open forum at the May 2008 ISPOR meeting in Toronto. The final draft report was circulated to the expert reviewer group and then to the entire membership for comment. ... periodically review use of ISPOR Task Force guidelines; periodically report on ...

  12. The ISPOR Good Practices for Quality Improvement of Cost ...

    Literature reviews and surveys were conducted and the first preliminary findings presented at an open forum at the May 2008 ISPOR meeting in Toronto. The final draft report was circulated to the expert reviewer group and then to the entire membership for comment. ... periodically review use of ISPOR Task Force guidelines; periodically report on ...

  13. Criteria and Process for Initiating and Developing an ISPOR Good

    These task forces focused on established HEOR methods to synthesize the literature and then current practice to develop ISPOR Good ... Guidelines for the Economic Evaluation of Health Technologies" references 11 of ISPOR's reports. 28 Brazil's Ministry of Health ... the ISPOR Task Force Review Committee liaison provides feedback on ...

  14. Assessing the Value of Biosimilars: A Review of the Role of Budget

    However, subsequent to that review, ISPOR published updated guidelines on BIAs , which may have positively affected the quality of BIAs conducted since the ISPOR guidelines were published. ... This literature review has identified a paucity of peer-reviewed information on the budget impact of biosimilar products. Only three full-text ...

  15. Quality appraisal for systematic literature reviews of health state

    Health state utility values (HSUVs) are an essential input parameter to cost-utility analysis (CUA). Systematic literature reviews (SLRs) provide summarized information for selecting utility values from an increasing number of primary studies eliciting HSUVs. Quality appraisal (QA) of such SLRs is an important process towards the credibility of HSUVs estimates; yet, authors often overlook this ...

  16. Identification, Review, and Use of Health State Utilities in Cost

    When possible, HSUs are identified by a systematic review of the literature, but, again, this rarely happens. In 2014, ISPOR established a Good Practices for Outcome Research Task Force to address the use of HSUs in cost-effectiveness models. This task force report provides recommendations for researchers who identify, review, and synthesize ...

  17. Systematic Review of Outcomes for Assessment of Medication Adherence

    The general characteristics of studies included in the literature review are presented in Table 2. There were 218 clinical trials, 79 prospective cohort studies, and 11 economic evaluations. No relevant value framework studies were identified. The articles originated from 59 countries with most being conducted in the United States (n = 105).

  18. Good practices for the translation, cultural adaptation ...

    Within current literature and practice, the category of patient-reported outcome (PRO) measures has been expanded into the broader category of clinical outcome assessments (COAs), which includes the subcategory of PRO, as well as clinician-reported outcome (ClinRO), observer-reported outcome (ObsRO), and performance outcome (PerfO) measure subcategories. However, despite this conceptual ...

  19. ISPOR

    A systematic review was conducted of the peer-reviewed literature in PubMed, EMBASE, Web of Science Core Collection, EBSCOhost Business Source Complete; and of the gray literature. Preliminary results were presented at ISPOR conferences, and this article benefited from 2 review rounds of peer review among ISPOR Biosimilar Special Interest Group ...

  20. PDF Systematic Literature Review Evaluating Quality Assessment Tools

    Background. Quality assessment (QA) is an important part of a well-designed systematic literature review (SLR), as assessment of the methodological quality of a study is crucial to ensuring that results are valid (ie, the design and methods have successfully prevented bias). Quality assessment tools (QATs) are used to determine the risk of bias ...

  21. ISPOR

    Through this review, a consensus emerged on a broad approach, along with a detailed critique of the strengths and weaknesses of the differing methodologies. The results of this review are set out as "Translation and Cultural Adaptation of Patient Reported Outcomes Measures—Principles of Good Practice" and are reported in this document.