A novel approach to evidence-based medicine
Manager, HEOR
Health economics and outcomes research (HEOR) is increasingly being recognized by regulatory agencies in the assessment of medical products. Nevertheless, challenges to HEOR methodology present important opportunities to expand the breadth of HEOR in order to provide more reliable and valid real-world evidence for regulatory agencies, clinicians, and patients.
One such challenge is the lack of well-defined, validated patient outcomes. These patient outcomes can appear as measures of disease activity, treatment response, or even diagnostic criteria. While the patient outcomes may be widely recognized by clinicians, they lack formal definitions, thus hindering their use in observational research relying on data captured from real-world practice (eg, EMRs, charts, claims).
The Delphi method could add to the armamentarium of HEOR tools and help overcome some of the challenges of observational research. Specifically, the Delphi method allows for the development of clinical guidelines, resulting in well-defined clinical measures suitable for use in HEOR. For example, the 28-item disease activity score (DAS28) that is used to assess disease activity among patients with rheumatoid arthritis (RA) represents a validated, widely accepted metric for assessing patient outcomes.1 However, how does an investigator study the impact of treatment on the DAS28 in an ill-defined population with loosely codified criteria using real-world data? Here is a setting in which the Delphi method could be leveraged to develop clinical guidelines
The Delphi Method
Application in HEOR
Figure 1. Delphi Method Schema

The first step in the Delphi method involves defining the research question. In this example, the question is how patients with ERPRA should be identified and classified. What patient demographics (eg, age at diagnosis) should be considered, and what clinical characteristics (eg, C-reactive protein, sedimentation rate, duration of RA, and number of prior treatment failures) should be weighed when classifying a patient as having ERPRA? These are the questions that would form the basis of the survey sent to experts in the field.
During survey development, the research team must decide whether to pose very broad, open-ended questions to encourage brainstorming and a wide range of responses on the topic or whether to pose narrower questions warranting specific and focused responses. The scope of the questions should be informed by the existing literature and current clinical practice, eg, if the clinical characteristics of ERPRA assessment are known, questions in the Delphi technique could focus on establishing cutoffs for each characteristic. The first round of questions should be broad in order to elicit a wide range of responses. It is the subsequent task of the researchers to identify common themes, discard irrelevant information, and tailor the next round of questions to narrow the responses.
The survey would be sent to experts in the field, which in this example, is comprised of practicing rheumatologists. Participants would remain anonymous throughout the rounds of surveys to allow for honest responses that are not influenced by any one participant who might carry significant weight in the field. In subsequent rounds, a participating expert would be more willing to revise their responses or opinions after digesting the summary of prior rounds if anonymity is maintained. It is in this way that responses in subsequent rounds would converge to form a consensus.
Experts can be identified in a number of ways. Professional relationships in a given field may be sufficient to identify a dozen experts. If a larger panel is needed, membership to professional societies may provide access to a greater number of potential expert panelists. The hundreds of clinicians who attend Cardinal Health Specialty Solutions Summits represent yet another potential pool of experts in the fields of urology, rheumatology, or oncology.
The number of experts needed in a Delphi exercise depends on several factors:
- Heterogeneity of the field(s) where results will be implemented (eg, a field comprised of several subspecialties, resulting in guidelines that would apply to multiple specialties).
- Internal and external validity.
- The size of the field itself.
If the diagnostic criteria being developed by the Delphi method will be used across various specialties, a greater number of expert panelists may be needed to account for the anticipated variability in responses. Related are the considerations given to internal and external validity: if the expert panel consists of half of the total experts in the field, it is likely to be more generalizable and, thus, accepted by other experts who were not involved in the Delphi exercise. The sample sizes of Delphi experts typically range from a dozen experts to over a hundred experts. Once the survey has been fielded and responses are obtained, common themes are identified, and responses are grouped thematically. Irrelevant data is also discarded: a response from one expert that does not appear related to a response from any other expert should be considered for exclusion in subsequent rounds in order to guide the panel to consensus. In addition to the research team, which should include at least one clinician, review by a key opinion leader may facilitate this phase of the Delphi process. Quantitative analyses of the responses will provide summaries of the proportion of experts responding in a particular way.
The next round of the Delphi exercise should be developed with the goal of narrowing the focus of the questions based on results from the previous round. For example, if the first round identified three clinical characteristics to assess in the classification of ERPRA, two of which reached consensus, the next round should focus on soliciting feedback on the item that did not reach consensus (eg, asking whether it should be considered for assessment of ERPRA) as well as feedback on relevant cutoffs for the items that reached consensus (eg, positive/negative, present/absent, and values of 3.2 or greater). Where possible, items posed initially as open-ended questions should be transformed into dichotomous or polytomous variables in subsequent rounds of surveys.
The rounds of surveys continue until consensus is reached on all questions being posed. Several cutoffs have been used to identify when a question has reached consensus. Responses reported by > 80% of the experts can be considered as having reached consensus.3 Alternative cutoffs may use 66.7% of the panel to determine consensus.5 Using the former cutoff threshold, similar responses reported by > 10% but < 80% of respondents should be included in subsequent rounds of Delphi surveys. When values are solicited, the variation in values reported by the experts should be considered. For example, an interquartile range of ≤ 2 units may be defined as having reached consensus. Once all responses associated with the research question attain consensus, the Delphi process can cease, and a summary of quantitative analyses can form the basis of expert recommendations on the given topic.
Although the recommendations for a particular clinical practice may have achieved expert consensus in the Delphi process, validation of the guidelines further strengthen the evidence supporting the recommendations and aid in their adoption. There are many aspects of validity, including construct validity, discriminative validity, sensitivity to change, and feasibility. Patient data containing the items of the consensus guidelines (eg, age at diagnosis, CRP levels, and swollen joint count) can be used to assess the validity of the expert guidelines. Analyses may include principal component analysis, estimates of association between the guidelines and other clinical measures, and an assessment of the feasibility to implement the consensus guidelines. Finally, a validated patient outcome (or patient characteristic in the case of diagnostic criteria) can be used in observational research.
Conclusion
References
1 Wells G, et al. Validation of the 28-joint disease activity score (DAS28) and European league against rheumatism response criteria based on C-reactive protein against disease progression in patients with rheumatoid arthritis, and comparison with the DAS28 based on erythrocyte sedimentation rate. Ann Rheum Dis 2009; 68:954-960.
2 Custer RL, Scarcella JA, Stewart BR. The modified Delphi technique—a rotational modification. J Vocat Tech Ed 1999; 15:50-58.
3 Weiss PF, et al. Development and retrospective validation of the juvenile spondyloarthritis disease activity index. Arthritis Care Res (Hoboken) 2014; 66:1775-1782.
4 Burmeister EA, et al. Using a Delphi process to determine optimal care for patients with pancreatic cancer. Asia Pac J Clin Oncol 2016; 12:105-114.
5 Mohile SG, et al. Geriatric assessment-guided care processes for older adults: a Delphi consensus of geriatric oncology experts. J Natl Compr Canc Netw 2015; 13:1120-1130.
6 Nuesslein HG, et al. Real-world effectiveness of abatacept for rheumatoid arthritis treatment in European and Canadian populations: a 6-month interim analysis of the 2-year observational, prospective ACTION study. BMC Musculoskelet Disord 2014; 15:14.
FOCUS Magazine
Explore the importance of Health Economics and Outcomes Research (HEOR) and real-world data in meeting the demands of a dynamic healthcare system.