Objective: To develop a provisional core set of response measures for clinical trials of systemic sclerosis (SSc).
Methods: The Scleroderma Clinical Trials Consortium (SCTC) conducted a structured, 3-round Delphi exercise to reach consensus on a core set of measures for clinical trials of SSc. Round 1 asked the SCTC investigators to list items in 11 pre-defined domains (skin, musculoskeletal, cardiac, pulmonary, cardio-pulmonary, gastrointestinal, renal, Raynaud phenomenon and digital ulcers, health-related quality of life and function, global health, and biomarkers) for SSc clinical trials. Round 2 asked respondents to rate the importance of the chosen items and was followed by a meeting, during which the Steering Committee discussed the feasibility, reliability, redundancy and validity of the items. Round 3 sought to obtain broader consensus on the core set measures. Members also voted on items that had data on feasibility but lacked data on reliability and validity, but may still be useful research outcome measures for future trials.
Results: A total of 50 SCTC investigators participated in round 1, providing 212 unique items for the 11 domains. In all, 46 (92%) participants responded in round 2 and rated 177 items. The ratings of 177 items were reviewed by the Steering Committee and 31 items from the 11 domains were judged to be appropriate for inclusion in a 1-year multi-centre clinical trial. In total, 40 SCTC investigators completed round 3 and ranked 30 of 31 items as acceptable for inclusion in the core set. The Steering Committee also proposed 14 items for a research agenda.
Conclusion: Using a Delphi exercise, we have developed a provisional core set of measures for assessment of disease activity and severity in clinical trials of SSc.
Statistics from Altmetric.com
Systemic sclerosis (scleroderma, SSc) has the highest case-specific mortality of any rheumatic disease, is associated with substantial morbidity,1 and has many detrimental effects on health-related quality of life.2 There has been significant progress over the past 10 years in the development and validation of outcome measures3 and refinement of trial methodology in SSc.4 This is paralleled by an increased understanding of the pathogenesis of SSc5 6 and development of targeted therapies.7 In order to make progress in therapy for SSc, researchers need a standardised core set of response measures that encompasses the complexity and clinical heterogeneity of SSc. Once developed, future research will be needed to determine the most appropriate algorithm that incorporates these core set measures into a feasible, reliable, and valid composite response index for defining clinically important improvement and worsening. Identification of core set measures is prerequisite to the development of a response index.
We conducted a structured Delphi exercise, with participation of the Scleroderma Clinical Trials Consortium (SCTC) membership, to reach consensus on a provisional core set of measures for SSc clinical trials.
A multisystem process was used to develop the core set of measures, which are described below and summarised in fig 1.
This project is an extension of work previously performed under the auspices of the Outcome Measures in Rheumatology Clinical Trials (OMERACT).8–10 The project was conducted using well recognised consensus formation methodology, namely the Delphi technique and the Nominal Group technique.11 The Delphi technique is a method for systematic solicitation and collation of judgments on a particular topic through a set of carefully designed sequential questionnaires interspersed with summarised information and feedback of opinions derived from earlier responses. It is a group process that utilises written responses as opposed to bringing individuals together. By employing a process of soliciting information in questionnaires, aggregating the information, and repeating the questionnaires to gain agreement among the participants, the Delphi technique reaches consensus in an elegant and efficient fashion. The Nominal Group technique11 is a structured discussion of each idea that is led by a facilitator, after which a voting process takes place to decide upon the ideas. The Delphi and Nominal Group techniques have previously been used successfully to develop outcome measures in other rheumatic diseases.12–14
Steering Committee members meeting
The Steering Committee (listed in the Appendix) for the development of provisional core set measures consisted of nine international scleroderma experts (seven from the USA, two from Europe). Before the first Delphi exercise was conducted, Steering Committee members listed the following 11 domains of illness relevant to people with SSc: skin, musculoskeletal, cardiac, pulmonary, cardio-pulmonary, gastrointestinal, renal, Raynaud phenomenon and digital ulcers, health-related quality of life and function, global health, biomarkers (serological markers, immune function, vascular injury, disease activity, and fibrosis), and others (items not captured by the aforementioned domains).
Structured Delphi exercise
We invited the SCTC investigators to participate in the development of a scleroderma response index for a 1-year multi-centre randomised clinical trial in SSc. The SCTC is a non-profit organisation dedicated to advancing clinical trial methodology and conduct to develop better treatments for SSc (http://www.sctc-online.org). The SCTC has an international membership of over 50 centres in the USA, Canada, South America, Asia, and Europe. At the time of the first Delphi exercise, there were 65 member institutions in the SCTC.
We initiated a Delphi exercise via. e-mail in December 2005 and asked the 65 investigators of the SCTC to list the items in the 11 domains that could be used in development of a scleroderma combined response index for a 1-year multi-centre clinical trial. We did not differentiate between the limited and diffuse cutaneous forms of SSc. Two reminder e-mails were sent to the SCTC investigators, 3 weeks apart. Responses were either e-mailed or faxed.
Steering Committee members were asked to consider the proposed items for their usefulness in a hypothetical 1-year multi-centre clinical trial and assess their feasibility, reliability, and validity (face, content, construct, and criterion validity) based on the OMERACT filters.15 Items considered either to not be feasible in multi-centre clinical trial or to lack reliability or validity data (based on recent observational studies and clinical trials) were dropped.
The investigators of the SCTC who responded to the first request were e-mailed a second Delphi exercise and were asked to rate the importance of the chosen items to be included in a definition of a scleroderma response index for a hypothetical 1-year multi-centre clinical trial. They were asked to rate each one on a scale of 1 (extremely inappropriate for a combined measure) to 9 (extremely appropriate for a combined measure). Two reminder e-mails were sent to the SCTC investigators.
Steering Committee members conducted a 1-day nominal group meeting in early August 2006 and discussed the issues of feasibility, reliability, redundancy, and validity, with special emphasis on responsiveness to change, based on the OMERACT filters.15 Although all proposed items were considered, the Steering Committee concentrated on items from round 2 that had a median score of 6 or greater on a scale of 1 to 9. The Steering Committee was provided with key articles summarising the current available data on outcome measures.2 9 10 16–19 After comprehensive discussions, the Steering Committee voted on each item in the core set. A priori, at least 66% consensus was required (six out of nine members present) to accept or reject an item for the core set. We chose 66% instead of the traditional 80% as there were only nine members in the Steering Committee and an 80% consensus would have required agreement among eight of nine experts. This latter was considered an unrealistic and overly stringent requirement. For purposes of practicality and easy of use, the experts were also instructed to choose two to four core set items for each domain. The group also agreed to include items that appear highly promising and feasible for use in future clinical trials but lacked published data regarding reliability and validity.
We sought to obtain broader consensus on the core set measures from the investigators of the SCTC. A third round of the Delphi exercise was conducted to ensure that scleroderma experts around the world agreed with the proposed core set items. This Delphi exercise was sent in October 2006 and investigators of the SCTC were asked to provide their agreement on a scale of 1 (complete disagreement on the core set measure for a combined measure) to 9 (complete agreement on the core set measure for a combined measure). All relevant articles were attached with the Delphi questionnaire.
Statistical analyses were descriptive in nature. The results are reported as median (25th–75th percentile). Based on the RAND Corp./University of California, Los Angeles (RAND/UCLA) Appropriateness Method,20 median scores in the 1–3 range during the third round of the Delphi were considered as inappropriate (complete lack of consensus), those in the 4–6 range were considered as uncertain (some consensus), and those in the 7–9 range were considered as appropriate (good to excellent consensus).
Of the 65 SCTC investigators, 50 (77% response rate, 30 from the USA, 13 from Europe, 5 from Canada, 1 from Asia, and 1 from South America) provided their responses to Delphi round 1 exercise. The participants provided 212 unique items for the 11 domains. The Steering Committee members were asked to consider the 212 items for their usefulness in a 1-year multi-centre clinical trial and assess their feasibility, reliability, redundancy, and validity. Of the 212 items, the Steering Committee decided to retain 177 items; 20 were dropped due to lack of feasibility (eg, ultrasound for skin thickness) and 15 were dropped either due to lack of reliable or valid data in recent observational studies and clinical trials (eg, oral aperture measurement) or they did not have an available instrument to capture the underlying domain (eg, elasticity of the skin; data summarised in Furst et al (2005, 2007), Merkel et al and Khanna et al).8–10 21 The combined cardio-pulmonary domain was dropped as the items overlapped completely (were redundant) with the separate cardiac and pulmonary domains. The previously combined Raynaud phenomenon/digital ulcer domain was separated into two domains: Raynaud phenomenon and digital ulcer.
Of the 50 participants, 46 (92%) responded in round 2 with their ratings of the 177 items within each domain. The ratings of the 177 items were provided to the Steering Committee members, and the 60 items with a median score of 6 or greater were discussed further.
Out of 60 items, some of the separate items were measured by the same outcome measure and were combined and resulted in 45 items. For example, echocardiogram with Doppler had four separate items—measurement of left ventricular parameters, diastolic function parameters, right ventricular parameters, and pulmonary artery pressure. These were combined into one item under cardiac domain (table 1).
Similarly, we combined items that were descriptive of right heart catheterisation (pulmonary artery pressure, pulmonary vascular resistance, wedge pressure, and cardiac output/cardiac index), pulmonary function tests (forced vital capacity, diffusion capacity, and total lung capacity), and dyspnoea indices (University of California San Diego Dyspnoea Questionnaire, St. George Respiratory Questionnaire, and Mahler Dyspnoea Index). Certain items, such as 24-h pH monitoring, were not considered to be feasible in a multi-centre study. Other items, such as palmar crease to digital tip measurement, were not responsive-to-change in recent clinical trials.22
The Steering Committee decided on 31 (out of 45) items in 11 domains (table 1). All chosen items were judged to be feasible for a 1-year multi-centre clinical trial. Of the 31 core set items, 4 were selected that were particularly suited for organ-specific trials: right heart catheterisation, 6-min walk test, and Borg Dyspnoea Index were specifically encouraged for clinical trials where a drug is being assessed specifically for pulmonary hypertension; high-resolution computer tomography (HRCT) was encouraged for trials focusing on lung parenchymal disease. Some of the other core items such as tendon friction rubs, visual analogue scale to assess skin activity from patient’s and doctor’s perspectives, and scleroderma transition items were considered as extremely important by the Steering Committee during nominal group technique (although they lack complete published data on reliability and validity) and were also rated highly by the SCTC members during round 3 of the Delphi assessment.
A total of 14 of the 45 items were feasible2 3 18 27–29 but lacked sufficient data on reliability and validity and were included in the research agenda (table 2). In all, 40 SCTC investigators returned the Delphi 3 exercise (87%) via e-mail. The median score for 30 of 31 items was 7 (out of maximum 9) or greater (appropriate based on the RAND/UCLA Appropriateness Method).20 The exception was tender joint count that received a score of 6. There was higher variability in the median score for the items chosen for the research agenda. The median score ranged from 5, for a measure of telangiectasia, to 8, for a scleroderma gastrointestinal tract instrument (table 2).
Using Delphi and Nominal Group techniques, we have developed a provisional core set of measures for assessment of disease activity and severity in clinical trials of SSc. The proposed core measures should be applied to future observational and multi-centre clinical trials in SSc. These measures have been validated to approximately the same degree as were the items in the core set for the American College of Rheumatology Response Criteria for rheumatoid arthritis when those criteria were first published.30 31 The SSc core set of measures will facilitate the standardisation, conduct, reporting, and interpretation of clinical trials, and facilitate future meta-analyses and comparisons of therapies. In addition, core set measures are the first step in development of a fully validated composite response index.13 14 30 31
Nearly all the core set of measures are considered “standard of care” and are frequently used to assess patients with early SSc in clinical practice.10 32 33 For example, the modified Rodnan skin score, echocardiogram with Doppler, pulmonary function test, Scleroderma-Health Assessment Questionnaire Disability Index, and acute phase reactants are considered standard of care in the management of early SSc by many scleroderma experts.10 32–34 Other measures, such as the durometer, 6-min walk test, Raynaud Condition Score, HRCT of the lungs, tendon friction rubs, and visual analogue scale (VAS)/Likert scores to assess disease activity, have been found to be feasible, reliable, valid, and responsive to change in recent SSc multi-centre clinical trials.16 23 35–40 Additionally, the Steering Committee chose 11 domains because SSc is a multisystem disease involving different organ systems, including a significant detrimental effect on health-related quality of life (HRQOL) and functional disability.
One of the primary goals of the exercise was to avoid domains/items that cannot be measured in a multi-centre study. For example, measures such as MRI to assess pulmonary hypertension or ultrasound to assess skin thickness were not included due to the cost associated with these procedures and inter-reader variability associated with the techniques, respectively. Indeed, for items where there is a high inter-reader variability (such as echocardiogram and HRCT), central reading was strongly recommended. All items chosen in the provisional core set were considered to be feasible in a 1-year multi-centre study. For example, although HRCT is associated with high cost, it was shown to be feasible in two recent SSc multi-centre studies, one in the US40 and one in Europe.41
The patient and doctor-reported outcomes were included to be measured as either a VAS or a Likert scale. Different analyses have show that VAS and Likert responses are highly correlated and yield similar results in patients with chronic diseases.42–44 Since Likert responses are easier to administer and interpret, they may be preferable to a VAS. In addition, the Likert scales can be external anchors to assess minimally important differences.45
During round 3, there was good to excellent consensus on 30 of 31 core set measures in the 11 domains. The exception was tender joint count that received a score of 6 out of 9. We decided to keep the tender joint count in the core set measures as joint pain affects a majority of patients with SSc.
The Steering Committee also discussed a research agenda; measures that were not included in the core measures that appear very promising, but are not yet ready for clinical trials and hence may be useful outcomes measures for future clinical trials. Most of the items in the research agenda have been determined to be feasible, but reliability and validity—including responsiveness to change—has not been determined for multi-centre clinical trials. Although cardiac MRI to assess cardiac function and pulmonary artery is expensive, the data so far is encouraging46 and so it was included in our research agenda. If MRI is shown to be a reliable and valid tool in future studies, it may be a useful outcome measure despite its cost (similar to HRCT). This list of items can be incorporated in future observational and clinical trials and, if found responsive, can be considered in future versions of the scleroderma composite response index.
Our study has both strengths and limitations. First, we were able to conduct a Delphi exercise using member institutions of the SCTC who conduct clinical treatment trials of new medications in SSc. Second, our response rate was good: 77% for round 1, 92% for round 2 and 80% for round 3. The principal limitations were, firstly, that the Delphi and Nominal Group techniques are consensus building approaches rather than data-driven approaches. Secondly, we chose a cut-off of ⩾6 during rounds 1 and 2 as the majority of items received a median score of 6 during these rounds and we wanted to incorporate large numbers of items for the Steering Committee and SCTC members to vote on; a cut-off of ⩾7 was chosen during round 3 to present items with good to excellent agreement. Thirdly, we chose a consensus of 66% instead of the traditional 80% during our Nominal Group technique assessment as only members of the Steering Committee (n = 9) participated in this. Acknowledging this limitation, we obtained broader consensus on the core set measures from the investigators of the SCTC using a Delphi exercise where 30 of 31 items were rated as having good to excellent agreement regarding appropriateness for a 1-year multi-centre clinical trial.
We are currently performing a prospective longitudinal observational clinical study to gather information on the measurement characteristics of the items in the preliminary core set. This data will be used to further develop a reliable, valid and responsive set of SSc measures and also to develop a validated response index. This is being accomplished through prospective, data-driven, consensus building techniques to develop and quantitatively evaluate candidate definitions for a combined response index for SSc, as has been performed in other rheumatic diseases.12–14
Scleroderma Clinical Trials Consortium steering committee
Co-Chairs: Dinesh Khanna; Daniel E Furst.
Committee members (in alphabetical order): Philip J Clements, Christopher P Denton, Edward Giannini, Daniel E Lovell, Maureen D Mayes, Marco Matucci-Cerinic, Peter A Merkel, James R Seibold and Virginia D Steen.
Co-authors (in alphabetical order): Baron, Murray, Jewish General Hospital, Montreal, Quebec, Canada. Csuka, Mary Ellen, Medical College of Wisconsin, Milwaukee, Wisconsin, USA. Berezne, Alice, Hôpital Cochin, Paris, France. Briet, Samuel N, University of New South Wales, Sydney, Australia. Brühlmann, Pius, University Hospital, Zurich, Switzerland. Buch, Maya H, University of Michigan, Ann Arbor, Michigan, USA. Catoggio, Luis, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina. Collier, David, University of Colorado Health Sciences Center, Denver, Colorado, USA. Crofford, Leslie, University of Kentucky, Lexington, Kentucky, USA. Czirják, László, University of Pécs, Pécs, Hungary. Derk, Chris T, Jefferson Medical College, Philadelphia, Pennyslvania, USA. Distler, Oliver, University Hospital Zurich, Zurich, Switzerland. Doyle, Mittie Kelleher, Centocor, Malvern, Pennyslvania, USA. Farge-Bancel, Domonique, Hopital Saint-Louis, Paris, France. Fessler, Barri, University of Alabama at Birmingham, Alabama, USA. Foeldvari, Ivan, Klinikum Eilbek, Hamburg, Germany. Goldberg, Avram, Albert Einstein College of Medicine, Bronx, New York, USA. Gran, Jan Tore, Rikshospitalet, Oslo, Norway. Grau, Raffael, Indiana University, Indianapolis, Indiana, USA. Griffing, W Leroy, Mayo Clinic College of Medicine, Scottsdale, Arizona, USA. Hayat, Samina, Louisiana State University, Shreveport, Louisiana, USA. Herrick, Ariane L, University of Manchester, Manchester, UK. Hsu, Vivien, RWJ Medical School, New Brunswick, New Jersey, USA. Hummers, Laura K, Johns Hopkins University, Baltimore, Maryland, USA. Ìnanç, Murat, University of Istanbul, Istanbul, Turkey. Johnson, Sindhu, University of Toronto, Toronto, Ontario, Canada. Kahaleh, M Bashar, Medical University of Ohio, Toledo, Ohio, USA. Lafyatis, Robert A, Boston University Medical Campus, Boston, Massachusetts, USA. Lee, Peter, Mount Sinai Hospital, Toronto, Ontario, Canada. Mahmud, Tafazzul H, Shaikh Zayed Postgrad Med Institute, Lahore, Pakistan. Malcarne, Vanessa, San Diego State University, San Diego, California, USA. McHugh, Neil J, Royal National Hospital, Bath, UK. Martin Richard W, College of Human Medicine, Michigan State University, Grand Rapids, Michigan, USA. McKown, Kevin, University of Wisconsin Medical School, Madison, Wisconsin, USA. Medsger, Thomas A, Jr, University of Pittsburgh, Pittsburgh, Pennysylvania, USA. Moreland, Larry, University of Alabama at Birmingham, Alabama, USA. Pope, Janet E, St Joseph Health Care London, London, Ontario, Canada. Rich, Eric, Centre Hospitalier de l’Université de Montréal, Montreal, Quebec, Canada. Rothfield, Naomi F, University of Connecticut, Farmington, Connecticut, USA. Schiopu, Elena, University of Michigan, Ann Arbor, Michigan, USA. Scorza, Raffaella, University of Milan, Milan, Italy. Senécal, Jean-Luc, Centre Hospitalier de l’Université de Montréal, Montreal, Quebec, Canada. Shanahan, Joseph, Duke University Medical Center, Durham, North Carolina, USA. Simms, Robert W, Boston University school of Medicine, Boston, Massachusetts, USA. Strand, Vibeke, Stanford University, Portola Valley, California, USA. Silver, Richard M, Medical University of South Carolina, Charleston, South Carolina, USA. Sweiss, Nadera, University of Chicago, Chicago, Illinois, USA. Valentini, Gabriele, Second University of Napoli, Napoli, Italy. van den Hoogen, Frank H J, St Maartenskliniek and Radboud University Medical Centre, Nijmegen, The Netherlands. Veale, Douglas, St Vincent's University Hospital, Dublin, Ireland. Voskuyl, Alexander E, VU University Medical Center, Amsterdam, The Netherlands. Wigley, Fred, Johns Hopkins University, Baltimore, Maryland, USA. Wollheim, Frank A, Lund University Hospital, Lund, Sweden.
Funding: This project was funded by a National Institutes of Health Award (NIAMS U01 AR055057-01). DK was supported by the Scleroderma Foundation (New Investigator Award), a National Institutes of Health Award (NIAMS K23 AR053858-01A1), and a grant from the Scleroderma Clinical Trial Consortium. PAM is supported by a Mid-Career Clinical Investigator Award (NIAMS K24 AR2224-01A1).
Competing interests: None declared.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.