Article Text
Abstract
Objectives To develop an Outcome Measures in Rheumatology (OMERACT) ultrasonography score for monitoring disease activity in giant cell arteritis (GCA) and evaluate its metric properties.
Methods The OMERACT Instrument Selection Algorithm was followed. Forty-nine members of the OMERACT ultrasonography large vessel vasculitis working group were invited to seven Delphi rounds. An online reliability exercise was conducted using images of bilateral common temporal arteries, parietal and frontal branches as well as axillary arteries from 16 patients with GCA and 7 controls. Sensitivity to change and convergent construct validity were tested using data from a prospective cohort of patients with new GCA in which ultrasound-based intima–media thickness (IMT) measurements were conducted at weeks 1, 3, 6, 12 and 24.
Results Agreement was obtained (92.7%) for the OMERACT GCA Ultrasonography Score (OGUS), calculated as follows: sum of IMT measured in every segment divided by the rounded cut-off values of IMTs in each segment. The resulting value is then divided by the number of segments available. Thirty-five members conducted the reliability exercise, the interrater intraclass correlation coefficient (ICC) for the OGUS was 0.72–0.84 and the median intrareader ICC was 0.91. The prospective cohort consisted of 52 patients. Sensitivity to change between baseline and each follow-up visit up to week 24 yielded standardised mean differences from −1.19 to −2.16, corresponding to large and very large magnitudes of change, respectively. OGUS correlated moderately with erythrocyte sedimentation rate, C reactive protein and Birmingham Vasculitis Activity Score (corrcoeff 0.37–0.48).
Conclusion We developed a provisional OGUS for potential use in clinical trials.
- giant cell arteritis
- ultrasonography
- outcome assessment, health care
- systemic vasculitis
Data availability statement
Data are available upon reasonable request.
Statistics from Altmetric.com
WHAT IS ALREADY KNOWN ON THIS TOPIC
Ultrasonography is increasingly used for diagnostic and monitoring purposes in giant cell arteritis (GCA).
The ‘halo’ sign reflecting inflammatory wall thickening of medium and large arteries is sensitive to change during immunosuppressive treatment.
An internationally accepted ultrasound score of temporal and large arteries to be used in clinical trials is still needed.
WHAT THIS STUDY ADDS
Here we present the provisional OMERACT GCA Ultrasonography Score (OGUS) along with its metric properties.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
OGUS may be used as a monitoring tool and outcome measure in research, particularly in trials to test the efficacy of pharmacological interventions.
Additional studies are needed to further validate the score in a patient-based reliability exercise, randomised controlled drug trials and independent GCA cohorts.
Introduction
Giant cell arteritis (GCA) is the most common form of primary vasculitis.1 Imaging, particularly ultrasonography, is increasingly used to establish the diagnosis of this disease; however, its role as a tool to monitor disease activity as well as the value of ultrasonography results as an outcome parameter is still uncertain.2
In patients with untreated GCA, ultrasonography of cranial and extracranial large arteries reveals a homogeneous and hypoechoic wall thickening, deemed as the ‘halo sign’.3–5 Once therapy is started, intima–media thickness (IMT) in temporal arteries is rapidly reduced, whereas in extracranial vessels, such as the axillary arteries, it takes a longer time for IMT to decrease.6 Besides, the initial hypoechoic appearance of extracranial large arteries changes to an isoechoic/hyperechoic thickening, including hyperechoic lines, which is the so-called ‘multilinear pattern’.7–9
The possible application of ultrasonography as a monitoring tool in GCA has only been investigated recently. In a retrospective study of 42 patients with GCA with follow-up ultrasonography examinations at 6, 12 and 24 months, a reduction of the IMT in extracranial arteries was observed in 45% of patients, as compared with 85% of cases in the temporal arteries.7 In the ‘GCA treatment with ultra-short glucocorticoid and tocilizumab (GUSTO)’ trial, a sharp decrement of the IMT in temporal and axillary arteries was observed by sonography after glucocorticoid pulse therapy, followed by re-increment to approximately baseline levels after 4 weeks despite tocilizumab monotherapy, succeeded by a gradual decrement of the IMT.10 In the ‘Prognosis of Temporal Arteritis (PROTEA)’ study, it was demonstrated that the halo sign, particularly the number of segments with halo and the sum and maximum halo IMTs, was sensitive to change to standard glucocorticoid treatment, with rapid reduction of the IMT at temporal arteries, and a delayed response at axillary arteries.11
The Outcome Measures in Rheumatology (OMERACT) ultrasonography large vessel vasculitis working group has recently developed and tested definitions of key elementary lesions for GCA, particularly for the halo sign and chronic changes of vasculitis of the axillary artery.4 9 Given the absence of a consensus-based ultrasonography score for GCA, the present work was the next step with the objective of developing an ultrasonography composite score to be used in clinical trials and other research studies. Herein, we present the development of the novel, provisional OMERACT GCA Ultrasonography Score (OGUS) along with its metric results (reliability, sensitivity to change and construct validity).
Methods
Study design
This project has been performed by the OMERACT ultrasonography large vessel vasculitis working group. The development of the score followed the methodology stipulated by the OMERACT Instrument Selection Algorithm (OFISA).12 Accordingly, a systematic literature review (step 1) was conducted first, followed by agreement on key elementary ultrasonography lesions for acute and chronic GCA (step 2) and subsequent testing for interobsever and intraobserver reliability (step 3). Results of these steps have been published earlier.4 13 Since these papers were not related to the agreement and validation of a scoring system,4 13 steps 2 (Delphi), 3 (reliability exercise) and 4 (final agreement and evaluation of psychometric properties) of the OFISA had to be repeated for the purpose of this project.
Delphi exercise
We performed a Delphi exercise in seven rounds using Survey Monkey as a platform.
Delphi rounds 1–3
The first questionnaire was designed by the steering group (CDe, CPo, WAS) together with the OMERACT chairs (AI, GAWB, LT) and was approved by the OMERACT mentor (MADA). The questionnaire contained six questions on the experience of respondents with ultrasonography in large vessel vasculitis and their work setting, as well as 38 statements/questions about which arterial segments to include, where and how to measure the IMT and how to transform measurements into a final score (see online supplemental file 1 for the full questionnaire). In addition, the group members received a factor analysis intended to identify the arterial segments contributing most to a composite score, as well as preliminary analyses on the sensitivity to change of different candidate scores tested in the PROTEA cohort (see also below). These data had the scope to inform the decision of group members; the final ultrasonography score was not selected by statistical means.
Supplemental material
The majority of questions were of a rating or rank-ordering type; a number of questions were also in an open-answer or single-choice format (online supplemental file 1). For rating, group members were asked to express their level of agreement or disagreement with a certain statement according to a 1–5 Likert scale with 1=strongly disagree, 2=disagree, 3=neither agree nor disagree, 4=agree and 5=strongly agree. According to the OMERACT methodology, a consensus was accepted when at least 75% of respondents voted with a score of 4 or 5 for a given statement. Statements rated 4 or 5 by <50% of respondents were excluded, while all other statements were revoted in the subsequent round after their modification based on the comments of the working group members. Rank ordering of statements helped the respondents to understand which statements had the highest priority within the group.
Several reminders were sent to the group members when the questionnaire was not returned. For all except the first Delphi round, group members received a summary of the results from the preceding questionnaire (percentage of agreement for each statement, ranking of statements where applicable) including the anonymised comments from respondents as well as the new survey. Two exceptions from the consensus rule were made after the first Delphi round: (1) None of the statements Q18–Q22 about the candidate ultrasonography scores reached the 50% level of qualification required for revoting in the second round. However, as this aspect was considered fundamental to this project, the statements were rephrased according to the comments of respondents and included in the subsequent round. (2) Statement 32 was also included in the second round although it had already reached consensus in the first round. This was because a modification of the text was made based on the comments of the group members.
After the third Delphi round, there was still no consensus about the candidate ultrasonography score. Several email communications among the members of the working group took place; however, some members argued that no decision could be made until data on reliability were obtained and data from the PROTEA study (which had not been published at that time) became publicly available. After consultation with the OMERACT mentor and chairs, it was decided to postpone the decision to select the ultrasonography score until subsequent phases of the project had been attained, namely testing of all candidate ultrasonography scores for reliability on static images as well as evaluating sensitivity to change and convergent construct validity using data from the PROTEA study (see below).
Delphi rounds 4–7
The group members received the results of the online reliability exercise on static images, as well as data on the sensitivity to change and convergent construct validity of all candidate ultrasonography scores along with the new questionnaire. In the fourth round, group members were asked whether they preferred the semiquantitative score14 or the quantitative scores (IMT-normal, IMT-rounded normal, IMT-cut-off, IMT-rounded cut-off, where ‘normal’ and ‘cut-off’ refer to the mean normal IMTs and IMT cut-off values in each segment, respectively according to Refs 15 and 16; see also table 1 for detailed explanation). In the fifth round, they selected the top two candidates among the quantitative scores, and in the sixth round, the final selection was made. An additional question about whether the final score should or should not be multiplied by eight (corresponding to the maximum number of segments in the score) was also included in this round. Since a consensus for the final score was not reached in round 6, an additional Delphi round was needed where respondents expressed their agreement with the final score on a 1–5 Likert scale (see also above).
Web-based interreader and intrareader reliability exercise
All members of the working group were invited to prospectively collect ultrasonography images of bilateral temporal (common trunk of superficial temporal arteries, frontal and parietal branches) and axillary arteries from patients with GCA before and 4 weeks after glucocorticoid therapy, as well as from controls without vasculitis. Image collection was approved by the local ethics committee of eight out of nine contributing centres (Berlin Medical Association, Eth-04-17; Lisbon Academic Medical Centre Ethics Committee, reference 08/17; University of Pavia Ethics Committee, E 2016 0031606; Ethics Committee of the University of Twente Mec-U, R19-072; Ethics Committee of the University of Regensburg, 11-101-0273; Capital Region of Denmark, Ethics committee nr.: H-20032069; Pomeranian Medical University ethical committee KB-0012/12/14; Slovenian medical ethics committee 99/04/15) while approval was waived by the local ethics committee of one centre (Norfolk and Norwich University Hospital). Written informed consent was obtained from each patient. Anonymised images were stored in a DICOM format and loaded onto a Web-based platform which also provided a free online DICOM viewer (NextCloud). The steering group selected images yielding the highest quality and best visibility of the intima–media complex. These images were proceeded for the interreader and intrareader reliability exercise. Images were acquired with either a GE Logic E10 ultrasonography machine equipped with a 6–24 MHz hockey stick probe for temporal artery branches or with a 6–15 MHz/2–9 MHz linear probe for axillary arteries, with a GE Logic E9 with an 8–18 MHz hockey stick probe or a 6–15 MHz linear probe, respectively, or with a Canon Aplio i800 with an 8–22 MHz hockey stick probe or a 4–18 MHz linear probe, respectively.
All members of the working group who participated in the Delphi process were invited also to participate in the reliability exercise; however, attendance of a 1.5-hour online training session (organised by the steering group, offered three times) was mandatory before taking part in this phase of the study. Members were asked to measure the IMT of each arterial segment and to enter the data in an electronic form in Survey Monkey. They were blinded to each other’s measurements. Two weeks after the first evaluation, participants received the same images in a different order to test again interobserver, but mainly intraobserver agreement.
Evaluation of the sensitivity to change and convergent construct validity
For this step, data from the PROTEA study were used. Details concerning the study design are reported elsewhere.11 In brief, PROTEA is a prospective study of patients with new-onset GCA who underwent serial ultrasonography assessments of the temporal and axillary arteries with the objective to test the sensitivity to change of the halo sign, particularly the number of segments with halo and IMT measurements. It was conducted in academic centres in Lisbon, Portugal (local principal investigator (PI) CPo), and Pavia, Italy (local PI SM), respectively, from March 2017 to March 2021 (longer study period and assessment of three additional patients as compared with the original cohort). All patients fulfilled the original or the modified 1990 American College of Rheumatology classification criteria for GCA,17 18 had to have an ultrasonography-verified halo sign in temporal and/or axillary arteries at baseline and had not been treated with high doses of glucocorticoids (≥30 mg/day of prednisolone or equivalent) for more than 15 days. Patients underwent clinical and ultrasonography assessments bilaterally of the common trunk of superficial temporal arteries, frontal and parietal branches as well as axillary arteries at baseline, weeks 1, 3, 6 and 12, and then every 3 months. For the present analysis, data until week 24 were used. The presence or absence of the halo sign and the maximum value of the IMT (regardless of the presence or absence of the halo sign) was recorded for each arterial segment. The IMT was measured in longitudinal view, in the single wall distal to the probe and at the area with the greatest wall thickness. Among clinical and laboratory parameters, the Birmingham Vasculitis Activity Score (BVAS), erythrocyte sedimentation rate (ESR) and C reactive protein (CRP) were recorded.19 No specific treatment protocol was applied.
Statistical analysis
In the Delphi process, descriptive statistics were used. To test the intraobserver and interobserver reliability of all candidate scores in the online exercise on static images, the intraclass correlation coefficient (ICC) was calculated. Variance analysis was conducted to detect possible outliers. In PROTEA, sensitivity to change was calculated for each candidate score as standardised mean difference (SMD) for each time point separately. No formal comparison between the scores was done at this stage. The statistical significance was calculated by one-sample t-test assuming a mean difference of zero.
Association between each candidate score and disease activity variables (ESR, CRP, BVAS) was assessed using Spearman’s correlation coefficient. Logistic regression analysis was used to determine the probability of being in remission for each unit increase (standardised) of the candidate score of interest (determined at week 24). Remission was defined as the absence of relapse with a prednisolone dose <30 mg/day, while a relapse was the recurrence of GCA-related symptoms or increased levels of acute-phase reactants (CRP ≥1 mg/dL and/or ESR ≥30 mm/hour) not otherwise explained and requiring increment of the glucocorticoid dose.20 All statistical tests were performed on complete data, without imputation and using Stata 17.
Results
Delphi exercise
Forty-nine members of the OMERACT ultrasound GCA working group from 18 countries in Europe, America and Asia (Austria, Bulgaria, Czech Republic, Denmark, France, Germany, Italy, Japan, The Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Switzerland, Turkey, UK and USA) were invited to participate. Between 40 (81.6%) and 47 (95.9%) completed the questionnaire in the seven rounds. Details on demographics and experience of respondents with ultrasonography are depicted in online supplemental table 1.
Supplemental material
Six candidate ultrasonography scores were proposed to the group members as outlined in table 1: IMT-normal, IMT-cut-off, IMT-rounded normal, IMT-rounded cut-off, semiquantitative and halo count.
Provisional OMERACT GCA Ultrasonography Score
The final score selected (agreement 92.7%) was the IMT-rounded cut-off, renamed as ‘OMERACT GCA Ultrasonography Score (OGUS)’ (see box 1 for a detailed description). Consensus statements on how to determine the score are detailed in box 2. In addition to the OGUS, the group members recommended (agreement 79.6%) to consider the halo count as an alternative if the OGUS cannot be performed (eg, IMT cannot be measured or no calculator is available).
Provisional OMERACT GCA Ultrasonography Score (OGUS)
OMERACT GCA Ultrasonography Score (OGUS)=(CR/0.4 mm+CL/0.4 mm+PR/0.3 mm+PL/0.3 mm+FR/0.3 mm+FL/0.3 mm+AR/1.0 mm+AL/1.0 mm)/number of segments available.*
If OGUS cannot be determined, the halo count=Sum of all segments with a positive halo sign (range 0–8) may be used as an alternative.
OMERACT GCA Ultrasonography Score is calculated as the [Sum of intima–media thickness (IMT) measured in every segment divided by the rounded cut-off values of IMTs in each segment (ie, common trunk of superficial temporal arteries: 0.4 mm; parietal and frontal branches: 0.3 mm; axillary arteries: 1.0 mm)] divided by the number of segments available.
*In case one or more artery segments are not examined (eg, because of biopsy), the sum of the remaining segments (each divided by the rounded cut-offs) is divided by the number of segments actually available. This is to normalise the final score according to the number of segments investigated. In case the IMT has been measured on a compressed artery, the value has to be divided by 2 (see also box 2, statement 5).
Notes: AL, axillary artery left; AR, axillary artery right; CL, common trunk of superficial temporal artery left; CR, common trunk of superficial temporal artery right; FL, frontal branch left; FR, frontal branch right; GCA, giant cell arteritis; PL, parietal branch left; PR, parietal branch right.
Statements for the determination of the provisional OMERACT GCA Ultrasonography Score (OGUS)
The score should include the right and left common trunk of superficial temporal arteries with their frontal and parietal branches (six segments) and the axillary arteries (two segments) (agreement 91.3%, first Delphi round).
In addition to a score based on IMT/halo size, we recommend a simple count of segments with positive halo sign (halo count). This count includes the right and left common trunks of superficial temporal arteries with their frontal and parietal branches (six segments) and the axillary arteries (two segments) (agreement 79.6%, first Delphi round).
In case of missing segments (eg, due to anatomical variants), all available segments will be considered. Segments where a biopsy has been performed should be excluded. The score will then be divided by the number of evaluated segments (agreement 75.0%, first Delphi round).
The IMT should be measured in the area of greatest thickness (agreement 88.6%, first Delphi round).
The IMT/halo thickness of temporal and axillary arteries should be measured at the thickest wall (superficial or deep wall). At temporal arteries, it may be an alternative to compress the vessel until no lumen or blood flow is visible with measurement of both walls dividing the result by 2 (agreement 75.0%, first Delphi round).
Measurement should preferably be performed in the grey scale image. Only in unclear situations measurement may be done using colour Doppler ultrasonography or an alternative ultrasonography mode for showing the artery lumen. Overfilling or underfilling of the lumen with colour must be avoided in this situation (agreement 93.0%, first Delphi round).
Measurements should include at least one, but if possible two decimal places (agreement 85.1%, second Delphi round).
At baseline and follow-up, the same method (single wall measurement or measurement with compression) should be applied if possible (agreement 87.2%, second Delphi round).
IMT should preferably be measured in longitudinal planes, if possible (agreement 89.1%, third Delphi round).
GCA, giant cell arteritis; IMT, intima–media thickness.
An online calculator for the OGUS is available at http://scoring.multimedium.at/OMERACT. See figure 1 for the QR code linking to this calculator and figure 2 for examples of how to measure the IMT.
Metric testing of the provisional OMERACT GCA Ultrasonography Score
In the following, the data for the metric properties of all candidate scores, namely reliability on static images, sensitivity to change and convergent construct validity are depicted. Face validity and feasibility has been confirmed by the group members in the Delphi and by conducting the measurements during the reliability exercise, respectively.
Reliability exercise
Nine working group members (AH, CBM, CPo, DB, MM, UMD, LT, WAS, WH) contributed images from 57 cases. Out of these, 23 (16 GCA, of whom 12 were baseline and 4 were follow-up examinations, and 7 controls) were selected for the reliability exercise. Most images were in grey scale and in longitudinal view, 12/23 series contained pictures with colour Doppler, 7/23 included images in transverse view and 3/23 depicted the compression sign. Thirty-three of the 35 working group members who conducted the online training session (94.3%) participated in both rounds of the online reliability exercise.
The interrater ICC of IMT measurements in any segment was 0.84 (95% CI 0.80 to 0.87) in the first and 0.78 (95% CI 0.74 to 0.82) in the second round. Subanalysis of the reliability in each arterial segment resulted in ICCs ranging from 0.67 (common trunk of superficial temporal arteries, 95% CI 0.56 to 0.78) to 0.82 (axillary arteries, 95% CI 0.74 to 0.88) in the first and 0.56 (frontal branch, 95% CI 0.45 to 0.68) to 0.78 (axillary artery, 95% CI 0.70 to 0.85) in the second round. Median intrarater reliability of all participants in any segment was 0.90 (IQR 0.83–0.94), with subanalysis of individual arterial segments revealing median ICCs from 0.80 (common trunk of superficial temporal arteries, IQR 0.69–0.87) to 0.92 (axillary artery, IQR 0.82–0.94) (see online supplemental table 2 for details).
The reliability of the OGUS was good, revealing an interrater ICC of 0.84 (95% CI 0.68 to 0.95) in the first and 0.72 (95% CI 0.53 to 0.91) in the second round. Median intrarater ICC was 0.91 (IQR 0.75–0.95). The halo count performed worse with interrater reliabilities of 0.56 (95% CI 0.42 to 0.73) in first and 0.57 (95% CI 0.43 to 0.73) in second round, as well as median intrarater reliability of 0.89 (IQR 0.77–0.92). The semiquantitative score had similar reliability data as the halo count, whereas the other candidate scores performed similar to the OGUS (see table 2 for details).
In variance analysis, measurements of a single expert qualified as an outlier. Sensitivity analysis was conducted excluding the data obtained from this expert. Interrater and intrarater reliability changed only minimally as detailed in online supplemental table 3.
Sensitivity to change
We used the data from the PROTEA study (n=52) to test the sensitivity to change of the candidate ultrasonography scores. See online supplemental table 4 for baseline characteristics of patients, online supplemental table 5 for baseline ultrasonography results and table 3 for data on sensitivity to change. The OGUS revealed SMDs between baseline and week 24 from −1.19 to −2.16, corresponding to large and very large magnitudes of change, respectively. The halo count yielded SMDs from −0.51 to −1.73 corresponding to medium and very large magnitudes of change, respectively. The semiquantitative score performed worse than the OGUS but better than the halo count, whereas the SMDs for the other candidate scores were similar to the OGUS.
Convergent construct validity
Data from the PROTEA cohort were also used to test the convergent construct validity. The candidate ultrasonography scores were correlated with the BVAS as marker of clinical disease activity and with ESR and CRP as laboratory parameters of inflammation. In addition, logistic regression analysis using remission at week 24 as dependent variable and the candidate ultrasonography scores at the same time point as independent variables were also calculated (see table 4 for details). Overall, the OGUS correlated moderately with ESR, CRP and BVAS (corrcoeff 0.37–0.48). Correlation coefficients were equal to those of the IMT-normal, IMT-cut-off and IMT-rounded normal, but higher than those of the semiquantitative score and the halo count. Logistic regression indicated a negative association between all scores and remission; however, regression coefficients were lower for OGUS, IMT-normal, IMT-cut-off and IMT-rounded cut-off (indicating a large effect) than for the semiquantitative score and the halo count.
Discussion
We developed a consensus-based, provisional OMERACT GCA Ultrasonography Score and tested its metric properties on static images and on prospectively recorded data. The score, as applied, revealed reliability, sensitivity to change and convergent construct validity. This score may be used as a monitoring tool and outcome measure in research, particularly in trials to test the efficacy of pharmacological interventions.
The development of an internationally accepted ultrasonography score has been a major unmet need in GCA research given the increasing effort to develop new treatment options for this disease.18 20–22 Apart from the fact that it is desirable to study GCA in a multidimensional way, there is a need to objectively assess disease activity, particularly when drugs directly influence ESR and CRP, mitigating their value as outcome parameters in such trials. Besides, imaging can help to document the influence of drugs on structural changes of the arterial wall.6 10
Initially, the members of the OMERACT ultrasonography large vessel vasculitis working group did not reach a consensus on the candidate ultrasonography score and the Delphi process had to be interrupted to first retrieve additional data on metric properties. While this approach was unusual and not planned upfront, it was nevertheless approved by the OMERACT mentor and chairs and in the end strengthened the decision of the group who otherwise would have had to decide on a candidate score without knowing their performance by means of metric testing.12 Future projects might want to consider an early acquisition of data on metric properties of candidate scores unless such information is available in the literature. The consensus of experts would then be based on more solid evidence rather than on experience only.
The semiquantitative score, also called Southend halo score, was the candidate score that has been used most in research so far, but was ultimately not chosen by the group.14 23–25 We can only speculate on the factors influencing this decision; however, reliability, sensitivity to change and convergent construct validity were slightly inferior in the Southend score as compared with the one finally chosen. Besides, most data on the Southend halo score are available for diagnostic purposes rather than for monitoring.14 23 25 Another peculiarity of this score is the fact that increased values may be achieved even though IMT measurements fall within the normal range.14–16 Southend halo scores up to 18 might either indicate sonographic remission (ie, all measurements are within the range of normal) or active disease (ie, at least one measurement is outside the range of normal). Another factor limiting the value of this score for clinical trials is the unclear handling of missing arterial branches after biopsy or in case of anatomical variants. We nevertheless believe that the OGUS and the Southend score can be used complementary: the OGUS as monitoring tool and outcome parameters of clinical trials and the Southend score for diagnostic purposes and disease stratification in clinical practice and research.23
Another candidate score that has already been used in studies is the IMT-normal, applied in the GUSTO trial (in that study, the compressed artery was measured and divided by 2, whereas for OGUS either the IMT of the compressed artery or the single vessel wall can be used).10 The metric properties were almost identical to those of the OGUS, but the latter is a bit easier to calculate, and probably more intuitive, given that a score ≤1 indicates that on average, all segments are within the normal range whereas for the IMT-normal, a score ≤1 suggests that all segments are equal or below the mean normal value of arterial walls.15 16
When referring to measurements of the arterial wall, we used the term ‘IMT’ rather than ‘halo size’, ‘halo thickness’ or similar descriptors, given that IMT is more commonly used in cardiovascular research and clinical practice. We acknowledge, however, that inflammation in GCA also involves the adventitia even though the bulk of inflammation as well as proliferation of myofibroblasts are most prominent in the media and intima and, consequently, thickening of these two layers contributes most to the values obtained when measuring the arterial wall.26
The major strengths of our study are the involvement of a large number of GCA experts from all over the globe, robust data from an online reliability exercise based on DICOM images of prospectively collected patients and the calculation of the sensitivity to change and the convergent construct validity using data from a well-designed, prospective study with short-term and long-term ultrasonography follow-up data. The latter aspect is particularly important given that most previous studies followed up patients with GCA only months after baseline, and it is well known that the kinetics of response is different for temporal and extracranial large arteries.7 11 27 28
The limitations concerning the new score are the absence of data from acquisition reliability and the lack of data from a randomised controlled trial enabling the comparison between responses in intervention and control groups. Reliability testing was conducted on stored images and, therefore, we were unable to test the variability related to the examination of patients by different investigators (=acquisition reliability). In some patients, the IMT might be difficult to measure possibly contributing to variable score results, while in our online exercise only images where the intima–media complex was clearly visible were selected. A patient-based exercise is already scheduled to further validate the score. The PROTEA study did not include a standardised treatment protocol nor was it designed to test the efficacy of an intervention.11 Therefore, we do not know the performance of the OGUS to distinguish two groups of treatment; however, it is planned to apply the OGUS in future randomised controlled trials. An additional limitation of PROTEA is the fact that patients could have been on glucocorticoids for up to 2 weeks when included into the study. Given that glucocorticoids improve clinical and laboratory signs of disease activity as well as arterial wall swelling (with a presumed faster response of the former), a convergence of data toward zero and, consequently, an underestimation of the (so far low to moderate) correlation of ESR, CRP and BVAS with the ultrasonography scores cannot be excluded. Besides, the BVAS has not been developed to measure disease activity in GCA, and correlations between the BVAS and any other score should be interpreted with caution.29 A disease-specific clinical composite for GCA enabling correlation analyses with OGUS or any other metric score is unfortunately not available yet.
Other open issues are the performance of the OGUS at the patient level (as compared with the group level investigated in this study), the definition of cut-offs discriminating between various disease states (such as active disease and remission) and the identification of a score delta reflecting a clinically meaningful response. These points have to be addressed by future studies.
The new proposed score is intended for use in all subsets of GCA30; however, the majority of patients in the PROTEA study had an exclusive cranial involvement and therefore, psychometric data of patients with other disease subsets are limited. A related issue is the fact that six out of the eight arterial segments included in OGUS are cranial. The semiquantitative score aimed to compensate for this imbalance by assigning higher weights to the axillary arteries. However, this resulted in a lower sensitivity to change compared with OGUS, given that with current treatments, temporal arteries usually have a faster and more complete response than axillary arteries, where intima–media thickening tends to persist for years.6 7 14
In patients where none of the vascular segments included in the OGUS are affected, the score can obviously not be applied (eg, in case of isolated vasculitis of the aorta and/or the subclavian arteries). Also, the score has not been developed for Takayasu arteritis or for other forms of large vessel vasculitis outside the GCA–PMR (polymyalgia rheumatica) spectrum and consequently, the metric properties of the OGUS for these diseases are unknown yet.30
In conclusion, we developed a consensus-based, provisional OGUS for potential use in clinical trials and other research studies. Further validation in a patient-based reliability exercise, randomised controlled drug trials and in independent GCA cohorts are necessary.
Data availability statement
Data are available upon reasonable request.
Ethics approval
This study involves human participants and image collection was approved by Berlin Medical Association, Eth-04-17; Lisbon Academic Medical Centre Ethics Committee, reference 08/17; University of Pavia Ethics Committee, E 2016 0031606; Ethics Committee of the University of Twente Mec-U, R19-072; University of Regensburg Ethics Committee, Fakultät Medizin No: 11-101-0273; Capital Region of Denmark, Ethics committee nr.: H-20032069; Pomeranian Medical University ethical committee KB-0012/12/14; Slovenian medical ethics committee 99/04/15. Participants gave informed consent to participate in the study before taking part.
Acknowledgments
The authors would like to thank Hannes Platzgummer for his help with setting up the online reliability exercise.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Handling editor Josef S Smolen
Twitter @cristinadbponte, @rthritis, @cmukhtyar, @adiamanteas, @Sarah_L_Mackie, @tomelleri_a
CD and CP contributed equally.
Contributors All authors were involved in data acquisition. CDe wrote the first version of the manuscript. All authors reviewed it and made extensive comments and appropriate changes to it. All authors approved the final version of the manuscript. CDe accepts full responsibility for the work, had access to the data, and controlled the decision to publish.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests CDe has received grant support from AbbVie and consulting/speaker’s fees from Abbvie, Eli Lilly, Janssen, Galapagos, Novartis, Pfizer, Sparrow, Roche and Sanofi, all unrelated to this manuscript. CPo is or has been the principal investigator of studies by AbbVie, Sanofi and Novartis and has received consulting/speaker’s fees from Vifor, AstraZeneca, GlaxoSmithKline and Roche, all unrelated to this manuscript. LT received speakers fee from Roche, Novartis, Janssen, Pfizer, UCB and GE. PB received grant support from Pfizer and speaker’s fees from Janssen. CDu has received consultancy or speaker fees and travel expenses from Abbvie, AOP Orphan, Astra-Zeneca, Bristol-Myers-Squibb, Eli-Lilly, Janssen, Galapagos, Merck-Sharp-Dohme, Novartis, Pfizer, Roche, Sandoz, UCB, Vifor and research support from Eli-Lilly, Pfizer, UCB, all unrelated to this manuscript. E-MH has received fees for speaking and/or consulting from Novartis, AbbVie, Sanofi, Sobi; research funding to Aarhus University Hospital from Novo Nordic Foundation, Danish Rheumatism Association, Danish Regions Medicine Grants, Roche, Novartis, AbbVie; travel expenses from Pfizer, Sobi, AbbVie. E-MH has been the principal investigator of studies by SynACT Pharma and involved as site principal investigator in trials by AbbVie, Novartis, Novo and Sanofi, all unrelated to this manuscript. AI received honoraria, advisory boards, speakers’ bureau, educational grants and research support from AbbVie, Alfasigma, Amgen, Biogen, BMS, Celgene, Celltrion, Eli-Lilly, Galapagos, Gilead, MSD, Novartis, Pfizer, Sanofi Genzyme, SOBI and UCB. KDT is or was a research investigator of studies for Novartis, Astra Zeneca, Glaxo SmithKline, Amgen; has received consulting fees from Aurinia, Novartis and Astra Zeneca; and is a contracted researcher of Bioclinica. KSMvdG received a speaker fee from Roche. WAS is or has been the principal investigator of studies by Abbvie, Amgen, GlaxoSmithKline, Novartis, Roche, Sanofi and has received consulting/speaker’s fees from Abbvie, Amgen, Bristol-Myers Squibb, Chugai, GlaxoSmithKline, Johnson & Johnson, Medac, Novartis, Roche and Sanofi, all unrelated to this manuscript. The other authors declare no conflicts of interest.
Patient and public involvement Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.