Objectives To study muscle biopsy tissue from patients with juvenile dermatomyositis (JDM) in order to test the reliability of a score tool designed to quantify the severity of histological abnormalities when applied to biceps humeri in addition to quadriceps femoris. Additionally, to evaluate whether elements of the tool correlate with clinical measures of disease severity.
Methods 55 patients with JDM with muscle biopsy tissue and clinical data available were included. Biopsy samples (33 quadriceps, 22 biceps) were prepared and stained using standardised protocols. A Latin square design was used by the International Juvenile Dermatomyositis Biopsy Consensus Group to score cases using our previously published score tool. Reliability was assessed by intraclass correlation coefficient (ICC) and scorer agreement (α) by assessing variation in scorers’ ratings. Scores from the most reliable tool items correlated with clinical measures of disease activity at the time of biopsy.
Results Inter- and intraobserver agreement was good or high for many tool items, including overall assessment of severity using a Visual Analogue Scale. The tool functioned equally well on biceps and quadriceps samples. A modified tool using the most reliable score items showed good correlation with measures of disease activity.
Conclusions The JDM biopsy score tool has high inter- and intraobserver agreement and can be used on both biceps and quadriceps muscle tissue. Importantly, the modified tool correlates well with clinical measures of disease activity. We propose that standardised assessment of muscle biopsy tissue should be considered in diagnostic investigation and clinical trials in JDM.
- Disease Activity
This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 3.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/3.0/
Statistics from Altmetric.com
The idiopathic inflammatory myopathies are rare complex chronic inflammatory disorders affecting muscle, skin and other organs. The most common childhood idiopathic inflammatory myopathy (onset before 16th birthday)—juvenile dermatomyositis (JDM)—has an incidence of 2–3 cases/million/year.1 ,2 The rarity of JDM makes recognition and assessment challenging for clinicians and histopathologists. Until now there has been no standardised histological approach to the assessment of the severity of abnormalities in muscle biopsy specimens from patients with suspected JDM. The JDM Cohort and Biomarker study collects clinical data and samples, including biopsy material, from children with myositis from across the UK and Ireland.3
The International Juvenile Dermatomyositis Biopsy Consensus Group has previously designed and tested a scoring tool for assessment of the severity of pathological change in biopsy specimens from patients with suspected or proven JDM.4 This tool assesses features agreed to be characteristic of JDM, organised into four domains (inflammatory, vascular, muscle fibre and connective tissue). The tool also includes an overall score of severity, scored by marking a Visual Analogue Score (histopathologists’ VAS) (1.0–10.0 cm). If particular items assessed within a score tool correlate well with clinical features, disease course or response to treatment, the tool would be a valuable addition to the evaluation of this complex disease. A similar approach to quantify features of renal allograft rejection was refined, validated and tested to produce the Banff scoring system.5 Retrospective studies of JDM biopsies have suggested that morphological features may correlate with clinical course but standardised scoring systems have not been previously used.6 To our knowledge there are no similar standardised tools available for assessment of pathological features in muscle biopsies.
The aims of this study were to reassess the reliability of the JDM score tool in quadriceps, including a comparison with our previous results,4 and to assess reliability of the tool when applied to biceps humeri, since this muscle is regularly sampled in some centres. To support the utility of the tool for multicentre studies we wished to determine the intraobserver agreement of the tool elements. Finally, we evaluated whether items of the score tool are associated with clinical measures of disease severity.
Patients and methods
Patients and biopsy material
Patients were recruited in the UK (through the UK JDM Cohort and Biomarker study) and Brazil. Both studies had full approval from ethical review boards and were carried out according to the declaration of Helsinki. Criteria for inclusion were that children had definite or probable JDM according to the Bohan and Peter criteria,7 and biopsy material was available for research. All children had disease duration of <12 months before biopsy and had their biopsy sample taken before use of steroids or disease-modifying agents such as methotrexate or other immunosuppressive agents. A total of 55 cases were available: 33 from UK, 22 from Brazil (table 1). UK muscle samples were all from the quadriceps femoris (vastus lateralis): 11 of these were reported in a previous study.4 In this study those 11 were analysed only for the correlation with clinical data. Brazil tissues were all from biceps humeri. We have shown that biceps and quadriceps have subtle differences in fibre size, fast:slow fibre ratio and capillary:fibre ratio, therefore the muscle source of biopsy tissue is an important consideration on assessment.9
In both cohorts, clinical data at disease onset, serum muscle enzyme levels, erythrocyte sedimentation rate and muscle strength measured by manual muscle testing on the Medical Research Council (MRC) scale 0–5 were recorded.8 Complications, including calcinosis, skin ulceration, lung, cardiac and gastrointestinal (GI) involvement, were assessed before biopsy. In the UK cohort, data on the Childhood Myositis Assessment Score (CMAS), an assessment of overall strength and stamina10 as well as physicians global assessment (PGA, range 0.0–10.0) were also available.11
Histology and immunohistochemistry
Muscle biopsy sampling, histological staining and immunohistochemistry were carried out as described previously.4 Histological staining included haematoxylin and eosin, Gomori's trichrome, ATPase pH 4.6 and pH 9.4, nicotinamide adenine dinucleotide dehydrogenase-tetrazolium reductase and acid phosphatase. For immunohistochemistry, primary antibodies used were: anti-human CD3 (UCHT1), anti-human CD68 (KP1), anti-human major histocompatibility complex (MHC) class I heavy chain (W6/32), anti-human neonatal myosin (WB-MHCn) (all from Novacastra, Newcastle-Upon-Tyne, UK) and anti-human CD31 (JC70A, 1/20) (Dako, Cambridgeshire, UK).
The International Working Group on JDM Biopsy previously proposed a score tool for assessment of JDM biopsy, designed using samples from quadriceps femori.4 In this study the same group of experts reconvened to assess inter- and intraobserver reliability of the tool, its reliability when used to score biceps tissue samples and to test correlation between elements of the score tool and clinical features. All validation and reliability data were generated from 44 new cases not used in our previous study. For the main scoring exercise to assess inter-observer reliability, 11 quadriceps samples and 11 biceps samples were selected to include cases in each group demonstrating a range of features and severity (judged by HV and JLH). The quadriceps and biceps samples were allocated by a 11×11 Latin square design for each group, as described previously.4
A further 22 additional biopsy samples (11 quadriceps, 11 biceps) were each assessed by five scorers, randomly assigned using a separate partial Latin square design (11×5) for quadriceps and biceps. Scorers did not know to which set of results their scores would contribute. Data from 11 quadriceps cases were available from our previous study.4 These data allowed inclusion of all 55 cases in the final analysis for association with clinical features. To assess intraobserver agreement, eight quadriceps cases were scored again by eight scorers in an 8×8 Latin square, 3 days apart from the initial scoring exercise. For each scoring exercise the full panel of stained sections was available as above; scorers were aware of age at time of biopsy and the muscle source of each biopsy.
Data analysis, statistics and decision on most informative items
Data from the scoring exercises were analysed to provide two summary measures, as used previously.4 We used an intraclass correlation coefficient (ICC) as a measure of reliability, and as a measure of scorer agreement we used the ratio of the estimates of the SD attributable to the scorer:the SD attributable to the cases (α).11 The ICC and α value for each domain and each item were used to classify the data as good, good* or poor.4 ,11 Items reaching an ICC>0.6 were considered to have high reliability while items with an α score<0.4 had high agreement. Where both reliability and agreement were high (ICC>0.6, α<0.4), the item was classified as good; where either reliability or agreement were high, but not both, performance of the item was classified as good*. Where agreement and reliability were low (ICC<0.6, α>0.4), the item was classified as poor. In the intrarater exercise we calculated proportional agreement (pA; the number of exact agreements of score divided by the number of biopsies (n=8)) achieved by each scorer and for each item of the tool. The median (and range) pA across all scorers for each element of the tool is reported.12
To explore associations between clinical measures of disease severity and tool items we used the modal score for each item in the tool, with the exception of domain totals for which we used the median values. Examination of these associations was restricted to tool items which consistently exhibited good* or good rating. Specifically, if they achieved good or good* in our original scoring exercise4 and in both the 11×11 scoring exercises conducted for this study, they were considered ‘informative items’ and suitable for further analysis. Comparisons of ordered categorical and binary variables (eg, MRC score, presence/absence of skin ulceration, biopsy score tool items) were compared between biopsy groups using Pearson's χ2 test or Fisher's exact test, as appropriate. Age at onset, age at biopsy, time to biopsy, histopathologists’ VAS and modified domain total scores were compared using the Mann–Whitney U test.
The scores for the informative items, modified domain total scores and histopathologists’ VAS were assessed for associations with measures of muscle strength by calculating the Spearman's rank correlation coefficient and conducting a test of independence. Pearson's χ2 test or Fisher's exact test were used to assess whether scores for informative items were associated with the presence of periungual erythema, skin ulceration, lung or GI involvement. The Kruskal–Wallis test was used to assess whether the modified domain total scores were associated with the presence of periungual erythema, skin ulceration, lung or GI involvement. This test was also used to assess whether the scores for the six informative items were associated with PGA or CMAS in the UK cohort only. Spearman's rank correlation coefficient was used to assess correlation between modified domain total scores and PGA, modified domain total scores and CMAS, histopathologists’ VAS and PGA and CMAS. All p values reported are unadjusted for multiple testing.
Fifty-five patients with JDM (38 female, 17 male) were included in this study. Table 1 shows the patient demographic and clinical data. Patients had a median age at onset of 6.42 years (IQR 4.04–9.13) and median disease duration of 3.0 months (IQR 2–6) at time of biopsy. There were no significant differences in age at biopsy, duration of disease before biopsy or clinical severity between the two groups of patients, with one exception: at the time of biopsy, the Brazil cohort had six (27%) cases with calcinosis, while the UK cohort had none (p=0.002). Proximal muscle strength as measured by manual muscle testing did not differ between the two groups. CMAS and PGA data were available only from the UK cohort.
Score tool reliability
The score tool and accompanying instructions are shown in online supplementary table S1.4 Data on score tool reliability were generated from 22 cases (11 quadriceps, 11 biceps), all new cases compared with our previous study.4 Overall scores for inflammatory and muscle fibre domains, as well as several items from each of these domains and severity assessment by histopathologists, reached high reliability for both quadriceps and biceps samples (table 2). These items were also reliable in our previous study.4 Intrarater agreement, assessed by pA, was substantial or better (>0.6) in all but one element. The median pA was ≥75% for all the informative items (see online supplementary data, table S2).
Items that can be reliably assessed by the same observer on different occasions and different observers will be useful in future studies. Therefore we limited further analysis to informative items—that is, those that were the most reliable, shown in bold in table 2. Two of these, overexpression of MHC protein on muscle fibres and infarction, had an α score of 0 indicating high agreement, but low variability since they were either always abnormal (MHC overexpression) or very rarely seen (infarction). These items were excluded from the modified score tool. Selection of an element for further analysis depended on the performance of that element rather than the importance of the pathological feature for diagnostic purposes. Representative examples of items selected for inclusion in the modified score tool, from both biceps and quadriceps biopsies, are shown in figure 1.
Association with disease severity measures
We reasoned that a modified score tool containing the most reliable items would be an appropriate instrument to investigate associations with clinical measures of disease. The most reliable items fell into two domains of the score tool: inflammatory and muscle fibre. Using these items, a modified total score range was calculated for each of these domains. Scores for these informative items, modified domain total scores and overall histopathologists’ VAS score data were analysed for all 55 cases (table 3). Comparison of the number of biopsies scoring high or low for each of these items suggested that the biceps samples showed more severe pathology than quadriceps, with differences between scores for the modified muscle fibre domain total, two individual items in the muscle fibre domain, as well as a significantly higher histopathologists’ VAS for severity in biceps compared with quadriceps (table 3).
There was evidence to suggest that measures of weakness were associated with biopsy scores for all of the informative items, the modified total domain scores and the histopathologists’ overall severity score (table 4). Specifically, a higher modified total for both domains was strongly associated with elbow flexor strength score as assessed by the MRC scale (0–5), r=−0.59 p<0.0001: r=−0.60 and p<0.0001 for inflammatory and muscle fibre domains, respectively. Within the muscle fibre domain substantial correlations were seen between the neonatal myosin positivity and both measures of strength (r=−0.57 p<=0.001). The histopathologists’ VAS was also significantly associated with measures of weakness (table 4). No associations were found between the six informative items and periungual erythema, skin ulceration, lung or GI involvement (data not shown).
For quadriceps biopsies, where data on PGA and CMAS were also available, PGA was associated with the biopsy score for the informative items in the inflammatory domain (CD3+ endomysial, CD3+ perimysial and CD68+ endomysial) and two items in the muscle fibre domain (neonatal myosin and perifascicular regeneration/degeneration/necrosis). Both modified domain total scores were moderately correlated with PGA, with the inflammatory domain showing a stronger relationship. In all of the above the direction of the association was as expected; a higher biopsy feature score was associated with higher PGA. Both modified total muscle fibre and modified total inflammatory domains were weakly correlated with CMAS. Details of these correlations are shown in online supplementary table S3.
These data provide the first validation of a histological score tool estimating severity in JDM, much needed in this uncommon but potentially devastating autoimmune childhood disease. The tool is designed to measure histological severity using semiquantitative assessment of histological features, rather than to diagnose the condition. This study extends our earlier findings and demonstrates the reliability of the tool, with low inter- and intraobserver variability. Importantly, the most reliable items of the scoring system correlate well with measures of clinical disease activity.
Our study used cases from two different countries, where the muscle used for diagnostic biopsy differs. Although all biopsies were taken early in disease course, calcinosis was more common in cases from Brazil, perhaps reflecting disease severity in that cohort,13 and biceps biopsy samples were also scored as more severe in several items (table 3). As biceps samples were not available from UK cases, nor quadriceps tissue from Brazilian cases and no case had samples from both muscles, it was not possible to test how the site of the biopsy affects pathological change. It is also possible that there are other differences between the groups of patients, related not to biopsy site, but to differences in clinical care, ethnicity or environment. Despite these potential confounders we found that the score tool functioned equally well on biceps and quadriceps tissue, and the same score items were the most reliable for both sample sets. By incorporating biceps samples we have generated data suggesting that the score tool can be applied to a muscle other than quadriceps. This provides confidence for inclusion in future studies of centres whose biopsy site is routinely either biceps or quadriceps.
After identifying morphological features that proved reliable between different assessors and different muscles, we showed that these items were moderately or strongly correlated with muscle strength, and with the overall PGA and CMAS, where available. Thus the score tool appears to correlate well with muscle disease activity. A limitation to this analysis is that skin score data were not available on a sufficiently large number of cases to compare biopsy assessment with skin disease activity.
The adoption of agreed protocols for histological assessment of tissue has provided important progress in other diseases, especially in conditions where semiquantitative analysis of specific features has been found to correlate with clinical severity and hence influence management. For example the Banff scoring of renal pathology is widely used to quantify allograft rejection, in trials of anti-rejection drugs and in clinical practice. This system has been refined, altered, validated and tested in several stages.5 Similarly, the BrainNet Europe consortium has tested, standardised and validated assessment of features such as α-synuclein immunoreactive structures and amyloid β, in neurodegenerative diseases.14 ,15
In JDM, some evidence suggests that histopathological features indicative of vasculopathy correlate with more aggressive disease,16 or that features of vasculopathy and necrosis may predict chronicity.6 However, those studies did not include biopsy analysis by a large group of observers and it is therefore difficult to assess how readily they would translate to multiple centres. One difficulty with assessment of rare diseases is ensuring adequate training for pathologists who may encounter only occasional cases. Online image databases might assist with this problem. To circumvent technical barriers and ensure that the tool is robust we have chosen to select features that were most reliable between a group of assessors and used standard, widely available, histopathological stains in preparation of sections.
A limitation of our study is the large number of hypothesis tests conducted to evaluate associations between biopsy features and measures of disease severity. This is likely to result in a high false discovery rate and therefore the reported p values must be interpreted with caution. However, the consistency of the data for associations with muscle strength, warrant consideration and further investigation.17 ,18
Long-term prospective studies are needed to test whether the JDM muscle score tool (and which items of the tool), using tissue obtained at the time of diagnosis, correlates well with disease course, treatment response or disease complications. It will be interesting to test how specific aspects of the tool correlate with more recently reported biomarkers, including the type I interferon gene signature, serum chemokine score or plasmacytoid denritic cells.19–22
In conclusion we have shown for the first time that a modified JDM biopsy score tool has high inter- and intraobserver agreement and can be used on both biceps and quadriceps muscle tissue. We suggest that inclusion of this simple, semiquantitative measure into routine diagnostic investigation and clinical trials in children with JDM, should facilitate a more standardised comparison of cases between studies and different centres.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online supplement 1
Handling editor Tore K Kvien
Collaborators The members of the JDRG were as follows: Dr Liza McCann, Mr Ian Roberts, Dr Eileen Baildam, Ms Louise Hanna and Ms Olivia Lloyd (The Royal Liverpool Children's Hospital, Alder Hey, Liverpool), Dr Phil Riley and Ms Ann McGovern (Royal Manchester Children's Hospital, Manchester), Dr Clive Ryder and Mrs Janis Scott (Birmingham Children's Hospital, Birmingham), Dr Sue Wyatt, Mrs Gillian Jackson, Dr Tania Amin, Dr Mark Wood and Vanessa VanRooyen (Leeds General Infirmary, Leeds), Dr Joyce Davidson, Dr Janet Gardner-Medwin, Dr Neil Martin, Ms Sue Ferguson and Ms Liz Waxman (The Royal Hospital for Sick Children, Yorkhill, Glasgow), Dr Mark Friswell, Professor Helen Foster, Mrs Alison Swift, Dr Sharmila Jandial, Ms Vicky Stevenson, Ms Debbie Wade, Dr Ethan Sen, Dr Eve Smith and Ms Lisa Qiao (Great North Children's Hospital, Newcastle), Dr Helen Venning, Dr Rangaraj Satyapal, Mrs Elizabeth Stretton and Ms Mary Jordan (Queens Medical Centre, Nottingham), Dr Kate Armon, Mr Joe Ellis-Gage and Ms Holly Roper (Norfolk and Norwich University Hospitals), Professor Lucy Wedderburn, Dr Clarissa Pilkington, Dr N Hasson, Mrs Sue Maillard, Ms Elizabeth Halkon, Ms Virginia Brown, Ms Audrey Juggins, Dr Sally Smith, Mrs Sian Lunt, Ms Elli Enayat, Mrs Hemlata Varsani, Miss Laura Beard, Miss Laura Kassoumeri and Miss Katie Arnold (Great Ormond Street Hospital, London), Dr Kevin Murray (Princess Margaret Hospital, Perth, Western Australia), Dr John Ioannou and Ms Linda Suffield (University College London Hospital).
Contributors LRW, JLH and HV designed and conceived the study; LRW, JLH, SCC devised the analysis strategy; HV, CKL, JLH, SKNM and AMES prepared and reviewed biopsy material. All authors took part in the scoring exercises and/or consensus meeting. LRW, JLH CKL and SCC analysed the data and wrote the manuscript; all authors reviewed the first draft and approved the final draft of the manuscript.
Funding The International Juvenile Dermatomyositis Biopsy Consensus Group, UK JDM Cohort Study and this work have been supported by generous grants from the Wellcome Trust UK (085860), Action Medical Research UK, (SP4252), The Myositis Support Group UK, Arthritis Research UK (14518) and The Henry Smith Charity. The JDM Cohort study is adopted onto the Comprehensive Research Network through the Medicines for Children Research Network (http://www.mcrn.org.uk). JLH is supported in part by The Myositis Support Group UK; LRW is supported in part by the Great Ormond Street Hospital Children's Charity.
Competing interests None.
Ethics approval Institute of Child Health UCL and Great Ormond Street Hospital.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Raw original scoring data from biopsy scoring exercises can be made available, upon request, after publication, through the corresponding author.