Article Text

Download PDFPDF

Use of “spydergrams” to present and interpret SF-36 health-related quality of life data across rheumatic diseases
  1. V Strand1,
  2. B Crawford2,
  3. J Singh3,4,5,
  4. E Choy6,
  5. J S Smolen7,
  6. D Khanna8
  1. 1
    Division of Immunology/Rheumatology, Stanford University School of Medicine, Stanford, California, USA
  2. 2
    Mapi Values, Boston, Massachusetts, USA
  3. 3
    Rheumatology Section, Medicine Service, VA Medical Center, Minneapolis, Minnesota, USA
  4. 4
    Division of Rheumatology, Department of Medicine, University of Minnesota, Minneapolis, Minnesota, USA
  5. 5
    Departments of Health Sciences and Orthopaedic Surgery, Mayo Clinic College of Medicine, Rochester, Minnesota, USA
  6. 6
    Sir Alfred Baring Garrod Clinical Trials Unit, Academic Department of Rheumatology, King’s College London, London UK
  7. 7
    Division of Rheumatology, Department of Medicine 3, Medical University of Vienna, Vienna, Austria
  8. 8
    Division of Rheumatology, Department of Medicine, David Geffen School of Medicine, Los Angeles, California, USA
  1. Correspondence to Dr V Strand, 306 Ramona Road, Portola Valley, CA 94028, USA; vstrand{at}


The Medical Outcomes Study Short Form-36 (SF-36) is a generic measure of health-related quality of life (HRQOL), validated and cross-culturally translated, which has been extensively utilised in rheumatology. In randomised controlled trials and observational studies, SF-36 provides rich data regarding HRQOL; but as typically portrayed, patterns of disease and treatment-associated effects can be difficult to discern. “Spydergrams” offer a simplified means to visualise complex results across all domains of SF-36 in a single figure: depicting disease and population-specific patterns of decrements in HRQOL compared with age and gender-matched normative data, as well as providing a tool for interpreting complex treatment-associated or longitudinal changes. Utilising spydergrams as a standard format to illustrate and report changes in SF-36 across different rheumatic diseases can greatly facilitate analyses and interpretations of clinical trial results, as well as providing patients an accessible means to compare baseline scores and treatment-associated improvements with normative data from individuals without arthritis. Furthermore, SF-6D utility scores based on mean changes across all eight domains of SF-36 are suggested as a quantitative means of summarising changes illustrated by spydergrams, offering a universal metric for cost-effectiveness analyses of therapeutic interventions.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

The Medical Outcomes Study Short Form-36 (SF-36) was developed to measure self-reported health-related quality of life (HRQOL): 36 questions combined into eight domains reflecting different dimensions of health,1 2 grouped into composite physical and mental component summary (PCS and MCS) scores.3 SF-36 has been cross-culturally translated4 and is widely used for clinical research, health policy evaluations as well as general population surveys. A US Veterans Affairs version has also been derived and validated.5

Extensively validated in randomised controlled trials (RCT) and longitudinal observational studies, this generic instrument has demonstrated sensitivity to treatment effects and reflects the impact of various rheumatic diseases upon HRQOL, including rheumatoid arthritis (RA),6 7 systemic lupus erythematosus (SLE),8 psoriatic arthritis,9 ankylosing spondylitis,10 gout,11 systemic sclerosis (SSc),12 fibromyalgia13 and osteoarthritis.14 SF-36 scores correlate well with improvements in physical function measured by health assessment questionnaire (HAQ-DI) in RA.15 Over the past decade, RCT with new disease-modifying antirheumatic drugs have documented significant treatment-associated changes, including “improvement in physical function and HRQOL”, which have become established labelling claims for approved therapies.15


Differences in the way individuals perceive and report HRQOL can be better interpreted by viewing baseline and change scores across domains, scored from 0 to 100, without z-transformation and normalisation as recommended in version 2 of SF-36, both of which reduce the magnitude of possible change. In contrast to the current practice of displaying SF-36 as eight-columned bar charts, “spydergrams” offer the ability to view changes more easily across all domains as a pattern recognition profile, depicting disease and population-specific “patterns” of decrements in baseline values compared with matched normative data, as well as treatment-associated or longitudinal changes. These “irregularly formed octagons or polygons” can be informative, reflecting different patterns of HRQOL and the impact of underlying disease on “multidimensional function”.

For heuristic or analytic purposes, SF-36 domain score bar graphs are presented as line graphs to aid the viewer in perceiving effects or trends. Similarly, in spydergrams categorical changes are connected (linked) to facilitate visual recognition of patterns, with the disclaimer that this is not intended to imply these are continuous scales. It is not unusual to see figures presented as line graphs that colour in the area below the line, not for significance as an “area under the curve” analysis, but to facilitate visual recognition of differences further. Spydergrams are an evolution from these standard practices, whereby the axis is simply rotated to connect with itself; bar graphs of baseline and changes in domain scores are connected with lines, and areas below the lines are shaded to facilitate pattern recognition.

To compare across disease states, the order of domains presented should be consistent, whether or not they reflect a certain sequence or priority of importance. Convention has dictated that the four physical domains are presented from left to right in a bar chart, then mental domains; thus in a “spydergram” physical function (PF) is at the top, 12 o’clock, followed clockwise by role physical (RP), bodily pain (BP) and general health perceptions (GH), and vitality (VT) at the 6 o’clock position, followed by social functioning (SF), role emotional (RE) and mental health index (MH) clockwise (fig 1A, B).7 15 Domain scores are plotted from 0 (worst) at the centre to 100 (best) at the outside; demarcations along axes of the domains present changes of 10 points, representing one to two times minimally clinical important differences (MCID). Changes in shape and thickness of these irregular octagonal rings offer a single graphic representation to: (1) compare baseline decrements with age and gender-matched normative values; (2) assess treatment-associated or longitudinal improvements in HRQOL and (3) compare and contrast scores across protocols and disease states. As spydergrams allow visualisation of these values simultaneously, they may be presented on an individual basis with norms as a “treatment goal” and were recently utilised in a patient-assessed programme of therapy.16

Figure 1

Rheumatoid arthritis. Data from the PREMIER randomised controlled trial (RCT): adalimumab plus methotrexate (ADA+MTX) versus methotrexate (MTX) in methotrexate-naive subjects with disease duration of 7–9 months. (A) Baseline scores from PREMIER (inner polygon, dark purple) versus age and gender-matched norms specific to this protocol, derived from the US population (outer polygon, light purple), as a spydergram. Differences in health-related quality of life across all domains compared with matched norms in this early rheumatoid arthritis population are easily discernable, with largest decrements in physical function (PF), role physical (RP), bodily pain (BP), social functioning (SF) and role emotional (RE). (B) Treatment-associated improvements at one and 2 years with methotrexate (MTX) monotherapy (orange) or adalimumab plus methotrexate (ADA+MTX) (blue) as concentric rings, compared with baseline and age/gender-matched norms. As each demarcation on the domain axes represents 10 points, changes are large and meet or exceed minimally clinical important differences in all domains at both timepoints in both groups. Although similar at 1 year, incremental improvements at 2 years in successful completers are largest with combination (ADA+MTX) therapy and meet or approach US normative values in five of eight domains, compared with three of eight with methotrexate monotherapy. Short Form-6D (SF-6D) scores reflect large improvements at 1 and 2 years in both treatment groups, well exceeding minimally important difference (MID) of 0.041. (C) Baseline (inner blue polygon) and matched norms (grey) versus treatment-associated changes in the RA Prevention of Structural Damage (RAPID) 1 trial at 1 year: placebo (PL) plus methotrexate (MTX) (orange) versus certolizumab (CZP) plus methotrexate (200 mg dark green; 400 mg light green polygons). Lowest scores were evident in RP and RE, followed by PF and BP, similar to other later disease populations. Improvements with active treatment are greatest in RP, PF and RE, with the largest decrements at baseline. SF-6D scores reflect large improvements in both certolizumab plus methotrexate groups versus placebo plus methotrexate, which does not meet MID of 0.041. As this protocol included subjects recruited outside north America, available US normative data offer a “benchmark” comparison only.

Quantitative summary of changes in SF-36 domain scores

Although suggestive of an area under the curve analysis, this would be misleading. The typical eight-column bar graphs have been linked into a single graphic for ease of interpretation (pattern recognition), but technically represent categorical data, not a continuum. Nevertheless, a summary metric that combines data from all eight domains into a single score is important for quantifying changes in these patterns.

Statistical analyses require a primary outcome measure and PCS and MCS scores are often chosen as a single metric for analysis of SF-36 within rheumatology. However, PCS and MCS scores do not fully reflect patterns of change within the domains as they are derived from z-transformed and norm-based domain scores. The model for their derivation assumes that physical and mental health constructs are independent,17 but in the Swedish SF-36 normative database, Taft et al18 illustrated significant correlations between PCS and MCS scores. Farivar et al19 showed there were fewer negative factor scoring coefficients using an oblique factor than standard orthogonal solutions. Hann and Reeves20 recently tested several models in two large databases, again observing correlations between PCS and MCS scores and that the relationship between domain and PCS and MCS scores varied significantly by medical condition, supporting the argument against the orthogonal derivation of scores. Furthermore, Ware and Kosinski21 22 have argued that: “one of the best defenses against inappropriate conclusions based on the summary measures is the thorough comparison with results based on the 8 SF-36 subscales (domains)”.

An alternative approach to summarise SF-36 domain scores quantitatively could be health state preferences, or utilities valuing health from “0” death, to “1” perfect health, an economic measure critical for evaluation of cost-effectiveness of therapeutic interventions. Ara and Brazier,23 24 Brazier et al25 and Marra et al26 developed a new calculation of SF-6D, which utilises mean scores across all eight SF-36 domains to yield a single utility measure, which has been validated in longitudinal databases and against EQ-5D within a rheumatic disease population. This single valuation may be used to represent baseline decrements and change scores portrayed by spydergrams.

The use of spydergrams to compare and contrast the impact of multiple rheumatic diseases upon HRQOL, measured by SF-36, has made their value apparent in clarifying decrements in HRQOL compared with matched normative data, as well as treatment-associated improvements.

Figure 3

Gout and osteoarthritis. (A) Data from the Vet-QOL survey from veterans with gout and comorbidities (red) versus US norms (light purple polygon), “treatment failure gout” patients enrolled in the longitudinal observational Natural History Study (NHS); (yellow polygon; A/G norms: purple polygon) and treatment failure gout patients enrolled in two phase 3 protocols comparing pegloticase (PGL) versus placebo (dark purple). Short Form-6D (SF-6D) values confirm the impact of chronic gout on health-related quality of life (HRQOL), remarkably similar across these populations. (B) Spydergrams of baseline Short Form-36 (SF-36) scores in individuals with knee osteoarthritis in a randomised controlled trial (RCT) compared with US subjects with a mean age of 59.7 years with hypertension (HTN) and angina, demonstrating that knee osteoarthritis (OA) more profoundly affects physical HRQOL, particularly physical function (PF) and bodily pain (BP), whereas cardiovascular disease has more profound effects on general health perceptions (GH) and vitality (VT) domains. BL, baseline; VAH, Veterans Affairs Hospitals.

SF-36 spydergram in different rheumatic diseases

Figures 13 illustrate SF-36 data, available from published reports and abstract presentations, analysed from RCT in RA, SLE, gout, SSc and osteoarthritis. Age and gender-matched normative data specific to each population were generated based on US norms published in SF-36 manuals and updates.27 Spydergrams were configured for each study, and utility scores were generated following the approach of Ara and Brazier.24 These figures reveal different “polygonal” patterns for each rheumatic disease.

In early and later disease, RA appears to impact all domains of HRQOL, especially RP, PF and BP, but also RE (figs 1A, C).7 28 Treatment-associated changes are large in all, not just physical domains, and are greatest in those with the largest decrements at baseline. In SLE (fig 2A), baseline SF-36 scores were low across all domains compared with matched norms.29 In contrast to RA, large decrements in any one domain do not stand out, reflecting the broad impact of active disease on mental as well as physical domains. When baseline as well as treatment-associated changes are viewed as spydergrams, SF-36 data reflected clinical responses defined by the British Isle Lupus Assessment Group (BILAG) disease activity score, patient and physician global scores and decreases in prednisone dose, despite small sample sizes and loss of balanced randomisation.30 31 32

Figure 2

Systemic lupus erythematosus and systemic sclerosis. (A) In a combined analysis of two prematurely discontinued randomised controlled trials (RCT) in systemic lupus erythematosus, treatment-associated changes with active treatment are large; despite low baseline (BL) scores due to high disease activity: subjects with moderate and/or severe flares (British Isles Lupus Assessment Group (BILAG)) “A” and “B” scores (inner aqua polygon). Age/gender matched normative data (outer light purple polygon) are based on the combined sample size of all subjects with available data. Despite large decrements from matched norms, treatment-associated improvements were evident with the 360 mg/m2 dose of epratuzumab (middle light blue polygon) compared with placebo (not shown), and are reflected by a large change in Short Form-6D (SF-6D) scores. (B) Baseline and treatment-associated changes at 6 months in a systemic sclerosis (SSc) trial, which failed to distinguish active (relaxin) from placebo treatment by clinical as well as Short Form-36 (SF-36) and SF-6D scores.

In SSc (fig 2B), SF-36 scores from a failed RCT with relaxin33 reveal patterns different from previous examples with markedly lower PF and RP scores than those with SLE, including more decrements in VT, BP and GH domains; despite the heterogeneity and multi-organ involvement shared by both conditions.

In the Vet-QOL survey (fig 3A), veterans with gout reported statistically more medical and arthritic comorbidities, hospitalisations and utilisation of outpatient services than those without, and large decrements across all HRQOL domains compared with matched norms.11 In comparison with a “treatment failure gout” population enrolled in an observational Natural History Study,34 baseline scores in both groups were low; remarkably similar, as were SF-36 scores reported by treatment failure gout patients enrolled in two phase 3 protocols comparing pegloticase versus placebo.35 Values reflect decrements in HRQOL comparable to those reported by subjects with longstanding, debilitating RA, or active SLE.36 In contrast, osteoarthritis appears exclusively to impact PF, RP and BP domains, with preservation of other scores, including VT (fatigue)37 (fig 3B).


As we have attempted to demonstrate, spydergrams can provide an effective tool to perceive more quickly patterns of change in complex sets of data. They are designed to illustrate differences between baseline and normative data, and portray treatment-associated changes in the context of age and gender-matched norms specific to the population studied. Due to the inclusion of baseline values and comparisons with matched norms as lower and upper bounds, spydergrams allow visual comparisons of the thickness of “rings” exactly proportional to the degree of changes from baseline. Although the shape of the octagon or polygon would change according to the order of presentation of the domains, perceived effects would still remain proportional along each axis, reflecting the impact of disease, facilitating comparisons across conditions.38 Baseline values and treatment-associated changes, in terms of clinical meaningfulness relative to MCID, can easily be discerned by examination of changes along individual domain axes. SF-36 manuals provide data from which age and gender-matched US norms can be generated. Importantly, normative data are available for Great Britain, Denmark, Norway and Sweden, The Netherlands and Turkey, among others.4 39

Data presented here demonstrate that the pattern of baseline domain scores as well as longitudinal and treatment-associated changes appear to be unique to each rheumatic disease. They also show that improvements tend to occur in domains with the largest decrements at baseline compared with age and gender-matched norms. It is evident that comparing data across all eight domains offers a richness of information not available when solely evaluating PCS and MCS scores or utilising norm-transformed domain scores. Importantly, findings derived from RCT and longitudinal observational studies are similar, supporting the robustness of these observations.

Utilising the recently derived SF-6D utility score to summarise data across all eight domains in a single metric offers a numeric comparison across disease states, in addition to the shape and thickness comparisons offered by spydergrams. The use of SF-6D also facilitates economic evaluations as baseline and change scores can be transformed into utilities for the calculation of quality-adjusted life years, a universal metric in cost-effectiveness analyses. Combining spydergrams with a single metric that generates health utility measures, SF-6D, allows both quantitative and qualitative assessment of the impact of disease and its treatment upon multidimensional function.



  • Funding DK was supported by a National Institutes of Health Award (NIAMS K23 AR053858-01A1) and the Scleroderma Foundation (New Investigator Award).

  • Competing interests VS is a consultant to the following: Abbott Immunology, Alder, Allergan, Almirall, Amgen Corporation, AstraZeneca, Bexel, BiogenIdec, CanFite, Centocor, Chelsea, Crescendo, Cypress Biosciences, Eurodiagnostica, Fibrogen, Forest Laboratories, Genentech, Human Genome Sciences, Idera, Incyte, Jazz Pharmaceuticals, Lexicon Genetics, Logical Therapeutics, Lux Biosciences, Medimmune, Merck Serono, Novartis Pharmaceuticals, NovoNordisk, Nuon, Ono Pharmaceuticals, Pfizer, Procter and Gamble, Rigel, Roche, Sanofi-Aventis, Savient, Schering Plough, SKK, UCB, Wyeth and Xdx. The other authors declare no conflicts of interest.

  • Provenance and Peer review Not commissioned; externally peer reviewed.