Methods to Explain the Clinical Significance of Health Status Measures

doi:10.4065/77.4.371

Mayo Clinic Proceedings

Volume 77, Issue 4, April 2002, Pages 371-383

https://doi.org/10.4065/77.4.371 Get rights and content

One can classify ways to establish the interpretability of quality-of-life measures as anchor based or distribution based. Anchor-based measures require an independent standard or anchor that is itself interpretable and at least moderately correlated with the instrument being explored. One can further classify anchor-based approaches into population-focused and individual-focused measures. Population-focused approaches are analogous to construct validation and rely on multiple anchors that frame an individual's response in terms of the entire population (eg, a group of patients with a score of 40 has a mortality of 20%). Anchors for population-based approaches include status on a single item, diagnosis, symptoms, disease severity, and response to treatment. Individual-focused approaches are analogous to criterion validation. These methods, which rely on a single anchor and establish a minimum important difference in change in score, require 2 steps. The first step establishes the smallest change in score that patients consider, on average, to be important (the minimum important difference). The second step estimates the proportion of patients who have achieved that minimum important difference. Anchors for the individual-focused approach include global ratings of change within patients and global ratings of differences between patients. Distribution-based methods rely on expressing an effect in terms of the underlying distribution of results. Investigators may express effects in terms of betweenperson standard deviation units, within-person standard deviation units, and the standard error of measurement. No single approach to interpretability is perfect. Use of multiple strategies is likely to enhance the interpretability of any particular instrument.

Section snippets

THE PROBLEM OF MEANINGFULNESS

Those responsible for making treatment recommendations, such as clinicians for individual patients or experts and health policymakers for groups of patients, must weigh the expected benefits of a treatment against its adverse effects, toxic effects, inconvenience, and cost. This process requires a reasonably accurate understanding of the benefits and risks of alternative treatments. Acquiring this understanding presents a significant problem even for dichotomous clinical outcomes, such as

THE TARGET AUDIENCES FOR CLINICAL SIGNIFICANCE

The intended audience for our discussion on clinical significance includes patients, clinicians, and policymakers. Increasing awareness that value judgments are implicit in every clinical management decision⁷ has focused more attention on the role of the patient in the decision-making process.8, 9 For patients who desire major involvement in decision making, one approach involves presenting patients with the options and eliciting their choice. Using this approach requires that patients

THE PROBLEM OF MEANINGFULNESS IN QOL MEASURES

We have noted a problem in presenting results of studies using binary outcomes: the different meaning conveyed by relative and absolute risk reduction, NNT, and life-years gained. The complexity increases with the realization that no binary outcome is truly unambiguous. Deaths can be painful or painless, strokes can be mild or severe, and myocardial infarctions can be large and complicated or small and uncomplicated. In fact, severity of stroke and myocardial infarction are continuous in

INFERENCES CONCERNING INDIVIDUALS AND INFERENCES CONCERNING GROUPS

Observers frequently distinguish between the significance of a particular change in score in an individual and a change of the same magnitude in the mean score of a group of patients.¹² A change in mean blood pressure in a population of a magnitude that would be trivial in an individual (eg, 2 mm Hg) may translate into a large number of reduced strokes in a population. Indeed, a mean change of 2 mm Hg in a population would reduce the number of strokes substantially. There are 2 reasons for the

ANCHOR-BASED METHODS

Investigators have used 2 easily separable strategies to achieve an understanding of the meaning of scores on a given instrument.¹² The first relies on anchor-based methods and examines the relationship between scores on the instrument whose interpretation is under question (the target instrument) and some independent measure (an anchor). For instance, we might examine the relationship between scores on a QOL measure for heart failure and the New York Heart Association (NYHA) functional

APPROACHES FOR IDENTIFYING CLINICAL SIGNIFICANCE

We have not conducted a systematic search for approaches to clinical significance. Thus, our examples are neither comprehensive nor representative. Rather, we have attempted to provide a broad sample of approaches investigators have used, focusing on those we believe are both well done and instructive. However, we have surveyed the entire group of participants in this conference to ensure that we have not omitted any salient methods.

Similarly, we have not tried to be systematic in our critique.

ANCHOR-BASED METHODS OF ESTABLISHING INTERPRETABILITY: REQUIREMENTS

Whether relying on a single anchor or multiple anchors, anchor-based methods have 2 requirements. First, the anchor must be interpretable. It would be of little use to tell clinicians that a 2-point change per item in the fatigue scale (range, 1-7) in the Chronic Heart Failure Questionnaire (CHQ)¹⁶ is equivalent to a 30-point change in the Medical Outcome Study physical function scale if they had no idea how to interpret the Medical Outcome Study instrument. On the other hand, if they use the

CLINICIANS’ TRADITIONAL APPROACHES AND INTERPRETABILITY

Experienced clinicians show little hesitation in acting on the clinical measures, yielding continuous scores, by which they judge their patients’ status. Hemoglobin concentration, platelet count, creatinine level, and treadmill exercise capacity constitute a few examples. How does the process of establishing interpretability occur? How, for instance, do chest physicians decide that a change in forced expiratory volume in 1 second (FEV₁) of 15% approximates a minimum important change?

Chest

MULTIPLE ANCHORS

Ware and Keller,¹⁸ with the 36-Item Short-Form Health Survey (SF-36), have accomplished extensive and comprehensive work using multiple anchors, and we rely to a large extent on their studies to provide examples of this approach. In our discussion, we deal initially with anchors that involve concurrent measurement of the target and anchor and subsequently discuss anchors that involve monitoring patient outcome over time (health care utilization, job loss, and death).

SINGLE-ANCHOR METHODS The Minimum Important Difference

Single-anchor methods generally aim to establish differences in score on the target instrument that constitute trivial, small but important, moderate, and large changes in QOL. However, they generally put great emphasis on a threshold that demarcates trivial from small but important differences: the minimum important difference (MID). One popular definition of the MID is “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in

ANALYTIC STRATEGIES FOR SINGLE-ANCHOR APPROACHES

Having chosen a single-anchor approach, investigators may use alternative analytic strategies that will lead to different estimates of the MID.⁵⁴ The simplest and so far most widely used approach is to specify a result or range of anchor instrument results that corresponds to the MID and calculate the target score corresponding to that value. For example, investigators have examined the mean change in QOL score corresponding to global ratings of change that included “hardly any better,” “a

SINGLE-ANCHOR APPROACHES AND CLINICAL TRIALS INTERPRETATION

Once one has established the MID for a patient, one must decide how to use this information in clinical trials. A naive approach would assume that if the mean difference between treatment and control was less than the MID, the treatment effect would be trivial, and if greater than the MID, the treatment effect would be important. This ignores the distribution of the results. For example, assume a MID of 0.5. A mean difference of 0.25 (trivial in a naive interpretation) could be achieved if 25%

BETWEEN-PERSON STANDARD DEVIATION UNITS

The most widely used distribution-based method to date is the between-person standard deviation. The group from which this is drawn is typically the control group of a particular study at baseline or the pooled standard deviation of the treatment and control groups at baseline. As we have mentioned herein, an alternative is to choose the standard deviation for a sample of the general population or some particular population of special interest, rather than the population of the particular

STANDARD ERROR OF MEASUREMENT

The standard error of measurement is defined as the variability between an individual's observed score and the true score and is computed as the baseline standard deviation multiplied by the square root of 1 minus the reliability of the QOL measure. Theoretically, a QOL measure's standard error of measurement is sample independent, whereas its component statistics, the standard deviation and the reliability estimate, are sample dependent and vary around the standard error of measurement.⁶⁴ For

RECONCILIATION OF ANCHOR-BASED AND DISTRIBUTION-BASED METHODS

Investigators are adducing increasing evidence concerning the relationship between statistical measures of patient variability and anchor-based estimates of small, moderate, and large differences in QOL. To the extent that standard deviations across QOL studies using the same instruments are consistent, one will see a consistent relationship between the standard deviation and the MID. If this relationship were also consistent across instruments, this area of investigation would become much

CONCLUSIONS

This review reflects both the considerable work that has been done to establish the interpretability of QOL measures in the last 15 years and the enormous amount left to do. The field remains controversial, and there are many alternative approaches, each with its advocates. The following conclusions, however, may be relatively safe. First, distribution-based methods will not suffice on their own but will be useful to the extent that they bear a consistent relationship with anchor-based methods.

REFERENCES (67)

CD Naylor et al.
Can there be a more patient-centered approach to determining clinically important effect sizes for randomized treatment trials?
J Clin Epidemiol
(1994)
M Bobbio et al.
Completeness of reporting trial results: effect on physicians' willingness to prescribe
Lancet
(1994)
A Fletcher et al.
Quality of life on angina therapy: a randomised controlled trial of transdermal glyceryl trinitrate against placebo
Lancet
(1988)
R Jaeschke et al.
Measurement of health status: ascertaining the minimal clinically important difference
Control Clin Trials
(1989)
EF Juniper et al.
Determining a minimal important change in a disease-specific Quality of Life Questionnaire
J Clin Epidemiol
(1994)
EF Juniper et al.
Interpretation of rhinoconjunctivitis quality of life questionnaire data
J Allergy Clin Immunol
(1996)
GR Norman et al.
Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach
J Clin Epidemiol
(1997)
DA Redelmeier et al.
Assessing the minimal important difference in symptoms: a comparison of two techniques
J Clin Epidemiol
(1996)
DA Redelmeier et al.
On the debate over methods for estimating the clinically important difference
J Clin Epidemiol
(1996)
RA Deyo et al.
Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance
J Chronic Dis
(1986)

MM Ward et al.

Identification of clinically important changes in health status using receiver operating characteristic curves

J Clin Epidemiol

(2000)

RS Goldstein et al.

Economic analysis of respiratory rehabilitation

Chest

(1997)

KW Wyrwich et al.

Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life

J Clin Epidemiol

(1999)

R Hebert et al.

Setting the minimal metrically detectable change on disability rating scales

Arch Phys Med Rehabil

(1997)

AR Feinstein

Indexes of contrast and quantitative significance for comparisons of two groups

Stat Med

(1999)

DC Naylor et al.

Measured enthusiasm: does the method of reporting trial results alter perceptions of therapeutic effectiveness?

Ann Intern Med

(1992)

JE Hux et al.

Prescribing propensity: influence of life-expectancy gains and drug costs

J Gen Intern Med

(1994)

DA Redelmeier et al.

Discrepancy between medical decisions for individual patients and for groups

N Engl J Med

(1990)

GH Guyatt et al.

Users' guides to the medical literature, XVI: how to use a treatment recommendation

JAMA

(1999)

AM O'Connor et al.

Decision aids for patients facing health treatment or screening decisions: systematic review

BMJ

(1999)

G Guyatt et al.

Moving from evidence to action: incorporating patient values

GH Guyatt et al.

Measuring health-related quality of life

Ann Intern Med

(1993)

MA Testa

Interpretation of quality-of-life outcomes: issues that affect magnitude and meaning

Med Care

(2000)

E Lydick et al.

Interpretation of quality of life changes

Qual Life Res

(1993)

GH Guyatt et al.

Interpreting treatment effects in randomised trials

BMJ

(1998)

R De Haan et al.

The clinical meaning of Rankin “handicap” grades after stroke

Stroke

(1995)

JM Wardlaw et al.

Thrombolysis for acute ischaemic stroke

Cochrane Database Syst Rev

(2000)

GH Guyatt et al.

Development and testing of a new measure of health status for clinical trials in heart failure

J Gen Intern Med

(1989)

E Lydick

Approaches to the interpretation of quality-of-life scales

Med Care

(2000)

JE Ware et al.

Interpreting general health measures

MS Thompson et al.

The cost effectiveness of auranofin: results of a randomized clinical trial

J Rheumatol

(1988)

WB Brooks et al.

The impact of psychologic factors on measurement of functional status: assessment of the sickness impact profile

Med Care

(1990)

RA Deyo et al.

Measuring functional outcomes in chronic disease: a comparison of traditional scales and a self-administered health status questionnaire in patients with rheumatoid arthritis

Med Care

(1983)

Cited by (1187)

Minimal clinically important difference: Bridging the gap between statistical significance and clinical meaningfulness
2024, Journal of Clinical Anesthesia
Agreement between 30-day and 90-day modified Rankin Scale score and utility-weighted modified Rankin Scale score in acute intracerebral hemorrhage: An analysis of ATACH-2 trial data
2024, Journal of Clinical Neuroscience
The relationship between 30- and 90-day modified Rankin Scale (mRS) scores in intracerebral hemorrhage (ICH) patients was evaluated. This post hoc cohort analysis of the ATACH-2 trial included patients with acute ICH who were alive at 30 days and who had mRS scores reported at 30 and 90 days. The mRS score was then converted to a utility (EuroQol-5 Dimension-3 Level [EQ-5D-3L])–weighted mRS score. After adjustment of 30-day mRS score for key covariates using multivariable ordinal regression, the relationship between 30-day and observed 90-day functional outcome was assessed via absolute difference in the utility-weighted version. Of the 1000 trial subjects, 898 met inclusion criteria. This low-moderate severity ICH cohort had a median baseline GCS score of 15 and median hematoma volume of 9.7 mL. Observed 30-day mRS had the largest association with observed 90-day values (χ² = 302.9, p < 0.0001). Patients generally either maintained the same mRS scores between 30 and 90 days (48 %) or experienced a 1-point (32 %) or 2-point (10 %) improvement by 90 days. The mean ± standard deviation (SD) EQ-5D-3L at 90 days was 0.67 ± 0.26. Following adjustment, the mean absolute difference between predicted and observed utility-weighted 90-day mRS scores was 0.006 ± 0.13 points and less than the estimated minimal clinically important difference of 0.13 points. The difference in average utility-weighted mRS scores at 30 and 90 days was not clinically relevant, suggesting 30-day score may be a reasonable proxy for 90-day values in patients with ICH when 90-day values are not available.
Minimal clinically important differences of spatiotemporal gait variables in Parkinson disease
2024, Gait and Posture
Assessment of gait function in People with Parkinson Disease (PwPD) is an important tool for monitoring disease progression in PD. While comprehensive gait analysis has become increasingly popular, only one study, Hass et al. (2014), has established minimal clinically important differences (MCID) for one spatiotemporal variable (velocity) in PwPD.
What are the MCIDs for velocity and additional spatiotemporal variables, including mean, variability, and asymmetry of step length, time, and width?
As part of a larger clinic-based initiative, 382 medicated, ambulatory PwPD walked on an instrumented walkway during routine clinical visits. Distribution and anchor-based methods (Unified Parkinson's Disease Rating Scale-III, Modified Hoehn and Yahr, and the mobility subsection of the Parkinson Disease Questionnaire) were used to calculate MCIDs for variables of interest in a cross-sectional approach.
Distribution measures for all variables are presented. Of nine gait variables, four were significantly associated with every anchor and pooled to the following values: velocity (8.2 cm/s), step length mean (3.6 cm), step length variability (0.7%), and step time variability (0.67%).
The finalized MCID for velocity (8.2 cm/s) was nearly half of the MCID of 15 cm/s reported by Hass et al., potentially due to differences in calculations. These results allow for evaluations of effectiveness of interventions by providing values that are specific to changes in gait for PwPD. Alterations of methodology including different versions of clinical or walking assessments, and/or different calculation and selection of gait variables necessitate careful reasoning when using presented MCIDs.
EORTC QLQ-C30 normative data for the United Kingdom: Results of a cross-sectional survey of the general population
2024, European Journal of Cancer
The cancer-specific health-related quality of life (HRQoL) questionnaire of the European Organisation for Research and Treatment of Cancer (EORTC), the EORTC QLQ-C30, is a frequently applied questionnaire to assess cancer patients’ self-reported health used as part of research and clinical practice. Normative data obtained from the general population can facilitate the interpretation of these data. Despite its frequent application, no detailed EORTC QLQ-C30 normative data have yet been published for the United Kingdom (UK). This study presents detailed EORTC QLQ-C30 normative data for the United Kingdom overall and by sex and age.
The data are drawn from a larger published, international, cross-sectional online survey. For the recruitment, the sample was stratified by sex (males, females) and age in five age groups with a sample size of n = 100 per subgroup.
A total of N = 1026 UK respondents completed the survey (n = 517 females, n = 509 males). There were no clear subgroup patterns by sex or age; however, older patients tended to show higher (i.e., better) scores in emotional and social functioning; they also reported some of the lowest (i.e., best) scores for symptoms, such as insomnia, appetite loss, diarrhoea, nausea/vomiting or financial difficulties.
This paper provides EORTC QLQ-C30 general population normative data for the UK, further stratified by sex and age. These data will greatly support the interpretation of EORTC QLQ-C30 scale scores obtained from UK cancer patients, and also enable comparison with other detailed national normative datasets collected in the same project, across several other European countries and the US.
The minimal important difference for the Postural Assessment Scale for Stroke Patients in the subacute stage
2024, Brazilian Journal of Physical Therapy
The minimal important difference (MID) of the Postural Assessment Scale for Stroke Patients (PASS) remains unknown, limiting the interpretation of change scores.
To estimate the MID of the PASS in patients with subacute stroke.
Data at admission and discharge for 240 participants were retrieved from a longitudinal study. The “mobility” item of the Barthel Index was used as the anchor for indicating the improvement of posture control. Receiver operating characteristic (ROC) method was used to estimate the anchor-based MID of the PASS.
The ROC method identified a MID of 3.0 points, with a sensitivity of 81.0 % and a specificity of 75.6 %.
The MID of the PASS was 3.0 points, indicating that if a patient achieves an improvement of 3.0 or more points on the PASS, they have a clinically important improvement in posture control. Our results can help in interpreting change scores and aid in understanding the clinical values of treatment outcomes.
Minimal important change and difference in health outcome: An overview of approaches, concepts, and methods
2024, Osteoarthritis and Cartilage
To provide an overview of approaches, concepts, and methods used to define and assess minimal important change and difference in health outcome.
A narrative review of the literature, guided by a conceptual framework.
We distinguish between (i) interpretation of health outcome in individuals versus groups, (ii) change within individuals or groups versus difference between change within individuals or groups; and (iii) the responder approach (based on the proportion of patients that obtain a defined response) versus the group average approach (based on the average amount of change in a group). We review approaches, concepts, and methods.
By bringing together and juxtaposing various approaches, concepts, and methods, we set a precursory step in the direction of consensus building in the field concerned with defining and assessing minimal important change and difference in health outcome. We emphasize the need for conceptual clarification and terminological standardization. We argue that assessing minimal importance of change and difference in health outcome is essentially a value judgment involving a range of considerations and perspectives.

View all citing articles on Scopus

A complete list of other Clinical Significance Consensus Meeting Group contributors to this article appears at the end of the article.

This project was supported in part by Public Health Service grants CA25224, CA37404, CA15083, CA35269, CA35113, CA35272, CA52352, CA35103, CA37417, CA63849, CA35448, CA35101, CA35195, CA35415, and CA35103.

Individual reprints of this article are not available. The entire Symposium on the Clinical Significance of Quality-of-Life Measures in Cancer Patients will be available for purchase as a bound booklet from the Proceedings Editorial Office at a later date.

View full text

Symposium on Quality of Life in Cancer PatientsMethods to Explain the Clinical Significance of Health Status Measures

Section snippets

THE PROBLEM OF MEANINGFULNESS

THE TARGET AUDIENCES FOR CLINICAL SIGNIFICANCE

THE PROBLEM OF MEANINGFULNESS IN QOL MEASURES

INFERENCES CONCERNING INDIVIDUALS AND INFERENCES CONCERNING GROUPS

ANCHOR-BASED METHODS

APPROACHES FOR IDENTIFYING CLINICAL SIGNIFICANCE

ANCHOR-BASED METHODS OF ESTABLISHING INTERPRETABILITY: REQUIREMENTS

CLINICIANS’ TRADITIONAL APPROACHES AND INTERPRETABILITY

MULTIPLE ANCHORS

SINGLE-ANCHOR METHODS The Minimum Important Difference

ANALYTIC STRATEGIES FOR SINGLE-ANCHOR APPROACHES

SINGLE-ANCHOR APPROACHES AND CLINICAL TRIALS INTERPRETATION

BETWEEN-PERSON STANDARD DEVIATION UNITS

STANDARD ERROR OF MEASUREMENT

RECONCILIATION OF ANCHOR-BASED AND DISTRIBUTION-BASED METHODS

CONCLUSIONS

J Clin Epidemiol

Lancet

Lancet

Control Clin Trials

J Clin Epidemiol

J Allergy Clin Immunol

J Clin Epidemiol

J Clin Epidemiol

J Clin Epidemiol

J Chronic Dis

J Clin Epidemiol

Chest

J Clin Epidemiol

Arch Phys Med Rehabil

Indexes of contrast and quantitative significance for comparisons of two groups

Stat Med

Measured enthusiasm: does the method of reporting trial results alter perceptions of therapeutic effectiveness?

Ann Intern Med

Prescribing propensity: influence of life-expectancy gains and drug costs

J Gen Intern Med

Discrepancy between medical decisions for individual patients and for groups

N Engl J Med

Users' guides to the medical literature, XVI: how to use a treatment recommendation

JAMA

Decision aids for patients facing health treatment or screening decisions: systematic review

BMJ

Moving from evidence to action: incorporating patient values

Measuring health-related quality of life

Ann Intern Med

Interpretation of quality-of-life outcomes: issues that affect magnitude and meaning

Med Care

Interpretation of quality of life changes

Qual Life Res

Interpreting treatment effects in randomised trials

BMJ

The clinical meaning of Rankin “handicap” grades after stroke

Stroke

Thrombolysis for acute ischaemic stroke

Cochrane Database Syst Rev

Development and testing of a new measure of health status for clinical trials in heart failure

J Gen Intern Med

Approaches to the interpretation of quality-of-life scales

Med Care

Interpreting general health measures

The cost effectiveness of auranofin: results of a randomized clinical trial

J Rheumatol

The impact of psychologic factors on measurement of functional status: assessment of the sickness impact profile

Med Care

Measuring functional outcomes in chronic disease: a comparison of traditional scales and a self-administered health status questionnaire in patients with rheumatoid arthritis

Med Care

Symposium on Quality of Life in Cancer Patients
Methods to Explain the Clinical Significance of Health Status Measures