Article Text

other Versions

PDF

Measurements, composite scores and the art of ‘cutting-off’
  1. Pedro M Machado
  1. Centre for Rheumatology Research & MRC Centre for Neuromuscular Diseases, University College London, London, UK
  1. Correspondence to Dr Pedro M Machado, Centre for Rheumatology Research & MRC Centre for Neuromuscular Diseases, Box 102, 8-11 Queen Square, London WC1N 3BG, UK; p.machado{at}ucl.ac.uk, pedrommcmachado{at}gmail.com

Statistics from Altmetric.com

Measuring is an essential part of medicine, both in research as well as in clinical practice. Clinical reasoning itself is a mental exercise based on a succession of tests and assessments (signs and symptoms and their quantification, impact on the patient, findings from the physical examination, results from laboratory and imaging investigations, among others) that result in making a diagnosis and making therapeutic decisions.1 Lord Kelvin’s (1824–1907) quote from 1883 extracted from a lecture on ‘Electrical Units of Measurement’ also applies to modern medicine: ‘When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be’.

The use of general or disease-specific measuring tools has become routine in rheumatology. In addition to the critical need of these tools for drug development, they are also valuable for patient management. These tools vary from simple visual analogue scales or numerical rating scales, for example, for pain or global assessment of disease activity, to patient-reported outcomes based on multiple questions, such as the Health Assessment Questionnaire (HAQ) or the Bath Ankylosing Spondylitis Functional Index, to composite indices such as the 28-joint count Disease Activity Score (DAS28) and the Ankylosing Spondylitis Disease Activity Score, which combine patient-reported and/or physician-reported variables with the laboratory measurement of acute phase reactants (C reactive protein (CRP) or erythrocyte sedimentation rate). Composite measures can be particularly helpful and are more likely to give more complete and reliable information about a certain health outcome such as disease activity.

Cut-offs increase the interpretability of a measurement or score, making it more meaningful and likely to be applied both in clinical practice as well as in the research setting. Ultimately, if the same score is being used across different settings, this allows for data to be compared and pooled, which can be useful for example for clinical benchmarking and for performing meta-analyses. Furthermore, an early diagnosis, determination of early therapeutic response and monitoring therapeutic response have now become increasingly important because very effective therapies are available for several rheumatic inflammatory diseases and they may even be more effective if used in early disease stages. Treat-to-target strategies have been proposed in diseases such as rheumatoid arthritis and spondyloarthritis, including ankylosing spondylitis and psoriatic arthritis (PsA),2–4 and composite scores can facilitate these treat-to-target strategies. The recently published Tight Control of Psoriatic Arthritis (TICOPA) trial confirmed that a treat-to-target approach can improve clinical outcomes for patients with early PsA.5

PsA is a heterogeneous disease, characterised by the involvement of peripheral joints, entheses, axial joints, skin and nails. The degree of involvement of these different domains varies and changes over time in individual patients. In PsA, several measures of disease activity states and therapeutic response have been proposed, including composite measures (table 1).6–10 Some of these measures have been adapted from other diseases (Disease Activity Score (DAS), DAS28, European League Against Rheumatism (EULAR) response criteria and American College of Rheumatology (ACR) response criteria, all adapted from rheumatoid arthritis (RA), and the Disease Activity in PsA (DAPSA) score, adapted from reactive arthritis), while others have been specifically developed for use in PsA (Psoriatic Arthritis Response Criteria, Psoriatic Arthritis Disease Activity Score, Psoriatic Arthritis Joint Activity Index and Composite Psoriatic Disease Activity Index (CPDAI), modified versions of the CPDAI, Arithmetic Mean of Desirability Function, GRAppa Composite Exercise Index and Minimal Disease Activity).

Table 1

Summary of composite scores proposed to assess disease activity in psoriatic arthritis and variables/domains included in the composite scores

Schoels and colleagues11 present the results of a study developing cut-offs for disease activity states and response criteria according to the DAPSA and the clinical DAPSA (cDAPSA). The investigators started by retrieving 30 patient profiles from an observational data set and performed an email-based survey among 44 international rheumatology experts asking them to classify the disease activity state of each patient (remission (REM), low disease activity (LDA), moderate disease activity (MDA) and high disease activity (HDA)) based on their 66/68 joint counts, patient global assessments (visual analogue scale; VAS), patient pain scores (VAS) and CRP values. Distributions of DAPSA/cDAPSA scores within each state were analysed and cut-offs determined by calculating the respective 25th and 75th percentiles of DAPSA/cDAPSA, with numerical differences between the 75th percentile of the lower and the 25th percentile of the adjacent higher disease activity state reconciled by calculating their mean (if necessary). The following DAPSA cut-offs were proposed for REM, DAPSA≤4; for LDA, 4<DAPSA≤14; for MDA, 14<DAPSA≤28; for HDA, DAPSA>28. Derived cDAPSA cut-offs were similar and the investigators arbitrarily proposed to reduce the cDAPSA cut-off for MDA and HDA by one point compared with the DAPSA, to account for the putative higher levels of CRP in patients with these levels of disease activity: for REM, cDAPSA≤4; for LDA, 4<cDAPSA≤13; for MDA, 13<cDAPSA≤27; for HDA, cDAPSA>27.

To define minor, moderate and major treatment response, the investigators used peak values of Cohen's κ agreement to detect the highest agreement between DAPSA percentage response and ACR20/50/70 response in three randomised controlled trials. After summarising all the analyses, the following DAPSA response criteria were proposed: minor response, 50% change in DAPSA; moderate response, 75% change; major response, 85% change. Discriminative validity was assessed using χ2 statistics.

This study represents an important contribution to the field of PsA and it may stimulate the use of the DAPSA/cDAPSA among practising rheumatologists and PsA researchers. Of note, the DAPSA/cDAPSA has a simple arithmetic formula and can be mentally calculated in clinic, which offers an advantage compared with other more complex composite measures. This study also defined new potential (articular) treatment targets in PsA, namely REM (a more ambitious target) and LDA (a less ambitious target) according to the DAPSA. The cut-off for MDA could potentially be used as a criterion for inclusion of PsA patients in clinical trials of biological therapies, but this use still requires further validation.

Some methodological aspects of this study are of interest to debate. Conceptually, the option of allowing the same patient to be included in more than one disease activity state according to the external construct (opinion of the expert) is a questionable approach because it artificially inflates the number of patients in each disease activity state. However, the fact that in a sensitivity analysis, with patient profiles assigned to a particular disease activity state only, if a majority (>50%) of experts adjudicated that state, resulted in similar results is somehow reassuring. It would have also been of interest to incorporate the patients’ perspective in the determination of the cut-offs by using patient-reported disease activity states as the external construct and assess how these compared with the physicians’ perspective.

Regarding the methodology used for cut-off determination of disease activity states (25th and 75th percentiles of the distribution of DAPSA/cDAPSA scores), this is a valid but not the only possible approach to cut-off determination. Other methods commonly used for establishing ‘optimal’ cut-points are the point on the receiver operating characteristic curve (ROC curve) closest to (0, 1), ie. the point closest to perfect differentiation, and the Youden index, that is, the point farthest from none differentiation (figure 1).12 These two ROC curve points do not necessarily agree and the Youden index is generally favoured in the literature. Furthermore, clinical reasoning should also be part of the ‘art of cutting-off’ and depending on the clinical scenario and the need for stringency in the cut-off being established, the use of alternative criteria such as the ROC curve based cut-offs of 90% specificity/sensitivity or the score distribution cut-offs of the 10th/90th percentile may be more appropriate.13 ,14

Figure 1

Simulation of a receiver operating characteristic (ROC) curve and classical ‘optimal’ cut-points (adapted from ref 10). The vertical lines and reference arcs identify the Youden index (solid lines) and the closest point to (0,1) (dashed lines) and their corresponding cut-points (red dot and green dot, respectively); the cut-point for the Youden index corresponds to the point in the ROC curve with the maximum value of (sensitivity—(1—specificity)); the cut-point for the closest point to (0, 1) corresponds to the point in the ROC curve with the minimum value of ((1—sensitivity)2+(1—specificity)2).

Importantly, the cut-off selection procedure should always be an informed decision that takes into account the clinical situation (eg, potential treatment implications of the cut-off) and the relative consequences of false-negative and false-positive test results (which may differ according to different clinical decision-making situations). A step further in cut-off determination that has rarely been taken because of its complexity is the optimisation of cut-offs with regard to the cost implications of false-positive and false-negative results.12 ,15

On the topic of cut-off determination, it should also be borne in mind that the cut-off values for diagnostic tests are usually derived using different methods among which the Gaussian (normal) distribution method is the most commonly used. Based on this method, a cut-off value is defined as the mean plus two SD of the negative reference sample. The rationale of the two SD approach is to establish a cut-off value providing a specificity of 97.5%, although this method may not be adequate if the test values follow a skewed or multimodal distribution.15

Regarding the methodology used by Schoels and colleagues for cut-off determination of response criteria for the DAPSA, the use of the ACR20/50/70 as external criteria is supported by the fact that they have been extensively used to assess response in PsA clinical trials (albeit being adapted from RA). However, the ACR20/50/70 criteria focus on the joint domain, which limits their value as a measure of change in a multidimensional disease such as PsA. Furthermore, the use of the κ statistic to define levels of response according to the DAPSA also has limitations. Since the scores on the external construct are dichotomous, the use of the area under the ROC curve would be the preferred method with sensitivity and specificity being the preferred parameters.16 ,17 It would have also been of interest to use a ‘global rating of change’ (GRC) as the external construct and to compare the results with the ACR20/50/70 external construct. The GRC is a Likert-type scale scored for change by the patient (eg. ‘much better’, ‘better’, ‘unchanged’, ‘worse’ and ‘much worse’). The GRC is a useful external criterion for defining treatment response using ROC-curve analysis.18–20

Finally, the use of a measure that omits some of the PsA disease domains focusing mainly on peripheral joint activity is not consensual. The authors argue that “given the relatively low frequency of entheseal involvement and the availability of separate tools for both entheseal and spinal involvement, it seems reasonable to focus on joint activity and systemic levels of inflammation” and that the various domains respond differently to the various therapies, a comment that is particularly relevant regarding the skin versus articular manifestations. Conversely, one could also argue that by not taking the various disease domains into account we are missing information, which may be particularly relevant in terms of defining disease REM. Of note, criteria to define a REM-like state (‘minimal disease activity’, allowing minor disease activity and not necessarily ‘complete REM’), taking multiple domains into account, have been previously proposed and validated in PsA. According to Coates and colleagues,21 patients can be classified as having ‘minimal disease activity’ if they fulfil five of seven outcome measures: tender joint count ≤1; swollen joint count ≤1; psoriasis activity and severity index (PASI) ≤1 or body surface area ≤3; patient pain VAS score ≤15 (0–100 scale); patient global disease activity VAS score ≤20 (0–100 scale); HAQ score ≤0.5 and tender entheseal points ≤1. Interestingly, the ‘minimal disease activity’ criteria were the criteria used as therapeutic target in the TICOPA trial.5

In summary, diseases activity states and response criteria according to the DAPSA/cDAPSA have been developed and are ready to be rolled out into clinical practice and clinical trials. Further validation and comparative studies, namely with other measures and disease activity states and therapeutic responses that have been proposed in PsA, are required until a final consensus can be reached about the best measure(s) and treatment target(s) to use in PsA. This discussion is ongoing22–24 and being fuelled by recent publications on the topic.5 ,11 In the future, further validation of these newly developed cut-offs should be undertaken in larger cohorts of patients from observational studies as well as further assessment of the performance of the score in different subgroups of PsA patients. It would also be of interest to explore the concept of ‘flare’ according to the DAPSA/cDAPSA as well as the prognostic validity of DAPSA/cDAPSA cut-off levels with regard to structural damage and disability.

Acknowledgments

PMM is supported by the National Institute for Health Research (NIHR) Rare Diseases Translational Research Collaboration (RD TRC) and by the National Institute for Health Research (NIHR) University College London Hospitals (UCLH) Biomedical Research Centre (BRC). The views expressed are those of the authors and not necessarily those of the National Health Service (NHS), the NIHR or the Department of Health.

References

View Abstract

Footnotes

  • Contributors PMM drafted the article and revised it critically for important intellectual content.

  • Funding National Institute for Health Research.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles