Reporting of patient-reported outcomes in recent trials in rheumatoid arthritis: a systematic literature review
  1. U Kalyoncu1,
  2. M Dougados1,
  3. J-P Daurès2,
  4. L Gossec1,2
  1. 1
    Paris Descartes University, Medicine Faculty; UPRES-EA 4058; APHP, Rheumatology B Department, Cochin Hospital, Paris France
  2. 2
    Université Montpellier I, EA 2415 Epidémiologie, Biostatistique, Santé Publique, Montpellier, France
  1. Umut Kalyoncu, Hacettepe Üniversitesi Tıp Fakültesi, Sıhhiye, 06100, Ankara, Türkiye; umutkalyoncu{at}


Objectives: Patient-reported outcomes (PROs) have been increasingly recognised as important in rheumatoid arthritis (RA). The objective of this study was to assess the frequency of use of different PROs in recently published RA articles and to compare the tools used through a systemic literature review.

Methods: (1) Data source: In PUBMED MEDLINE database, articles reporting any type of clinical study for adult patients with RA, published between February 2005 and February 2007, and reporting any type of PRO. Articles were excluded if they did not concern adult RA or if they did not report any PROs. (2) Data extraction: demographic characteristics of patients, study design, treatment assessed and all PROs. (3) Data analysis: descriptive.

Results: Of 109 reports, 50 (45%) were randomised controlled trials and 59 were other types of studies. A total of 63 questionnaires or tools for PROs were used, corresponding to 14 domains of health. Frequently reported domains (and most frequent tools) were: function, 83% (most frequent tool, health assessment questionnaire, HAQ); patient global assessment, 61% (most frequent tool, visual analogue scale, VAS); pain, 56% (VAS); and morning stiffness 27%. Domains such as fatigue, coping or sleep disturbance were infrequently reported.

Conclusions: PROs are reported with great heterogeneity in recently published trials in RA. Some domains that appear important from the patient’s perspective are infrequently reported. Further work is needed in this field.

Rheumatoid arthritis (RA) is currently recognised as a heterogeneous entity that is usually diagnosed with reference to American College of Rheumatology (ACR) classification criteria.1 The clinical course of RA is variable and its prognosis is difficult to predict.2 In many patients, the disease process is severe and may result in progressive joint destruction and severe disability. For the purpose of preventing joint destruction, it is important to be able to detect inflammation, ie, synovitis and acute phase reactants as these elements seem closely correlated to further bone erosions.3 However, it is not sufficient to monitor these objective elements reflecting inflammation. Indeed, RA is also a disease that leads to a considerable burden of disease for patients, ie, to symptoms such as pain and functional disability. Monitoring patients’ symptoms is necessary in RA.

The main outcome measures used for RA clinical trials over the last 10 years are issued from a consensus of groups of experts from the ACR, the European League Against Rheumatism (EULAR) and methodologists from, eg, Outcome Measures in Rheumatology Clinical Trials (OMERACT). All of these organisations have endorsed a “core set” of data for use in clinical trials.46 This core set is composed of tender joint count, swollen joint count, patient’s assessment of pain, patient’s and physician’s global assessments of disease activity, patient’s assessment of physical function, and laboratory evaluation of one acute-phase reactant. Thus, patient-reported outcomes (PROs), defined here as outcomes that are completed by patients, are already well-established by physician experts as important in RA, in this case by assessment of pain, functional disability and patient global assessment. However, it has been increasingly recognised as important over the past few years to take into account the patients’ perspective in RA.7 8 Several publications issued from patient group discussions9 10 or patient focus groups11 indicate that some domains or areas of health that are important for patients are unrecognised and underestimated in RA. These domains include, among others, fatigue,9 11 12 well-being,9 11 12 sleep patterns,9 work incapacity11 or return to normal life11 12 and independence (ie, being able to manage daily activities, such as personal hygiene).11

Thus, pain and functional capacity are well-recognised important PROs, whereas fatigue for example was poorly recognised before 2003.9 However, it is unknown how these different notions translate into practical use; ie, are these important PROs reported in recent publications?

The objective of this work was to assess the frequency of use of PROs in recently published trials of RA, through a systematic literature review.


This systematic literature review was conducted according to the Cochrane Collaboration guidelines.13 However, data were not pooled (no meta-analysis was performed) because the results are descriptive.

Search and selection process

To obtain all recently published articles reporting any type of PRO in RA, an extensive literature search was performed in PUBMED MEDLINE database on 12 February 2007. Publications were identified through a search that used the following exploded MeSH term: (“arthritis, rheumatoid” (MeSH)) with a limitation to “humans”, “all adults: 19+ years”, “English”, “published in the last 2 years” and “clinical trials”. Publications were limited to articles referenced in PUBMED in the last 2 years, to obtain an exact view of the status of recent research in the field of RA, as data regarding fatigue9 11 12 or other important PROs were published over the last 5 years. Inclusion criteria comprise articles reporting any type of clinical design, including patients with RA, and reporting patient-reported results. Articles were excluded if they did not concern RA, or if they did not focus on patient-based outcome measures (eg, articles reporting as main results laboratory outcomes, radiographic scores or genetic examination). Reviews, editorials and letters were also excluded because we were interested in obtaining an overview of the use of PROs in original research articles. The selection process was performed based on the titles and abstracts of the articles, then on full texts.

General data extraction

Publications were evaluated based on the full-text articles. The reviewers were not blinded to the journal name and the authors, as evidence concerning the effect of masking on assessments of trials is inconsistent.14 Publications were assessed with the use of a checklist of items developed by the two reviewers, UK and LG.

Data were obtained on year of publication, funding sources (public or private either clearly reported or extrapolated from authors’ affiliations), study design (randomised controlled trial (RCT), open-label trial, prospective cohort, retrospective study) and number of patients. Demographic data such as sex, mean age, mean disease duration, treatments under evaluation and maximum duration of follow-up were recorded for each report. The quality of the publications was determined by use of the Jadad scale (score 0–5) where a high score reflects high quality. The Jadad scale evaluates quality of randomisation, blinding, and description of withdrawals and dropouts.15 It is often used to describe clinical trial quality but is more particularly adapted for use in RCTs; it is used here as a descriptive measure of trial quality.

Patient-reported outcomes

All PRO measures were noted. Outcome measures that are not patient-reported, such as biological results (C-reactive protein, rheumatoid factor, anticyclic citrullinated peptide antibodies), or x-rays, were not assessed. If available, composite indices such as the disease activity score (DAS),16 ACR response criteria17 and EULAR response criteria18 were noted. These composite indices include PROs in the “core set” (pain, global assessment) and domains that are not patient-reported. However, if their results were only presented as global results (eg, ACR 20), the PROs included in the ACR criteria were not considered as reported.

Domains of health

The PROs were classified by the authors and according to published classifications19 into “domains”, ie, areas or dimensions of health, such as pain, functional ability or depression. Results are presented as frequency of reported domains and of each PRO in a given domain as this presentation better reflects the importance of a given domain (eg, fatigue) than listing the frequency of diverse methods used to report fatigue. To classify the tools into domains, an extensive literature search was performed.

Statistical analysis

Analysis was mainly descriptive, ie, frequency of use of a PRO. Comparisons of frequency of PROs according to study designs were performed, by the χ2 or Fisher test. Data analyses involved use of SPSS version 10.0.


Description of recent publications assessing patient-reported outcomes in rheumatoid arthritis

Of the 382 publications identified by the literature search, 109 were included in the analysis. The 273 publications excluded mainly focused on non-PROs (n = 194), or were not about the selected disease (n = 79) (fig 1). The characteristics of the publications are given in table 1 according to the study design.

Figure 1 Flow chart showing the selection of recent publications that reported patient-reported outcomes in RA. GI, gastrointestinal; RA, rheumatoid arthritis.
Table 1 Characteristic features of recent articles assessing patient-reported outcomes in rheumatoid arthritis

Of the 109 publications, 50 (45%) were RCTs, 24 (23%) were other therapeutic trials (eg, open-label trials), 28 (26%) were prospective cohorts, one (1%) had a retrospective design and six (5%) had another design. Trials with a Jadad score >2 represented 26.1% of all trials (n = 29). Patient characteristics were typical of RA populations (table 1). Follow-up was shorter in RCTs (mean, 40 weeks) than in other studies (mean, 91 weeks, p = 0.047). Financial support was frequently at least partly private but was not reported in 30.4% (n = 33). Most articles (n = 80, 73.9%) reported the results of the “core set”, which includes some PROs (pain, functional assessment, patient global assessment). The core set was reported through DAS-based scores in 40 articles (36.7%), through ACR-based scores in 19 (17.4%) and through both types of composite indices in 21 (19.3%). However, in three (2.7%) of these articles, the core set was only reported as a composite index response, with no detail of its components, and no other results of PRO. Thus these three articles did not give any individual results for any PRO and are only reported in table 1 (no results for these three articles in table 2).

Table 2 Domains and tools used to assess PROs in recent publications of RA

Patient-reported outcomes in recent publications

Sixty-three tools or measures of PROs were reported in the 109 articles. Mean PRO count per article was 4.9 (SD 2.1). Most of the tools were infrequently used, ie, in under 5% of articles. Tools are shown table 2, and have been classified into 14 domains: function, patient global assessment, pain, inflammation, quality of life, utility, fatigue, self-reported painful joint count, psychological status, coping, productivity losses, well-being, sleep disturbance and leisure.

Table 2 gives extensive results; domains most frequently reported will be explicited here.


The most frequent domain reported (n = 91, 83.4%) was functional assessment. The most frequent tool for this domain was the Health Assessment Questionnaire, (HAQ)20 (n = 67, 73.6% of 91). Physical function was also evaluated by Modified Health Assessment Questionnaire (M-HAQ) (n = 14, 15.3% of 91) which is a modified shorter version of the HAQ.21 HAQ was more frequently used in RCTs (p = 0.001) and in Jadad score >2 studies (p = 0.001). M-HAQ was more frequently used in non-RCTs (p = 0.011).

HAQ and M-HAQ are easy to administer self-questionnaires that comprise eight categories of functioning: dressing, rising, eating, walking, hygiene, reach, grip and usual activities.20 21

Patient global assessment

Patient global assessment was a frequently reported domain (69 articles, 63.3%). There was no difference in its report between RCTs and non-RCTs. It was assessed by the visual analogue scale (VAS), but the wording was not homogeneous between studies. Most articles measured patients’ opinion by patient global assessment, without precision (n = 53, 76.8% of 69). Other tools included general health VAS (n = 10, 14.5% of 69), patient’s assessment of disease severity (n = 1), overall status in RA28 (n = 2), categorical scale assessing patient global assessment (n = 1), overall VAS (n = 1) and arthritis impact VAS29 (n = 1). Patient global assessment was more frequently reported in RCTs.


Pain was frequently (n = 61, 55.9%) evaluated, most frequently by pain VAS (n = 52, 85.2% of 61). There was no difference in its report between RCTs and non-RCTs. Pain VAS was more frequently reported in RCTs.

Morning stiffness

Morning stiffness was frequently evaluated (n = 29, 26.6%). It was most frequently evaluated by morning stiffness duration (n = 28, 96.5% of 29).

Quality of life

One of the relatively frequently reported domains was quality of life (n = 21, 19.2%), which was mainly assessed by Short-Form Health Survey, SF-36 (n = 16, 76.1% of 21).33 There was no difference in its report between RCTs and non-RCTs. SF-36 contains 36 questions measuring health across eight different dimensions representing physical and psychological dimensions; physical functioning, role limitations due to physical health problems, bodily pain, vitality, social functioning, role limitations due to emotional problems, mental health and general health. It takes approximately 10 min for most patients to complete SF-36.34


One of the other relatively frequent domains was utility (n = 18, 16.5%), which was mainly assessed by Euro quality of life (EuroQOL) (n = 11, 61.1% of 18), both in RCTs and non-RCTs.

EuroQOL is a societal measure of utility and includes self-assessed problems across five items on mobility, self-care, usual activities, pain/discomfort and depression/anxiety.38 EuroQOL scores range from 0 to 1, where 0 represents full health and 1 death. On the general health status scale of EuroQOL, 0 represents the worst and 100 the best imaginable health state. Time to administer EuroQOL is 2–5 min.34


Fatigue was infrequently assessed (n = 15, 13.7%) though it was more often reported in studies that were not RCTs (n = 12) than in RCTs (n = 3) (p>0.05). Fatigue VAS was the most frequent tool (n = 11, 73.3% of 15). Other tools included HAQ fatigue score (n = 1, 6.6% of 15),40 functional assessment of chronic illness therapy (FACIT) (n = 1, 6.6% of 15),41 fatigue intensity (n = 1, 6.6% of 15) and fatigue effect (n = 1, 6.6% of 15).

Joint counts

Self-reported painful joint counts were infrequently used (n = 10, 9.1%). Self-reported painful joints were evaluated either through painful joint count (n = 6, 60% of 10) or by the RA disease activity index (RADAI (n = 4, 40% of 10)).42

Other domains

Other domains such as coping, sleep disturbance, productivity losses, psychological status, well-being and leisure were infrequently reported.


In this systemic literature review, the first overview of the use of PROs in recent articles concerning RA was obtained. Fourteen domains reflecting patient-relevant outcomes were found to be reported in articles of clinical research in RA, published in the last 2 years. The only domains that were reported in more than 25% of the articles were function, patient global assessment, pain and morning stiffness. On the other hand, domains such as fatigue, sleep disturbance, well-being, coping and psychological status were infrequently reported, though they appear important from the patient’s point of view.

The patient’s perspective in the assessment of RA is a broad area, which comprises PROs, defined here as outcomes that are completed by patients, but also patient-reported priorities, acceptable symptom state, or important improvement. These last items are not the object of the present study. A growing number of studies indicate discordance in opinion between patients and physicians. Outcomes important for patients were evaluated by several patient group discussions9 10 or patient focus groups.11 Physical and general outcomes that are part of the core set46 (ie, pain, disability, general assessment) were evaluated frequently as shown by the present study. On the other hand, although some PROs such as sleep patterns,9 “return to normal life”,11 12 “sexual dissatisfaction”,11 “fear of the future”11 and “independence”11 (ie, being able to manage daily activities such as personal hygiene) were reported as important for patients, none of the articles assessed here presented these outcome measures. One of the reasons probably is that there are no well-defined and validated tools to evaluate these domains (eg, independence, even though this item can partly be understood as self-care, which is an item in the Arthritis Impact Measurement Scale, AIMS2). However, another issue at stake is that PROs may not considered “important” by physicians.58 Aletaha et al found more patients would be considered in remission by patient-derived measures than by physician-derived measures, as physicians are less “stringent” when faced with patient-based outcomes.58 Finally, another aspect is that data regarding PROs are relatively recent (20039) and may not yet have been integrated into studies, even though we limited the present study to articles published between 2005 and 2007. Another potential bias explaining the low rates of report of PROs may be due to the discrepancies existing between the information collected during a study, and the information published in a manuscript. Usually there is a restriction for length of manuscripts, imposed by the publisher (and sometimes referees). Therefore, authors may choose to present the “mandatory” information (ie, the composite indices) and not the individual PRO. In favour of this explanation, we can note that the core set composite indices were reported in more than 70% of publications. However, fatigue, for example, is not part of the core set and was reported separately in less than 15% of articles.

Fatigue was poorly recognised before 2003.9 Since that date, several qualitative studies11 12 have pointed out the importance of fatigue for patients with RA; fatigue is intrusive and overwhelming in RA according to patients, and has consequences on all aspects of quality of life.11 12 It is informative but disappointing to note that only 13% of clinical studies reported fatigue results, as evidenced in the present literature review.

Sixty-three different tools or questionnaires were used to evaluate these 14 domains; however, 42 tools were used only once and only 10 tools were used in over 5% of articles. HAQ, pain VAS and patient global assessment by VAS were the most frequently used tools. Two different tools were used frequently for functional status (HAQ and modified HAQ), and for patient global assessment (different wordings of VAS). On the other hand, no tool was used frequently for coping, sleep disturbance, productivity losses, psychological status, well-being and leisure.

The separation of tools into domains in the present study may be an element of discussion. Self-reported painful joint count does not reflect patients’ pain level so we decided to separate self-reported painful joints from pain as a different domain. There were also some difficulties to categorise tools into domains. First, some domains have some overlap (eg, quality of life and utility) and some tools can be used to assess several domains (eg, SF-36 subscales). It should be noted that multidimensional instruments were presented under the domain “quality of life”, which is consistent with recent interpretation of the concept from the Food and Drug Administration.59 It may be difficult to relate this domain to single item PROs. Second, in most articles, outcome measures were defined as tools, not as domains. If article methods included domain definitions (eg, “functional status was measured by HAQ”) we registered function as the domain and HAQ as the tool. However, if domains were not clear and tools were more difficult to categorise (eg, “Grip Ability Test”, “Advanced Activities of Daily Living (ADL) Scale”) we searched the original article describing the tool from the reference list to ascertain which domain was concerned. This categorisation has some limitation because of the terminology. For instance, patient global assessment was evaluated frequently by patient global assessment VAS but also by general health VAS. In fact, these two terms may cover the same concepts but this has not been demonstrated. Some domains (eg, “patient pain score”) were not defined clearly in the articles and they did not refer to any original articles. By “patient pain score”, authors may mean pain VAS, but as this was not clear, we categorised “patient pain score” as a different tool in the pain domain. One possibility to help classify outcomes into domains is to use the International Classification of Functioning, Disability and Health (ICF) frame,60 though this frame does not take into account quality of life, for instance.

Limitations of the present study include the possible non-exhaustiveness regarding reports of PROs. Thus, the only database consulted for the search was PUBMED (and not EMBASE or PsycINFO, for example). However, PUBMED is the principal database for largely accessed articles; the results reported here concern these largely accessed articles. Similarly, only articles referenced in PUBMED during the last 2 years were analysed. This 2-year limit was chosen empirically, to obtain an exact view of the status of recent research in the field of RA, as data regarding fatigue9 11 12 or other important PROs were published over the last 5 years and may therefore have led to modifications in the reporting of PROs more recently (eg, as the first articles on fatigue appeared in 2003, it is unreasonable to suppose articles published before 2005 (therefore, reporting studies probably conducted in 2004 or even before) should report fatigue). Thus we believe the present article gives a correct overview regarding recent use of PROs, and may raise awareness of issues concerning PROs.

HAQ and modified HAQ were both frequently used to report function. To date there are no definitive data regarding the comparison of these two questionnaires.61 HAQ may be more sensitive to detecting change in the middle of the scale rather than at the ends of the scale, though data are conflicting.61

Several frequent domains found by our systemic review, ie, pain, patient global assessment and fatigue were assessed by VAS. Because of its simplicity, VAS is a useful method of assessment for PROs, in particular for use in composite indices or in questionnaires evaluating several domains. However, there are some limits to the use of VAS. Elderly persons, low-literacy populations and some cultural groups have difficulties conceptualising a VAS.62 In these cases, numerical rating scales may be a useful alternative.62 Giving the patient the opportunity to rate himself in comparison with a previous rating may also be helpful. An important element is that a VAS or numerical rating scale is very quick and easy to apply and can also be used as part of daily clinical practice. Pain VAS is considered as the gold standard to assess pain.62 Correlations between the VAS and a verbal descriptor scale are in the range of 0.70–0.75.62 Pain VAS has high test–retest reliability coefficient (r = 0.93) in literate patients. VAS is sensitive to change in drug and non-drug clinical trials.62 Global assessment by VAS may be one of the most discriminant criteria in disease-modifying drug trials.62 Some studies did not report pain assessment in the present overview: these were either non-pharmacological interventions or articles, which reported only ACR/EULAR core sets.

Fatigue VAS is a single-item scale. It measures the severity of fatigue over the past week with a specific question. However, the exact wording of the question and the anchors have not been perfectly determined yet.63 Fatigue VAS is simple and reproducible. Wolfe64 found that the single-item VAS (with a wording they proposed) performed as well as or better than longer scales in respect to sensitivity to change, and was at least as well correlated with clinical variables as longer scales.

In conclusion, although PROs are essential in the monitoring of RA, there are limitations to their assessment in the recent literature. These limitations include the selection of domains, which do not adequately reflect the patient’s perspective, and the great heterogeneity of use of specific tools or questionnaires.

Further work is needed to obtain a better insight into what is relevant to the patient, what the relevant discrepancies are between patient-assessed and physician-assessed outcomes, and finally how best to measure outcomes that are important for patients. This may lead, in the future, to a revision of the RA “Core Set”.


