Adaptation and cross-cultural validation of the rheumatoid arthritis work instability scale (RA–WIS)
  1. G Gilworth1,
  2. P Emery1,
  3. L Gossec2,
  4. T P M Vliet Vlieland3,
  5. F C Breedveld3,
  6. A J Hueber4,
  7. G Schett4,
  8. A Tennant1
  1. 1
    Department of Musculoskeletal and Rehabilitation Medicine, (Faculty of Medicine and Health), University of Leeds, Leeds, UK
  2. 2
    Paris Descartes University, Medicine Faculty ; UPRES-EA 4058; APHP, Rheumatology B Department, Cochin Hospital, Paris, France
  3. 3
    Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
  4. 4
    Department of Internal Medicine 3, University of Erlangen-Nuremberg, Nuremberg, Germany
  Correspondence to Mrs G Gilworth, Department of Musculoskeletal and Rehabilitation Medicine, University of Leeds, D Floor, Martin Wing, Leeds General Infirmary, Leeds LS1 3EX, UK; gilworths{at}


Background: Despite recent advances, work disability in rheumatoid arthritis (RA) remains common. Work disability is frequently preceded by a period of work instability characterised by a mismatch between an individual’s functional abilities and job demands. This could raise the risk of work disability if not resolved. A work instability scale for RA (RA–WIS) has previously been developed to screen for this risk. The objective of this study was the adaptation of this scale into French, Dutch and German.

Method: Different language versions of the RA–WIS were produced through a process of forward and back translations. The new scales were tested for face validity. English data from the original developmental study was pooled with data generated through postal surveys in each country. The internal construct and cross-cultural validity of the new scales were assessed using Rasch analysis, including differential item functioning (DIF) by culture.

Results: The pooled data showed good fit to the Rasch model and demonstrated strict unidimensionality. DIF was found to be present for six items, but these appeared both to cancel out at the test level and have only a marginal effect on the test score itself.

Conclusions: The RA–WIS was shown to be robust to adaptation into different languages. Data fitted Rasch model expectations and strict tests of unidimensionality. This project and the continuing work on further cross-cultural adaptations have the potential to help ensure clinicians across Europe are able to support RA patients to achieve their potential in work through early identification of those most at risk.

Despite recent advances in the medical management of rheumatoid arthritis (RA) work disability remains a common outcome, with many patients losing their jobs soon after diagnosis.1 Many studies have considered the factors relating to work disability2 3 but few have considered the impact of RA on those who are in work. Work disability in RA is frequently preceded by a period of work instability, a state in which the consequences of a mismatch between an individual’s functional abilities and the demands of their job could threaten continuing employment if not resolved.4 The challenge to clinicians is to identify people at risk of problems with work performance early (for example reduced productivity or poor attendance at work) so that appropriate interventions can be targeted to facilitate job retention.

The development of the original English language version of the RA work instability scale (RA–WIS) explored the concept of work instability in RA through qualitative interviews with patients following a diagnosis of RA and examined the factors relevant to them while still in work. Concentrating on the mismatch between functional ability and the demands of work, the objective was to generate items for a work instability scale, a screening tool for potential work loss. Through Rasch analysis5 and validation against a gold standard of expert vocational assessment, a self-administered RA–WIS was developed. The scale has 23 items (statements about the impact of the disease on working), derived from the patient interviews, each of which is affirmed as true or false. An example of the items are given in fig 1. This scale is scored by summing the 23 items, giving a range of 0–23, and this is grouped into three bands indicating low, medium and high risk. Given an internal consistency reliability of 0.93, the RA–WIS can be used as a screening tool at the individual patient level (which was the original intended use). It also has potential for an outcome measure in intervention studies.

Figure 1

A selection of items from the rheumatoid arthritis work instability scale (RA–WIS).

With the increasing number of multinational research projects across Europe there is recognition of the need to adapt measures for use in other than the source language.6 7 8 Reaching equivalence between the original version of a questionnaire and new target language(s) requires attention not only to accurate linguistics, but also cultural considerations to maintain the content validity of the measure at a conceptual level across different languages. The aim of this project was to develop and preliminarily validate French, Dutch and German language versions of the RA–WIS.

Classic approaches to scale development, including tests of reliability, validity and responsiveness, are well established.9 For internal construct validity, that is the internal scaling properties of an instrument, approaches have included factor analysis and, often inappropriately, reliance on internal consistency reliability in the form of Cronbach’s alpha.10 More recently, modern psychometric approaches have been adopted to provide a more robust interpretation of internal construct validity, and the most widely applied of these approaches is that of the Rasch measurement model.11

In this paper we report the translation process of the RA–WIS, and the evaluation of its internal construct and cross-cultural validity in four European languages using Rasch analysis.


The methodology used was based on published guidelines for the process of cross-cultural adaptation of self-report measures.12 The process of cross-cultural adaptation aims to produce equivalency of content between the source language (in this project English) and the target languages. The term “cross-cultural adaptation” encompasses a process that looks both at language and cultural adaptation issues in preparing the new version of the scale.

The same methodology was used for each new language version of the scale. The adaptation has seven stages, which are reported in sequential order.

1 Initial forward translations

In each case two translations of the RA–WIS (including instructions to users) were made from the English language version by bilingual translators, native to the target language. For each new language version of the scale, one translator was a rheumatology doctor aware of the concepts of work instability in RA, regularly in contact with RA patients and was therefore an “informed” translator. The other translator did not have a medical background. The questionnaire items and instructions were both translated independently without any discussion between translators initially.

2 Synthesis of the translations

The two translations were compared and any discrepancies resolved in a discussion between the translators. In each case this discussion also included the project leader from the original developmental study of the English RA–WIS who completed the majority of the qualitative interviews on which the scale is founded.

3 Back translations

Working from the new language version of the questionnaire produced in stage 2 and blind to the original English language version of the RA–WIS, two further translators translated the new scale back into English. For this stage of the study both the translators were “lay”, that is with no clinical background.

4 Expert committee to consolidate all versions of the questionnaire

A committee composed of the project leader and all four translators reviewed all the translations and following discussion reached consensus on the final wording to be used for each new language version of the RA–WIS.

The translation process outlined in stages 1–4 above involved men and women with a spread of ages in each country.

5 Field testing of the adapted version of the questionnaire

Sample: The target population for stages 5 and 6 were confirmed RA patients either in work or temporarily on sick leave (less than 6 months), aged between 18 and 60 years.

The purpose of this stage was to test the face and content validity of the new version of the scale. Participants at this stage were asked to complete the new language version of the RA–WIS and comment on the clarity and wording of both the instructions and the items on the scale.

6 Postal survey

A postal survey was used to generate data to test the new language versions of the RA–WIS using Rasch analysis (see below). The survey form included the new language version of the RA–WIS, demographic data, disease duration and the question as to whether they were having a good, average or bad day.

7 Rasch analysis

Detailed descriptions of the process of Rasch analysis are given elsewhere.11 13 As the model defines measurement, the task is to ascertain if the data meet model expectations. Statistics indicating fit to the model test how far the observed data match that expected by the model. A variety of fit statistics are considered, both at the scale and individual item level. The values consistent with fit to model expectations are given in detail elsewhere,11 13 and are summarised on the bottom row of the table reporting fit (table 1).

Table 1

Results of Rasch analysis

Within the framework of Rasch measurement, the scale should also work in the same way irrespective of which group (eg, gender or country) is being assessed.14 For example, in the case of measuring work instability, French, Dutch and German respondents should have the same probability of affirming an item, at the same level of work instability. The probability is thus conditioned on the trait. If for some reason respondents from one country did not display the same probability of affirming the item, then this item would be deemed to display differential item functioning (DIF), and runs the risk of biasing results should data be pooled and comparison made between countries. However, adjustments for such bias can be made in most circumstances.15 Essentially, biased items are rendered unique for the group concerned. So, for example, an item may be made into two separate items, one for French and one for German, allowing item difficulty to vary by culture.

Strict tests of unidimensionality are undertaken at every stage of analysis following the work of Smith,16 17 and the acceptable range for this test is again given at the foot of the fit table (table 1).

An estimate of the internal consistency reliability of the scale is also available, based on the person separation index in which the estimates on the logit scale for each person are used to calculate reliability. This is equivalent to Cronbach’s alpha.


Stages 1–4 The translation process

The use of forward and back translations by different people ensured confidence that the overall meaning of individual items on the scale were maintained. For the vast majority of the items the process went well, although in all three languages there were some words and phrases in which literal translation was not possible. For example, there is not a direct equivalent for the word “flare” in French. An alternative expression was found in French that patients frequently use to convey the same meaning.

In Dutch many words have more than one meaning depending on the context. In addition, there were some words in which a number of possible translation options were discussed during the final talks to consolidate the versions of the scale. For example, item 20 “Ik ben bang dat ik mijn werk zal moeten opgeven” (I feel I may have to give up work) the context is of being afraid of having to give up work so the choice of translation in the Dutch version reflects this.

In a few cases, although there was good correlation between the initial item and the translated item, there was consensus that it was not in typical “lay persons” language, that is not a phrase that sounded natural. One of the strengths of the RA–WIS is excellent face validity due to the fact the scale was developed from qualitative interviews with RA patients, but this does mean some of the phrases are colloquial in nature. For example the item “When I’m feeling tired all the time work’s a grind” was well understood by the translators because of their English language skills. However, an exact translation into French, for example, was not possible. The final version agreed for the French language version of this item “Comme je suis fatigué(e) tout le temps, le travail me semble pénible” would translate directly as “Because I am tired all the time work is difficult for me”. For the German version of the scale the same item generated discussion, with a number of options being considered before agreement on “Wenn ich ständig müde bin, ist die Arbeit eine unendliche Qual.”

Some cultural similarities were confirmed during the translation process, for example in relation to the item “I have used my holiday so that I don’t have to go sick” the fact that people worry about their work attendance record in the other countries in this project in the same way as they might in England.

The final panel discussion involving participants from both medical and lay backgrounds gave an opportunity to refine the final language versions of the scales, in some cases favouring versions that were felt to be closer to what patients with RA might say in their language over an exact word-for-word translation from the English to the target language.

Stage 5 Field testing for face validity

The questionnaires were largely deemed understandable and patients commented that the RA–WIS touched on notions that were important for them. As might be expected, the parts of the RA–WIS that raised the most difficulties were those corresponding to colloquial expressions. However, all in all the questionnaire was easily understood and there were no modifications of phrasing at this stage.

Stage 6 Postal survey

Data on RA patients were provided from the three countries following the postal survey. The data from the French (75 cases), mean age 46 years, 79% female; Dutch (85 cases), mean age 45 years, 73% female; and German (73 cases) mean age 43 years, 72% female were added to a randomly selected matching set from the original developmental study4 to give a total of 306 cases; thus each country had approximately 25% of the cases.

Stage 7 Rasch analysis

The pooled data from the four countries showed good fit to Rasch model expectations and demonstrated strict unidimensionality (analysis 1, table 1). Likewise, each individual country showed good fit to the Rasch model with the exception of France (analysis 5, table 1), in which the χ2 interaction fit statistic indicated some lack of invariance. One item “Au travail, j’ai des jours avec et des jours sans”, (I get good days and bad days at work), showed significant (Bonferroni adjusted) misfit to model expectation. Subsequent to the removal of this item, fit to model expectation was good (analysis 6, table 1).

Despite good fit to model expectation in the pooled data (which included the misfitting French item), DIF by country was found to be present for six of the items. For example, the item “The stress of my job makes my condition flare” shows clear bias across countries (fig 2). The UK and Dutch responses differ from those of the French and German responses, the latter having a higher expected response on this item at any level of work instability. Another item “When I’m feeling tired all the time, work’s a grind” showed a difference between the Dutch responses and the other languages (fig 3).

Figure 2

Differential item functioning for the item “The stress of my job makes my condition flare”.

Figure 3

Differential item functioning for the item “When I’m feeling tired all the time, work’s a grind”.

To ensure that each item was showing true DIF, and not compensating for DIF in other items,18 the “pure set” was constructed and each individual biased item added to the pure set in turn to see if DIF remained. All six items continued to show DIF. To test if DIF then cancelled out, subtests were constructed from the pure set of items and the biased set. This analysis also showed good fit to model expectations (analysis 7, table 1) and DIF was now found to be absent, suggesting that cross-cultural DIF, although present, cancelled out at the test level in pooled data. To test the effect of this further, individual estimates were made from the full set of items (including the biased items) and the pure set of items, and the magnitude of these were compared. Just 7.9% of estimates showed a magnitude of difference greater than 0.5 logits.

Throughout the analyses the level of reliability, as expressed by the person separation index, remained consistently high, consistent with individual use. Although a floor effect was evident (fig 4), this is not necessarily a problem for a screening tool designed to identify increasing levels of risk. The RA–WIS has two cut points (at 10 and 17) to identify the transition from low to medium and high risk, and these are located well within the central part of the scale, at −0.27 and 1.23 logits, respectively.

Figure 4

Targeting of scale, showing distribution of persons and items on the same metric.


Many people with RA face a potentially devastating change in lifestyle if the disease is not controlled during the few months following onset. With the emergence of new treatments, particularly the effective but expensive tumour necrosis factor blocker therapies, the risk of job loss in RA may diminish. However, in order to evaluate the overall cost-effectiveness of these therapies a reliable measure sensitive to changes in work instability is essential. The English language version of the RA–WIS is the first questionnaire of its kind that is disease specific and developed to measure work instability. There are currently no equivalent questionnaires available in other languages that are able to assess the impact of this relatively common condition on the working lives of patients, and identify those most at risk of job retention problems.

The RA–WIS has been shown to be robust to adaptation into different languages. Generally, data fitted Rasch model expectations and strict tests of unidimensionality. DIF was found to be present for six items, but these appeared both to cancel out at the test level in the pooled data, and have only a marginal effect on the test score itself. Of some concern is the item “I get good days and bad days at work”, which showed a distinct lack of fit to model expectations in the French data. It is possible that this item was being considered in a context other than the effects of RA in the French setting, and this item will need to be reviewed.

There are a number of limitations to this study. The adaptation into different languages followed a standard protocol used for many scales,12 yet it could be argued that validity of the adapted versions could have been further enhanced by undertaking additional qualitative interviews in each country. It is possible that there are country-specific nuances to work instability that could have been identified under such an approach.

The identification of item bias (DIF) in the current study was kept to a simple level, and six items were identified as such, although they cancelled out at the test level. This means that whereas one item may have been more likely to be endorsed by the Dutch respondents, another would have been more likely to be endorsed by the French, and so on, so that the overall effects were neutral and had little effect upon person estimates. More detailed analysis of the potential causes of such bias were not included in the current study and, in some respects, culture was being used as a crude surrogate for a potential wide variety of influences, for example socioeconomic status and educational level, and so on. These are a matter for further investigation.

Likewise, although the validity of the scale for screening purposes was originally determined by criterion validity,4 and its reliability in the original and adapted versions has been shown to be consistent with individual use, further work will need to be undertaken to access issues such as predictive validity. While job loss is a concrete outcome, this usually takes several years and is beyond the scope of most studies. Studies to examine the predictive validity of the scale for sickness absence would provide a meaningful interim test.

The cross-cultural adaptation of the RA–WIS into the three European languages, as outlined in this report and other similar work that is continuing on other language versions, has the potential to help ensure that clinicians across Europe are able to support RA patients to achieve their potential in work through the early identification of those most at risk. Using the RA–WIS as a screening tool will facilitate appropriate referral for targeted therapy, and/or support such as ergonomic changes in the workplace to help those individuals most at risk cope better at work. As a research tool it may provide studies with a way of measuring this very important aspect of RA-related disability. This will be particularly important in studies that seek to introduce the major indirect costs associated with loss of work into the cost-effectiveness calculation of the tumour necrosis factor therapies. As further language versions of the RA–WIS are developed it will be possible to measure the impact of RA on the working lives of patients in a comparable manner in multinational trials or screening programmes across Europe.


The authors would like to thank the Standing Committee for Allied Health at EULAR for their funding of this project, all the translators and the following colleagues for their help and support of the work reported: Pr M Dougados, Cochin Hospital, Paris, France; Mrs H Breedveld, Department of Rheumatology, Leiden University Medical Center and Jochen Wacker University of Erlangen.



  • Funding The project was funded by the Standing Committee for Allied Health at EULAR.

  • Competing interests None.

