Objective: Correct application of the Kellgren and Lawrence (K&L) classification system is difficult due to inexact wording of the descriptors. We summarised different descriptions and searched for evidence on the impact of such variations on classification of knee osteoarthritis (OA) in epidemiological studies.
Methods: We searched Medline/Pubmed (1966 to August 2006) for studies of epidemiological cohorts that professed use of the original K&L scale (grades 0–4, with 0 being normal and 4 severe OA), and recorded their descriptions of the five grades. The descriptions were compared with each other and with the original description.
Results: We identified five different descriptions. In grade 2, often used as a cut-off to classify OA, one description replaced “definite osteophytes and possible narrowing of joint space” (K&L) by “definite osteophyte, unimpaired joint space”. Another description for grade 2 was “minimal osteophytes, possible narrowing, cysts, and sclerosis”. In some cohort studies, descriptions changed during follow-up. None of the included articles studied the impact of the use of different descriptions.
Conclusion: Major OA cohort studies disagree between each other and even among themselves on the definition and grading of disease according to the original K&L system. The impact of this disagreement warrants further study, but consensus urgently needs to be reached on a single valid and feasible classification system.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
The classification for osteoarthritis (OA) described by Kellgren and Lawrence (K&L) is the most widely used radiological classification to identify and grade OA. Kellgren and Lawrence defined OA in five grades (0, normal to 4, severe). The radiological signs found to be evidence for OA were combined to define a grading scale for severity. For the knee, important changes are: (a) formation of osteophytes on the joint margins or in ligamentous attachments, as on the tibial spines, (b) narrowing of joint space associated with sclerosis of subchondral bone, (c) cystic areas with sclerotic walls situated in the subchondral bone, and (d) altered shape of the bone ends.
The World Health Organization (WHO) adopted these criteria for the radiological classification of OA as the standard for epidemiological studies of this pathology.1 In the K&L article of 1957, no clarification was given as to how to interpret the grades;2 this clarification was given with the radiographs in the WHO atlas,1 in which the grades of eight joints, including the knee, were described.
Several published studies professed use of the original K&L criteria. However, the descriptions often differed from the descriptions published in the WHO atlas. Some studies have already pointed out the differences in written descriptions of the K&L criteria.3 They also noted that even Lawrence4 used other descriptions in a later article. Hart et al (1995) concluded that these different interpretations of the K&L criteria have led to problems in different classifications of OA in epidemiological studies.3 There is no complete overview of the different descriptions of the K&L criteria of knee OA, and the real impact of the alternative descriptions is not yet known.
This systematic appraisal provides information about how many different descriptions of the K&L criteria of knee OA are used, and what impact this might have on the classification of knee OA in epidemiological studies.
We were interested in epidemiological cohorts using the original K&L criteria. We performed a Medline/Pubmed (1966 to August 2006) search using the words: “Kellgren”, “Lawrence”, “knee” and “osteoarthritis” to find articles in English, German, Dutch, Danish, Swedish and Norwegian. We excluded articles that (1) did not use a epidemiological cohort, (2) explicitly mentioned the use of modified K&L criteria, (3) used the K&L criteria without referring to Kellgren2 or the WHO atlas (Kellgren),1 and (4) described less than five grades.
The extracted descriptions and the original description of the K&L criteria1 are listed in table 1. When the (alternative) description remained the same we refer to the most recent publication of each epidemiological cohort. Finally, we summarise the results of the studies on the impact of using different descriptions if available.
The search resulted in 190 articles. A total of 44 articles gave a detailed description of the K&L criteria; of these, 18 studies used an epidemiological cohort.
We found five different descriptions (table 1). The Baltimore Longitudinal Study of Aging (BLSA),5 and the Southeast Michigan Cohort (SMC)6 used the original criteria.1 The Johnston County Osteoarthritis Project (JCOP, two articles),7 one other epidemiological study,8 and two out of three published studies of the Chingford Study (CS, two articles)9 used description A, which was the same as the description Lawrence gave in 1977.4 A third Chingford article10 used a description that corresponded most with description B, which was used in the Beijing Study (BS, one acticle)11 and the Framingham Osteoarthritis study.11 Description C was used in the Clearwater Osteoarthritis Study (COS, three articles),12 the Mechanical factors of Arthritis of the Knee study (MAK, three articles)13 and also in another article of the Framingham Osteoarthritis Study (FOS, one article).14 The last article,15 described another alternative description (description D) of the K&L criteria.
Between all descriptions, differences are seen; some descriptions are the same in one grade, but differ in the other grades. The most important grade in the K&L criteria is grade 2; this is the cut-off point for having definite OA according to K&L.2 The original description and descriptions B and C are the same for grade 2. The description of grade 2 of the CS and the JCOP (description A) describe “unimpaired joint space”, which is different from the original with possible joint space narrowing. Description D of grade 2 does not describe definite osteophytes.
None of the included articles reported the impact of alternative descriptions of K&L criteria.
Despite 50 years of use of the K&L scale, OA investigators still disagree on the optimum format (in terms of validity and feasibility) to classify and grade OA. Despite the advent of newer imaging technologies such as MRI, radiological classification will probably remain the diagnostic gold standard for knee OA in large epidemiological studies for many years to come. Therefore, agreement on the descriptions is urgently needed.
We limited ourselves to the five descriptions used in epidemiological cohorts. However, several other descriptions were found in studies other than epidemiological cohorts. All included articles referred to the original descriptions,1 2 so we assume these K&L descriptions were not intended to be modified, yet they differed from the original descriptions. Also, we ignored the many mentioned revisions of K&L, as these incidences merit a separate study.
The use of different descriptions within one cohort, FOS and CS, is remarkable. However, it cannot be excluded that only the reported description in the article changed but not the actual reading.
Another important issue is the knee position in which the radiographs are obtained. Only the most recent studies mentioned semi-flexed knees, which can give a different K&L grading than straight knees. The K&L classification system is not tuned on the position of the knee.
The descriptions in the first article by Kellgren and Lawrence (1957)2 described osteophytes on the tibial spines; this is only mentioned by Scott (1993)5 although not for their K&L grading. It is unknown if this is taken into account by the other studies. There is low evidence that these osteophytes are important for osteoarthritis, but it is a point for further investigation.
One article excluded by our criteria because it only described the K&L grade 2 used in their study was, however, the only article reporting on the impact of the use of different descriptions.16 Felson et al (1995) investigated a modified scale that permits knees with isolated joint space narrowing as having possible OA (modified K&L grade 2) as well as the original description; definite osteophyte with possible joint space narrowing (K&L grade 2).16 One radiologist scored 50 radiographs with both scales. The agreement using the different scales was good (κ = 0.76, p<0.001), but no knees were graded as 2 with solely joint space narrowing. The study of Felson et al (1995) was performed from the perspective that the lack of focus on joint space narrowing in the K&L criteria is seen as a flaw;3 however, they could not find support for this in their own study.16 The lack of focus on joint space narrowing in the K&L criteria might also be the reason why many different descriptions have been introduced.
Based on the latter study,16 there is no evidence that alternative K&L descriptions change the diagnosis of knee OA. This is certainly not to be read as evidence that alternative K&L descriptions do not change the diagnosis. On the contrary, such alternatives might have major importance for our interpretation of study results. For instance, Schouten et al (1995) showed that the American College of Rheumatology (ACR) clinical criteria for knee OA in the traditional format (fulfilling three out of six criteria) yielded very different associations with age, obesity and meniscectomy than the same criteria in the decision tree format (the same criteria ordered to importance in an algorithm) (odds ratios 1.4 vs 4.6, 1.6 vs 3.5, and 3.9 vs 6.6, respectively).17
To establish whether the alternative descriptions of the K&L criteria cause a change in prevalence and cause major differences in associations with known risk factors as reported above, all large cohort studies should score lesions of OA (eg, narrowing, osteophytes, sclerosis, cysts and deformity) as separate entities, as the guidelines recommend. Reliability and feasibility of the separate feature scores and the descriptions should be documented. Subsequently, investigators should correlate the separate lesions with the different descriptions of the K&L criteria and then compare the influence of the different descriptions on prevalence and associations with known risk factors. A consensus process is probably necessary to create one optimum classification score. This process should contain also points as an atlas to use for separate lesion scoring, position of the knee, taking into account osteophytes on tibial spines. Although mostly associated with outcome, the Outcome Measures In Rheumatology (OMERACT) initiative may be an appropriate place for such a process.
Competing interests: None declared.