OBJECTIVES To determine the validity of the histological-histochemical grading system (HHGS) for osteoarthritic (OA) articular cartilage.
METHODS Human articular cartilage was obtained from macroscopically normal (n = 13) and OA (n = 21) knee joints. Sections of central and peripheral regions of normal samples were produced. Sections of regions containing severe, moderate, and mild OA changes were produced from each OA sample. A total of 89 sections were graded by means of the HHGS (0–14) twice by three observers.
RESULTS Average scores for regions designated severe (8.64) and moderate (5.83) OA were less than the expected (10–14 and 6–9, respectively) according to the HHGS, whereas average scores for the region designated mild (5.29) OA and central and peripheral regions (2.19) of normal cartilage were higher than expected (2–5 and 0–1, respectively). The HHGS was capable of differentiating between articular cartilage from macroscopically normal and OA joints and between the region designated severe OA and other regions. However, the HHGS did not adequately differentiate between regions designated mild and moderate OA. Values for sensitivity, specificity, and efficiency for all regions varied considerably.
CONCLUSION The HHGS is valid for normal and severe OA cartilage, but does not permit distinction between mild and moderate OA changes in articular cartilage.
Statistics from Altmetric.com
The histopathology of osteoarthritis (OA) is generally graded using the histological-histochemical grading system (HHGS) proposed by Mankin et al 1 or a number of histopathological grading systems that are modifications of the original system.2-7 The HHGS was initially developed for the grading of OA in human articular cartilage1 and has been used extensively in human studies.3 6-24 More recently, the HHGS and modifications thereof have also become extensively used for the grading of articular cartilage from animal models of OA.2 4 5 25-29
We have previously investigated the intraobserver and interobserver reproducibilities of the HHGS based on human articular cartilage30 and concluded that they were inadequate. Similar values regarding the intraobserver and interobserver reproducibilities of the HHGS have been reported in a study on materials from an experimental animal model of OA.31 The results of our previous study also indicated that the validity of the HHGS was questionable. It is of paramount importance to determine if the HHGS is valid. A valid grading system with inadequate intraobserver and interobserver reproducibilities could and should be improved upon. However, a grading system that is not valid should be disregarded altogether and replaced by a completely new grading system.
To evaluate the validity of a histopathological grading system the results obtained when using the system should be compared with the results of a system already known to be valid. Unfortunately, a valid grading system capable of serving as “the gold standard” does not exist for OA articular cartilage at the histological level.
Therefore, the purpose of this study was to determine the validity of the original HHGS using a material that has been sampled in such a way that the macroscopic description of the articular cartilage may function as “the gold standard”.
Samples of macroscopically normal articular cartilage (Collins and McElligott grade 0)32 from the knee joint were obtained at the time of necropsy after sudden deaths. Three cases were women (median age 57 years, range 54–70) and 10 were men (median age 29 years, range 16–76). None of the subjects had a clinical history of inflammatory or non-inflammatory joint disease or chronic systemic inflammatory disease. Full thickness samples consisting of both articular cartilage and adjacent subchondral bone were taken across the medial or lateral femoral condyle or the medial or lateral tibial plateau in a central to peripheral fashion. Hence, each sample encompassed both central and peripheral regions of articular cartilage.
Samples of cartilage from the medial or lateral femoral condyle or the medial or lateral tibial plateau were obtained from 21 patients undergoing replacement surgery for OA of the knee. Eleven cases were women (median age 72 years, range 46–89) and 10 were men (median age 67 years, range 55–87). According to Collins and McElligott32 the overall grades of OA of the knee joints were III or IV. Full thickness samples consisting of both residual articular cartilage and adjacent subchondral bone included a region of denuded bone and a shoulder of residual cartilage ending in a region of macroscopically intact articular cartilage. The region of the sample containing the shoulder of residual cartilage thus represented the most advanced and severe OA cartilage changes next to the area of denuded bone; mild OA changes next to the region of macroscopically intact cartilage; and moderate OA changes in between the regions designated severe and mild OA changes (fig 1). Some femoral condyles and tibial plateaus contained peripherally sited osteophytes whose cartilage covering could be differentiated from the roughened, creamy-yellow residual articular cartilage. Care was taken not to include samples from these osteophytic areas.
Samples were fixed in 10% buffered formalin and decalcified in a 14% solution of EDTA (Sigma, St Louis, MO, United States) at 40°C. Decalcification was controlled by radiography to minimise time in EDTA. After decalcification, samples were processed into paraffin wax. Samples were cut in 5 μm thick sections and mounted on SuperFrost/Plus glass slides (Eire Scientific, Portsmouth, NH, United States). Two sections were cut from each of the samples of articular cartilage from macroscopically normal joints producing a total of 26 sections. Three sections were cut from each of the samples of residual articular cartilage from OA joints producing a total of 63 sections.
Sections were deparaffinised and stepwise incubated in 96–62% alcohol. The first staining was accomplished with Weigert’s acid iron chloride haematoxylin for six minutes. After washing with water, 1:5000 aqueous fast green was applied for three minutes followed by washing in 1% acetic acid and staining with 0.1% safranin-O in water for six minutes.33 The sections were dehydrated with alcohol and mounted in Eukitt (O Kindler, Freiburg, Germany). All sections were deparaffinised and stained in one batch.
The sections of macroscopically normal cartilage were covered by non-translucent tape so that only the central or the peripheral region of the articular cartilage of the femoral condyle or the tibial plateau was available for microscopic examination. Care was taken not to include the outermost periphery of the articular cartilage as this region is known to become more fibrocartilaginous in nature with little or no potential for safranin-O staining. In this way 13 matching pairs of sections with peripheral and central articular cartilage were produced. According to the HHGS (table 1) the expected score for normal articular cartilage should be 0 point. However, to allow for minor variations the expected range of scores for normal articular cartilage is generally set at 0–1 point.34
The sections of OA cartilage were also covered by non-translucent tape so that only the region designated severe, moderate, or mild OA changes, respectively, was available in each section for microscopic examination. Using the regions of denuded bone and macroscopically intact residual cartilage as fixed points, the shoulder of non-intact residual cartilage was divided into thirds representing the regions designated severe, moderate, and mild OA changes (fig 1) producing 21 matching triplets of sections. According to the HHGS (table 1) the expected ranges of scores for mild, moderate, and severe OA changes in articular cartilage are 2–5, 6–9, and 10–14 points, respectively.8 34
All sections were microscopically examined twice with an interval of at least one week by a specialist in rheumatology (A), a specialist in pathology without any special experience with articular cartilage (general pathologist) (B), and a specialist in pathology with a special interest in articular cartilage (osteoarticular pathologist) (C). All observers were experienced with the use of the HHGS and were the same observers participating in a previous study.30 However, to promote standardisation, each observer received both oral and written instruction from the same instructor regarding the HHGS before the examinations. The observers were blinded with respect to the macroscopic description, and the sections were presented in random order. The observers were asked to view and score each section with respect to the categories and subcategories shown in the original work by Mankin et al 1 (table 1). The observers noted the score for each of the four categories “structure”, “cells”, “safranin-O staining”, and “tidemark integrity”, and the total score for each section. All sections were examined with standard light microscopes (Orthoplan) at magnifications from × 100 to × 400.
The sensitivity, specificity, and efficiency of the HHGS were estimated as described by Collan35 and Paiket al.36 In brief, the macroscopic description and expected histological-histochemical score ranges served as the validating test (“the gold standard”) against which the results obtained when using the HHGS were compared. The sensitivity was calculated as the number of true positive scores divided by the numbers of true positive and false negative scores. The specificity was calculated as the number of true negative scores divided by the numbers of true negative and false positive scores. The efficiency was calculated as the number of true positive and true negative scores divided by the numbers of true positive, true negative, false positive, and false negative scores. The Wilcoxon signed ranked test and the Mann-Whitney U test were used to test the null hypothesis that there was no systematic difference between scores given to the central and peripheral regions of normal articular cartilage and regions of severe, moderate, and mild OA changes. Statistics used to analyse the reproducibility of a single measurement method and to compare measurements by more than one observer were based on graphical techniques and calculations as described by Bland and Altman37 and used in previous studies regarding the HHGS.30 31 Reproducibility refers to the percentage of agreement or consistency between multiple observations in regard to the same units of observation.36 TheF test was used to test the null hypothesis that there was no systematic difference between the intraobserver variabilities. The limit of significance was chosen as 0.05.
This study was approved by regional ethical committees.
The mean histological-histochemical score, the standard deviation, and the range of all three observers (including the total scores for both the first and second examination) for articular cartilage from macroscopically normal (central and peripheral regions) and OA (regions designated severe, moderate, and mild OA changes) joints are shown in table 2.
The scores for the region designated severe OA changes are significantly higher than the scores for the regions designated moderate and mild OA changes (Wilcoxon signed ranked test, p < 0.05) and the central and peripheral regions of normal articular cartilage (Mann-Whitney U test, p < 0.05). The scores for the regions designated moderate and mild OA changes are significantly higher than the scores for the central and peripheral regions of normal articular cartilage (Mann-Whitney U test, p < 0.05). The scores for the region designated moderate OA changes are not significantly different from the scores for the region of mild OA changes (Wilcoxon signed ranked test, p > 0.05). The scores for the central region are not significantly different from the scores for the peripheral region of normal articular cartilage (Wilcoxon signed ranked test, p > 0.05).
Table 3 shows the median histological-histochemical score, the interquartile range, and the range for each of the three observers (the total score for the first examination) for articular cartilage from macroscopically normal (central and peripheral regions) and OA (regions designated severe, moderate, and mild OA changes) joints. The range of scores for all three observers for the region designated severe OA changes is from 2 to 14. The ranges of scores for all three observers for the region designated normal is from 0 to 6.
Using the macroscopic description and corresponding expected scores as the validating result (“the gold standard”), the sensitivity, specificity, and efficiency of the HHGS are calculated based on the average score of all examinations (table 4). The specificities of the HHGS for the regions designated severe OA changes and normal are both 98%. The sensitivities for the same regions are 19% and 42%, respectively. The specificities and sensitivities of the HHGS for the regions designated moderate and mild OA changes range from 57% to 71%.
INTRAOBSERVER AND INTEROBSERVER REPRODUCIBILITIES
Table 5 shows the exact intraobserver and interobserver reproducibilities (no difference between the first and the second total score for each observer and between the first total score of one observer and the first total score of another observer), the intraobserver and interobserver reproducibilities within two score points, the average difference between the first and the second total score for each observer and between the first total score of one observer and the first total score of another observer, the standard deviation, the range, and the score points equivalent to 95% of differences for all observers.
Exact intraobserver reproducibility of the total score occurred despite the fact that in 19% of these cases the scores for the categories “structure”, “cells”, “safranin-O staining”, and “tidemark integrety” varied between the first and second examination. Exact interobserver reproducibility of the total score occurred despite the fact that in 36% of the cases the scores for the categories varied between observers.
The intraobserver variability of observer A was significantly larger than that of observer B (F test, p < 0.01), but not significantly different from that of observer C (F test, p > 0.05). The intraobserver variability of observer C was significantly larger than that of observer B (F test, p < 0.05).
The articular cartilage from macroscopically OA joints was obtained from joints graded III or IV according to the Collins and McElligott system32 and, hence, represented end stage disease. We have made the assumption that regions designated severe, moderate, and mild OA changes in this study are indeed representative of severe, moderate, and mild OA changes, respectively, and, hence, may be assigned an expected range of scores according to the HHGS. It is possible that the OA process in the regions designated severe, moderate, and mild OA changes of the residual cartilage of grade III and IV joints may be different from the OA process in the residual cartilage of grade I and II joints. However, we have favoured a material that could be sampled and further divided into regions in a reproducible fashion. In addition, we expect a histopathological grading system to be valid for a range of OA changes both in different joints and within a single joint.
The samples of articular cartilage were obtained from knee joints in this study whereas in the previous study30 the samples were all obtained from femoral heads. Furthermore, the median age was lower for the macroscopically normal group as compared with the macroscopically OA group as a necessary consequence of including strictly macroscopically normal joints. These differences in joint types and age groups should of course not influence the performance of a valid histopathological grading system.
Tissue fixation and decalcification could result in a decrease in proteoglycan content that would influence one aspect of the HHGS—that is, the category “safranin-O staining”. Formalin fixation and EDTA acid decalcification used in the processing of the tissues in our study seem to preserve proteoglycan content.38 Other fixation and decalcification techniques may, however, effect proteoglycan content and staining influencing the reproducibility of the HHGS score.
The results demonstrate that the HHGS is capable of differentiating between articular cartilage from macroscopically normal and OA joints. Furthermore, a total score of less than 2 points and a total score of 10 points and above carry a specificity of 98% for representing macroscopically normal articular cartilage and macroscopically severe OA articular cartilage, respectively. In other words, the false positive rate of a total score of less than 2 points and a total score of 10 points and above is only 2%.
However, the grading index ranging from 0 to 14 score points is not used as intended; the average score of articular cartilage from macroscopically normal joints is more than 2 and not 0–1 as expected according to the HHGS (resulting in a sensitivity for the central and peripheral regions of normal articular cartilage of only 42%). Similarly, the average score of sections of articular cartilage from macroscopically OA joints designated severe OA changes is less than 9 and not 10–14 as expected (resulting in a sensitivity for the region of severe OA changes of only 19%).
It is even more disturbing that the HHGS does not adequately differentiate between regions designated moderate and mild OA changes and that the values for sensitivity, specificity, and efficiency for the regions designated moderate and mild OA changes are unacceptably low. The real virtue of a histopathological grading system for OA articular cartilage is in its capacity to differentiate between normal articular cartilage and the range from mild to moderate OA changes. This is particularly relevant in experimental models of OA.31
Previous results indicated that the total scores of cartilage from normal femoral heads varied according to topography.30 In the present study, there were no statistically significant differences between scores given to central versus peripheral areas of articular cartilage from macroscopically normal knee joints. However, only central and peripheral sampling sites were used in this study whereas 12 sampling sites were selected from the femoral head in the previous study and, furthermore, in this study care was taken not to sample from the outermost periphery of articular cartilage. In contrast with the findings of the previous study,30 the category “tidemark integrity” was used by all observers in this study.
The intraobserver and interobserver reproducibilities of the HHGS are low along with a wide range of intraobserver and interobserver variations in total scores. Interestingly, the values regarding both intraobserver and interobserver reproducibilities are similar to those of the previous study30 even though the observers should now be more familiar with the HHGS. Therefore, increased training does not seem to improve the reproducibility of the system.
Reasons for the weak intraobserver and interobserver reproducibilities and the low sensitivity, specificity, and efficiency of the HHGS have previously been discussed.30 Requirements for a reliable histopathological grading system were also discussed. It was emphasised that a histopathological grading system must acknowledge that OA encompasses the opposing processes of degeneration/destruction and regeneration/repair. Assessment variables representative of each process would need to be included and weighted appropriately in a numerical grading system. To accomplish a valid weighting of histopathological assessment variables, detailed knowledge regarding both the temporal and the spatial interrelation between the assessment variables is required. This level of knowledge is not yet available and as OA is a multifactorial condition with a complex pathogenesis a simple linear progression is unlikely. Therefore, the development of a valid numerical score attached to any histopathological grading system for OA articular cartilage seems to be an unrealistic task for the time being.
However, recognising its weaknesses does not necessarily make the HHGS redundant. It remains useful as a system for systematic assessment of articular cartilage; the categories included in the system encompass highly relevant histological and histochemical variables. In view of the lack of validity of the HHGS in the scoring of mild to moderate OA along with the inadequate reproducibility of the system, we suggest that while the HHGS remains the standard in histopathological assessment of OA articular cartilage the assignment of scores is omitted and, hence, the system becomes purely descriptive.
Furthermore, to generate comparable and useful research data on articular cartilage in different laboratories, we suggest that careful standardisation of materials and methods is a necessity. In particular, sampling procedures need to be standardised. A macroscopic description of the status of the articular cartilage, bone, and synovial membrane of the source joint should be included; the Collins and McElligott system32 could be further developed and serve as a useful approach. In addition, the sampling sites for different source joints should be standardised according to topography to secure reproducibility.
In conclusion, numerical values generated by the HHGS are not adequately valid in grading of OA. A new approach is needed for the sampling, description, and grading of joint materials in general and articular cartilage in particular.
We gratefully acknowledge Professor H J Mankin, Orthopaedic Service, Massachusetts General Hospital, Massachusetts, USA, Professor D L Gardner, Department of Pathology, University of Edinburgh, Edinburgh, Scotland, Chief Physician V Andersen and Professor G Bendixen, Department of Medicine, National University Hospital/Rigshospitalet, Copenhagen, Denmark for their critical review of the manuscript and Associate Professor L Theil Skovgaard, Department of Biostatistics, Faculty of Health Sciences, the Panum Institute, University of Copenhagen, Copenhagen, Denmark for her helpful advice regarding statistical methods. The skilful technical assistance of V Weibull is highly appreciated. The Michaelsen Foundation, the Danish Rheumatism Association, the Danish Medical Research Council, the Danish Biotechnology Program, and The Arthritis and Rheumatism Council (United Kingdom) are acknowledged for financial support.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.