Machine-learning, MRI bone shape and important clinical outcomes in osteoarthritis: data from the Osteoarthritis Initiative

Objectives Osteoarthritis (OA) structural status is imperfectly classified using radiographic assessment. Statistical shape modelling (SSM), a form of machine-learning, provides precise quantification of a characteristic 3D OA bone shape. We aimed to determine the benefits of this novel measure of OA status for assessing risks of clinically important outcomes. Methods The study used 4796 individuals from the Osteoarthritis Initiative cohort. SSM-derived femur bone shape (B-score) was measured from all 9433 baseline knee MRIs. We examined the relationship between B-score, radiographic Kellgren-Lawrence grade (KLG) and current and future pain and function as well as total knee replacement (TKR) up to 8 years. Results B-score repeatability supported 40 discrete grades. KLG and B-score were both associated with risk of current and future pain, functional limitation and TKR; logistic regression curves were similar. However, each KLG included a wide range of B-scores. For example, for KLG3, risk of pain was 34.4 (95% CI 31.7 to 37.0)%, but B-scores within KLG3 knees ranged from 0 to 6; for B-score 0, risk was 17.0 (16.1 to 17.9)% while for B-score 6, it was 52.1 (48.8 to 55.4)%. For TKR, KLG3 risk was 15.3 (13.3 to 17.3)%; while B-score 0 had negligible risk, B-score 6 risk was 35.6 (31.8 to 39.6)%. Age, sex and body mass index had negligible effects on association between B-score and symptoms. Conclusions B-score provides reader-independent quantification using a single time-point, providing unambiguous OA status with defined clinical risks across the whole range of disease including pre-radiographic OA. B-score heralds a step-change in OA stratification for interventions and improved personalised assessment, analogous to the T-score in osteoporosis.

Proportions of knees recorded as KL grades 0, 1,2,3,4 for 20 bins of B-score. Note that measurement repeatability supports the use of 40 categories; we have used 20 here to ensure that outer bins contain sufficient numbers. Data are graphically represented in Supplementary Figure S3.  Error bars show 95% confidence intervals for each measure. Moderate or greater pain was defined as WOMAC pain ≥4 on the 10-unit scale (black points); severe pain as WOMAC pain ≥8 (grey points). Limits of Non-OA group B-scores are provided using a dotted line and greyed area.

Definition of variables and assessment of confounders
All data from the Osteoarthritis Initiative (OAI) that were utilised in this study are publicly available at https://data-archive.nimh.nih.gov/oai.
For the different outcomes assessed, the influence of covariates (both confounders and competing exposures) chosen a priori from previously established clinical relationships was evaluated. Given the large sample size, both the statistical significance and the size of the estimates were considered. The covariates considered and adjusted for in the regression models were age, sex, BMI, ethnicity, previous knee surgery, alignment, NSAID use and smoking status described in more detail below.
Covariates were coded as recorded by the OAI. Age was modelled as a continuous variable in years, sex was binary (male or female), BMI as a continuous variable in kg/m 2 . Ethnicity was categorised as White or Caucasian, Black or African-American, Asian, Other Non-white.
Previous knee surgery was modelled as a binary variable coded as zero if participant had no history of previous surgery and one if they reported any previous knee surgery. In the OAI previous knee surgery was defined as "history of knee surgery (including arthroscopy, ligament repair, and meniscectomy)". Alignment was measured using a goniometer and recorded in degrees which was modelled as a continuous variable in degrees. NSAID use was modelled as a binary variable (yes or no). The definition of NSAID use was any use of prescription or non-prescription NSAIDS (e.g., Ibuprofen, Diclofenac, Aspirin…) for joint pain or arthritis for more than half the days of the month in the past 30 days. Smoking status was modelled as a categorical variable with 3 levels (never, current and former).
The variables considered for the regression models were based on a priori relationships between the outcomes. For TKR for example, we considered clinically important risk factors such as age, gender, weight, and pain, which may influence the surgeon`s decision to BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) operate. We also considered whether health insurance could affect the outcome with participants potentially not offered a TKR for financial reasons; however, on exploration of the data we found that 98% of participants that had a TKR had some form of health insurance while 96% of those not having a TKR had insurance.

Tests for interactions
Interactions, including that for age were considered during an initial analysis, but as the differences between univariable and adjusted models showed that the odds ratios represented small effects after adjustment, a parsimonious model was chosen as the final model, excluding interactions.

Statistical Shape Modelling
Femur bones were automatically segmented from DESS-we images using active appearance models (AAMs), a type of SSM trained to search images, provided by Imorphics The construction of an AAM parameterises femur bone shape using principal component analysis. Each time that a femur bone shape is identified within an image using an AAM, the femur bone shape is returned as a set of principal components.

OA Vector
Using the principal components from the AAM, we calculated the mean shape from two populations: 1. The "Non-OA group", being the group of all knees with KLG0 radiograph reading at 0,1,2 and 4 years in the OAI (n=885), regardless of sex 2. The "OA group", being the group of all knees with KLG ≥2 at 0, 1, 2 and 4 years (n = 1,713), regardless of sex.
There is no risk of over-training any subsequent models using 2,597 knees, as the only information taken from these populations of knees was the mean shape of the two groups.
An "OA vector" was defined as the line passing through the mean shape of the Non-OA group shape, and the OA group (Supplementary Figure S4).

B-Score and sex
Each parameterized femur bone shape was projected orthogonally onto the OA vector to provide a distance along the OA vector. This distance was then normalised as follows: the origin (B-score of 0) was defined as the mean shape of the Non-OA Group for each sex.
Means were determined separately for males and females (although the OA vector is constructed using both sexes  Figure S5. Preparing entirely separate models for sex did not improve classification of OA vs Non-OA, sensitivity to change, and the logistic regression models for pain, function and TKA were indistinguishable from those using a vector containing all males and females (data not shown). As a result, a single vector combining the sexes was used for this study, with the origin corrected separately for males and females. Scale is defined as 1 standard deviation of the distribution of the Non-OA Group along the OA vector (with positive direction being toward the OA Group). A normal distribution of mean value 0 and a standard deviation of 1 is shown in each histogram using dotted line. Both males and females from the Non-OA group (confirmed KLG0 over 4-year period), are normally distributed along the OA vector, centered on 0 after correction for sex. BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s)