We conducted a systematic review of genetic association studies for osteoarthritis of the peripheral joints (OA) and spinal degenerative disease (SDD). Electronic searches were carried out for any English language article reporting on a gene association study for either OA or SDD published up until the end of 2006. A team of seven reviewers used a standardised template to extract data in duplicate. In all, 90 studies fulfilled our inclusion criteria, reporting a total of 94 significant associations from 83 different genes. We found relatively few instances in which a specific gene–disease association had been analysed by more than one study, and there were 14 cases in which significant associations were replicated in independent studies (at joints associated with the AGC1, ASPN, COL9A2, COL9A3, COL11A2, ESR1, FZRB, HFE, IL1A, IL1RN, PTGS2 and VDR genes). Methodological and reporting problems were widespread, including failure to report full results, missing population details, multiple testing, and over-reliance on subgroup analysis. In summary, the complex phenotypes of OA and SDD may have made it difficult for researchers to focus their efforts. The field is dominated by isolated analyses of disparate potential associations, a problem that is amplified by the frequent analysis of different polymorphisms within individual genes. Flaws in study methodology and interpretation undoubtedly increase the risk of publication bias. Closer adherence to published recommendations (in particular those produced by HuGENet) will help to ensure that future studies are well-designed and build on current understanding, rather than simply adding to the growing bank of potential associations.
Statistics from Altmetric.com
Twin and family studies demonstrate that between a half and two-thirds of the occurrences of osteoarthritis (OA) can be attributed to genetic factors.1 This finding has justifiably stimulated the search for specific genes,2–5 but the task is formidable, as has proven to be the case for other types of complex disease. Genetic factors may vary with the pattern and severity of disease and according to patient characteristics, such as gender and age. OA can also have a range of clinical and radiographic definitions, which further increases the difficulty. For example, while peripheral joint OA and spinal degenerative disease (SDD) share pathological and clinical features, the extent to which the aetiological determinants of these disease processes are similar is not known.
Genetic association studies provide a means of quantifying the effects of specific gene variants on disease occurrence, but interpreting such data can be problematic. Sample sizes, sample selection, population stratification effects, and differences in the pattern of confounding may all have an impact.6 Over the last two decades a large number of gene–OA association studies have been conducted, and it is increasingly recognised that an objective appraisal of such studies can only be approached through systematic review. To date the consistency of the results of all published studies of OA and SDD have not been evaluated in this way.
In this review, therefore, we aimed to retrieve and review all gene–OA and gene–SDD association studies that have been published to date in peer-reviewed, English language journals. Our first objective was to carry out a broad ranging systematic review of genetic association studies. We aimed to document the number of genes that have been screened and to assess the degree to which analyses of individual genes have been replicated by more than one study. Our second objective was to describe common methodological and reporting problems encountered within gene–OA/SDD association studies and to assess the frequency with which these problems occurred. Thus, we aim to help improve the way in which future studies are conducted in this rapidly expanding field.
LITERATURE REVIEW METHODOLOGY
Inclusion and exclusion criteria
A deliberately broad search strategy was used in which gene–OA and gene–SDD association studies of any design, dealing with any gene, were judged suitable for inclusion. We aimed to review associations on a per gene basis: included studies could therefore test for association with regard to one or more genetic polymorphism(s) within a particular gene. We excluded conference abstracts, review papers and non-English language papers.
Electronic searches were performed on Medline and Embase (using the Ovid platform, beginning with 1966 for Medline, and 1980 for Embase) and on the Science Citation Index (SCI; Thompson Web of Knowledge), for all studies matching our search criteria published up until the end of 2006. The general form of the search for Medline is given below. The terms “or” and “and” indicate the standard Boolean operators; $ is an Ovid truncation symbol. Lower case terms indicate free-text searches applied to all text fields (title, abstract, keywords, etc). Capitalised words or phrases (eg, Osteoarthritis) indicate MeSH (Medical Subject Heading) terms, which were expanded to include index subheadings (including all available subcategories of disease). The Medline search was adapted for Embase (in which some MeSH terms differ from Medline) and for the SCI. The search was designed to be as broad as possible. We chose to identify a larger number of studies with the initial search and then exclude irrelevant studies manually, rather than risking using an excessively specific (and exclusive) strategy. In addition, gene–disease association studies are sometimes performed alongside other types of analysis (particularly linkage analysis), making a search for related types of study a useful means of identifying smaller association studies.
Search terms and structure
osteoarthriti$ or osteoarthrosis or osteoarthrotic or spondylosis or osteophytosis
(degeneration or degenerative) adj6 (spine or spinal or disc or discs or prolapse)
Osteoarthritis or Osteoarthritis (Hip) or Osteoarthritis (Knee) or Spinal Osteophytosis
1 or 2 or 3
gene or genes or genetic$ or genic or geno$ or chromosom$ or allel$ or homozygote or heterozygote or polymorph$ or linkage$ or heritab$ or inherit$ or mutat$ or hered$
Heredity or Gene Frequency or Genotype or Gene Dosage or Genetic Predisposition to Disease or Haplotypes or Heterozygote or Homozygote or Inheritance Patterns or Genes (Dominant) or Genes (Recessive) or Multifactorial Inheritance or Quantitative Trait (Heritable) or Linkage (Genetics) or Linkage Disequilibrium or Lod score or Phenotype or Genetic Markers or Penetrance or Variation (Genetics) or Genetic Heterogeneity or Mutation or Polymorphism (Genetic) or Polymorphism (Restriction Fragment Length) or Polymorphism (Single Nucleotide) or Chromosomes or Genome or Genes or Alleles or Major Histocompatibility Complex or Quantitative Trait Loci
5 or 6
4 and 7
Limit the above to human studies
The records for each article were checked individually using the title and abstract, and those not dealing specifically with a gene–OA or gene–SDD association were excluded. This initial screen was carried out twice, once each by two members of the review team, to serve as a double check. If it was not possible to exclude articles at the screening stage (for example, if insufficient details were given in the abstract to enable us to judge content), paper or electronic copies were retrieved and re-checked to determine whether or not any further articles should be excluded (also with double-checking within the review team).
We used a simple form to extract the following data from the included studies:
Type of disease and sites affected. We assigned joint categories on the basis of the primary patient diagnosis of OA (ie, hip, knee or hand) or SDD.
Method of diagnosis (eg, radiographic).
Total sample size and number of cases and controls. A “small” sample size was judged to be <150 patients in total (eg, less than 75 cases and 75 controls combined).7
Proportion of cases male/female.
Origin of case/control populations and recruitment setting (eg, hospital, with geographic region).
Gene(s) analysed, with type(s) and number(s) of polymorphism(s) (eg, vitamin D receptor, RFLP).
Number of alleles per polymorphism (per gene).
Details of associations and non-significant associations reported.
Whether or not we judged that a correction for multiple testing may be appropriate, and if so whether or not one had been applied.
Whether or not any subgroup analyses were used and in what context.
Missing details: ethnicity, gender, age and details of the recruitment process (eg, origin of the control population not given).
Data extraction was carried out in duplicate by two reviewers. Studies were categorised on the basis of sample size following a recent meta-analysis of gene–disease association studies,7 which found that discrepancies between the results of first and subsequent studies of a particular association were more common when the total sample size of the first study was <150 (“small”) than when it was ⩾150. We judged that correction for multiple testing may be appropriate whenever more than one genetic locus was tested for association, except where the authors put forward an explicit argument as to why correction was considered unnecessary (we made no judgement concerning the validity of any such arguments). This criterion was applied whether the loci under consideration were polymorphic sites within separate genes (ie, a multiple gene study) or discrete polymorphisms within the same gene.
To simplify the results, we grouped together studies that focused on different polymorphic regions within the same gene. This approach necessitates that we qualify our use of the term “replication” in the discussion below. In particular, we stress that “replication” does not necessarily refer to a specific genetic locus, only to the gene within which that locus occurs. Therefore, where a significant association has been reported for a particular gene–disease association in more than one study, we have referred to this as replicated reporting of a significant association, regardless of the specific gene variant.
Our search identified 5748 articles in total, across all three databases. The number of articles was reduced to 122 by the initial screening process, which excluded irrelevant or unsuitable studies on the basis of the title or abstract. Paper or electronic copies of all 122 articles were retrieved and re-checked. A further 32 articles were excluded after it became apparent that they were not association studies, leaving a total of 90 for data extraction. Table 1 provides an overview of these studies, grouped by joint category.
Gene–OA/SDD associations by joint category
Out of the 90 included studies, 68 reported finding one or more significant gene–disease association(s). From these studies, 94 associations were reported from a total of 83 different genes (table 1). Hip OA (HOA) and knee OA (KOA) studies were most prevalent, but we found relatively few instances, either for these or any other joint sites, in which the same gene–disease association had been the focus of more than one study. There were also very few examples of significant associations that had been replicated in separate studies. For example, 31 studies examined gene associations for HOA, analysing a total of 38 genes. Only 15 of these genes were represented by two or more separate studies and there were only two cases in which a significant association was replicated by another study.
Table 2 provides a more detailed summary of the number of significant and non-significant associations for each gene (grouped by joint category).
It is clear that the relatively low incidence of different studies having focussed on the same gene–disease association (which is apparent from table 1) is in part a consequence of the dilution of study effort between different disease phenotypes. For example, we found four studies that tested for an association with the COL1A1 gene, but these were split between cases of HOA, KOA, HOA and KOA combined, and SDD; only one of these separate joint categories was represented by more than one study. A similarly diffuse pattern of research effort is apparent for many of the other genes shown on table 2, with a small number of exceptions (such as the VDR gene, where four of the six joint categories studied are represented by multiple studies).
Study designs, sample sizes and data analysis
The majority of studies had a sample size of ⩾150 (table 3).7 There was no suggestion that the simple categorisation of studies into groups with either < or ⩾150 patients was correlated with a tendency to report a significant association, but caution needs to be exercised here as the number in the “small” category was low.
The majority of studies used some variation of a case–control design to quantify associations, sometimes nesting within a cohort study but more often sampling from a proxy control population, whilst three studies used a family-based design.14 47 93 Most studies analysed the association between a genetic variant(s) and disease prevalence (presence or absence of disease), whilst a smaller proportion focused on quantitative or qualitative measures of disease severity (table 3). Less than a third of all studies that reported a significant association and tested across multiple loci applied any form of correction for a raised type I error rate. Equally striking is the finding that, regardless of sample size, over half of the studies failed to report basic study details. Omissions were evident with regard to one or more of: ethnicity, gender, age, and details of the recruitment process. In many cases these omissions make it difficult to judge whether or not reported associations could potentially be confounded by population stratification, or whether controls have been appropriately selected. Full results of analyses were often not provided and many studies relied on subgroup analyses. Taken together, these problems may create a substantial risk of publication bias.
Our results show that there has been much research into gene–OA and gene–SDD associations over the last decade. The 90 studies included in this review reported a total of 94 associations from 83 different genes. Reasonably good evidence is emerging in support of a role for certain genes. We found a number of significant associations that have been replicated within a joint category by two or more independent studies: AGC1, hand;14 15 17 ASPN, hip and knee;19–22 COL9A2, spine;41 44 COL9A3, spine;39 45 COL11A2, spine;16 39 ESR1, knee;48 49 FZRB, hip;9 54 55 HFE, hand;57 58 IL1A, knee;67 68 IL1RN, knee;67 68 PTGS2, knee;10 11 and VDR, knee and spine.50 52 89–94 We also found several studies where significant associations (FRZB and IL1 gene cluster) were replicated in separate populations.9 53 68 However, in the majority of cases where a significant association had been reported and the same gene–disease association had been tested in another study, the association had not been found again. Therefore, it is unclear to what extent these isolated significant associations are in fact “false positives”. This problem is confounded by the fact that separate studies of the same gene and the same disease phenotype sometimes focus on different polymorphisms within the gene. In such cases, the association may not have been replicated with respect to the same genetic unit (ie, the genetic locus).
Thus, despite the obvious progress that has been made, isolated gene–OA/SDD studies are very common (103 out of a total of 157 by joint category; table 1) and relatively few genes have been the subject of more than two or three studies within any particular joint category (table 2). The complex phenotype of the disease undoubtedly makes it difficult to prioritise research areas and may encourage a somewhat uncoordinated and diffuse approach when the field is viewed as a whole. Indeed, the degree to which different types of OA and SDD genuinely represent discrete forms of disease, with distinct aetiologies, is far from clear. The sheer number of potential gene–OA associations inevitably adds to this problem. However, we suggest that substantial progress could be achieved from focusing on those candidate genes where evidence seems to be growing (eg, VDR). If we continue instead to cast the “net” ever wider in an attempt to detect new associations, unreplicated and untested results will simply continue to accumulate without adding much to our understanding. A more convincing approach may be to carry out smaller numbers of large, high quality, genome-wide association studies (eg, Spector et al).77
A further problem is that relatively simple and avoidable methodological and reporting problems frequently undermine the degree of confidence that can be placed in individual gene–OA/SDD association studies. First, we found that many studies provided insufficient detail to enable the reader to assess whether or not bias could have confounded the analysis (eg, unreported age-biases).6 It should be standard practice to report full case and control population details. Second, many studies tested for associations across multiple loci, but did not apply any form of correction for type I statistical errors. The problem of spurious association may therefore be inflated by the use of inappropriate (permissive) significance levels, presenting a serious challenge to the development of clear, well-supported conclusions. Third, in many cases the main associations reported in a study were derived from some form of subgroup analysis. Whilst this approach may be perfectly valid in the right context, it may be misleading not to include strong caveats. For example, significant associations derived from post hoc subgroup analyses may be emphasised in the abstract of a study over and above the results of the primary analysis of the study, if the latter are non-significant. The omission of appropriate caveats from the abstract of a study can only encourage biased reporting and interpretation. Fourth, many studies failed to report results in full, adding to the familiar concern regarding the degree to which non-significant results are being under-reported.
Huizinga et al have provided recommendations for the conduct of gene–OA association studies; Little et al also recently gave a detailed overview of a wide-range of issues that arise in the appraisal of gene–disease association studies.98–100 These studies offer detailed recommendations to aid authors in the preparation of manuscripts and to help readers assess study design and quality. It is undoubtedly too soon to expect these recommendations to have been applied fully in all but the most recent cases (and many studies included here predate these overviews), but our results illustrate why they must be applied in future studies. Adequate replication and attention to study design, methodology and reporting is demonstrably lacking, as is a clear delineation between hypothesis testing vs hypothesis generation. A clearer approach to disease phenotype in the OA/SDD field is also vital. Taken together, these problems present a major obstacle to the systematic review and/or meta-analysis of effect sizes for particular associations. A key challenge for future studies will be to actively minimise the potential for publication bias. For example, the value of published association analyses will be greatly undermined if unpublished studies and datasets are allowed to accumulate beyond the range of the systematic reviewer’s “radar”. The Human Genome Epidemiology Network (HuGENet) has been at the leading edge of calls for registries and investigator networks aimed at overcoming this problem.101
In summary, our primary concern in this review has been to provide an overview of general progress and to highlight areas where it appears that improvements are possible. Whilst it is clear that real progress has been made in the genetic epidemiology of OA and SDD, future association analyses must deal with a range of important problems. Greater focus and coordination between researchers is vital,101 particularly in light of advances in gene-scanning technology, which mean that the potential now exists for the number of reported significant associations to grow at an accelerating rate. As our understanding of the genetic architecture of OA grows, hypothesis testing should become a more realistic and achievable goal. Additionally, basic epidemiological principles must be applied with greater vigour and a clearer standard of reporting maintained, in keeping with the recommendations of HuGENet. We also hope that funding bodies will begin to prioritise research that follows these well-tested principles.
Funding: This work was supported by an Action Arthritis grant awarded to AJM. JJR was supported by core funding awarded to the Institute of Health at the University of East Anglia, through the Department of Health’s Research Capacity Development Programme.
Competing interests: None.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.