Complete pipeline for Infinium(®) Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation

Epigenomics. 2012 Jun;4(3):325-41. doi: 10.2217/epi.12.21.

Abstract

Background: Huge progress has been made in the development of array- or sequencing-based technologies for DNA methylation analysis. The Illumina Infinium(®) Human Methylation 450K BeadChip (Illumina Inc., CA, USA) allows the simultaneous quantitative monitoring of more than 480,000 CpG positions, enabling large-scale epigenotyping studies. However, the assay combines two different assay chemistries, which may cause a bias in the analysis if all signals are merged as a unique source of methylation measurement.

Materials & methods: We confirm in three 450K data sets that Infinium I signals are more stable and cover a wider dynamic range of methylation values than Infinium II signals. We evaluated the methylation profile of Infinium I and II probes obtained with different normalization protocols and compared these results with the methylation values of a subset of CpGs analyzed by pyrosequencing.

Results: We developed a subset quantile normalization approach for the processing of 450K BeadChips. The Infinium I signals were used as 'anchors' to normalize Infinium II signals at the level of probe coverage categories. Our normalization approach outperformed alternative normalization or correction approaches in terms of bias correction and methylation signal estimation. We further implemented a complete preprocessing protocol that solves most of the issues currently raised by 450K array users.

Conclusion: We developed a complete preprocessing pipeline for 450K BeadChip data using an original subset quantile normalization approach that performs both sample normalization and efficient Infinium I/II shift correction. The scripts, being freely available from the authors, will allow researchers to concentrate on the biological analysis of data, such as the identification of DNA methylation signatures.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Automation, Laboratory
  • CpG Islands / genetics*
  • DNA Methylation / genetics*
  • Electronic Data Processing
  • Epigenomics
  • Genome, Human*
  • Humans
  • Oligonucleotide Array Sequence Analysis / instrumentation
  • Oligonucleotide Array Sequence Analysis / methods*
  • Software*