Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets

Abstract

A substantial investment has been made in the generation of large public resources designed to enable the identification of tag SNP sets, but data establishing the adequacy of the sample sizes used are limited. Using large-scale empirical and simulated data sets, we found that the sample sizes used in the HapMap project are sufficient to capture common variation, but that performance declines substantially for variants with minor allele frequencies of <5%.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Training set sample size and tSNP selection method on the capture of common variation (MAF >5%) in simulated data.
Figure 2: Training set sample size and tSNP selection method on the capture of rare variation (MAF 1–5%) in simulated data.

Similar content being viewed by others

References

  1. Johnson, G.C. et al. Nat. Genet. 29, 233–237 (2001).

    Article  CAS  PubMed  Google Scholar 

  2. Ke, X. et al. Hum. Mol. Genet. 13, 2557–2565 (2004).

    Article  CAS  PubMed  Google Scholar 

  3. Ahmadi, K.R. et al. Nat. Genet. 37, 84–89 (2005).

    Article  CAS  PubMed  Google Scholar 

  4. Lin, S., Chakravarti, A. & Cutler, D.J. Nat. Genet. 36, 1181–1188 (2004).

    Article  CAS  PubMed  Google Scholar 

  5. Kamatani, N. et al. Am. J. Hum. Genet. 75, 190–203 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Halldorsson, B.V., Istrail, S. & De La Vega, F.M. Hum. Hered. 58, 190–202 (2004).

    Article  CAS  PubMed  Google Scholar 

  7. McCarthy, M.I. Curr. Diab. Rep. 3, 159–167 (2003).

    Article  PubMed  Google Scholar 

  8. Carlson, C.S. et al. Am. J. Hum. Genet. 74, 106–120 (2004).

    Article  CAS  PubMed  Google Scholar 

  9. Hudson, R.R. Bioinformatics 18, 337–338 (2002).

    Article  CAS  PubMed  Google Scholar 

  10. Nordborg, M. in Handbook of Statistical Genetics 179–212 (Wiley, Chichester, UK, 2001).

    Google Scholar 

  11. Mueller, J.C. et al. Am. J. Hum. Genet. 76, 387–398 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Carlson, C.S., Eberle, M.A., Kruglyak, L. & Nickerson, D.A. Nature 429, 446–452 (2004).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was funded by the National Institute of Diabetes and Digestive and Kidney Diseases. Collection of the UK case samples was funded by Diabetes UK. This work was carried out on behalf of the International Type 2 Diabetes 1q Consortium.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eleftheria Zeggini.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

MAF distribution of SNPs in the simulated and empirical data (moderate LD and full 13 Mb region). (PDF 8 kb)

Supplementary Fig. 2

Characteristics of the 3 regions of variable LD studied in the empirical data. (PDF 24 kb)

Supplementary Fig. 3

Training set sample size and tSNP selection method on the capture of common variation in the empirical data. (PDF 8 kb)

Supplementary Fig. 4

Training set sample size and tSNP selection method on the capture of unmeasured common variation in the simulated data. (PDF 9 kb)

Supplementary Fig. 5

Training set sample size and tSNP selection method on the capture of rare variation in the empirical data. (PDF 11 kb)

Supplementary Fig. 6

Tagging SNP performance at different r2 thresholds. (PDF 10 kb)

Supplementary Fig. 7

Correlation between training and test set pairwise r2 values. (PDF 11 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zeggini, E., Rayner, W., Morris, A. et al. An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets. Nat Genet 37, 1320–1322 (2005). https://doi.org/10.1038/ng1670

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng1670

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing