Statistical Analysis and Modeling of Mass Spectrometry-Based Metabolomics Data

Xi, Bowei; Gu, Haiwei; Baniasadi, Hamid; Raftery, Daniel

doi:10.1007/978-1-4939-1258-2_22

Statistical Analysis and Modeling of Mass Spectrometry-Based Metabolomics Data

Bowei Xi³,
Haiwei Gu^4,5,
Hamid Baniasadi⁶ &
…
Daniel Raftery^4,7

Protocol
First Online: 01 January 2014

6401 Accesses
65 Citations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1198))

Abstract

Multivariate statistical techniques are used extensively in metabolomics studies, ranging from biomarker selection to model building and validation. Two model independent variable selection techniques, principal component analysis and two sample t-tests are discussed in this chapter, as well as classification and regression models and model related variable selection techniques, including partial least squares, logistic regression, support vector machine, and random forest. Model evaluation and validation methods, such as leave-one-out cross-validation, Monte Carlo cross-validation, and receiver operating characteristic analysis, are introduced with an emphasis to avoid over-fitting the data. The advantages and the limitations of the statistical techniques are also discussed in this chapter.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

The NIST MS database. http://www.hmdb.ca/
The Metlin Database. http://metlin.scripps.edu/index.php
Gu H, Pan Z, Xi B, Hainline B, Shanaiah N, Asiago V, Gowda G, Raftery D (2009) ¹H NMR metabolomics study of age profiling in children. NMR Biomed 22:826–833
Article PubMed CAS PubMed Central Google Scholar
Johnson R, Wichern DW (2002) Applied multivariate statistical analysis, 5th edn. Prentice-Hall, Englewood Cliffs, NJ
Google Scholar
Nyamundanda G, Brennan L, Gormley IC (2010) Probabilistic principal component analysis for metabolomic data. BMC Bioinformatics 11:571
Article PubMed PubMed Central Google Scholar
Pan Z, Gu H, Talaty N, Chen H, Shanaiah N, Hainline BE, Cooks G, Raftery D (2007) Principal component analysis of urine metabolites detected by NMR and DESI–MS in patients with inborn errors of metabolism. Anal Bioanal Chem 387:539–549
Article PubMed CAS Google Scholar
Wiklund S, Johansson E, Sjstrm L, Mellerowicz EJ, Edlund U, Shockcor JP, Gottfries J, Moritz T, Trygg J (2008) Visualization of GC/TOF-MS-based metabolomics data for identification of biochemically interesting compounds using OPLS class models. Anal Chem 80:115–122
Article PubMed CAS Google Scholar
Wikoffa WR, Anforab AT, Liub J, Schultzb PG, Lesleyb SA, Petersb EC, Siuzdak G (2009) Metabolomics analysis reveals large effects of gut microflora on mammalian blood metabolites. Proc Natl Acad Sci U S A 106:3698–3703
Article Google Scholar
Gu H, Pan Z, Xi B, Asiago V, Musselman B, Raftery D (2011) Principal component directed partial least squares analysis for combining nuclear magnetic resonance and mass spectrometry data in metabolomics: application to the detection of breast cancer. Anal Chim Acta 686:57–63
Article PubMed CAS PubMed Central Google Scholar
Bretz F, Hothorn T, Westfall P (2011) Multiple comparisons using R. Chapman & Hall, New York
Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc Ser B 57:289–300
Google Scholar
Storey JD (2002) A direct approach to false discovery rates. J Royal Stat Soc Ser B 64:479–498
Article Google Scholar
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188
Article Google Scholar
Bender R, Lange S (2001) Adjusting for multiple testing—when and how? J Clin Epidemiol 54:343–349
Article PubMed CAS Google Scholar
Baniasadi H, Nagana Gowda GA, Gu H, Zeng A, Zhuang S, Skill N, Maluccio M, Raftery D (2013) Targeted metabolic profiling of hepatocellular carcinoma and hepatitis C using LC-MS/MS. Electrophoresis 34:2910–2917
PubMed CAS Google Scholar
Wold S, Antti H, Lindgren F, Öhman J (1998) Orthogonal signal correction of near-infrared spectra. Chemom Intell Lab Sys 44:175–185
Article CAS Google Scholar
Liao JG, Chin KV (2007) Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23:1945–1951
Article PubMed CAS Google Scholar
Sugimoto M, Wong DT, Hirayama A, Soga T, Tomita M (2010) Capillary electrophoresis mass spectrometry-based saliva metabolomics identified oral, breast and pancreatic cancer-specific profiles. Metabolomics 6:78–95
Article PubMed CAS PubMed Central Google Scholar
Park MY, Hastie T (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9:30–50
Article PubMed Google Scholar
R package stepPlr. http://cran.r-project.org/web/packages/stepPlr/
Cortes C, Vapnik V (1995) Support-vector networks. Machine Learning 20:273–297
Google Scholar
Steinwart I, Christmann C (2008) Support vector machine. Springer, New York
Google Scholar
Mahadevan S, Shah SL, Marrie TJ, Slupsky CM (2008) Analysis of metabolomic data using support vector machines. Anal Chem 80:7562–7570
Article PubMed CAS Google Scholar
Zhu J, Rosset S, Hastie T, Tibshirani R (2004) 1-Norm support vector machines. Adv Neural Inf Process Syst 16:49–56
Google Scholar
Chang CC, Lin CJ (2001) libsvm: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Weka: data mining software in Java. http://www.cs.waikato.ac.nz/ml/weka/
R package e1071. http://cran.r-project.org/web/packages/e1071/
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classifiers 10:61–74
Google Scholar
Breiman L (2001) Random forests. Machine Learning 45:5–32
Article Google Scholar
West PR, Weir AM, Smith AM, Donley EL, Cezar GG (2010) Predicting human developmental toxicity of pharmaceuticals using human embryonic stem cells and metabolomics. Toxicol Appl Pharmacol 247:18–27
Article PubMed CAS Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York
Book Google Scholar
R package randomForest. http://cran.r-project.org/web/packages/randomForest/
Carrola J, Rocha CM, Barros AS, Gil AM, Goodfellow BJ, Carreira IM, Bernardo J, Gomes A, Sousa V, Carvalho L, Duarte IF (2011) Metabolic signatures of lung cancer in biofluids: NMR-based metabonomics of urine. J Proteome Res 10:221–230
Article PubMed CAS Google Scholar
Molinaro AM, Simon R, Pfeiffer PM (2005) Prediction error estimation: a comparison of resampling methods. Bioinformatics 21:3301–3307
Article PubMed CAS Google Scholar

Download references

Acknowledgments

This article was written while one of the authors, Bowei Xi, was on sabbatical leave at the Statistical and Applied Mathematical Sciences Institute (SAMSI, Research Triangle Park, NC). This work is partially funded by NSF DMS-1228348, ARO W911NF-12-1-0558, DoD MURI W911NF-08-1-0238 (BX) and NIH R01GM085291 (DR).

Author information

Authors and Affiliations

Department of Statistics, Purdue University, 250 North University Street, West Lafayette, IN, 47907, USA
Bowei Xi
Department of Anesthesiology & Pain Medicine, Northwest Metabolomics Research Center, University of Washington, Seattle, WA, 98109, USA
Haiwei Gu & Daniel Raftery
Jiangxi Key Laboratory for Mass Spectrometry and Instrumentation, East China Institute of Technology, Nanchang, Jiangxi, China
Haiwei Gu
Department of Chemistry, Purdue University, West Lafayette, IN, USA
Hamid Baniasadi
Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Daniel Raftery

Authors

Bowei Xi
View author publications
You can also search for this author in PubMed Google Scholar
Haiwei Gu
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Baniasadi
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Raftery
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bowei Xi .

Editor information

Editors and Affiliations

University of Washington Fred Hutchinson Cancer Rsrch Ctr, Seattle, Washington, USA
Daniel Raftery

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Xi, B., Gu, H., Baniasadi, H., Raftery, D. (2014). Statistical Analysis and Modeling of Mass Spectrometry-Based Metabolomics Data. In: Raftery, D. (eds) Mass Spectrometry in Metabolomics. Methods in Molecular Biology, vol 1198. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-1258-2_22

Download citation

DOI: https://doi.org/10.1007/978-1-4939-1258-2_22
Published: 14 August 2014
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-1257-5
Online ISBN: 978-1-4939-1258-2
eBook Packages: Springer Protocols

Publish with us

Policies and ethics