Abstract
To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Boguski, M.S. (1999) Biosequence exegesis. Science 286(5439), 453–5.
Brazma, A. (2001) On the importance of standardisation in life sciences. Bioinformatics 17(2), 113–4.
Stoeckert, C.J., Jr., Causton, H.C., and Ball, C.A. (2002) Microarray databases: standards and ontologies. Nat Genet 32, 469–73.
Brooksbank, C., and Quackenbush, J. (2006) Data standards: a call to action. OMICS 10(2), 94–9.
Rogers, S., and Cambrosio, A. (2007) Making a new technology work: the standardization and regulation of microarrays. Yale J Biol Med 80(4), 165–78.
Warrington, J.A. (2008) Standard controls and protocols for microarray based assays in clinical applications, in Book of Genes and Medicine. Medical Do Co: Osaka.
Piwowar, H.A., et al. (2008) Towards a data sharing culture: recommendations for leadership from academic health center. PLoS Med 5(9), e183.
Brazma, A., Krestyaninova, M., and Sarkans, U. (2006) Standards for systems biology. Nat Rev Genet 7(8), 593–605.
Brazma, A., et al. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29(4), 365–71.
Spellman, P.T., et al. (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 3(9), RESEARCH0046.
Whetzel, P.L., et al. (2006) The MGED ontology: a resource for semantics-based description of microarray experiments. Bioinformatics 22(7), 866–73.
Parkinson, H., et al. (2009) ArrayExpress update – from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 37(Database issue), D868–72.
Parkinson, H., et al. (2007) ArrayExpress – a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35(Database issue), D747–50.
Parkinson, H., et al. (2005) ArrayExpress – a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 33(Database issue), D553–5.
Barrett, T., and Edgar, R. (2006) Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411, 352–69.
Barrett, T., et al. (2005) NCBI GEO: mining millions of expression profiles – database and tools. Nucleic Acids Res 33(Database issue), D562–6.
Barrett, T., et al. (2007) NCBI GEO: mining tens of millions of expression profiles – database and tools update. Nucleic Acids Res 35(Database issue), D760–5.
Barrett, T., et al. (2009) NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 37(Database issue), D885–90.
Taylor, C.F., et al. (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25(8), 887–93.
Shi, L., et al. (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24(9), 1151–61.
Taylor, C.F., et al. (2008) Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 26(8), 889–96.
DeFrancesco, L. (2002) Journal trio embraces MIAME. Genome Biol 8(6), R112.
Jones, A.R., and Paton, N.W. (2005) An analysis of extensible modelling for functional genomics data. BMC Bioinformatics 6, 235.
Ashburner, M., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1), 25–9.
Smith, B., et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11), 1251–5.
Salit, M. (2006) Standards in gene expression microarray experiments. Methods Enzymol 411, 63–78.
Li, H., et al. (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–9.
Brookes, A.J., et al. (2009) The phenotype and genotype experiment object model (PaGE-OM): a robust data structure for information related to DNA variation. Hum Mutat 30(6), 968–77.
Brazma, A., and Parkinson, H. (2006) ArrayExpress service for reviewers/editors of DNA microarray papers. Nat Biotechnol 24(11), 1321–2.
Rayner, T.F., et al. (2006) A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics 7, 489.
Rayner, T.F., et al. (2009) MAGETabulator, a suite of tools to support the microarray data format MAGE-TAB. Bioinformatics 25(2), 279–80.
Manduchi, E., et al. (2004) RAD and the RAD Study-Annotator: an approach to collection, organization and exchange of all relevant information for high-throughput gene expression studies. Bioinformatics 20(4), 452–9.
Ball, C.A., et al. (2005) The Stanford Microarray Database accommodates additional microarray platforms and data formats. Nucleic Acids Res 33(Database issue), D580–2.
Demeter, J., et al. (2007) The Stanford Microarray Database: implementation of new analysis tools and open source release of software. Nucleic Acids Res 35(Database issue), D766–70.
Gollub, J., et al. (2003) The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res 31(1), 94–6.
Gollub, J., Ball, C.A., and Sherlock, G. (2006) The Stanford Microarray Database: a user’s guide. Methods Mol Biol 338, 191–208.
Hubble, J., et al. (2009) Implementation of GenePattern within the Stanford Microarray Database. Nucleic Acids Res 37(Database issue), D898–901.
Sherlock, G., et al. (2001) The Stanford Microarray Database. Nucleic Acids Res 29(1), 152–5.
Navarange, M., et al. (2005) MiMiR: a comprehensive solution for storage, annotation and exchange of microarray data. BMC Bioinformatics 6, 268.
Allison, M. (2008) Is personalized medicine finally arriving? Nat Biotechnol 26(5), 509–17.
Orchard, S., and Hermjakob, H. (2008) The HUPO proteomics standards initiative – easing communication and minimizing data loss in a changing world. Brief Bioinform 9(2), 166–73.
Pedrioli, P.G., et al. (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol 22(11), 1459–66.
Keller, A., et al. (2005) A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 1, 0017.
Deutsch, E. (2008) mzML: a single, unifying data format for mass spectrometer output. Proteomics 8(14), 2776–7.
Deutsch, E.W., Lam, H., and Aebersold, R. (2008) Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. Physiol Genomics 33(1), 18–25.
Orchard, S., et al. (2007) The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat Biotechnol 25(8), 894–8.
Kerrien, S., et al. (2007) Broadening the horizon – level 2.5 of the HUPO-PSI format for molecular interactions. BMC Biol 5, 44.
Fiehn, O., et al. (2006) Establishing reporting standards for metabolomic and metabonomic studies: a call for participation. OMICS 10(2), 158–63.
Sansone, S.A., et al. (2007) The metabolomics standards initiative. Nat Biotechnol 25(8), 846–8.
Goodacre, R., et al. (2007) Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics 3(3), 231–41.
Hardy, N., and Taylor, C. (2007) A roadmap for the establishment of standard data exchange structures for metabolomics. Metabolomics 3(3), 243–8.
Jenkins, H., Johnson, H., Kular, B., Wang, T., and Hardy, N. (2005) Toward supportive data collection tools for plant metabolomics. Plant Physiol 138(1), 67–77.
Jenkins, H., et al. (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nat Biotechnol 22(12), 1601–6.
Spasic, I., et al. (2006) MeMo: a hybrid SQL/XML approach to metabolomic data management for functional genomics. BMC Bioinformatics 7, 281.
Sansone, S.-A., Schober, D., Atherton, H., Fiehn, O., Jenkins, H., Rocca-Serra, P., et al. (2007) Metabolomics standards initiative: ontology working group work in progress. Metabolomics 3(3), 249–56.
Jenkins, H., Hardy, N., Beckmann, M., Draper, J., Smith, A.R., Taylor, J., et al. (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nat Biotechnol 22(12), 1601–6.
Kumar, D. (2007) From evidence-based medicine to genomic medicine. Genomic Med 1(3–4), 95–104.
Fostel, J.M. (2008) Towards standards for data exchange and integration and their impact on a public database such as CEBS (Chemical Effects in Biological Systems). Toxicol Appl Pharmacol 233(1), 54–62.
Bland, P.H., Laderach, G.E., and Meyer, C.R. (2007) A web-based interface for communication of data between the clinical and research environments without revealing identifying information. Acad Radiol 14(6), 757–64.
Meslin, E.M. (2006) Shifting paradigms in health services research ethics. Consent, privacy, and the challenges for IRBs. J Gen Intern Med 21(3), 279–80.
Ferris, T.A., Garrison, G.M., and Lowe, H.J. (2002) A proposed key escrow system for secure patient information disclosure in biomedical research databases. Proc AMIA Symp, 245–9.
Quackenbush, J., et al. (2006) Top-down standards will not serve systems biology. Nature 440(7080), 24.
Jones, A.R., et al. (2007) The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics. Nat Biotechnol 25(10), 1127–33.
Sansone, S.A., et al. (2008) The first RSBI (ISA-TAB) workshop: “can a simple format work for complex studies?” OMICS 12(2), 143–9.
Sansone, S.A., et al. (2006) A strategy capitalizing on synergies: the Reporting Structure for Biological Investigation (RSBI) working group. OMICS 10(2), 164–71.
Whetzel, P.L., et al. (2006) Development of FuGO: an ontology for functional genomics investigations. OMICS 10(2), 199–204.
Smith, B., et al. (2005) Relations in biomedical ontologies. Genome Biol 6(5), R46.
Rubin, D.L., et al. (2006) National Center for Biomedical Ontology: advancing biomedicine through structured organization of scientific knowledge. OMICS 10(2), 185–98.
Piwowar, H.A., and Chapman, W.W. (2008) Identifying data sharing in biomedical literature. AMIA Annu Symp Proc, 596–600.
Galperin, M.Y., and Cochrane, G.R. (2009) Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009. Nucleic Acids Res 37(Database issue), D1–4.
Ruttenberg, A., et al. (2007) Advancing translational research with the Semantic Web. BMC Bioinformatics (8 Suppl 3), S2.
Sagotsky, J.A., et al. (2008) Life Sciences and the web: a new era for collaboration. Mol Syst Biol 4, 201.
Stein, L.D. (2008) Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Rev Genet 9(9), 678–88.
Day, A., et al. (2007) Celsius: a community resource for Affymetrix microarray data. Genome Biol 8(6), R112.
Ochsner, S.A., et al. (2008) Much room for improvement in deposition rates of expression microarray datasets. Nat Methods 5(12), 991.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Chervitz, S.A. et al. (2011). Data Standards for Omics Data: The Basis of Data Sharing and Reuse. In: Mayer, B. (eds) Bioinformatics for Omics Data. Methods in Molecular Biology, vol 719. Humana Press. https://doi.org/10.1007/978-1-61779-027-0_2
Download citation
DOI: https://doi.org/10.1007/978-1-61779-027-0_2
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-026-3
Online ISBN: 978-1-61779-027-0
eBook Packages: Springer Protocols