Skip to main content

Development, Preparation, and Curation of High-Throughput Phenotypic Data for Genome-Wide Association Studies: A Sample Pipeline in R

  • Protocol
  • First Online:
Genome-Wide Association Studies

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2481))

  • 2578 Accesses

Abstract

Genome-wide association studies (GWAS) have benefited from the advances of sequencing methods for the generation of high-density genomic data. By bridging genotype to phenotype, several genes have been associated with traits of agricultural interest. Despite this, there is still a gap between genotyping and phenotyping due to the large difference in throughput between the two disciplines. Although cutting-edge phenomics technologies are available to the community, their costs are still prohibitive at the small lab level. Semiautomated methods of investigation provide a valid alternative to generate large-scale phenotyping data able to deeply investigate the characteristics of different plant organs. Beyond automation, phenomics data management is another major constraint to consider; while bioinformatics pipelines are well-trained for releasing high-quality genomic data, fewer efforts have been done for phenotyping information. This chapter provides a guide for generating large-scale data related to the size and shape of fruits, leaves, seeds, and roots and for downstream analysis for curation and preparation of clean datasets, through removal of outliers and performing primary statistical analysis. Different steps to be carried out in the R environment will be shown for gathering the appropriate input information to use in GWAS avoiding any possible bias.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Brachi B, Morris GP, Borevitz JO (2011) Genome-wide association studies in plants: the missing heritability is in the field. Genome Biol 12:232. https://doi.org/10.1186/gb-2011-12-10-232

    Article  PubMed  PubMed Central  Google Scholar 

  2. Esposito S, Carputo D, Cardi T, Tripodi P (2020) Applications and trends of machine learning in genomics and phenomics for next-generation breeding. Plants 9(1):34. https://doi.org/10.3390/plants9010034

    Article  CAS  Google Scholar 

  3. Casci T (2010) Plants are not humans. Nat Rev Genet 11:315. https://doi.org/10.1038/nrg2788

    Article  CAS  Google Scholar 

  4. Korte A, Farlow A (2013) The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9:29. https://doi.org/10.1186/1746-4811-9-29

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. European Plant Phenotyping Network (EPPN) https://www.plant-phenotyping-network.eu/. Accessed 20 Jun 2021

  6. International Plant Phenotyping Network (IPPN) https://www.plant-phenotyping.org/IPPN_home. Accessed 20 Jun 2021

  7. North American Plant Phenotyping Network (NAPPN) https://nappn.plant-phenotyping.org/. Accessed 20 Jun 2021

  8. Image Software Tools https://www.quantitative-plant.org/software. Accessed 20 Jun 2021

  9. R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  10. Coucke W, China B, Delattre I, Lenga Y, Van Blerk M, Van Campenhout C, Van de Walle P, Vernelen K, Albert A (2012) Comparison of different approaches to evaluate external quality assessment data. Clin Chim Acta 413:582–586. https://doi.org/10.1016/j.cca.2011.11.030

    Article  CAS  PubMed  Google Scholar 

  11. O’Connor LJ, Price AL (2018) Distinguishing correlation from causation using genome-wide association studies. arXiv:1811.08803

    Google Scholar 

  12. Tripodi P, Soler S, Campanelli G, Díez MJ et al (2021) Genome wide association mapping for agronomic, fruit quality, and root architectural traits in tomato under organic farming conditions. BMC Plant Biol 21:481. https://doi.org/10.1186/s12870-021-03271-4

  13. Fernandes SB, Zhang KS, Jamann TM, Lipka AE (2021) How well can multivariate and univariate GWAS distinguish between true and spurious pleiotropy? Front Plant Sci 11:1–11. https://doi.org/10.3389/fgene.2020.602526

    Article  CAS  Google Scholar 

  14. Colonna V, D’Agostino N, Garrison E, Albrechtsen A, Meisner J, Facchiano A, Cardi T, Tripodi P (2019) Genomic diversity and novel genome-wide association with fruit morphology in Capsicum, from 746k polymorphic sites. Sci Rep 9:10067. https://doi.org/10.1038/s41598-019-46136-5

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Lever J, Krzywinski M, Altman N (2017) Principal component analysis. Nat Methods 14:641–642. https://doi.org/10.1038/nmeth.4346

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pasquale Tripodi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Tripodi, P. (2022). Development, Preparation, and Curation of High-Throughput Phenotypic Data for Genome-Wide Association Studies: A Sample Pipeline in R. In: Torkamaneh, D., Belzile, F. (eds) Genome-Wide Association Studies. Methods in Molecular Biology, vol 2481. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2237-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2237-7_7

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2236-0

  • Online ISBN: 978-1-0716-2237-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics