Target Selection for Structural Genomics: An Overview

Marsden, Russell L.; Orengo, Christine A.

doi:10.1007/978-1-60327-058-8_1

Russell L. Marsden⁴ &
Christine A. Orengo⁴

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 426))

3845 Accesses
23 Citations

The success of the whole genome sequencing projects brought considerable credence to the belief that high-throughput approaches, rather than traditional hypothesis-driven research, would be essential to structurally and functionally annotate the rapid growth in available sequence data within a reasonable time frame. Such observations supported the emerging field of structural genomics, which is now faced with the task of providing a library of protein structures that represent the biological diversity of the protein universe. To run efficiently, structural genomics projects aim to define a set of targets that maximize the potential of each structure discovery whether it represents a novel structure, novel function, or missing evolutionary link. However, not all protein sequences make suitable structural genomics targets: It takes considerably more effort to determine the structure of a protein than the sequence of its gene because of the increased complexity of the methods involved and also because the behavior of targeted proteins can be extremely variable at the different stages in the structural genomics “pipeline.” Therefore, structural genomics target selection must identify and prioritize the most suitable candidate proteins for structure determination, avoiding “problematic” proteins while also ensuring the ultimate goals of the project are followed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Case Studies: Function Predictions of Structural Genomics Results

The impact of structural genomics: the first quindecennial

Article 02 March 2016

Observation selection bias in contact prediction and its implications for structural bioinformatics

Article Open access 18 November 2016

References

Bourne, P. E., Westbrook, J., and Berman, H. M. (2004) The Protein Data Bank and lessons in data management. Brief. Bioinform. 5, 23–30.
Article CAS PubMed Google Scholar
Airlie Agreement (2001) http://www.nigms.nih.gov/news/meetings/airlie.html
Baker D., and Sali A. (2001) Protein structure prediction and structural genomics. Science 294, 93–96.
Article CAS PubMed Google Scholar
Brenner, S. E., and Levitt, M. (2000) Expectations from structural genomics. Protein Sci. 9, 197–200.
Article CAS PubMed Google Scholar
Chandonia, J. M., Earnest, T. N., and Brenner, S. E. (2004) Structural genomics and structural biology: compare and contrast. Genome Biol. 5, 343.
Article PubMed Google Scholar
Todd, A. E., Marsden, R. L., Thornton, J. M., and Orengo, C. A. (2005) Progress of structural genomics initiatives: an analysis of solved target structures. J. Mol. Biol. 348, 1235–1260.
Article CAS PubMed Google Scholar
Bray, J. E., Marsden, R. L., Rison, S. C., Savchenko, A., Edwards, A. M., Thornton, J. M., and Orengo, C. A. (2004) A practical and robust sequence search strategy for structural genomics target selection. Bioinformatics 20, 2288–2295.
Article CAS PubMed Google Scholar
Marsden, B. D., Sundstrom, M., and Knapp, S. (2006) High-throughput structural characterization of therapeutic protein targets. Expert Opin. Drug Disc. 1, 123–136.
Article CAS Google Scholar
Bravo, J., and Aloy, P. (2006) Target selection for complex structural genomics. Curr. Opin. Struct. Biol. 16, 385–392.
Article CAS PubMed Google Scholar
Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (2000) SCOP: a structural classification of proteins for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540.
Google Scholar
Orengo, C. A., Mitchie, A. D., Jones, S., Jones, D. T., Swindells, M. B., and Thornton, J. M. (1997) CATH—a hierarchical classification of protein domain structures. Structure 5, 1093–1108.
Article CAS PubMed Google Scholar
Grant, A., Lee, D., and Orengo, C. (2004) Progress towards mapping the universe of protein folds. Genome Biol. 5, 107.
Article PubMed Google Scholar
Harrison, A., Pearl, F., Mott, R., Thornton, J., and Orengo, C. (2002) Quantifying the similarities within fold space. J. Mol. Biol. 323, 909–926.
Article CAS PubMed Google Scholar
Orengo, C. A., Jones, D. T., and Thornton, J. M. (1994) Protein superfamilies and domain superfolds. Nature 372, 631–634.
Article CAS PubMed Google Scholar
Todd, A. E., Orengo, C. A., and Thornton, J. M. (2002) Sequence and structural differences between enzyme and nonenzyme homologs. Structure 10, 1435–1451.
Article CAS PubMed Google Scholar
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Article CAS PubMed Google Scholar
Eddy, S. R. (1996) Hidden Markov models. Curr. Opin. Struct. Biol. 6, 361–365.
Article CAS PubMed Google Scholar
Finn, R. D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S. R., Sonnhammer, E. L., and Bateman, A. (2006) Pfam: clans, web tools and services. Nucleic Acids Res. 34, D247–251.
Article CAS PubMed Google Scholar
Letunic, I., Copley, R. R., Pils, B., Pinkert, S., Schultz, J., and Bork, P. (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res. 34, D257–260.
Article CAS PubMed Google Scholar
tigr fam protein families: http://www.tigr.org/TIGRFAMs
Friedberg, I., Jaroszewski, L., Ye, Y., and Godzik, A. (2004) The interplay of fold recognition and experimental structure determination in structural genomics. Curr. Opin. Struct. Biol. 14, 307–312.
Article CAS PubMed Google Scholar
Vitkup, D., Melamud, E., Moult, J., and Sander, C. (2001) Completeness in structural genomics. Nat. Struct. Biol. 8, 559–566.
Article CAS PubMed Google Scholar
Marsden, R. L., Lee, D., Maibaum, M., Yeats, C., and Orengo, C. A. (2006) Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res. 34, 1066–1080.
Article CAS PubMed Google Scholar
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Wheeler, D. L. (2006) GenBank. Nucleic Acids Res. 34, D16–20.
Article CAS PubMed Google Scholar
Savchenko, A., Yee, A., Khachatryan, A., Skarina, T., Evdokimova, E., Pavlova, M., Semesi, A., Northey, J., Beasley, S., Lan, N., Das, R., Gerstein, M., Arrowmith, C. H., and Edwards, A. M. (2003) Strategies for structural proteomics of prokaryotes: quantifying the advantages of studying orthologous proteins and of using both NMR and X-ray crystallography approaches. Proteins 50, 392–329.
Article CAS PubMed Google Scholar
Needleman, S., and Wunsch, C. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453.
Article CAS PubMed Google Scholar
Smith, T., and Waterman, M. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.
Article CAS PubMed Google Scholar
Sander, C., and Schneider, R. (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56–68.
Article CAS PubMed Google Scholar
Doolittle, R. F. (1986) Of URFs and ORFs: a primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley, California.
Google Scholar
Rost, B. (1997). Protein structures sustain evolutionary drift. Folding and Design 2, S19–S24.
Article CAS PubMed Google Scholar
Smith, C. V., and Sacchettini, J. C. (2003) Mycobacterium tuberculosis: a model system for structural genomics. Curr. Opin. Struct. Biol. 13, 658–664.
Article CAS PubMed Google Scholar
Riley, M. L., Schmidt, T., Wagner, C., Mewes, H. W., and Frishman, D. (2005) The PEDANT genome database in 2005. Nucleic Acids Res. 33, D308–310.
Article CAS PubMed Google Scholar
Yeats, C., Maibaum, M., Marsden, R., Dibley, M., Lee, D., Addou, S., and Orengo, C. A. (2006) Gene3D: modeling protein structure, function and evolution. Nucleic Acids Res. 34, D281–284.
Article CAS PubMed Google Scholar
The Gene Ontology Consortium. (2000) Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29.
Article Google Scholar
Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30.
Article CAS PubMed Google Scholar
Bairoch, A. (2000) The ENZYME database in 2000. Nucleic Acids Res. 28, 304–305.
Article CAS PubMed Google Scholar
Xie, L., and Bourne P. E. (2005) Functional coverage of the human genome by existing structures, structural genomics targets, and homology models. PLoS Comput. Biol. 1, e31.
Article PubMed Google Scholar
Russell, R. B., and Eggleston, D. S. (2000) New roles for structure in biology and drug discovery. Nat. Struct. Biol. 7, 928–930.
Article CAS PubMed Google Scholar
Goh, C. S., Lan, N., Douglas, S. M., Wu, B., Echols, N., Smith, A., Milburn, D., Montelione, G. T., Zhao, H., and Gerstein, M. (2004) Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis. J. Mol. Biol. 336, 115–130.
Article CAS PubMed Google Scholar
Gruber, M., Soding, J., and Lupas, A. N. (2006) Comparative analysis of coiled-coil prediction methods. J. Struct. Biol. 155, 140–145.
Article CAS PubMed Google Scholar
Wolf, E., Kim, P. S., and Berger, B. (1997) MultiCoil: a program for predicting two- and three-stranded coiled coils. Protein Sci. 6, 1179–1189.
Article CAS PubMed Google Scholar
Bryson, K., McGuffin, L. J., Marsden, R. L., Ward, J. J., Sodhi, J. S., and Jones, D. T. (2005) Protein structure prediction servers at University College London. Nucleic Acids Res. 33, W36–38.
Article CAS PubMed Google Scholar
Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E. L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580.
Article CAS PubMed Google Scholar
Bigelow, H., and Rost, B. (2006) PROFtmb: a web server for predicting bacterial transmembrane beta barrel proteins. Nucleic Acids Res. 34, W186–188.
Article CAS PubMed Google Scholar
Bendtsen, J. D., Nielsen, H., von Heijne, G., and Brunak, S. (2004) Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795.
Article PubMed Google Scholar
Wootton, J. C., and Federhen, S. (1996) Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266, 554–571.
Article CAS PubMed Google Scholar
Promponas, V. J., Enright, A. J., Tsoka, S., Kreil, D. P., Leroy, C., Hamodrakas, S., Sander, C., and Ouzounis, C. A. (2000) CAST: an iterative algorithm for the complexity analysis of sequence tracts. Complexity analysis of sequence tracts. Bioinformatics 16, 915–922.
Article CAS PubMed Google Scholar
Linding, R., Jensen, L. J., Diella, F., Bork, P., Gibson, T. J., and Russell, R. B. (2003) Protein disorder prediction: implications for structural proteomics. Structure 11, 1453–1459.
Article CAS PubMed Google Scholar
Pantazatos, D., Kim, J. S., Klock, H. E., Stevens, R. C., Wilson, I. A., Lesely, S. A., and Woods, V. L. (2004) On the use of DXMS to produce more crystallizable proteins: structures of the T. maritima proteins TM0160 and TM1171. Proc. Natl. Acad. Sci. USA 101, 751–756.
Article CAS PubMed Google Scholar
Sarachu, M., and Colet, M. (2005) wEMBOSS: a web interface for EMBOSS. Bioinformatics 21, 540–541.
Article CAS PubMed Google Scholar
Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, R. D., and Bairoch A. (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 31, 3784–3788.
Article CAS PubMed Google Scholar
Rost, B., Yachdav, G., and Liu, J. (2003) The PredictProtein Server. Nucleic Acids Res. 32, W321–W326.
Article Google Scholar
Canaves, J. M., Page, R., Wilson, I. A., and Stevens, R. C. (2004) Protein biophysical properties that correlate with crystallization success in Thermotoga maritima: maximum clustering strategy for structural genomics. J. Mol. Biol. 344, 977–991.
Article CAS PubMed Google Scholar
Zdobnov, E. M., and Apweiler, R. (2001) InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847–848.
Article CAS PubMed Google Scholar
Chen, L., Oughtred, R., Berman, H. M., and Westbrook, J. (2004) TargetDB: a target registration database for structural genomics projects. Bioinformatics 20, 2860–2862.
Article CAS PubMed Google Scholar
Task Force on Target Tracking (2001) http://www.nigms.nih.gov/news/reports/airlie_tasks.html
Chandonia, J. M., and Brenner, S. E. (2006) The impact of structural genomics: expectations and outcomes. Science 311, 347–351.
Article CAS PubMed Google Scholar
Pellegrini, M., Haynor, D., and Johnson, J. M. (2004) Protein interaction networks. Expert Rev. Proteomics 1, 239–249.
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Biochemistry and Molecular Biology Department, University College London, London, UK
Russell L. Marsden & Christine A. Orengo

Authors

Russell L. Marsden
View author publications
You can also search for this author in PubMed Google Scholar
Christine A. Orengo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Molecular and Microbial Sciences, The University of Queensland, Brisbane, Queensland, Australia
Bostjan Kobe
School of Molecular and Microbial Biosciences, University of Sydney, Sydney, Australia
Mitchell Guss
School of Molecular and Microbial Sciences and Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Australia
Thomas Huber

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Marsden, R.L., Orengo, C.A. (2008). Target Selection for Structural Genomics: An Overview. In: Kobe, B., Guss, M., Huber, T. (eds) Structural Proteomics. Methods in Molecular Biology™, vol 426. Humana Press. https://doi.org/10.1007/978-1-60327-058-8_1

Download citation

DOI: https://doi.org/10.1007/978-1-60327-058-8_1
Publisher Name: Humana Press
Print ISBN: 978-1-58829-809-6
Online ISBN: 978-1-60327-058-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Target Selection for Structural Genomics: An Overview

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Case Studies: Function Predictions of Structural Genomics Results

The impact of structural genomics: the first quindecennial

Observation selection bias in contact prediction and its implications for structural bioinformatics

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Target Selection for Structural Genomics: An Overview

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Case Studies: Function Predictions of Structural Genomics Results

The impact of structural genomics: the first quindecennial

Observation selection bias in contact prediction and its implications for structural bioinformatics

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation