Abstract
The sequencing of theMycobacterium tuberculosis (MTB) H37Rv genome has facilitated deeper insights into the biology of MTB, yet the functions of many MTB proteins are unknown. We have used sensitive profile-based search procedures to assign functional and structural domains to infer functions of gene products encoded in MTB. These domain assignments have been made using a compendium of sequence and structural domain families. Functions are predicted for 78% of the encoded gene products. For 69% of these, functions can be inferred by domain assignments. The functions for the rest are deduced from their homology to proteins of known function. Superfamily relationships between families of unknown and known structures have increased structural information by ∼ 11%. Remote similarity detection methods have enabled domain assignments for 1325 ‘hypothetical proteins’. The most populated families in MTB are involved in lipid metabolism, entry and survival of the bacillus in host. Interestingly, for 353 proteins, which we refer to as MTB-specific, no homologues have been identified. Numerous, previously unannotated, hypothetical proteins have been assigned domains and some of these could perhaps be the possible chemotherapeutic targets. MTB-specific proteins might include factors responsible for virulence. Importantly, these assignments could be valuable for experimental endeavors. The detailed results are publicly available at http://hodgkin.mbu.iisc.ernet.in/∼dots.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Abbreviations
- cAMP:
-
Adenosine 3,5′-cyclic monophosphate
- cGMP:
-
guanosine 3’,5’-cyclic monophosphate
- CMAS:
-
cyclopropane mycolic acid
- cNMP:
-
cyclic nucleotide monophosphate
- DDG:
-
2-dehydro-3-deoxy-galactarate
- HMM:
-
hidden Markov model
- LCRs:
-
low-complexity regions
- mce:
-
mycobacterial cell entry
- MS:
-
mechanosensitive
- NRDB:
-
non-redundant database
- PGAM:
-
phosphoglycerate mutase
- PSSM:
-
position-specific scoring matrices
- SDR:
-
short chain dehydrogenase/reductases
- Usp:
-
Universal stress protein
References
Altschul S F, Gish W, Miller W, Myers E W and Lipman D J 1990 Basic local alignment search tool;J. Mol. Biol. 215 403–410
Altschul S F, Madden T L, SchÄffer A A, Zhang J, Zhang Z, Miller W and Lipman D J 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search program;Nucleic Acids Res. 25 3389–3402
Ames G F 1993 Bacterial periplasmic permeases as model systems for the superfamily of traffic ATPases, including the multidrug resistance protein and the cystic fibrosis transmembrane conductance regulator;Int. Rev. Cytol. 137 1–35
Aravind L and Ponting C P 1999 The cytoplasmic helical linker domain of receptor histidine kinase and methyl-accepting proteins is common to many prokaryotic signalling proteins;FEMS Microbiol. Lett. 176 111–116
Arruda S, Bomfim G, Knights R, Huima-Byron T and Riley L W 1993 Cloning of anM. tuberculosis DNA fragment associated with entry and survival inside cells;Science 261 1454–1457
Balaji S, Sujatha S, Kumar S S C and Srinivasan N 2001 PALI-a database of Phylogeny and ALIgnment of homologous protein structures;Nucleic Acids Res. 29 61–65
Bork P and Gibson T J 1996 Applying motif and profile searches;Methods Enzymol. 266 162–184
Buchan D W, Shepherd A J, Lee D, Pearl F M, Rison S C, Thornton J M and Orengo C A 2002 Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database;Genome Res. 12 503–514
Camus J, Pryor M J, Médigue C and Cole S T 2002 Re-annotation of the genome sequence ofMycobacterium tuberculosis H37Rv;Microbiology 148 2967–2973
Chambers H F, Moreau D, Yajko D, Miick C, Wagner C, Hackbarth C, Kocagoz S, Rosenberg E, Hadley W K and Nikaido H 1995 Can penicillins and other beta-lactam antibiotics be used to treat tuberculosis?;Antimicrob. Agents Chemother. 39 2620–2624
Chang G, Spencer R H, Lee A T, Barclay M T and Rees D C 1998 Structure of the MscL homolog fromMycobacterium tuberculosis: a gated mechanosensitive ion channel;Science 282 2220–2226
Chothia C and Gerstein M 1997 Protein evolution. How far can sequences diverge?;Nature (London) 385 579–581
Chothia C and Lesk A M 1986 The relation between the divergence of sequence and structure in proteins;EMBO J. 5 823–826
Cole S T, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon S V, Eiglmeier K, Gas S, Barry C E 3rd,et al 1998 Deciphering the biologyof Mycobacterium tuberculosis from the complete genome sequence;Nature (London) 393 537–544
Cole S T 1999 Learning from the genome sequence ofMycobacterium tuberculosis H37Rv;FEBS Lett. 452 7–10
Devos D and Valencia A 2001 Intrinsic errors in genome annotation;Trends Genet. 17 429–431
Doran T J, Hodgson A L, Davies J K and Radford A J 1992 Characterisation of a novel repetitive DNA sequence fromMycobacterium bovis;FEMS Microbiol. Lett. 75 179–185
Eddy S R 1998 Profile hidden Markov models;Bioinformatics 14 755–763
Evans S V 1993 SETOR: hardware-lighted three-dimensional solid model representations of macro molecules;J. Mol. Graph. 11 134–138
Fetrow J S, Siew N, Di Gennaro J A, Martinez-Yamout M, Dyson J H and Skolnick J 2001 Genomic-scale comparison of sequence-and structure-based methods of function prediction: Does structure provide additional insight?;Protein Sci. 10 1005–1014
Finn J T, Grunwald M E and Yau K W 1996 Cyclic nucleotidegated ion channels an extended family with diverse functions;Annu. Rev. Physiol. 58 395–426
Fischer D and Eisenberg D 1999 Predicting structures for genome proteins;Curr. Opin. Struct. Biol. 9 208–211
Fisher M A, Plikaytis B B and Shinnick T M 2002 Microarray analysis of theMycobacterium tuberculosis transcriptional response to the acidic conditions found in phagosomes;J. Bacteriol. 184 4025–4032
Flesselles B, Anand N N, Remani J, Loosemore S M and Klein M H 1999 Disruption of the mycobacterial cell entry gene ofMycobacterium bovis BCG results in a mutant that exhibits a reduced invasiveness for epithelial cells;FEMS Microbiol. Lett. 177 237–242
Gamieldien J, Ptitsyn A and Hide W 2002 Eukaryotic genes inMycobacterium tuberculosis could have a role in pathogenesis and immunomodulation;Trends Genet. 18 5–8
Gardner P R, Gardner A M, Martin L A and Salzman A L 1998 Nitric oxide dioxygenase: An enzymic function for flavohemoglobin;Proc. Natl. Acad. Sci. USA 95 10378–10383
George K M, Yuan Y, Sherman D R and Barry C E 1995 The Biosynthesis of Cyclopropanated Mycolic Acids inMycobacterium tuberculosis;J. Biol. Chem. 270 27292–27298
Gerstein M 1998 How representative are the known structures of the proteins in a complete genome? A comprehensive structural census;Fold. Des. 3 497–512
Gribskov M, McLachlan A D and Eisenberg D 1987 Profile analysis: detection of distantly related proteins;Proc. Natl. Acad. Sci. USA 84 4355–4358
Hardison R C 1996 A brief history of hemoglobins: Plant, animal, protist, and bacteria;Proc. Natl. Acad. Sci. USA 93 5675–5679
Hegyi H and Gerstein M 1999 The relationship between protein structure and function a comprehensive survey with application to the yeast genome;J. Mol. Biol. 288 147–164
Higgins C F 1992 ABC transporters: From microorganisms to man;Annu. Rev. Cell Biol. 8 67–113.
Hoersch S, Leroy C, Brown N P, Andrade M A and Sander C 2000 The GeneQuiz web server protein functional analysis through the Web;Trends Biochem. Sci. 25 33–35
Hubbard B K, Koch M, Palmer D R, Babbitt P C and Gerlt J A 1998 Evolution of enzymatic activities in the enolase superfamily characterization of the (D)-glucarate/galactarate catabolic pathway inEscherichia coli;Biochemistry 37 14369–14375
Huynen M, Doerks T, Eisenhaber F, Orengo C, Sunyaev S, Yuan Y and Bork P 1998 Homology-based fold predictions forMycoplasma genitalium proteins;J. Mol. Biol. 280 323–326
Izard T and Blackwell N C 2000 Crystal structures of the metal-dependent 2-dehydro-3-deoxy-galactarate aldolase suggest a novel reaction mechanism;EMBO J. 19 3849–3856
Johnson M S, Overington J P and Blundell T L 1993 Alignment and searching for common protein folds using a data bank of structural templates;J. Mol. Biol. 231 735–752
Kelley L A, MacCallum R M and Sternberg M J 2000 Enhanced genome annotation using structural profiles in the program 3D-PSSM;J. Mol. Biol. 299 499–520
Kisker C, Hinrichs W, Tovar K, Hillen W and Saenger W 1995 The Complex Formed Between Tet Repressor and Tetracycline-Mg2+ Reveals Mechanism of Antibiotic Resistance;J. Mol. Biol. 247 260–280
Lewis S, Ashburner M and Reese M G 2000 Annotating eukaryote genomes;Curr. Opin. Struct. Biol. 10 349–354
Li W W, Quinn G B, Alexandrov N N, Bourne P E and Shindyalov I N 2003 A comparative proteomics resource: proteins ofArabidopsis thaliana;Genome Biol. 4 R51 Epub
Liu J, Rosenberg E Y and Nikaido H 1995 Fluidity of the Lipid Domain of Cell Wall FromMycobacterium chelonae;Proc. Natl. Acad. Sci. USA 92 11254–11258
Letunic I, Copley R R, Schmidt S, Ciccarelli F D, Doerks T, Schultz J, Ponting C P and Bork P 2004 SMART 40: towards genomic data integration;Nucleic Acids Res. 32 D142–144
Makarova K S, Aravind L, Galperin M Y, Grishin N V, Tatusov R L, Wolf Y I and Koonin E V 1999 Comparative Genomics of the Archaea (Euryarchaeota) Evolution of Conserved Protein Families, the Stable Core, and the Variable Shell;Genome Res. 9 608–628
Martinac B and Kloda A 2003 Evolutionary origins of mechanosensitive ion channels;Prog. Biophys. Mol. Biol. 82 11–24
McCue L A, McDonough K A and Lawrence C E 2000 Functional classification of cNMP-binding proteins and nucleotide cyclases with implications for novel regulatory pathways inMycobacterium tuberculosis;Genome Res. 10 204–219
Meyer F, Goesmann A, McHardy A C, Bartels D, Bekel T, Clausen J, Kalinowski J, Linke B, Rupp O, Giegerich R,et al 2003 GenDB-an open source genome annotation system for prokaryote genomes;Nucleic Acids Res. 31 2187–2195
Mishra R K and Kasik J E 1970 The mechanisms of mycobacterial resistance to penicillins and cephalosporins;Int. J.Clin. Pharmacol. 3 73–77
Müller A, MacCallum R M and Sternberg M J E 1999 Bench-marking PSI-BLAST in Genome Annotation;J. Mol. Biol. 293 1257–1271
Murzin A G and Bateman A 1997 Distant homology recognition using structural classification of proteins;Proteins (Suppl. 1) 105–112
Murzin A G and Brenner S E, Hubbard T and Chothia C 1995 SCOP: a structural classification of proteins database for the investigation of sequences and structures;J. Mol. Biol. 247 536–540
Oppermann U, Filling C, Hult M, Shafqat N, Wu X, Lindh M, Shafqat J, Nordling E, Kallberg Y, Personn B,et al 2003 Short-chain dehydrogenases/reductases (SDR): the 2002 update;Chem. Biol. Interact. 143–144, 247–253
Orengo C A, Todd A E and Thornton J M 1999 From protein structure to function;Curr. Opin. Struct. Biol. 9 374–382
Pandit S B, Gosar D, Abhiman S, Sujatha S, Dixit S S, Mhatre N S, Sowdhamini R and Srinivasan N 2002 SUPFAM-a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes;Nucleic Acids Res. 30 289–293
Pawlowski K, Zhang B, Rychlewski L and Godzik A 1999 TheHelicobacter pylori genome from sequence analysis to structural and functional predictions;Proteins 36 20–30
Pearl F M, Lee D, Bray J E, Buchan D W, Shepherd A J and Orengo C A 2002 The CATH extended protein-family database providing structural annotations for genome sequences;Protein Sci. 11 233–244
Pearson W R and Lipman D J 1988 Improved tools for biological sequence comparison;Proc. Natl. Acad. Sci. USA 85 2444–2448
Rost B, Liu J, Nair R, Wrzeszczynski K O and Ofran Y 2003 Automatic prediction of protein function;Cell. Mol. Life Sci. 60 2637–2650
Rychlewski L, Zhang B and Godzik A 1998 Fold and function predictions forMycoplasma genitalium proteins;Fold Des. 3 229–238
Schaffer A A, Wolf Y I, Ponting C P, Koonin E V, Aravind L and Altschul S F 1999 IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices;Bioinformatics 12 1000–1011
Schroeder B G, Peterson L M and Fleischmann R D 2002 Improved quantitation and reproducibility inMycobacterium tuberculosis DNA microarrays;J. Mol. MicroBiol. Biotechnol. 4 123–126
Snider D E Jr, Raviglione M and Kochi A 1994 Global Burden of Tuberculosis; inTuberculosis: Pathogenesis, protection, and control (ed.) B R Bloom (Washington DC: Am. Soc. Microbiol.)pp3–11
Sonnhammer ELL, Eddy S R and Durbin R 1997 Pfam: A Comprehensive Database of Protein Families Based on Seed Alignments;Proteins 28 405–420
Sonnhammer ELL, Von Heijne G and Krogh A 1998 A hidden Markov model for predicting transmembrane helices in protein sequences; inProceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, Menlo Park, California (eds) J Glasgow, T Littlejohn, F Major, R Lathrop, D Sankoff and C Sensen, pp 175–182
Sousa M C and McKay D B 2001 Structure of the universal stress protein ofHaemophilus influenzae;Structure (Camb) 9 1135–1141
Strong M, Mallick P, Pellegrini M, Thompson M J and Eisenberg D 2003 Inference of protein function and protein linkages inMycobacterium tuberculosis based on prokaryotic genome organization a combined computational approach;Genome Biol. 4 R59 Epub
Tatusov R L, Galperin M Y, Natale D A and Koonin E V 2000 The COG database: a tool for genome-scale analysis of protein functions and evolution;Nucleic Acids Res. 28 33–36
Thornton J M 2001 From genome to function;Science 292 2095–2097
Voladri R K R, Lakey D L, Hennigan S H, Menzies B E, Edwards K M and Kernodle D S 1998 Recombinant Expression and Characterization of the Major P-Lactamase ofMycobacterium tuberculosis;Antimicrob. Agents Chemother. 42 1375–1381
Wagner J, Lerner R A and Barbas C F 3rd 1995 Efficient aldolase catalytic antibodies that use the enamine mechanism of natural enzymes;Science 270 1797–1800
Wootton J C and Federhen S 1993 Statistics of local complexity in amino acid sequences and sequence databases;Comput. Chem. 17 149–163
Zhu H and Riggs A F 1992 Yeast Flavohemoglobin is an Ancient Protein Related to Globins and a Reductase Family;Proc. Natl. Acad. Sci. USA 89 5015–5019
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Namboori, S., Mhatre, N., Sujatha, S. et al. Enhanced functional and structural domain assignments using remote similarity detection procedures for proteins encoded in the genome ofMycobacterium tuberculosis H37Rv. J Biosci 29, 245–259 (2004). https://doi.org/10.1007/BF02702607
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02702607