Introduction

Basic helix-loop-helix (bHLH) transcription factors have long been recognized as important regulators in various developmental processes including neurogenesis, myogenesis, hematopoiesis, sex determination, and gut development. bHLH transcription factors have a common bHLH structural motif containing a basic region and two helices separated by a loop (HLH) region of variable length (Massari and Murre 2000). The basic region acts as a DNA-binding domain, while the HLH region interacts with other bHLH sequences to form homodimers or heterodimers.

The bHLH motif has approximately 60 amino acids, among which 19 were found to be highly conserved in organisms ranging from yeast to mammals. Based on statistics of amino acid frequencies within the bHLH motif, a prediction motif for bHLH proteins was established (Atchley et al. 1999). Through examination of amino acids at the 19 highly conserved sites, more than 1000 bHLH sequences have been identified in organisms whose genome sequences are available. Among these organisms, rice and Arabidopsis were found to have 167 and 147 bHLH members, respectively (Li et al. 2006; Toledo-Ortiz et al. 2003). Apart from them, the human genome was found to encode 118 bHLH proteins (Simionato et al. 2007). Mouse had previously been reported to have 102 bHLH members (Ledent et al. 2002). However, our recent searches against the latest version of the mouse genome sequence assembly have revised that figure to 114 (data to be published elsewhere). In addition, the Florida lancelet (Branchiostoma florida), cnidarian (Nematodtella vectensis), mollusk (Lottia gigantea), fruit fly (Drosophila melanogaster), and nematode (Caenorhabditis elegans) were found to possess 78, 68, 63, 59, and 33 bHLH members, respectively (Simionato et al. 2007).

Animal bHLH proteins have been classified into 45 orthologous families and six higher-order groups based on their phylogenetic relationships and different properties (Atchley et al. 1999; Ledent et al. 2002; Simionato et al. 2007). The 45 families were named according to the names (or common abbreviations) used when they were first reported or the names of the best-known members of the family. The six higher-order groups were named A, B, C, D, E, and F, each of which has different DNA-binding and functional properties. Briefly, groups A and B bHLH proteins bind to core DNA sequences called E boxes (CANNTG). Specifically, group A proteins bind to CACCTG or CAGCTG and group B proteins bind to CACGTG or CATGTTG. Group C proteins possess a PAS (Drosophila Period, human Arnt, and Drosophila Single-minded) domain in addition to the bHLH motif. Their target core sequence is ACGTG or GCGTG. Group D proteins do not have the basic domain. They interact with group A proteins to form inactive heterodimers. Group E proteins bind preferentially to core sequences called N boxes (CACGCG or CACGAG). They also have two additional domains, named ‘Orange’ and ‘WRPW,’ in their carboxyl termini. Group F proteins contain the COE domain, which has an additional domain functioning in both dimerization and DNA binding (Ledent and Vervoort 2001).

Zebrafish (Danio rerio) is a good model organism for studies on vertebrate development. Its developmental processes are similar to the embyrogenesis of higher vertebrates, including human. However, among the vast expanse of nonmammalian vertebrate species, which include fish, amphibian, reptile, and bird, only the Florida lancelet has been surveyed regarding its bHLH members (Simionato et al. 2007). Identification of bHLH members encoded in the zebrafish genome will greatly facilitate studies on vertebrate developmental biology and a variety of human congenital and genetic diseases. Although a great number of bHLH protein sequences have been deposited in NCBI (www.ncbi.nlm.nih.gov) databases of zebrafish (Adolf et al. 2004; Chong et al. 2005; Germanguz et al. 2007; Hinits et al. 2007), questions such as how many bHLH members are encoded by its genome and to which bHLH families they belong remain unanswered. Here we report the identification of zebrafish bHLH members and their phylogenetic relationships with human homologues. Moreover, their features such as distribution patterns on chromosomes and molecular phylogenesis in the evolutionary history are discussed.

Materials and Methods

tblastn Searches

Amino acid sequences of 45 representative bHLH motifs were prepared from the additional files of previous reports (Ledent and Vervoort 2001; Simionato et al. 2007). Each sequence was used to perform tblastn searches against genomic sequences of zebrafish (http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=7955). The expect value (E) was set as 10 in order to retrieve all bHLH-related sequences. The subject sequences obtained were manually examined to eliminate redundant ones, to add the missing amino acids on two ends of the bHLH motif, and to find introns within the bHLH motifs. Intron analysis was done using the NetGene2 application online (http://www.cbs.dtu.dk/services/NetGene2/).

Sequence Alignment

All sequences that had undergone the above examination were aligned using ClustalW online (http://www.ebi.ac.uk/clustalw/) with default settings. The aligned sequences were examined manually for their amino acid residues at the 19 conserved sites. Sequences with fewer than nine variations within the 19 sites were regarded as zebrafish bHLH members and subjected to further analyses.

Phylogenetic Analyses

Phylogenetic analyses were conducted using PAUP 4.0 Beta 10 (Swofford 1998) based on a step matrix constructed from the Dayhoff PAM250 distance matrix by R. K. Kuzoff (http://paup.csit.fsu.edu/nfiles.html). Each amino acid sequence of obtained zebrafish bHLH motifs was used to construct neighbor-joining (NJ), maximum parsimony (MP), and maximum likelihood (ML) trees with those of human bHLH motifs, respectively. NJ trees were bootstrapped with 1000 replicates to provide information about their statistical reliability. MP analysis was performed using heuristic searches and bootstrapped with 100 replicates. ML trees were constructed using TreePuzzle 5.2 (Schmidt et al. 2002). The number of puzzling steps was set to 25,000. Model of substitution was set to Jones-Taylor-Thornton (JTT; Jones et al. 1992). Other parameters were set to default values.

Identification of Protein Sequences, Genomic Contigs, Expressed Sequence Tags, and Chromosomal Locations

Protein sequence accession numbers were obtained by using the amino acid sequence of each identified zebrafish bHLH motif to conduct blastp searches against all zebrafish protein databases (including ‘RefSeq protein,’ ‘Non-RefSeq protein,’ ‘Build protein,’ and ‘Ab initio protein’). Genomic contig numbers and number of ESTs (expressed sequence tags) were obtained using the amino acid sequences of each identified zebrafish bHLH motif to conduct tblastn search against zebrafish genome and EST sequences. All above searches used 0.01 as their E value and were without a filter. From all searches, only “hits” having 100% identity to the query sequence were accepted. This is because most bHLH family members are closely related. A 98% identity could very possibly refer to another bHLH member. The chromosomal location of each identified zebrafish bHLH sequence was obtained using the above-found protein sequence’s accession number to search in the genome map view of zebrafish (http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=7955).

Results and Discussion

Zebrafish bHLH Members

The tblastn searches, sequence alignment, and examination of the 19 conserved amino acid sites revealed that 139 bHLH genes were encoded in the zebrafish genome, a number higher than that in human. The names of all 139 of these zebrafish bHLH members are listed in Table 1. Each zebrafish bHLH gene was named according to its phylogenetic relationship (explained below) with the corresponding human homologue. In the case where one human bHLH sequence has two or more zebrafish homologues, we used “a,” “b,” and “c” or “1,” “2,” and “3,” etc., to number them. For instances, two homologues of the human Tcf3 gene and the PTFb gene were found in zebrafish, respectively. Thus, the zebrafish genes were named Tcf3a and Tcf3b, and PTFb1 and PTFb2, respectively. It was found that zebrafish has 58, 29, 21, 5, 19, and 5 bHLH members in groups A, B, C, D, E, and F, respectively. An additional two members could not be assigned to any known families and were thus regarded as “orphan.” The existence of an EST sequence is a good indicator of an endogenous gene. Our EST searches revealed that 108 of the 139 zebrafish bHLH genes have corresponding EST sequences (data not shown), indicating a fairly high rate (77.7%) of genuine genes currently identified. ESTs of the other 31 bHLH members were not obtained, probably due to extremely low expression level in the tissues assayed. The amino acid sequences of the 139 zebrafish bHLH motifs together with their protein accession numbers are available in Supplementary File 1.

Table 1 A complete list of bHLH genes from zebrafish (Danio rerio)

It was found that zebrafish and human each possess unique bHLH genes. For instance, zebrafish homologues were not found for human Hash2, Hath4a, eHand, NSCL2, L-Myc, MondoA, NPAS1, Id1, Id3, Hes4, Orphan2, Orphan3, and Orphan4 genes. On the contrary, zebrafish either has extra members in certain bHLH families or has multiple homologues corresponding to one specific human bHLH sequence. The former includes Beta3c, Oligo4, V-Myc1, V-Myc2, V-Myc3, Hif2a, and Hes8. Human has only two Beta3 genes (Beta3a and Beta3b) and three Oligo genes (Oligo1, Oligo2 and Oligo3) and does not have V-Myc, Hif2, or Hes8 genes. The latter include Zath1a, Zath1b, and Zath1c (homologues of human Hath1), Sclerax1 and Sclerax2 (homologues of human Sclerax), PTFb1 and PTFb2 (homologues of human PTFb), C-Myc1 and C-Myc2 (homologues of human C-Myc), TFE3a and TFE3b (homologues of human TFE3), EPAS1a and EPAS1b (homologues of human EPAS1), Hes1a and Hes1b, Hes2a and Hes2b, Hes3a and Hes3b, Hes5a, Hes5b, Hes5c, Hes5d and Hes5e (homologues of human Hes1, Hes2, Hes3, and Hes5 respectively), Orphan1a and Orphan1b (homologues of human Orphan1), pMeso1b and pMeso1c (homologues of human pMeso1), and Id2b and Id2c (homologues of human Id2) (Table 1).

Homologue Identification of Zebrafish bHLH Genes

Classification of human bHLH family members has been extensively studied (Ledent et al. 2002; Simionato et al. 2007). Thus, human bHLH members can be used as a good reference for homologue identification of bHLH members in other organisms. Although orthologue identification has been accompanied by much uncertainty since there is no absolute criterion that can be used to decide whether two genes are orthologous (Ledent and Vervoort 2001), by constructing phylogenetic trees using various methods and setting an adequate standard for bootstrap values, phylogenetic analysis has remained an effective measure for homologue identification (Simionato et al. 2007). Furthermore, in our previous studies (Wang et al. 2007, 2008), in-group phylogenetic analysis was adopted to identify homologues for the unknown sequences that would form a monophyletic clade among themselves. An in-group phylogenetic analysis uses a single unknown bHLH sequence to construct different phylogenetic trees with other known bHLH members of the same group. If the unknown sequence forms a monophyletic clade with another known member and the bootstrap value is >50 in various phylogenetic trees, the known member will be regarded as a homologue of the unknown sequence.

In this study, each identified zebrafish bHLH sequence was used to conduct in-group phylogenetic analyses with human bHLH members. The bootstrap values obtained that support the formation of a monophyletic clade with its human homologue are listed in Table 1. Table 1 indicates that the bootstrap support for identifying zebrafish bHLH sequences as homologues of specific human bHLH members varied greatly. First, among all the 139 zebrafish bHLH members, 80 have all NJ, MP, and ML bootstrap values >50 (ranging from 53 to 100), enabling us to confidently assign corresponding human homologues for them (Table 1; very pale beige background). Second, 12 zebrafish bHLH members have two of the three bootstrap values >50 and one n/m* (see explanation in Table 1, Note) in the constructed phylogenetic trees (Table 1; pale aqua background), while 14 other members have only one of the three bootstrap values >50 and have two n/m* (pale mauve background). Although these 26 zebrafish bHLH members did not have sufficient bootstrap support, we assigned the corresponding homologues for them by considering that fish and human are relatively distant species and the above-set criterion can be relaxed in certain cases as Smionarto et al. did when they made phylogenetic analyses of Mesp, Myc, and H/E(spl) family members (Ledent and Vervoort 2001). Third, there are 16 zebrafish bHLH members that have one or two bootstrap values <50 and/or have one n/m* or, in a few cases, have bootstrap values as low as 10 and 16 (Table 1; aqua background). Yet we defined homologues for them because most of the values had supported the formation of a monophyletic clade with the same human counterpart. However, these assignments can be regarded as arbitrary and are subject to modification upon acquisition of new data. Finally, there was no bootstrap support information available for identifying human homologues for the other 17 zebrafish bHLH members, because none of them formed any monophyletic clade with known human bHLH members (Table 1; light-purple background). This is very possibly because human and fish diverged from their common ancestor very early in their evolutionary history. Therefore, these zebrafish bHLH members have quite low sequence similarity to human bHLH sequences and thus could not form a monophyletic clade in our phylogenetic analyses. It is anticipated that an increased number of identified bHLH sequences in higher animals such as amphibian, reptile, and bird will facilitate final homologue identification of these sequences, because some “missing pieces” can probably be found in those organisms and can thus establish clear phylogenetic relationships of fish bHLH sequences with human ones. Among these 17 bHLH members, Beta3c, Oligo4, V-Myc1, V-Myc2, V-Myc3, Hif2a, and Hes8 are probably zebrafish specific sequences, because they are extra members of certain bHLH families that have not been found in other animals examined so far. For instance, only two Beta3 genes have been found in other animals and were named Beta3a and Beta3b. In zebrafish, a third Beta3 member was found apart from the identification of Beta3a and Beta3b homologues. Therefore, the extra member was named Beta3c. The rest of these 17 bHLH members were temporarily named according to relevant human bHLH names. They were not extra bHLH members found in zebrafish. Their homology with bHLH members in other species awaits further analysis when new data are available.

Protein Sequences and Genomic Contigs of Zebrafish bHLH Genes

Protein sequence accession numbers and their genomic contig numbers for all 139 zebrafish bHLH motifs are also listed in Table 1. It should be noted that when the amino acid sequence of an individual zebrafish bHLH motif was used to conduct blastp searches against zebrafish various protein databases (‘RefSeq protein,’ ‘Non-RefSeq protein,’ ‘Build protein,’ and ‘Ab initio protein’), generally a considerable number of “hits” with 100% identity in the bHLH motif could be obtained. These protein sequences often varied slightly in length. Yet most of them were not different protein sequences encoded in the zebrafish genome, because most tblastn searches using the amino acid sequence of each zebrafish bHLH motif against the zebrafish genome yielded only one coding region in the genome. One of the exceptions is seen in Bmal1, pMeso1b, MITF, and Mesp2, which codes for two different proteins and has separate genomic locations, respectively (Table 1; those with two protein accession numbers for one bHLH gene). The other exception is seen in Hes5b, Hes5d, Hes5e, and Sim1; it codes for only one protein sequence but has two or three separate genomic locations, respectively (Table 1; those with two or three genomic contig numbers for one bHLH gene). Considering this, it seems not very economical to use the zebrafish genome to code bHLH proteins because, of the 139 bHLH genes, these 8 genes were found to be multiple-copy genes. This figure is much higher than that in insects and mammals. In the silkworm (Bombyx mori) and honeybee (Apis mellifera), one and three bHLH genes were found as two-copy genes, respectively (Wang et al. 2007, 2008). In mouse, all 114 bHLH genes are single-copied, and in rat and human only one bHLH gene was found to be two-copied, respectively (data to be published elsewhere).

Chromosomal Locations of Zebrafish bHLH Genes

Chromosomal locations of all zebrafish bHLH genes are shown in Fig. 1. It can be seen that zebrafish bHLH genes are distributed in a rather uneven pattern. Chromosome 23 has 12 bHLH protein-coding regions, while each of chromosomes 11, 20, 21, and 22 has 8 and each of chromosomes 4, 7, 8, 13, 14, 15, and 19 has 6 or 7 coding regions, respectively. All other chromosomes were found to have two to five bHLH gene coding regions. In addition, chromosomal locations for eight zebrafish bHLH genes were not found, probably because the genomic sequences containing these genes have not been assembled into chromosomes (Fig. 1; ‘not placed’).

Fig. 1
figure 1

Chromosomal locations of zebrafish bHLH genes. Gene names shaded in blue boxes are bHLH genes that belong to the same family and are clustered on the chromosome. A bracket ([) preceding a gene name means that the gene has multiple coding regions on the chromosome, except for Bmal1 and MITF, which are not located on the same chromosome and are thus labeled Bmal1(1) and Bmal1(2), which are on chromosomes 7 and 25, and MITF(1) and MITF(2), which are on chromosomes 6 and 23, respectively. Gene names in red, light-purple, and green letters indicate closely related genes on separate chromosomes. Family information on each bHLH gene is listed in Table 1. MT mitochondrion (Color figure online)

It should be noted that two, three, or four zebrafish bHLH genes which belong to the same family are found to cluster on the chromosome. A total of 21 zebrafish bHLH genes fall into this category (Fig. 1; in blue boxes). For instances, Myf5 and Myf6 cluster on chromosome 4; Hes2a, Hes2b, and Hes3 cluster on chromosome 8; and Hes1a, Hes5c, Hes5d (two copies), and Hes5e (three copies) cluster on chromosome 23. Figure 1 also shows the existence of the above-mentioned multiple coding regions for eight zebrafish bHLH genes, i.e., Bmal1, pMeso1b, MITF, Mesp2, Hes5b, Hes5d, Hes5e, and Sim1, all of which are marked with a square bracket before them except Bmal1 and MITF, which have separate coding regions on different chromosomes and are thus labeled as Bmal(1) and Bmal1(2) and as MITF(1) and MITF(2).

Molecular Evolution of Zebrafish bHLH Genes

The above analyses revealed that the whole genome of zebrafish has coding regions for 139 bHLH genes, 8 of which have multiple copies. Given that human has only 118 bHLH genes (Simionato et al. 2007), how did this higher number of bHLH genes arise? It has been thought that two rounds of whole-genome duplication (WGD), i.e., the 2R hypothesis, have played an important role in the establishment of gene repertoires in vertebrates (Skrabanek and Wolfe 1998). In addition, a third round of fish-specific WGD (3R) was suggested according to observations of differences in Hox gene clusters and other duplicated genes between fish (tetraodon, fugu, medaka, and zebrafish) and birds/rodents (Amores et al. 1998, 2004; Naruse et al. 2004; Panopoulou and Poustka 2005; Woods et al. 2000). Evidence for 3R came from the comparative approach conducted on the pufferfish genome using the human genome as the unduplicated reference (Jaillon et al. 2004). In this approach, anchor genes which exist as a single copy in unduplicated genomes and as multiple copies on separate chromosomes in duplicated genomes were important indicators of potential duplication events (Kellis et al. 2004; Panopoulou and Poustka 2005). Among the 139 zebrafish bHLH genes identified in our study, Bmal1 and MITF were found to exist on separate chromosomes (Fig. 1). Close examination enabled us to conclude that both are anchor genes from a duplication event.

First, Bmal1 is the anchor gene between the zebrafish genome and the mouse/chicken genomes (Fig. 2). Figure 2 shows that zebrafish chromosomes 7 and 25 are the product of a duplication event, since both have Bmal1 and Mesp genes on them. Their counterparts in an unduplicated reference genome are chromosome 7 in mouse and chromosomes 5 and 10 in chicken (Fig. 2a, b). Mouse chromosome 7 and zebrafish chromosome 7 have three anchor genes, i.e., Bmal1, ARNT2, and Mesp1. And mouse chromosome 7 and zebrafish chromosome 25 also have three anchor genes, i.e., Bmal1, Myf3, and Mesp2. In addition, zebrafish chromosome 15 was found to have Hif3a, USF1, and NPAS3b genes for which homologous genes were found on mouse chromosome 7, suggesting that zebrafish chromosome 15 was once connected with chromosome 7 or 25 and was separated later in the evolutionary process (Fig. 2a). The chicken Bmal1 gene is located on chromosome 5. Looking at other bHLH genes on this chromosome, only the Myf3 gene was found to exist on zebrafish chromosome 25. Other anchor genes such as Mesp1 and ARNT2 were found on chromosome 10. Therefore, both chicken chromosome 5 and chicken chromosome 10 were considered as the unduplicated reference. Four anchor genes, i.e., Bmal1, Mesp1, ARNT2, and TF12, were found on zebrafish chromosome 7. And three anchor genes, i.e., Bmal1, Myf3, and Mesp2, were found on zebrafish chromosome 25, while two pairs of homologous genes, i.e., NPAS3/NPAS3b and Hif1a/Hif3a, were found on chicken chromosome 5 and zebrafish chromosome 15, respectively (Fig. 2b).

Fig. 2
figure 2

Bmal1 as anchor gene in explorations of a genome duplication event using mouse chromosomes (a) and chicken chromosomes (b) as the unduplicated reference. Zebrafish chromosomes are shown in yellow boxes. Those of mouse and chicken are in light-purple and light-aqua boxes, respectively. All chromosomes are drawn to scale. The very small gray box in the bottom-left corner represents 10 million base pairs. For clearness of labeling, genes on mouse and chicken chromosomes have been put on both sides of the chromosomes. Identical genes that appear on both zebrafish and mouse/chicken chromosomes are shown in boldface and are connected by dotted lines. Those that are not identical but are homologous genes are shown in regular typeface and are connected by dotted lines (Color figure online)

Second, MITF is also an anchor gene between the zebrafish genome and the mouse/chicken genomes (Fig. 3). Figure 3 shows that zebrafish chromosomes 6, 8, 11, and 23 are the product of an ancient duplication event. Their counterparts in unduplicated reference genomes are chromosome 6 in mouse and chromosomes 2 and 12 in chicken, because anchor genes such as MITF and Dec1 and a number of homologous genes exist among these chromosomes (Fig. 3a, b; gene names connected by dotted lines).

Fig. 3
figure 3

MITF as anchor gene in explorations of a genome duplication event using mouse chromosomes (a) and chicken chromosomes (b) as the unduplicated reference. See the legend to Fig. 2 for details (Color figure online)

The distribution pattern of all anchor genes shown in Figs. 2 and 3 can be regarded as clear evidence for 3R occurring after zebrafish diverged from its common ancestor with chicken and mouse. This result is consistent with other observations in fish (Jaillon et al. 2004).

Apart from the existence of the above anchor genes, the distribution pattern of other closely related zebrafish bHLH genes also suggests an origination through chromosomal duplication. For example, the C-Myc2 and Zath2b genes are on chromosome 2, and their closely related genes C-Myc1 and Zath2a genes are on chromosome 24 (Fig. 1; gene names in red). More examples are seen on chromosomes 12 and 14 and on chromosomes 15 and 21 (Fig. 1; gene names in light purple and green, respectively). These distribution patterns of bHLH genes should not be regarded as random, and can be considered as additional evidence for the WGD hypothesis.

Molecular Evolution of Chicken and Mouse bHLH Genes

As discussed above, the chicken and mouse genomes can be used as an unduplicated reference for determining the duplication event in zebrafish. However, this nonduplication is only relative to a fish-specific duplication event. As the 2R hypothesis suggests, chicken and mouse bHLH genes should also be the products of ancient duplication events. This has been largely evident because, among the 45 bHLH families, only 11 and 10 families have a single member in chicken and mouse, respectively, while 33 families have a single member in lancelet, an ancestor of vertebrates (Table 2). Therefore, a WGD event should have occurred during the evolutionary stage of lancelet into jawless fish or cartilaginous fish. To prove this, data from two aspects are desirable. One is the distribution pattern of bHLH genes on lancelet chromosomes, because a specific lancelet chromosome may have suitable anchor genes and becomes a good unduplicated reference for chicken/mouse chromosomes. Other evidence may come from the identification of bHLH genes in genomes of hagfish or shark, both of which are expected to have a relatively high number of bHLH genes.

Table 2 A comparison of the number of bHLH family members in lancelet, zebrafish, chicken, and mouse

If this earlier duplication event did happen, it would mean that the present zebrafish genome has undergone at least two rounds of WGD after the lancelet emerged (Panopoulou and Poustka 2005). Is this possible? Because the lancelet has 78 bHLH genes, two rounds of WGD would yield at least 200 bHLH genes, even after considering intensive gene loss after the WGD. The present zebrafish genome only encodes for 139 bHLH genes. Therefore, it seems unlikely that two rounds of WGD could have occurred in the zebrafish genome. However, the zebrafish genome has multiple coding regions of eight bHLH genes (Fig. 1). This is very rare among the vertebrates examined so far. The chicken and mouse genomes are found to have only one coding region for each bHLH gene, while those of rat and human have two coding regions for only one bHLH gene, respectively (data not shown). As transcriptional regulators, it is not reasonable for a bHLH gene to have multiple coding regions. Therefore, the multiple coding regions in the zebrafish genome could merely be the redundant copies that have not yet been lost, probably due to their relatively ‘recent’ origination. (A preliminary list of bHLH genes encoded in the chicken Gallus gallus genome was obtained as a reference for this study. The amino acid sequences of 104 chicken bHLH motifs together with their protein or EST accession numbers are provided in Supplementary File 2.)

Conclusion

In this study, 139 bHLH genes were found in zebrafish. Among them, 12 were newly identified to be encoded in the genome. All zebrafish bHLH members have been defined by their names and families according to various phylogenetic analyses with human bHLH homologues. Phylogenetic analysis has been an effective measure for homologue identification (Atchley and Fitch 1997; Ledent et al. 2002). It is much more reliable than that based on comparison of sequence similarity. Therefore, the names and family information in this report can be used to correct inadequate annotations made for previously identified zebrafish bHLH homologues, most of which were based on sequence similarity comparison. For instance, the zebrafish rotein numbered AAI00123.1 was denoted Tcf3 in GenBank, butt our phylogenetic analyses clearly indicate that it is the homologue of human E2A (Table 1).

Human and zebrafish have their own species-specific bHLH genes. We found that 13 human bHLH genes have no zebrafish homologues, and 24 zebrafish genes have no human homologues. Eight zebrafish bHLH genes were found to have multiple coding regions in the genome. Among them, Bmal1 and MITF are good anchor genes for identification of a fish-specific WGD event in comparison with the mouse and chicken genomes. The identification of zebrafish bHLH family members and investigation of their significance in gene evolutionary events provide useful information for studies on vertebrate development and for related studies in amphibians, reptiles, birds, and other fish species.