Keywords

In this chapter:

  • Common names in plants: Arabidopsis (Arabidopsis thaliana), soybean (Glycine max), common bean (Phaseolus vulgaris )

  • Chromosome-based locus (gene model) identifier (Phytozome)

11.1 Introduction

As sessile organisms, plants produce numerous secondary metabolites to overcome biotic and abiotic stressors, to attract pollinators and nitrogen-fixing microorganisms, and to communicate with other plants (Koes et al. 2005; Noel et al. 2005; Moura et al. 2010; Agati et al. 2012, 2013; Baxter and Stewart 2013). Many of these compounds are synthesized by the phenylpropanoid pathway, which is likely one of the most studied pathways in plants. It is relatively well understood and was extensively reviewed (Goujon et al. 2003; Raes et al. 2003; Wang 2011; Falcone Ferreyra et al. 2012; Petrussa et al. 2013). Individual branches of the pathway have been thoroughly characterized. Most of the enzymes that catalyze individual steps of the pathway have been identified, and the genes coding for them have been isolated in a number of plant species, including Arabidopsis and soybean (Graham et al. 2008; Fraser and Chapple 2011).

The core (general or central) pathway consists of three steps, including (1) the conversion of the aromatic amino acid phenylalanine into trans-cinnamic acid, which is catalyzed by phenylalanine ammonia-lyase (PAL); (2) the conversion of trans-cinnamic acid into p-coumaric acid, catalyzed by cinnamate 4-hydroxylase (C4H); and (3) the transformation of p-coumaric acid into p-coumaroyl-CoA, catalyzed by 4-coumarate:CoA ligase (4CL). The compound p-coumaroyl-CoA serves as a starting point for several branches of the phenylpropanoid pathway leading to biosynthesis of lignin, lignans, coumarins, stilbenes, flavonoids, anthocyanin, condensed tannins (proanthocyanidins), and isoflavonoids (Vogt 2010; Cheynier et al. 2013). These products have important functions not only for plant survival, growth, and development but they could also be powerful supplements to the human diet. For example, lignans, stilbenes, and isoflavonoids have been associated with the reduced onset/development of certain chronic disease in humans, including some forms of cancer and heart diseases (Cassidy et al. 2000; Chen et al. 2006; Adlercreutz 2007; Xiao 2008; Brunetti et al. 2013) (Fig. 11.1).

Fig. 11.1
figure 1

Cytochrome P450s involved in the phenylpropanoid pathway. The positions of ten enzymes and locus (gene model) identifiers (https://phytozome.jgi.doe.gov/pz/portal.html) in the pathway in common bean (blue), soybean (red), and Arabidopsis (black) are indicated; 1. Core phenylpropanoid pathway: cinnamate 4-hydroxylase (C4H, CYP73A); 2. Lignin/lignans branch: coumarate 3-hydroxylase (C3H, CYP98A) and ferulic acid 5-hydroxylase (F5H, CYP84A); 3. Anthocyanins/condensed tannins branch: flavonoid 3′-hydroxylase (F3′H, CYP75B), flavonoid 3′,5′-hydroxylase (F3′,5′H, CYP75A) and flavone synthase (FNS, CYP93B); 4. Isoflavonoid branch: isoflavone synthase (IFS, CYP93C), isoflavone 2′-hydroxylase (I2′H, CYP81E), flavonoid 6-hydroxylase (F6H, CYP71D), and 3,9-dihydroxypterocarpan 6a-monooxygenase (D6aH, CYP93A)

Lignin biosynthesis is a two-step process. First, monolignol is synthesized through a series of hydroxylations, O-methylations, and conversions of side-chain carboxyl into p-coumaryl, coniferyl, and sinapyl alcohols (Humphreys and Chapple 2002; Boerjan et al. 2003; Vanholme et al. 2010; Weng and Chaple 2010; Labeeuw et al. 2015). A second step involves monolignol polymerization by peroxidases (PER), laccases (LAC), and dirigent proteins (DP). In a reversible reaction, hydroxycinnamoyl-CoA:shikimate/quinate hydroxycinnamoyltransferase (HCT) converts p-coumaroyl-CoA and caffeoyl-CoA into their corresponding shikimate/quinate esters, which are then transformed by coumarate 3-hydroxylase (C3H) into their corresponding caffeoyl esters (Schoch et al. 2001). Caffeoyl-CoA O-methyltransferase (CCoAOMT) catalyzes methylation of caffeoyl-CoA to generate feruloyl-CoA. Cinnamoyl-CoA reductase (CCR) converts hydroxycinnamoyl-CoA esters into their corresponding aldehydes, and cinnamyl-alcohol dehydrogenase (CAD) catalyzes the conversion of cinnamyl aldehydes into their corresponding alcohols. Ferulic acid 5-hydroxylase (F5H) converts ferulic acid into 5-hydroxyferulic acid. F5H is also known as coniferaldehyde 5-hydroxylase (CAld5H), since the enzyme preferably transforms coniferaldehyde and/or coniferyl alcohol into synapaldehyde and/or sinapyl alcohol, respectively (Humphreys et al. 1999; Osakabe et al. 1999). Caffeic acid O-methyltransferase (COMT) converts 5-hydroxyconiferaldehyde and/or 5-hydroxyconiferyl alcohol into sinapaldehyde and/or sinapyl alcohol, respectively (Osakabe et al. 1999; Parvathi et al. 2001; Zubieta et al. 2002). COMT was previously thought to be a bifunctional enzyme, methylating caffeic and 5-hydroxyferulic acids.

Chalcone synthase (CHS) is the first enzyme in the flavonoid/anthocyanin branch of the phenylpropanoid pathway. It catalyzes the biosynthesis of chalcone from one molecule of p-coumaroyl-CoA with three molecules of malonyl-CoA. This basic flavonoid structure is then transformed by a set of various isomerases, reductases, hydroxylases, Fe2+/2-oxoglutarate-dependent dioxygenases, and transferases into different flavonoids, including flavanones, flavones, flavonols, anthocyanins, and condensed tannins (Winkel-Shirley 2001; Ralston et al. 2005; Ferrer et al. 2008; Saito et al. 2013). CHS and chalcone isomerase (CHI) catalyze the two-step condensation, producing a colorless flavanone (naringenin), which is then oxidized by flavanone 3-hydroxylase (F3H) into the colorless dihydroflavonol (dihydrokaempferol). Subsequent hydroxylation of this compound (at the 3′ or 5′ position of the B-ring), catalyzed by flavonoid 3′-hydroxylase (F3′H) and flavonoid 3′,5′-hydroxylase (F3′5′H), produces dihydroquercetin and dihydromyricetin. These two enzymes (F3′H and F3′5′H) can also hydroxylate flavanone (naringenin) to produce eriodictyol and pentahydroxy-flavanone, which are then hydroxylated by F3H into dihydroquercetin and dihydromyricetin, respectively. The next step in the pathway is the conversion of the three dihydroflavonols (dihydroquercetin, dihydrokaempferol, and dihydromyricetin). These compounds can be transformed into flavonols (kaempferol, quercetin, and myricetin) by flavonol synthases (FLS). Dihydroflavonol 4-reductase (DFR) converts dihydroflavonols into leucoanthocyanidins (colorless flavan-3,4-diols: leucocyanidin, leucopelargonidin, and leucodelphinidin), which are then oxidized by anthocyanin synthase [ANS, also known as leucoanthocyanidin dioxygenase (LDOX)] into colored but unstable anthocyanidins [cyanidin (red-magenta), pelargonidin (orange), and delphinidin (purple-mauve)]. Stable anthocyanins (colored) are produced by glycosylation of these compounds by the UDP-glucose:flavonoid 3-O-glucosyl transferases (UFGT). Some anthocyanins (cyanidin-3-glucoside and delphinidin-3-glucoside) may be further methylated by methyltransferases (MTs) to produce peonidin-3-glucoside and petunidin- or malvidin-3-glucoside, respectively.

Condensed tannins are synthesized through two branches of the anthocyanin pathway. The reduction of leucocyanidin to catechin (2,3-trans flavan-3-ols) is catalyzed by leucoanthocyanidin reductase (LAR), and the conversion of cyanidin into epicatechin (2,3-cis flavan-3-ols) is driven by anthocyanidin reductase (ANR). The subsequent steps catalyzed by polyphenol oxidases and condensing enzymes possibly take place in vacuoles.

Legume-specific isoflavonoids are produced through two branches of the isoflavonoid pathway having major reactions in common. The branch leading to the isoflavone genistein uses the same naringenin intermediate, which is synthesized in the flavonoid/anthocyanin branch of the phenylpropanoid pathway by a two-step condensation catalyzed by CHS and CHI (common to majority of plants) (Lozovaya et al. 2007). On the other hand, isoflavone daidzein is synthesized through the co-action of CHS and legume-specific chalcone reductase (CHR), yielding isoliquiritigenin (trihydroxychalcone), which is then transformed into liquiritigenin (dihydroxyflavanone), a core intermediate of this branch of the isoflavonoid pathway (Austin and Noel 2003). Isoflavone synthase [IFS, also known as 2-hydroxyisoflavanone synthase (2-HIS)] converts flavanone (naringenin or isoliquiritigenin) into 2-hydroxyisoflavanones (through an aryl migration of the aromatic B-ring from C-2 to C-3 position and hydroxylation in position C-2) (Steele et al. 1999; Jung et al. 2000), which are then dehydrated (formation of a double bond between C-2 and C-3) to the corresponding isoflavones (genistein and daidzein) by 2-hydroxyisoflavanone dehydratase (HID) (Akashi et al. 2005; Shimamura et al. 2007). They are further modified by isoflavonoid-specific enzymes to produce major phytoalexins, including medicarpin, biochanin A, glyceollin, pisatin, and maackiain (Latunde-Dada et al. 2001; Lozovaya et al. 2007; Artigot et al. 2013).

Biosynthesis of lignin, flavonoids/anthocyanins/proanthocyanidins, and isoflavonoids is under complex regulation. The expression of the lignin biosynthetic genes is coordinately regulated by a number of transcription factors. The majority of these genes contain a common AC cis-element, which is required for their expression in cells undergoing lignification. NST1/2/3 (NAC secondary wall thickening promoting factor 1/2/3) and Myb26/Myb83 transcription factors act as master switches to regulate biosynthesis of major secondary wall components, including cellulose, xylan, and lignin in Arabidopsis (Zhong and Ye 2009; Zhao and Dixon 2011; Hao and Mohnen 2014; Yoon et al. 2015). In Arabidopsis flavonoid pathway, genes for early biosynthetic enzymes (CHS, CHI, F3H, and F3′H) are regulated by the three functionally redundant R2R3-MYB transcription factors (MYB11, MYB12, and MYB111), while the activation of late biosynthetic genes is controlled by the R2R3-MYB/bHLH/WD40 (MBW) complex (Grotewold 2005; Hartman et al. 2005; Ramsey and Glover 2005; Gonzalez et al. 2008; Gou et al. 2011; Petrussa et al. 2013; Li et al. 2014; Xu et al. 2014, 2015). Genes of legume-specific isoflavonoid branch of phenylpropanoid pathway are regulated by a different set of transcription factors. For example, GmMYB176, a R1 MYB transcription factor, regulates CHS8 expression and isoflavonoid synthesis in soybean (Yi et al. 2010a, b; Dhaubhadel 2011). The constitutive over-expression of LjMYB14 was associated with the activation of dozen of genes coding for enzymes in the core phenylpropanoid pathway and isoflavonoid branch in Lotus japonicus (Shelton et al. 2012). At the same time, the expression of other transcription factors was altered resulting in coordinated down-regulation of the competing biosynthetic pathways.

Genes encoding the major enzymes of the phenylpropanoid pathway have been identified in a number of plant species (Tsai et al. 2006; Tohge et al. 2007; Xu et al. 2009). In most species, enzymes involved in the phenylpropanoid pathway are encoded by gene families of various sizes. For example, plants’ CADs can reduce various aldehydes, including those expressed in response to pathogens (Barakat et al. 2010; Miedes et al. 2014). The nine putative CAD genes that were identified in Arabidopsis are split into three classes based on protein phylogenetic analysis (Raes et al. 2003). Using Southern hybridization of genomic DNA, Ryder et al. (1987) identified six to eight CHS genes in common bean, some of them tightly clustered, which represented different loci, not allelic variation. The soybean CHS gene family consists of nine members (CHS1 to CHS9), some of which are clustered (Akada and Dube 1995; Yi et al. 2010a). They share a high degree of sequence similarity and play different roles in plant development and interactions with environment. Matsumura et al. (2005) mapped eight CHS genes on five linkage groups (A1, A2, B1, DIa, and K) in soybean. Duplicated CHS1 gene was associated with the suppressed seed coat pigmentation in yellow soybean (Senda et al. 2002).

Gene families arise from interspecific hybridization, polyploidization, and local duplication. Genome duplication results in biased gene content (Freeling 2009) and non-random divergence in gene expression (Casneuf et al. 2006; Wang et al. 2012, 2013). After a duplication event, the new gene copy (or the original copy) can retain the same function (subfunctionalization), undergo neo-functionalization, or become non-functional (loss of function) (Lynch and Conery 2000; Hanada et al. 2011; Barker et al. 2012). Gene clusters formed by gene duplication have been frequently found in multigene families, including plant specialized metabolism (Nutzmann and Osbourn 2014, 2015). For example, clusters encoding enzymes of all steps in lignin biosynthesis have been identified in the Eucalyptus grandis EST libraries (Harakava 2005). The authors also predicted co-localization of several phenylpropanoid pathway enzymes including PAL, C4H, 4CL, C3H, and F5H on the endoplasmic reticulum (ER) membrane. This may suggest the existence of metabolons involving P450 multienzyme complexes and channeling of pathway intermediates without their release into the general metabolic pool (Hrazdina and Wagner 1985; Winkel-Shirley 1999; Ralston and Yu 2006; Bassard et al. 2012).

The availability of complete genome sequences enabled genome-wide analyses of the phenylpropanoid pathway genes in several species (Naoumkina et al. 2010). Shi et al. (2010) identified 95 genes (ten gene families) associated with phenylpropanoid pathway in Populus trichocarpa and identified functional redundancy at the transcript level for six lignin biosynthetic genes [PAL, C4H, 4CL, HCT, CCoAOMT, CAld5H (F5H)]. Using an in silico approach, Costa et al. (2003) analyzed the organization and function of phenylpropanoid pathway gene network in Arabidopsis, while Lucheta et al. (2007) focused on genes encoding key enzymes in the flavonoid pathway in Citrus sinensis. Hamberger et al. (2007) conducted genome-wide analysis of phenylpropanoid pathway gene families in poplar and compared them to homologs in Arabidopsis and rice. The focus of these studies was on the genes of the core pathway and the lignin branch. To explore the evolution of phenylpropanoid pathway diversity, Tohge et al. (2013) compared 65 gene families involved in the pathway among 23 species, including Arabidopsis and soybean. Another evolutionary study was focusing on the isoflavonoid pathway (Chu et al. 2014). The research examined nine major isoflavonoid genes in seven plant species, including Arabidopsis, soybean, and common bean. Genes coding for PAL, C4H, 4CL, CHS, and CHI were identified in all analyzed species, while for CHR, IFS, IOMT (isoflavonoid O-methyltransferase), and IFR (isoflavonoid reductase) were confirmed to be legume-specific. Divergent evolutionary patterns were observed among different gene copies of centrally located branch-point enzymes (4CL, CHS, and CHI) regardless of the level of polymorphism or the evolutionary rate.

However, information about this important pathway in common bean is still fragmentary. In our previous study (Reinprecht et al. 2013), 35 phenylpropanoid pathway genes were cloned and mapped in silico in common bean genome (annotation Phaseolus vulgaris v1.0). The work also identified syntenic regions containing phenylpropanoid pathway genes in common bean and soybean (annotation Glycine max v1.1) (Reinprecht et al. 2013). In another study, 22 phenylpropanoid pathway genes have been mapped in the Bat93 × Jalo EEP558 (a core mapping resource for P. vulgaris) and OAC Rex × SVM Taylor recombinant inbred line (RIL) populations (Yadegari 2013). Currently, work on identifying an association between these genes and different seed phenolics in common bean using an association mapping approach is underway. Cytochrome P450 gene family encodes several key enzymes in the phenylpropanoid pathway. Alber and Ehlting (2012) reviewed P450s involved in lignin biosynthesis. The availability of the complete common bean genome sequence allowed Kumar et al. (2015) to identify members of this gene family. The focus of our work was to study gene families encoding enzymes of phenylpropanoid pathway in common bean, using an in silico approach.

11.2 Gene Families Encoding Enzymes of Phenylpropanoid Pathway in Common Bean

Currently, complete genome sequences for 55 plant species, including common bean (Schmutz et al. 2014; current annotation P. vulgaris v1.0), are deposited in Phytozome 10.3 (a comparative genomic database, available at http://phytozome.jgi.doe.gov/pz/portal.html; accessed 16 Nov 2015; Goodstein et al. 2012). This allowed us to study the complete gene families encoding enzymes of phenylpropanoid pathway in common bean, thus extending our previous work (Reinprecht et al. 2013). In particular, we examined their conservation and diversification through comparative analyses with previously sequenced soybean (Schmutz et al. 2010; current annotation G. max Wm82.a2.v1) and Arabidopsis (The Arabidopsis Genome Initiative 2000; Lamesch et al. 2012; current annotation Arabidopsis thaliana TAIR10) genomes. The basic information for the sequenced Arabidopsis, soybean, and common bean genomes is presented in Table 11.1.

Table 11.1 Basic information for the sequenced genomes of A. thaliana, G. max, and P. vulgaris

Genome annotations for common bean (Schmutz et al. 2014), soybean (Schmutz et al. 2010), and Arabidopsis (The Arabidopsis Genome Initiative 2000) were obtained from Phytozome 10.2 (Goodstein et al. 2012). For each gene, identifiers and descriptions for all Pfam (Protein families), KEGG (Kyoto Encyclopedia of Genes and Genomes), GO (Gene Ontology), PANTHER (Protein ANalysis THrough Evolutionary Relationships), and KOG (EuKaryotic Orthologous Groups) classifications assigned to this gene can be found.

Table 11.2 contains the list and the number of putative genes in each of the major gene families encoding enzymes of the phenylpropanoid pathway in common bean, soybean, and Arabidopsis in the current annotations of their complete genome sequences (P. vulgaris v1.0, G. max Wm82.a2.v1, and A. thaliana TAIR10) deposited in Phytozome. For each of the 21 enzyme classes, their functional annotations were based on the Pfam and KOG databases (commonly used), while the number of genes in each family was based on Phytozome and KEGG databases. For example, with the KOG0222 search, four phenylalanine ammonia-lyase (PAL, EC:4.3.1.24) genes were identified in Arabidopsis, eight PAL genes were identified in soybean, and six PAL genes were found in common bean (Table 11.2). Several large gene families are involved in phenylpropanoid pathway, including the cytochrome P450 family.

Table 11.2 Major gene families encoding enzymes of phenylpropanoid pathway in A. thaliana, G. max, and P. vulgaris

11.3 The Role of Cytochrome P450 Superfamily in Phenylpropanoid Pathway

11.3.1 Cytochrome P450

Cytochromes P450 (CYPs) are ubiquitous monooxygenase enzymes involved in the oxidation of various substrates using oxygen and NADPH. Plant P450s play vital roles in metabolism and detoxification (Mizutani and Ohta 2010; Hamberger and Back 2013). They catalyze reactions in both primary metabolism and secondary metabolism and are involved in the biosynthesis of various metabolites, including fatty acids, sterols, hormones, phenylpropanoids, terpenoids, and signaling molecules. Chemical diversity across plant species is well correlated with the heterogeneity of the P450s (Mizutani and Sato 2011; Mizutani 2012; Sezutsu et al. 2013). They contain a heme cofactor, which absorbs light at 450 nm, and are named for this trait (Pigment absorbing at 450 nm), as well as their cellular localization. Plant P450s are typically membrane-bound to the cytoplasmic surface of the endoplasmic reticulum (ER) by a short N-terminal segment.

The P450s are one of the largest families of enzymes in plants and, in most of plant species, exist as a superfamily. The number of P450 genes is highly variable among plants (Nelson 2006) and represents 0.57–1.07% of the protein coding genes in various plant species [1.07% in Arabidopsis (246/23,000) (Nelson et al. 2004), 0.71% in soybean (332/46,500) (Guttikonda et al. 2010), and 0.78% in common bean (247/31,638) (Kumar et al. 2015)]. The large number of P450s in higher plants is due to gene duplication and diversification (Werck-Reichhart and Feyereisen 2000).

The P450 gene superfamily is characterized by enormous structural and functional diversity (Nelson et al. 2008; Nelson and Werck-Reinchhart 2011; Nagano 2014). Homology and phylogeny were used to group P450s into families (>40% amino acid sequence identity) and subfamilies (>55% amino acid sequence identity) (Nelson et al. 1996). Plant P450 proteins are numbered as CYP51, CYP71 to CYP99, and CYP701 to CYP772. They belong to ten clans (group of genes originated from a single ancestor), which are named by their lowest numbered member [six single-family clans (CYP51, CYP74, CYP97, CP710, CYP711, and CYP727) and four multiple-family clans (CYP71, CYP72, CYP85, and CYP86)] (Werck-Reichhart and Feyereisen 2000; Nelson et al. 2004; Schuler and Werck-Reinchhart 2003; Schuler et al. 2006). Following recommendations of a nomenclature committee (Nelson et al. 1996), the name of P450s consists of a CYP italicized root symbol, followed by a number of the family, a letter of the subfamily and ending by a number of the gene (e.g., CYP71D9—family 71, subfamily D, gene 9), which is determined by the order of identification regardless of the origin.

Initially, P450s were divided into a large A-type clade, which included members that are involved in secondary metabolism (clan CYP71) and several smaller, non-A-type clades, involved in primary metabolism (such as fatty acids and sterols) (Nelson 2006). The occurrence of large numbers of A-type P450s, compared to the non-A-type, suggests a rapid expansion of A-type P450 gene families in plants (Bak et al. 2011).

11.3.2 Clan CYP71—P450s Involved in the Phenylpropanoid Pathway

Based on the current genome annotations [Pfam:00067 (cytochrome P450) functional annotation at Phytozome 10.2; http://phytozome.jgi.doe.gov/pz/portal.html—accessed 26 June 2015], there are 249 P450 genes in A. thaliana TAIR10, 443 P450 genes in G. max Wm82.a.v1, and 264 P450 genes in P. vulgaris v1.0. However, the number of published P450s in these species is slightly different, 272 genes (including 28 pseudogenes) in Arabidopsis (Bak et al. 2011) and 247 genes (including 15 pseudogenes) in common bean (Kumar et al. 2015). P450s in common bean were classified into ten clans that contain 47 families. The largest CYP71 clan (A-type) consists of 19 families with 144 genes. The majority of the genes (>70%) contain a single intron, but more than 20% of the genes have two introns and only a small number of genes (4%) are intronless. In addition, over 80% of the introns are of the zero phase (intron sequence inserted between two successive codons).

It was reported that over 16 P450s are involved in the synthesis and metabolism of phenylpropanoids (Werck-Reichhart 1995). They are placed at the several key positions in the phenylpropanoid pathway, and their roles in phenylpropanoid metabolism were extensively reviewed. For example, Ehlting et al. (2006) and Alber and Ehlting (2012) focused on P450s involved in the core phenylpropanoid pathway and lignin branch, Ayabe and Akashi (2006) in flavonoid metabolism, while Tanaka (2006) and Tanaka and Brugliera (2013) reviewed the role of P450s in flower color.

Seven gene families that encode P450 enzymes involved in phenylpropanoid pathway, as identified in the current genome annotations in common bean, soybean, and Arabidopsis, are listed in Table 11.3. It should be noted, however, that the number of genes in analyzed genomes may change as more work on annotations is done. For example, the CYP71D family in soybean had 81 genes (including 39 pseudogenes) in G. max v1.0 (Nelson 2009) and 52 genes (including 16 pseudogenes) in G. max Wm82.a2.v1. Eleven gene sequences did not correspond between the two genome annotations.

Table 11.3 Clan CYP71 cytochrome P450 gene families encoding enzymes of the phenylpropanoid pathway in A. thaliana, G. max, and P. vulgaris

We used the standard nomenclature of chromosome-based locus (gene model) identifiers in plant genome annotations and assemblies (Phytozome), which consists of four segments:

  • species [AT or At (A. thaliana), Glyma. (G. max), Phvul. (P. vulgaris)],

  • chromosome number [1 to 5 (A. thaliana), 01 to 20 (G. max), 001 to 011 (P. vulgaris)],

  • gene (G or g), and

  • five-digit code [A. thalianaAt2g37040 for phenylalanine ammonia-lyase 1 (PAL1)] or six-digit code [G. max (Glyma.03g181700, PAL1) and P. vulgaris (Phvul.001g177800, PAL1)], numbered from top to bottom of chromosome.

These gene families encode enzymes that catalyze various reactions in different branches of the phenylpropanoid pathway (Fig. 11.1), including

  1. 1.

    core phenylpropanoid pathway: cinnamate 4-hydroxylase (C4H, CYP73A),

  2. 2.

    lignin/lignan branch: coumarate 3-hydroxylase (C3H, CYP98A) and ferulic acid 5-hydroxylase (F5H, CYP84A),

  3. 3.

    anthocyanin/condensed tannin branch: flavonoid 3′-hydroxylase (F3′H, CYP75B), flavonoid 3′,5′-hydroxylase (F3′,5′H, CYP75A), and flavone synthase (FNS, CYP93B), and

  4. 4.

    isoflavonoid branch: isoflavone synthase (IFS, CYP93C), isoflavone 2′-hydroxylase (I2′H, CYP81E), flavonoid 6-hydroxylase (F6H, CYP71D), and 3,9-dihydroxypterocarpan 6a-monooxygenase (D6aH, CYP93A).

11.3.3 Gene Structure, Conserved Domains, and Motifs of P450s Involved in the Phenylpropanoid Pathway

Seven P450s families (clan CYP71) that encode enzymes in the phenylpropanoid pathway in common bean, soybean, and Arabidopsis contain 135 members, with one to 36 genes per family (Table 11.3). Most of these genes contain introns. Only one gene is intronless (Phvul.009g244000, CYP81E51). The number of introns ranges from one to four. The majority of the genes contain one (63%) or two introns (32%). The proteins that they encode range in size from 408 amino acids (Phvul.001g139500, CYP93A57) to 543 amino acids (Phvul.002g014800, CYP81E44). The protein sequences were aligned using Clustal Omega at EMBL-EBI (http://www.ebi.ac.uk/Tools/msa/clustalo/), and conserved regions were displayed with a sequence logo generated from the alignment using a Web-based WebLogo 3.4 (Crooks et al. 2004; available at http://weblogo.threeplusone.com/). All of the P450 sequences included the following domains: a heme-binding region (FxxGxRxCxG), a PERF motif (PERF/W), a K-helix region (KETRL) involved in defining the heme pockets and stabilizing the protein structure, and an I-helix region (AGxDT) involved in oxygen binding (Fig. 11.2).

Fig. 11.2
figure 2

Conserved domains and motif patterns of P450s, CYP71 clan involved in biosynthesis of various phenylpropanoids. P450 domains including a heme-binding region [cysteine (C*) residue is indicated by an asterisk (*)], PERF motif, K-helix and I-helix regions are indicated in red rectangles; the other regions (such as N-terminal region, proline-rich region, membrane anchor, and C-terminal region) are shown in black

11.3.4 Phylogenetic Analysis of P450s Involved in the Phenylpropanoid Pathway

The alignment and tree construction of 135 protein sequences (Table 11.3) from seven P450 gene families (clan CYP71) involved in the phenylpropanoid pathway were performed in MEGA6 (Tamura et al. 2013). These analyses were based on the full-length genes from the three genomes, with one nearly intact soybean C4H pseudogene included (indicated by P at the end of the CYP name—CYP73A88 P). A member from the soybean CYP81E family (CYP81E220de1b, Glyma.16g149200) is truncated (101 amino acids) and was not included in the tree construction.

The phylogenetic tree (Fig. 11.3) separates P450 protein sequences (clan 71) from the two species into seven families:

Fig. 11.3
figure 3

Protein sequences of the seven gene families from the clan CYP71 involved in the phenylpropanoid pathway in soybean and common bean. A neighbor-joining tree (Poisson model, complete deletion) was built using MEGA6. Soybean sequences are labeled in red, and common bean in blue; P at the end of CYP name indicates pseudogene (Glyma.10G275600-CYP73A88 P); shorter protein sequences are indicated by an asterisk (*); a truncated (101 amino acids) Glyma.16g149200-CYP81E220de1b was excluded from the tree construction

  • CYP71CYP71D is a legume-specific cluster and contains 36 genes in soybean (and 16 pseudogenes, not included) and 21 genes in common bean (and four pseudogenes, not included). A single flavonoid 6-hydroylase (F6H) in common bean was clustered with three F6H proteins in soybean.

  • CYP73CYP73A family contains four genes for cinnamic acid 4-hydroxylase (C4H) in soybean (including one pseudogene), three genes in common bean, and a single gene in Arabidopsis. The C4H cluster splits into class I and class II enzymes.

  • CYP75 family is split into two subfamilies. CYP75A consists of two genes for flavonoid 3′,5′-hydroxylase (F3′5′H) (and one pseudogene, not included) in soybean and two genes in common bean. There are no genes for F3′5′H in Arabidopsis. Subfamily CYP75B contains five genes for flavonoid 3′-hydroxylase (F3′H) (and one pseudogene, not included) in soybean, two genes in common bean, and a single gene for F3′H in Arabidopsis.

  • CYP81CYP81E is a legume-specific cluster and consists of 12 genes coding isoflavone 2′-hydroxylase-like (I2′H) genes (and four pseudogenes, not included) in soybean and 12 genes (and two pseudogenes, not included) in common bean.

  • CYP84CYP84A cluster contains three genes encoding ferulic acid 5-hydroxylase (F5H) (and one pseudogene, not included) in soybean, three genes in common bean, and two genes in Arabidopsis.

  • CYP93—The family is clustered into three subfamilies. CYP93A is a legume-specific subfamily. It consists of eight genes for 3,9-dihydropterocarpan 6a-monooxygenase (D6aH) (and two pseudogenes, not included) in soybean and seven genes (and one pseudogene, not included) in common bean. The CYP93B subfamily contains two genes encoding flavonoid synthase (FNS) in soybean and a single gene in common bean. There are no FNS genes in Arabidopsis. CYP93C is a legume-specific branch. It consists of two genes for isoflavone synthase (IFS) in soybean and three genes in common bean.

  • CYP98CYP98A cluster consists of two genes for coumarate 3-hydroxylase (C3H) in soybean and single genes in common bean and Arabidopsis genomes, respectively. There are two additional pollen-specific CYP98As in Arabidopsis (CYP98A8 and CYP98A9; Matsuno et al. 2009—not included in tree construction).

11.3.5 Genome Organization of the Clan CYP71 Gene Families Involved in Phenylpropanoid Pathway in Common Bean

A common bean in silico map that contained genes coding for enzymes of phenylpropanoid pathway, including nine P450s, was developed previously (Reinprecht et al. 2013). The map was created by BLASTing the genomic sequences of the phenylpropanoid pathway genes against the whole common bean genome (P. vulgaris v1.0, Phytozome) using the starting nucleotide positions of the resulting alignments with the chromosome as the map positions for each of the gene sequences.

A similar approach was used to develop a common bean P450-based in silico map, which contains 144 P450, clan CYP71 genes. The mapping was initiated with 134 genes that were identified at Phytozome by searching for KOG0156 functional annotations (cytochrome P450 CYP2 subfamily). Selected gene sequences were BLASTed against the complete common bean genome sequence (Phytozome) to identify their locations. Gene identity was confirmed with the published common bean P450s (Kumar et al. 2015), and ten new sequences (not annotated as KOG0156 in Phytozome) were added to the map. Gene families involved in the phenylpropanoid pathway (shown in larger font, color-coded) were found throughout the common bean genome, except for chromosome Pv05 (Fig. 11.4).

Fig. 11.4
figure 4

Distribution of cytochrome P450—clan CYP71 genes [locus (gene model) identifiers—Phytozome] in the common bean genome (identified on the right on bars). Genes belonging to families involved in the phenylpropanoid pathway are color-coded; P at the end of CYP name indicates a pseudogene; the orientation along the chromosome is indicated by a forward or reverse arrow. The starting nucleotide position of the resulting alignment with the chromosome was used as the map position for each P450 gene sequence (indicated on the left on bars)

Within the same family, P450s are usually grouped into clusters and the structure of the same P450 family is generally conserved (Nelson et al. 2004; Paquette et al. 2009). In the common bean genome, clustering of genes from the same family was noticed on the chromosomes Pv03 for family CYP93C (all three IFS genes) and Pv09 for family CYP81E (three I2H genes). Some of the CYP71 genes are tandem arranged with at least four genes from the same subfamily in a row. Many of these clustered genes are found in the same orientation on four chromosomes [Pv01 (four CYP712B, all forward), Pv02 (four CYP71D, all forward), Pv04 (ten CYP82A, all forward; five CYP71AU, all reverse; five CYP736A, all reverse) and Pv06 (eight CYP71D, all reverse; four CYP79D, all forward)] but in a different orientation on three chromosomes [Pv03 (four CYP71D), Pv04 (four CYP81E), and Pv06 (four CYP71D)]. However, members of the large CYP71D subfamily clustered in the same orientation on chromosomes Pv02 (four) and Pv06 (eight) but in a different orientation on the chromosomes Pv03 (four) and Pv06 (four). Therefore, the subfamily distribution may not follow a regular pattern. Due to clustered organization, the 144 CYP71 P450 genes (Kumar et al. 2015) were not evenly distributed in the common bean genome. They ranged from two genes on the chromosome Pv05 to 25 genes on the chromosome Pv04 (Fig. 11.4).

11.4 Cinnamate 4-Hydroxylase (C4H, EC:1.14.13.11, CYP73A)

11.4.1 C4H Catalytic Reaction and Position in the Phenylpropanoid Pathway

Cinnamate 4-hydroxylase (trans-cinnamate 4-monooxygenase, C4H, EC:1.14.13.11, CYP73A) is the first P450 enzyme in the phenylpropanoid pathway. It is an ER membrane-bound P450 and belongs to the family of oxidoreductases that act on paired donors with incorporation of molecular oxygen. The enzyme catalyzes an irreversible (and rate-limiting) region-specific hydroxylation of the aromatic ring of trans-cinnamic acid (only at the 4-position or para position) to produce p-coumaric (hydroxycinnamic) acid (Fig. 11.5), a precursor for many phenylpropanoids including flavonoids, phytoallexins, and monolignols (Hahlbrock and Scheel 1989; Anterola and Lewis 2002; Lu et al. 2006). For activity, C4H requires molecular oxygen and a cytochrome P450 reductase (CPR).

Fig. 11.5
figure 5

Core (general) phenylpropanoid pathway and the catalytic reaction of cinnamate 4-hydroxylase (C4H, red). The enzyme catalyzes the first oxygenation step of the core phenylpropanoid pathway leading to synthesis of lignin, pigments, and phytoalexins

Mizutani et al. (1997) isolated a cDNA and a genomic clone encoding cinnamate 4-hydroxylase from Arabidopsis (CYP73A5) and found its coordinated expression with PAL and 4CL genes. Mutations in this gene affected phenylpropanoid metabolism, growth, and development (Schilmiller et al. 2009). The gene was mapped to the lower arm of chromosome 2 and was highly expressed in all Arabidopsis tissues, especially in roots and lignifying cells (Bell-Lelong et al. 1997). Genes targeted by the same transcription factors tend to show similar expression patterns, which usually suggest relationships among the genes. Down-regulation of genes coding for PAL and C4H was associated with reduced lignin content and altered lignin composition in transgenic tobacco (Sewalt et al. 1997). The position of C4H in the phenylpropanoid pathway protein network is shown in Fig. 11.6a. Highly connected proteins have a stable steady-state distribution of gene expression (Fig. 11.6b).

Fig. 11.6
figure 6

Functional protein association network in Arabidopsis (action view) visualized on the STRING Web site (http://string-db.org/; accessed: 25 June 2015). a C4H is colored red, and modes of action are shown in different colors. Nodes directly linked to C4H are colored; b Co-expression of C4H with other phenylpropanoid pathway genes in Arabidopsis; locus AT1G15950 is a CCR1 gene

Separation of three common beans, four soybeans, and single Arabidopsis sequences into two groups (Fig. 11.7) confirmed earlier groupings of C4H into class I and class II proteins (Ehlting et al. 2006). This diversification occurred early in the evolution of vascular plants through gene duplication. Common bean and soybean have both classes of C4Hs, while Arabidopsis (Brassicaceae) contains only one gene encoding class I C4H. The alignment of C4H protein sequences (ClustalW2 at EMBL-EBI, available at http://www.ebi.ac.uk/Tools/msa/clustalw2/) revealed high conservation (60–98% identity) among the proteins (85–98% within five C4H class I proteins and 90% between two class II C4H proteins). However, when both monocots and dicots were compared, class I C4H was highly conserved (over 80% protein level), while class II C4Hs were more divergent (less than 70% protein level). This suggests that class I C4Hs “maintained an essential function that does not allow these genes to be lost or even changed much, and it is appealing to assume that this essential function is developmental lignification” (Alber and Ehlting 2012). Class II C4Hs are only present in some plant species, and the class seems to have more specialized functions.

Fig. 11.7
figure 7

Phylogenetic tree of class I and class II C4H proteins. Common bean sequences are labeled in blue, soybean in red, and Arabidopsis in black

The sequences of eight C4H proteins from common bean, soybean, and Arabidopsis were aligned using Clustal Omega (http://www.ebi.ac.uk/Tools/msa/clustalo/) and BoxShade (http://www.ch.embnet.org/software/BOX_form.html). The sequences were most divergent in their N-terminal membrane anchors. Conserved motifs found in plant P450s (Fig. 11.8, shown in bold) were present in all eight proteins, including proline-rich (PPGP) region, C helix (WrkmR), oxygen binding and activation I-helix (AAIETT), K-helix (EtlR), PERF motif (PeeFrPeRF), and heme-binding region (FgvGrRsCpG) at C-terminus. The only exception is soybean C4H (CYP73A88P) encoded by a pseudogene (Glyma.10g275600). It has truncated N-terminal region, and the generally highly conserved PERF motif has an arginine (R, Arg) to lysine (K, Lys) substitution (Fig. 11.8, highlighted).

Fig. 11.8
figure 8

Comparison of C4H protein sequences from common bean, soybean, and Arabidopsis. Conserved motifs and sequences are shown in bold. Secondary structures predicted for Arabidopsis C4H gene (At2g30490) are color-coded [shown at the top of sequences alignment, where H (blue) indicates alpha helices, E (red) represents extended (beta) strands, and C (pink) indicates random coils]

Secondary structures of C4H proteins were predicted by programs GOR (Garnier-Osguthorpe-Robson), IV (Garnier et al. 1996; https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html), and Phyre2 (Protein Homology/analogY Recognition Engine V 2.0) (Kelley et al. 2015; http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index). Transmembrane helices were predicted by program TMHMM-2.0 (TransMembrane prediction using Hidden Markov Models; Krogh et al. 2001; http://www.cbs.dtu.dk/services/TMHMM-2.0/). All proteins have secondary structures similar to the previously published P450s (Graham and Peterson 1999) including alpha helices (blue), beta sheets (red), and random coils (pink) (Fig. 11.9a). They consist of 36–45% alpha helices, 14–18% extended (or beta) strands, and 40–46% random coils. There is a slight difference between the classes of common bean and soybean C4H proteins. Class I C4H proteins contain higher percentages of alpha helices, while class II C4H proteins were predicted to have higher percentages of extended (or beta) strands and random coils. Membrane anchors were predicted for all proteins except for soybean C4H (CYP73A88P) encoded by the pseudogene Glyma.10g026000 (Fig. 11.9b). All C4H proteins are globular proteins as predicted by Phyre2 (Fig. 11.9c). Common bean and soybean C4Hs have tertiary structures similar to the previously identified CYP73A5 in Arabidopsis (At2g30490) and also contain an alpha-domain and a beta-domain (Rupasinghe et al. 2003).

Fig. 11.9
figure 9

Predicted structure of C4H class I and class II proteins in Arabidopsis, soybean, and common bean. a Secondary structures of C4H proteins (predicted by GOR IV); b Transmembrane helices of C4H proteins (predicted by TMHMM); c Tertiary structure of C4H proteins (predicted by Phyre2)

Gene ontology (GO) annotations for C4H proteins (Table 11.4) were predicted using the protein function prediction (PFP), a sequence similarity-based protein function prediction server at Kihara Bioinformatics Laboratory (http://kiharalab.org/; Hawkins et al. 2009). PFP takes into account weakly similar sequences as well as GO term associations observed in known annotations.

Table 11.4 Protein function prediction (PFP) GO terms predicted for common bean, soybean, and Arabidopsis C4H proteins

11.4.2 CYP73A Gene Family—Structure and Genome Location of C4H Genes

C4Hs are encoded by the relatively small CYP73A gene family. It consists of three genes in common bean {Phvul.006g079700CYP73A118, Phvul.007g026000CYP73A15, and Phvul.008g247400CYP73A [this P450 was incorrectly named as CYP73A2 in common bean (Kumar et al. 2015); however, CYP73A2 was identified in mung bean (Mizutani et al. 1993), Vigna radiata (previously Phaseolus aureus; recently moved from the genus Phaseolus to Vigna)], four genes (including one pseudogene, Glyma.10g275600CYP73A88 P) in soybean, and a single gene in Arabidopsis (At2g30490; CYP73A5; REF3).

The gene is well conserved in plants, including soybean and common bean. It contains the Pfam domain (PF00067), found as a “duplication-resistant” gene (Paterson et al. 2006). The first C4Hs were identified in Jerusalem artichoke (Helianthus tuberosusCYP73A1, GenBank accession Z17369; Teutsch et al. 1993) and mung bean (V. radiataCYP73A2, GenBank accession L07634; Mizutani et al. 1993). Soybean C4H (CYP73A11), a class I C4H enzyme, was identified as an elicitor-induced cytochrome P450, using differential display of mRNA (Schopfer and Ebel 1998). In contrast, common bean C4H (CYP73A15) was identified as a class II C4H enzyme, whose expression was associated with differentiation (Nedelkina et al. 1999).

The genes coding for C4H in common bean, soybean, and Arabidopsis differ in their exon/intron structures. The exons are conserved, while introns are more variable. Genes encoding class II proteins in common bean and soybean consist of two exons separated by an intron of moderate size (354 and 463 bp, respectively). Both exons are split, resulting in four exons, in the two genes encoding class I C4Hs in soybean. These genes are characterized by a long intron 3 (1499 and 1272 bp, respectively). The class I C4H gene in Arabidopsis and the two genes in common bean all have three exons (Fig. 11.10).

Fig. 11.10
figure 10

Exon/intron structures of C4H genes in common bean, soybean, and Arabidopsis. Exons are represented by rectangles (common bean—blue, soybean—red, and Arabidopsis—black), and introns are shown as full lines. Conserved exon sequences are connected by dashed lines

11.4.3 Tissue-Specific Expression of Genes Encoding C4Hs

Using publicly available microarray data, Ehlting et al. (2008) created a tool for co-expression analysis of P450s in Arabidopsis. RNA sequencing (RNA-seq) atlases were developed for both soybean (Severin et al. 2010) and common bean (O’Rurke et al. 2014). Based on RNA-seq data (Phytozome 10), genes encoding C4H are differentially expressed in six common bean and soybean tissues (Fig. 11.11). In general, the expression of the genes encoding class I C4H enzymes [Phvul.008g247400 (CYP73A), Glyma.02g236500 (CYP73A11), and Glyma.14g205200 (CYP73A90)] compared to the class II enzymes [Phvul.007g026000 (CYP73A15) and Glyma.20g114200 (CYP73A87)] was higher in all tissues (flowers, pods, leaves, stems, roots, and nodules). Both common bean and soybean have two copies of genes encoding class I C4H enzymes. In both species, one of the genes (Phvul.008g247400 and Glyma.02g236500) is highly expressed in all tissues. The second copy of the genes (Glyma.14g205200 and Phvul.006g079700) is expressed at lower level. In soybean, Glyma.14g205200 had approximately half of the expression of Glyma.02g236500 in stems, roots, and nodules but very low expression in leaves, pods, and flowers. However, Phvul.006g079700 had very low expression in all common bean tissues compared to Phvul.008g247400 (Fig. 11.11).

Fig. 11.11
figure 11

Expression of common bean and soybean genes encoding cinnamic acid 4-hydroxylase (C4H) in six different tissues. FPKM (Fragments Per Kilobase of transcript per Million fragments mapped) data for expression levels of the genes were calculated from the RNA-seq data deposited at Phytozome 10.2 (available at http://phytozome.jgi.doe.gov/pz/portal.html)

Common bean C4H (CYP73A15) was characterized as a class II C4H enzyme, whose expression was more related to differentiation than the responses to stress (Nedelkina et al. 1999). Antisense and sense expression of cDNA coding for a truncated CYP73A15 gene from French bean led to a reduced and delayed production of lignin in tobacco (Blee et al. 2001). Three C4H genes were identified in the P. trichocarpa genome. Two of them (PtrC4H1 and PtrC4H2) were abundant in differentiating xylem, suggesting their importance in monolignol biosynthesis. Transcripts of PtrC4H3 had little or no expression in all examined tissues (Lu et al. 2006).

11.4.4 Cis-Regulatory Regions in 5′UTRs of C4H Genes

In order to understand the functions of individual members of the C4H multigene families, promoters of the common bean and soybean genes were analyzed and compared to Arabidopsis gene (At2g30490) promoter, which have known functions. Promoter sequences [1 kb of 5′ regulatory sequence upstream of the coding region (1 kb 5′UTR flanking region)] of C4H genes were retrieved from Phytozome (10.2) and aligned in Clustal Omega at EMBL-EBI (http://www.ebi.ac.uk/Tools/msa/clustalo/) to search for possible sequence similarities among these sequences in the two C4H classes. The analysis of the 5′ regulatory regions of C4H genes in Arabidopsis, soybean, and common bean C4H genes revealed a moderate degree of divergence in these regions (39–60% identity). Multiple sequence alignment was sent to ClustalW2_Phylogeny to produce a phylogenetic tree, which was visualized in TreeView. Based on the 5′UTR sequences, eight C4Hs were split into two clusters: a three-gene class I C4Hs (Phvul.008g247400, Glyma.02g236500, and Glyma.14g205200) and a two-gene class II C4Hs (Phvul.007g026000 and Glyma.20g114200) clusters. However, Arabidopsis (class I At2g30490), common bean (class I Phvul.006g079700), and soybean (class II pseudogene Glyma.10g275600) were not clearly included in any class (Fig. 11.12).

Fig. 11.12
figure 12

Phylogenetic tree of 5′ upstream region (5′UTR) sequences of the class I and class II C4H genes in Arabidopsis, common bean, and soybean. Arabidopsis sequences are labeled in black, soybean in red, and common bean are in blue; P at the end of the CYP name indicates pseudogene. Class II C4Hs are shown in boxes. * identifies the mostly highly expressed genes, and the number of asterisks indicates the relative levels

The 5′UTR sequence of C4H genes was analyzed for potential cis-acting regulatory elements using PlantCARE database (http://bioinformatics.psb.ugent.be/webtools/plantcare/html; Lescot et al. 2002). In total, 69 potential regulatory elements were identified in 5′UTR sequences of eight C4H genes (Fig. 11.13; Table 11.5). Twenty-six (38%) elements were present in four or more genes (Fig. 11.13, color-coded). In addition to the core TATA box and CAAT box (present in all genes), the list included a large number of light-responsive elements (27), as well as elements associated with tissue-specific expression (5), defense and stress responses (6), or hormonal responsiveness (9). A considerable number (14) of predicted regulatory elements were categorized as unknown function (Table 11.5), and two of these (AC II and unnamed_4) were present in all eight C4H genes.

Fig. 11.13
figure 13

Distribution of the putative cis-regulatory elements in the 5′ upstream regions (5′UTRs) in common bean, soybean, and Arabidopsis C4H genes, identified using PlantCARE database. The elements found in four or more genes are color-coded. Sequences and functions of elements are presented in Table 11.5

Table 11.5 Potential cis-acting regulatory elements identified in the 5′ regulatory sequences (5′UTR) of C4H genes in common bean, soybean, and Arabidopsis using PlantCARE database

A fraction of identified regulatory elements was specific only to class I or class II C4H genes (Fig. 11.13; Table 11.5). Twenty-six elements (37.7%) were present only in class I C4H genes. Four of these elements were identified in all five class I C4H genes. The CGTCA-motif and the TGACG-motif are cis-acting elements involved in the MeJA responsiveness, while the functions of the unnamed_1 and unnamed_3 are unknown. In addition, MBS (a MYB binding site involved in drought inducibility) and O2 site (cis-acting regulatory element involved in regulation of zein metabolism) were identified in the 5′UTRs of all four legume class I C4H genes. Eleven elements (15.9%) were unique to the class II C4H genes. Three of these elements were identified in both soybean (Glyma.20g114200) and common bean (Phvul.007g026000) C4H genes. The MNF1 and CG motifs are light-responsive elements, while the function of the TATCCAT/C-motif is unknown. Lu et al. (2006) reported that four divergent C4H isoforms play distinct roles in P. trichocarpa. The divergent upstream sequences among the two group PtreC4H genes suggested that the mechanisms of gene regulation might be different.

The identification of the cis-acting sequences regulating differential expression of C4H genes and transcription factors that interact with these sequences in common bean, soybean, and Arabidopsis could lead to an understanding of the mechanism(s) of differential regulation of these highly similar genes in these plant species.

11.4.5 Syntenic Regions Containing Common Bean C4H Genes

The availability of the complete genome sequences for numerous plant species, including soybean (Schmutz et al. 2010) and common bean (Schmutz et al. 2014), allows the organization of the individual genomes to be studied, as well as enables comparison of the genomes at the nucleotide level. The size of the common bean genome (521 Mb) is approximately half of the size of the soybean genome (978 Mb). As a result of at least two rounds of polyploidization [~59 MYA (million years ago) and ~13 MYA], the soybean genome contains significant gene duplications and redundancy (Schmutz et al. 2010). In general, for any gene in common bean, two corresponding homologous genes could potentially be found in soybean. Moreover, because of the shared synteny between the two genomes, regions homologous to regions in two soybean chromosomes were found for all 11 common bean chromosomes, with a minor marker rearrangement and/or sequence orientation (Galeano et al. 2009; McClean et al. 2010; Reinprecht et al. 2013).

Synteny analysis was performed in Plant Genome Duplication Database (PGDD, available at http://chibba.agtec.uga.edu/duplication; Lee et al. 2013) against complete genome sequences available for 47 flowering plant species. Numerous syntenic regions (26–44) with other plant species were found for common bean, soybean, and Arabidopsis class I C4Hs. The blocks were of various sizes, ranging from 14 to 884 gene anchors. For example, common bean C4H on the chromosome Pv06, CYP73A118 (Phvul.006g079700), was syntenic to 44 regions in 31 different plant species including two regions in soybean, poplar, pear, watermelon, rice, kale, sacred lotus, and chickpea, three regions in Chinese cabbage, and four regions in kiwifruit (data not shown). In contrast, only five syntenic blocks were identified for common bean and soybean class II C4Hs. They were syntenic to each other and to another three legumes (Medicago truncatula, Cicer arietinum, and Cajanus cajan).

Several syntenic blocks containing C4H loci were identified among common bean, soybean, and Arabidopsis genomes (Table 11.6; Fig. 11.14). For example, Phvul.006g079700 (encoding common bean class I C4H) was syntenic to other four class I C4Hs: common bean Phvul.008g247400, soybean Glyma.02g236500 and Glyma.14g205200, and Arabidopsis At2g30490. Similarly, common bean class II C4H Phvul.007g026000 was syntenic to two soybean class II C4Hs: Glyma20g.114200 and Glyma.10g275600. They were contained in large syntenic blocks anchored by 641 and 561 genes, respectively (Table 11.6; Fig. 11.14, ). Synteny was also analyzed with SyMap v4.0 (Synteny Mapping and Analysis Program; available at http://www.symapdb.org; Soderlund et al. 2011) to produce circular alignments of multiple common bean and soybean chromosomes (Fig. 11.14, right).

Table 11.6 Syntenic blocks containing C4H loci in genomes of common bean, soybean, and Arabidopsis
Fig. 11.14
figure 14

Syntenic regions containing C4H loci in genomes of common bean and soybean. a. Class I C4H—Phvul.006g079700 (CYP73A118) and Phvul.008g247400 (CYP73A); b Class II C4H—Phvul.007g026000 (CYP73A15). Left—synteny identified in Plant Genome Duplication Database. Query locus is represented by a red arrow; blue arrows are other anchor genes in the region. Right—circular alignment of common bean and soybean chromosomes containing C4H loci

11.4.6 Sequence Polymorphisms in C4H Genes in Common Bean

Nucleotide polymorphisms for a number of phenylpropanoid pathway genes in various plant species have been described, including Arabidopsis (Savolainen et al. 2000; Aguade 2001; Wright et al. 2003) and maize (Brenner et al. 2010). In the current work, sequences of three C4H genes in the common bean landrace G19833 (Phytozome) were BLASTed against genome sequence of cultivar OAC Rex. The structure of C4H genes identified in OAC Rex was predicted with the HMM-based Fgenesh gene finder (Solovyev et al. 2006; available at http://linux1.softberry.com/berry.phtml?topic=fgenesh&group=programs&subgroup=gfind; accessed: 7 July 2015).

The C4H proteins in the two genotypes were very similar. The proteins encoded by the Phvul.006g079700 gene in G19833 and OAC Rex were identical. A single amino acid difference was identified at position 42 between OAC Rex (I) and G19833 (V) C4H proteins encoded by the Phvul.008g247400 gene (99.8% identity). OAC Rex and G19833 C4H proteins encoded by the Phvul.007g026000 gene were 99.1% identical. Differences were found in five amino acids at positions 4 (V in OAC Rex, F in G19833), 7 (N in OAC Rex, K in G19833), 18 (L in OAC Rex, S in G19833), 54 (K in OAC Rex, N in G19833), and 420 (I in OAC Rex, V in G19833) (data not shown).

The CH4 genomic sequences were also very similar between two common bean genotypes (97.2% identity for Phvul.006g079700, 98.5% identity for Phvul.008g247400, and 98.9% identity for Phvul.007g026000). However, by aligning the CH4 encoding sequences in the two bean genomes (G19833 and OAC Rex), polymorphism (SNPs, insertions, and deletions) was identified for all three C4H genes (Table 11.7; Fig. 11.15).

Table 11.7 C4H gene polymorphism between common beans cultivar OAC Rex and landrace G19833
Fig. 11.15
figure 15

C4H gene sequence polymorphisms between common bean cultivar OAC Rex (UofG) and landrace G19833 (Phytozome v10.2). a Class I C4HCYP73A118 (Phvul.006G079700; OAC Rex accession KU308554); in an alignment, E indicates exons (shown in capital letters) and I represents introns (shown in small letters); the sequence polymorphism in intron 2 (I2) is highlighted (shown in gray); b Class I C4HCYP73A (Phvul.008G247400; OAC Rex accession KU308556); c Class II C4HCYP73A15 (Phvul.007G026000; OAC Rex accession KU308555)

Although polymorphisms were detected in both the coding (one to 11 SNPs, shown in bold) and non-coding regions, the majority of the sequence differences that were identified occurred in the introns and UTRs. For example, the size difference of the Phvul.006g079700 gene (encoding class I C4H, CYP73A118) intron 2 (143 bp) in OAC Rex (272 bp) and G19833 (415 bp) can be used to develop gene-based marker(s). However, the usefulness of these polymorphisms as C4H gene-specific marker needs to be evaluated in additional germplasm from two common bean gene pools.

11.5 Conclusions

The availability of the whole genome sequences allowed us to identify gene families encoding major enzymes of the phenylpropanoid pathway in common bean, soybean, and Arabidopsis. The work focused on C4H, a cytochrome P450 that occupies an entry position in the pathway. Three genes encoding C4H proteins were identified in common bean genome compared to the four genes in soybean. The next step would be to functionally characterize these genes. The availability of the common bean genome sequence also makes it possible to identify and characterize the members of each gene family that are involved in the specific branches of the phenylpropanoid pathway. Furthermore, the identification of transcription factors that activate phenylpropanoid biosynthetic gene families could provide tools to potentially manipulate the amount of different phenylpropanoids in common bean.