Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

From an anthropocentric point of view, for millennia, human culture has been intricately involved with cellulose, the major component of the plant cell wall. The development of wood, paper, and textile industries has served to weave cellulosic materials into the fabric of our society. Within the past century, however, cellulosic wastes, derived mainly from the same industries, have also become a major source of environmental pollution. This chapter will concentrate mainly on cellulose and the cellulolytic bacteria, in view of their importance to mankind and world ecology. Nevertheless, the true substrate of these bacteria—i.e., the complement of plant cell wall polysaccharides in general—is much more complex than cellulose alone. Likewise, the complement of enzymes—both the cellulolytic and the non-cellulolytic glycoside hydrolases (GHs)—are produced concurrently in these bacteria for the purpose of efficient synergistic degradation of the complete substrate composite as it appears in nature. Consequently, when we discuss the cellulose-decomposing bacteria and their enzyme systems, we cannot ignore the related non-cellulolytic enzymes, and these will also be treated, albeit secondarily, in this chapter.

The plant cell wall consists of an intricate mixture of polysaccharides (Carpita and Gibeaut 1993); cellulose, hemicellulose, and lignin are its major constituents. These polymers are of a very robust nature. They both equip the plant with a stable structural framework and protect the plant cell from the perils of its environment. Despite its recalcitrant nature, in the guise of dead or dying plant matter, the polysaccharides of the plant cell wall provide an exceptional source of carbon and energy, and a multitude of different microorganisms has evolved which are capable of degrading plant cell wall polysaccharides.

In any given ecosystem, the polysaccharide-degrading microbes are not alone, but rely on the complementary contribution of other bacterial and/or fungal species (Bayer and Lamed 1992; Bayer et al. 1994; Ljungdahl and Eriksson 1985). The polymer-degrading strains play a primary and crucial role in the ecosystem by converting the plant cell wall polysaccharides to the respective simple sugars and other degradation products (Fig. 6.1 ). They are assisted by satellite microbes, which cleanse the microenvironment from the breakdown products, producing, in the final analysis, methane and carbon dioxide.

Fig. 6.1
figure 00671figure 00671

Simplified schematic description of a typical ecosystem comprising degrading plant matter. Cellulolytic, xylanolytic, and other hemicellulosic microbes combine to decompose the major polysaccharide components to soluble sugars. “Satellite” microorganisms assimilate the excess sugars and other cellular end products, which are ultimately converted to methane and carbon dioxide

In a given polysaccharide-degrading microorganism, the enzymes that catalyze the degradation may occur in several possible states, according to recognized paradigms (Himmel et al. 2010). These include (1) enzymes in the free state and (2) multifunctional polypeptides in discrete multi-enzyme complexes and/or enzymes attached directly to the bacterial cell surface. All of these paradigms exhibit similar types of enzyme components, which usually comprise modular proteins that contain a multiplicity of functional modules. The “free” enzymes comprise a single polypeptide chain, which contains a catalytic module usually connected to a cellulose-binding module (CBM). The multifunctional enzyme paradigm includes more than one catalytic module per polypeptide chain. Cellulosomes are exocellular macromolecular machines, designed for efficient degradation of cellulose and associated plant cell wall polysaccharides (Bayer et al. 1998a, 2004, 2008; Shoham et al. 1999; Doi and Kosugi 2004). In contrast to the free and multifunctional enzymes, the cellulosome complex is composed of a collection of subunits, each of which comprises a set of interacting functional modules. Thus, one type of cellulosomal module, the CBM, is selective for binding to the substrate. Another family of modules, the catalytic modules, is specialized for the hydrolysis of the cellulose chains. Yet another complementary pair of modules—the cohesins and dockerins—serves to integrate the enzymatic subunits into the complex and the complex, in turn, into the cell surface. Multiple copies of the cohesins form a unique type of nonenzymatic integrating subunit called scaffoldin to which the dockerin-containing enzymes are attached. This “Lego™”-like arrangement of the modular subunits generates an intricate multicomponent complex, the enzymes of which are bound en bloc to the insoluble substrate and act synergistically toward its complete digestion. Finally, single enzymes and cellulosomes can both be attached to the bacterial cell surface using one of several mechanisms. In the case of a single enzyme, the catalytic module is attached to the cell surface via a specialized binding module. In the cellulosome, a similar type of binding can occur, but this is done so via an anchoring scaffoldin which then binds to the primary (enzyme-integrating) scaffoldin. Alternatively, in some cases, the enzyme or anchoring scaffoldin is bound covalently to the cell surface enzymatically. Finally, the attachment of some types of cellulosome to the bacterial surface has not yet been elucidated. The above paradigms will be discussed in more detail later in this chapter.

A list of cellulose-degrading bacteria is presented in Table 6.1 , together with their dominant enzyme paradigm (some species show numerous types of enzymes, e.g., free and cellulosomal) and their distinctive types of modules (i.e., CBMs, cohesins, dockerins).

Table 6.1 A list of cellulose-degrading bacteria

Inherent to the study of cellulases and related enzymes is their potential industrial application—particularly toward conversion of cellulosic biomass to biofuels. For reviews on the potential uses of these enzymes, the reader is referred to appropriate reviews on the subject (Bhat 2000; Himmel et al. 1999, 2007; Lynd et al. 1991, 2008; Perlack et al. 2005; Ragauskas et al. 2006; Schubert 2006; Galbe and Zacchi 2007; DOE 2008; Himmel 2008; Wall et al. 2008; Nordon et al. 2009; Sheehan 2009; Wilson 2009; Xu et al. 2009; Klein-Marcuschamer et al. 2011).

Plant Cell Wall Polysaccharides

Plant cells produce a composite matrix of hardy and durable polysaccharides on the outer surface of the plasma membranes, called the cell wall (Carpita and Gibeaut 1993). The cell wall confers a protective coating to the plant cell, providing structure, turgidity, and durability, which render the cell resistant to the outer elements, including mechanical, chemical, and microbial assault. Different types of plant cell tissues exhibit different ratios of the three major types of cell wall component; on the average, the cell wall contains roughly 40 % cellulose, 30 % hemicellulose, and 20 % lignin, but the exact composition of an individual type of plant varies greatly. The first two polymers are indeed polysaccharides. On the other hand, lignin is a heterogeneous, high-molecular-weight hydrophobic polymer, which consists of non-repeating aromatic monomers connected via phenoxy linkages (Higuchi 1990; Lewis and Yamamoto 1990). Unlike cellulose and hemicellulose, which are degraded aerobically or anerobically, lignin degradation requires oxygen and is limited to filamentous prokaryotes (e.g., the Actinomycetes Streptomyces viridans) and fungi (e.g., Phanerochaete chrysosporium, Bejerkendera adusta, and Pleurotus ostreatus), which produce a complicated set of enzymes that hydrolyze the polymer. In fact, the recalcitrant lignin interferes severely with the access of enzymes to the cellulose component, and is rate-limiting for anaerobic degradation of cellulose. In any case, the lignin component must be degraded or removed, before efficient degradation of cellulose can take place. Nevertheless, since lignin is not a polysaccharide, it will not be discussed further in this chapter.

Cellulose

Cellulose is the major constituent of plant matter and thus represents the most abundant organic polymer on Earth. Cellulose is a remarkably stable homopolymer, consisting of a linear (unbranched) polymer of β-1,4-linked glucose units. Chemically, the repeating unit is simply glucose, but, structurally, the repeating unit is the disaccharide cellobiose, i.e., 4-O-(β-d-glucopyranosyl)-d-glucopyranose, since each glucose residue is rotated 180° relative to its neighbor (Fig. 6.2 ). The individual cellulose chains contain from about 100 to more than 10,000 glucose units, packed tightly in parallel fashion into microfibrils by extensive inter and intrachain hydrogen bonding interactions, which account for the rigid structural stability of cellulose. The microfibrils exhibit variable amounts of crystalline and amorphous components, again depending on the degree of polymerization, the extent of hydrogen bonding and, ultimately, on the source of the cellulose. Cellulose of the plant cell wall is composed of two different forms: cellulose Iα and cellulose Iβ. Cellulose Iα is in a triclinic state with a single chain per unit cell and is of higher energy than cellulose Iβ, which is in a monoclinic state and much more stable (Atalla and VanderHart 1984; Sugiyama et al. 1991; Atalla 1999; Ding and Himmel 2006). Enzymatic hydrolysis of the Iα form occurs more readily, but the cellulose of the plant cell wall comprises mainly the more stable Iβ form. The hydroxyl groups of glucose are in the equatorial position, as opposed to the axial positions which are all nonpolar protons that do not participate in hydrogen bonding interactions. Thus, owing to the packing of the glucose chains in the microfibrils, the “sides” are polar and hydrogen bonding whereas the “tops and bottoms” are hydrophobic in character (Matthews et al. 2006). The microfibrils themselves are further assembled into plant cell walls, the tunic of some sea animals, pellicles from bacterial origin, etc. Highly crystalline forms of cellulose include cotton, bacterial cellulose (from Acetobacter xylinum) and the cellulose from the algae Valonia ventricosa, which exhibit crystallinity levels of about 45 %, 75 %, and 95 %, respectively. The following reviews are available for more information on the structure of cellulose: A talla (1999); Atalla and VanderHart (1984); Chanzy (1990); O’Sullivan (1997); Ding and Himmel (2006); Moon et al. (2011).

Fig. 6.2
figure 00672figure 00672

Structure of cellulose. Three parallel chains are shown, and a glucose moiety and repeating cellobiose unit are indicated. The model was built by Dr. José Tormo, based on early crystallographic data

Hemicellulose

Hemicelluloses are relatively low-molecular-weight, branched heteropolysaccharides that are associated with both cellulose and lignin and together build the plant cell wall material (Puls and Schuseil 1993; Timell 1967). The main backbone of hemicellulose is usually made of one or two sugars, which determines their classification. For example, the main backbone of xylan is composed of 1,4-linked-β-d-xylopyranose units. Similarly, the backbone of galactoglucomannan is made of linear 1,4-linked β-d-glucopyranose and β-d-mannopyranose units with α-1,6-linked galactose residues. Other common hemicelluloses include arabinogalactan, lichenins (mixed 1,3-1,4-linked β-d-glucans), and glucomannan. Most hemicellulases are based on a 1,4-β-linkage and the main backbone is branched, whereas the individual sugars may be acetylated or methylated. For example, the linear xylan backbone is highly substituted with a variety of saccharide and nonsaccharide components (Fig. 6.3 ). In the plant cell wall, xylan is closely associated with other wall components. The 4-O-methyl-α-d-glucuronic acid residues can be ester-linked to the hydroxyl groups of lignin, providing cross-links between the cell walls and lignin (Das et al. 1984). Similarly, feruloyl substituents serve as cross-linking sites to either lignin or other xylan molecules. Thus, the chemical complexity of xylan is in direct contrast to the chemical simplicity of cellulose. Likewise, the structural diversity of the xylans is in contrast to the structural integrity of the cellulose microfibril. Consequently, unlike the crystalline-like character of cellulose, the hemicellulose component adopts a gel-like consistency, providing an amorphous matrix in which the rigid crystalline cellulose microfibrils are embedded.

Fig. 6.3
figure 00673figure 00673

Composition of a typical xylan component of hemicellulose. The xylobiose unit (β-Xylp–β-Xylp) is indicated by the blue-sided box, as are major substituents: Me-α-GlcA, methylglucuronic acid; αAraf, arabinofuranosyl group; OAc, acetyl group. A presumed lignin attachment site to a feruloyl substituent of xylan is also illustrated. Sites of cleavage by selected hemicellulases and carbohydrate esterases are also shown: Xyn, xylanase; Abn, arabinofuranosidase; Glr, glucuronidase; Axe, acetyl xylan esterase; Fae, ferulic acid esterase

Pectin

Pectin is a structural polysaccharide which is another major component of the primary cell wall of terrestrial plants. Pectin derivatives serve to mediate plant defense responses and regulate plant development (Ridley et al. 2001). The pectins are heteropolysaccharides composed of α-(1-4)-linked galacturonic acid, substituted with numerous constituent groups, e.g., xylose, rhamnose, and galactose. In addition, a large percentage of the galacturonic carboxyl groups are methylated. During the normal physiological processes of the plant (including plant growth, maturation, fruit ripening, and aging), the distribution, quantity, chemical composition, and structure of pectin is altered.

Cellulose-Degrading Bacteria

The cellulolytic microbes occupy a broad range of habitats. Some are free living and rid the environment of plant polysaccharides by converting them to the simple sugars which they assimilate. Others are linked closely with cellulolytic animals, residing in the digestive tracts of ruminants and other grazers or in the guts of wood-degrading termites and worms (Haigler and Weimer 1991). Cellulose-based ecosystems include soils, swamps, marshes, rivers, lakes, and seawater sediments; rotting grasses, leaves, and wood; cotton bales; sewage sludge; silage; compost heaps; muds; and decaying vegetable matter in hot and volcanic springs, acid springs, and alkaline springs (Ljungdahl and Eriksson 1985; Stutzenberger 1990).

The cellulolytic microorganisms include protozoa, fungi, and bacteria and are ubiquitous in nature. The cellulose-decomposing bacteria include aerobic, anaerobic, mesophilic, and thermophilic strains, inhabiting a great variety of environments, including the most extreme, vis-à-vis temperature, pressure, and pH. Cellulolytic bacteria have also been found in the gut of wood-eating worms, termites, and vertebrate herbivores, all of which exploit anaerobic symbionts for the digestion of wood and fodder.

In nature, many cellulolytic species exist in symbiotic relationships with secondary microorganisms (Ljungdahl and Eriksson 1985). The primary microorganisms degrade cellulose directly to cellobiose and glucose. Only part of the breakdown products is assimilated by the polymer degrading strain(s), and the rest is utilized by the satellite microorganisms. Removal of the excess of sugars promotes further cellulose degradation by the primary species, since cellobiose-induced inhibition of cellulase action and repression of cellulase synthesis are precluded.

Modern interest in cellulolytic microorganisms was spawned by the decay of cotton fabric in army tents and military clothing in the South Pacific jungles during World War II. The basic research program that resulted from this military problem led to the establishment of the US Army Natick Laboratories (Reese 1976). The resultant research led to the discovery that the causative agent for the costly problem was a cellulolytic fungi, Trichoderma viride (subsequently renamed Trichoderma reesei). Subsequent research, originally from the Natick Laboratories and later spreading to other research institutes and universities, led to the identification and classification of thousands of different strains of cellulolytic fungi and bacteria. Many of the major types of cellulolytic bacteria have been listed in the chapter published in the second edition of The Prokaryotes (Coughlan and Mayer 1992). During the interim period until publication of the chapter in the third edition (Bayer et al. 2006), the major emphasis in the area did not concentrate on the discovery or description of new cellulolytic strains but centered on characterizing the enzymes and enzyme systems from selected bacteria that degrade cellulose and plant cell wall polysaccharides in general. More recently, however, the emerging simplicity of genome sequencing efforts and metagenomic prospecting (Li et al. 2009) has supplanted the more tedious biochemical approaches.

Enzymes That Degrade Plant Cell Wall Polysaccharides

The chemical and structural intricacy of plant cell wall polysaccharides is matched by the diversity and complexity of the enzymes that degrade them. The cellulases and hemicellulases are family members of the broad superfamily of glycoside hydrolases (see Table 6.2 ), which catalyze the hydrolysis of oligosaccharides and polysaccharides (Gilbert and Hazlewood 1993; Kuhad et al. 1997; Ohmiya et al. 1997; Schülein 1997; Tomme et al. 1995a; Viikari and Teeri 1997; Warren 1996; Wilson and Irwin 1999). In the past decade, numerous bacterial genomes were sequenced (see Table 6.1 ), and databases for the rapidly spiraling accumulation sequences and structures of cellulolytic and hemicellulolytic enzymes are readily available online (see discussion below).

Table 6.2 Major glycoside hydrolase families involved in the degradation of plant cell wall polysaccharides and their enzymatic activities. The glycoside hydrolase families (GHn) in which some members exhibit standard cellulase activities are shown in bold (See CAZy Website for more details: http://www.cazy.org/)

Historically, the type of substrate and manner in which a given enzyme interacts with its substrate were decisive in the classification of the glycosidases, as established first by the Enzyme Commission (EC) and later by the Nomenclature Committee of the International Union of Biochemistry. Enzymes were usually named and grouped according to the reactions they catalyzed. Thus, cellulases, xylanases, mannanases, and chitinases were grouped a priori in different categories. Moreover, enzymes which cleave polysaccharide substrates in the middle of the chain (“endo”-acting enzymes) versus those which clip at the chain ends (“exo”-acting enzymes) were also placed in different groups. For example, in the case of cellulases, the endoglucanases were grouped in EC 3.2.1.4, whereas the exoglucanases (i.e., cellobiohydrolases) were classified as EC 3.2.1.91.

It is interesting that the distinction between endo- and exo-acting enzymes is also reflected by the architecture of the respective class of active site, even within the same family of enzymes (Fig. 6.4 ). The endoglucanases, e.g., are commonly characterized by a groove or cleft, into which any part of a linear cellulose chain can fit. On the other hand, the exoglucanases bear tunnel-like active sites, which can only accept a substrate chain via its terminus. The exo-acting enzyme apparently threads the cellulose chain through the tunnel, wherein successive units (e.g., cellobiose) would be cleaved in a sequential manner. The sequential hydrolysis of a cellulose chain is a relatively new notion of growing importance, which has earned the term “processivity” (Davies and Henrissat 1995), and processive enzymes are considered to be key components which contribute to the overall efficiency of a given cellulase system.

Fig. 6.4
figure 00674figure 00674

Structures of a typical endoglucanase and exoglucanase. In each case, the structure is viewed from a perspective, which demonstrates the comparative architecture of the respective active site. Despite the sequence similarity of both enzymes and their classification as Family-6 glycoside hydrolases, their respective active-site architecture is different. The Family-6 endoglucanase (endoglucanase Cel6A from the bacterium Thermomonospora fusca, PDB code 1TML) is characterized by a deep cleft to accommodate the cellulose chain at any point along its length, whereas the active site of the Family-6 exoglucanase (cellobiohydrolase CBHI from the cellulolytic fungus, Trichoderma reesei, PDB code 1CEL) bears an extended loop that forms a tunnel, through which one of the termini of a cellulose chain can be threaded

Though instructive, there is growing dissatisfaction with the endo/exo terminology. As our understanding of the nature of catalysis by these enzymes progresses, it has become clear that some enzymes are capable of both endo- and exo-action (Johnson et al. 1996; Morag et al. 1991; Reverbel-Leroy et al. 1997; Sakon et al. 1997). Moreover, some glycoside hydrolase families include both endo- and exo-enzymes, again indicating that the mode of cleavage can be independent of sequence homology and structural fold. In this context, relatively minor changes in the lengths of relevant loops in the general proximity of the active site, may dictate the endo- or exo-mode of action without significant differences in the overall fold.

Due to subtle but diverse chemical and structural aspects of the substrates involved, plant cell wall–degrading enzymes do not follow the same rules as common enzyme standards, such as simple proteases, DNAse, RNAse, and lysozyme. In fact, the cellulases and hemicellulases are usually very large enzymes, whose molecular masses often exceed those of proteases by factors of 2 to 5 and more. Their polypeptide chains partition into a series of functional modules and linker segments (frequently glycosylated), which together determine their overall activity characteristics and interaction with their substrates and/or with other components of the cellulolytic and hemicellulolytic system.

However, the historical division of enzymes is inappropriate for the classification of the cellulases and other glycoside hydrolases. Like other enzymes (e.g., proteases), previous classification systems of the glycoside hydrolases centered on the types of substrates and the bonds cleaved by a given enzyme. The problem with the glycoside hydrolases is that the polysaccharide substrates and particularly the bonds they cleave are all quite similar, and classification of the different types of enzymes according to conventional criteria often misses the mark. Consequently, alternative approaches were pursued. Over the past decade or so, the definitive trend has evolved to classify the different glycoside hydrolases into groups based on common sequence, structural fold and mechanistic themes (Davies and Henrissat 1995; Henrissat 1991; Henrissat and Bairoch 1996; Henrissat and Davies 1997; Henrissat et al. 1998).

A comprehensive, authoritative website that provides a complete and growing catalog of the different glycoside hydrolase families is available (Coutinho and Henrissat 1999a, b, c; Coutinho et al. 2003a, b, c; Henrissat and Coutinho 2001; Henrissat et al. 2003; Cantarel et al. 2009): The Carbohydrate-Active Enzymes server (http://www.cazy.org/). The website also provides similar sequence information for additional types of enzymes that participate in the degradation of plant cell wall polysaccharides, namely, carbohydrate esterases (e.g., that cleave acetyl, feruloyl and cinnamoyl groups from xylans) and polysaccharide lyases (that act on pectin). Additional associated modular components of these enzymes, particularly the carbohydrate-binding modules (CBMs), are also classified into families and documented exhaustively. An extensive list of sequenced genomes is included, which contains the carbohydrate-active enzymes encoded by the genome (“CAZome”) of the given bacterium and facilitates insight into the nature and extent of the metabolism of complex carbohydrates of the species and comparison between both related and unrelated species. The site contains excellent introductory explanatory material, and the interested reader is encouraged to use this site extensively. Moreover, a companion website, CAZypedia (http://www.cazypedia.org/index.php/Main_Page), provides an encyclopedic resource for detailed understanding of the different glycoside hydrolase families.

Cellulases

The cellulases include the large number of endo- and exo-glucanases which hydrolyze β-1,4-glucosidic bonds within the chains that comprise the cellulose polymer (Béguin and Aubert 1994; Haigler and Weimer 1991; Tomme et al. 1995a). Thus, in principle, the degradation of cellulose requires the cleavage of a single type of bond. Nevertheless, in practice, we find that cellulolytic microorganisms produce a variety of complementary cellulases of different specificities from many different families. The major glycoside hydrolase families of cellulases include GH5, GH6, GH7 (found in fungi), GH8, GH9, GH12, GH44, GH45, GH48, GH74, and GH124.

It may seem somewhat surprising that the combined effect of so many different enzymes is required to degrade such a chemically simplistic substrate. This complexity reflects the difficulties an enzyme system encounters upon degrading such a highly crystalline substrate as cellulose. As described in the previous section, cellulases that degrade the cellulose chain can be either “endo-acting” or “exo-acting.” Moreover, the degradation of crystalline cellulose should be viewed three-dimensionally and in situ, where the cellulose chains are packed within the microcrystal, thus generating the remarkably stable physical properties of the crystalline substrate. The enzymes have to bind to the cellulose surface, localize and isolate suitable chains, destined for degradation. It would seem logical that amorphous regions or defects in the crystalline portions of the substrate would be favorable sites for initiation of the process. The structural as opposed to chemical heterogeneity of the substrate dictates the synergistic action of a complex set of complementary enzymes toward its complete digestion.

Various models have been suggested to account for the observed synergy between and among two or more different types of cellulases. For example, an endo-acting enzyme can produce new chain ends in the internal portion of a polysaccharide backbone, and the two newly exposed chains would then be available for action of exo-acting enzymes. In addition, two different types of exo-glucanases may exhibit different specificities by acting on a cellulose chain from opposite ends (i.e., the reducing versus the nonreducing end of the polymer). Likewise, an endoglucanase may be selective for only one of the two sterically distinct glucosidic bonds on the cellulosic surface. In addition, some cellulases may display high levels of activity at the beginning of the degradative process, i.e., on the highly crystalline material, whereas others would be selective for newly exposed, partially degraded chains, otherwise embedded within the crystal. Still others would show very high levels of activity after the degradative process has advanced, and cellulose chains which have been freed of the crystalline setting would then be hydrolyzed quite rapidly. A collection of various enzymes, which exhibit complementary specificities and modes of action, would account for the observed synergistic action of the complete cellulase “system” in digesting the cellulosic substrate.

In addition to endo- and exo-glucanases, included in the overall group of cellulases are the β-glucosidases (EC 3.2.1.21), which hydrolyzes terminal, nonreducing β-d-glucose residues from cellooligodextrins. These enzymes are members of the following glycoside hydrolase families: GH1, GH3, and GH116. In particular, this type of enzyme cleaves cellobiose—the major end product of cellulase digestion—to generate two molecules of glucose. Some β-glucosidases are specific for cellobiose whereas others show broad specificity for other β-d-glycosides, e.g., xylobiose. Often, the β-glucosidases are associated with the microbial cell surface and hydrolyze cellobiose to glucose before, during or after the transport process.

Among the novel glycoside hydrolase families, a new and important oxidative family, previously classified as a glycoside hydrolase family (GH61), was found in fungi which break internal glucan bonds (Beeson et al. 2012). A similar oxidative family of enzymes has also been proposed to exist in bacteria – the Family-33 CBMs, which were originally considered to be CBMs (Forsberg et al. 2011). Both the GH61 and the CBM33s will have to be reclassified separately as oxidative enzymes associated to the other CAZymes.

Hemicellulases

Strictly speaking, hemicellulases are not the precise subject of this chapter, since they do not directly sever the β-1,4-glucosidic bond of cellulose. Nevertheless, in nature, they are essential to the bacterial degradation of insoluble cellulose, since the natural bacterial substrate—the plant cell wall—comprises an architecturally cogent composite of cellulose and hemicelluloses. In natural systems, the two types of polysaccharides cannot be easily separated, and microbial systems have to deal simultaneously with both. The xylan component is particularly of interest for several reasons: (a) xylan is a major hemicellulosic component of the plant cell wall; (b) the xylanases are well-defined enzymes, closely associated with the cellulase; and (c) the repeating units (both xylose and xylobiose) bear striking structural resemblance to their cellulosic counterparts (i.e., glucose and cellobiose).

In contrast to cellulose degradation, the degradation of the hemicelluloses imposes a somewhat different challenge, since this group of polysaccharides includes widely different types of sugars or nonsugar constituents with different types of bonds. Thus, the complete degradation of hemicellulose requires the action of different types of enzymes. These enzymes, the hemicellulases, can differ in the chemical bond they cleave, or, as in the case of the cellulases, they may cleave a similar type of bond but with different substrate or product specificity (Biely 1985; Coughlan and Hazlewood 1993; Eriksson et al. 1990; Gilbert and Hazlewood 1993).

Hemicellulases can be divided into two main types, those that cleave the main chain backbone, i.e., xylanases or mannanases, and those that degrade side chain substituents or short end-products, such as arabinofuranosidase, glucuronidase, acetyl esterases, and xylosidase. Like the cellulases, hemicellulases can be of the endo or exo types. A schematic view of the types of bonds that would be hydrolyzed by different types of hemicellulases is presented in Fig. 6.3 .

Xylan-Degrading Enzymes

The xylanases are by far the most characterized and studied of the hemicellulases and involve the cleavage of a major main chain backbone. Endoxylanases (1,4-β-d-xylan xylanhydrolase, EC 3.2.1.8) hydrolyze the 1,4-β-d-xylopyranosyl linkage of xylans, such as d-glucurono-d-xylans and l-arabino-d-xylan. These single-subunit enzymes from both fungi and bacteria exhibit a broad range of physiochemical properties, whereby two main classes have been described: alkaline proteins of low Mr (<30,000) and acidic proteins of high Mr. This general classification scheme correlates with their assignment into glycoside hydrolase Families 10 and 11 (http://www.cazy.org/), whereby the former represents the high Mr xylanases and the latter coincides with the low Mr enzymes. The two families also differ in their catalytic properties, such that the Family 10 enzymes seem to display a greater versatility toward the substrate than that observed for those of Family 11, and are thus typically able to hydrolyze highly substituted xylan more efficiently. The Family 10 xylanases exhibit a (β/α)8 topology whereas those from Family 11 form a β-jelly roll fold. Both families show a retaining catalytic mechanism of hydrolysis. In addition to the GH10 and GH11 families which comprise xylanases exclusively, a small portion of the GH5 and GH8 enzymes exhibit xylanase activity.

Mannan- and Galactan-Degrading Enzymes

Glucomannans and galactoglucomannans are branched heteropolysaccharides found in hardwood and softwood. The degradation of these polymers again involve many hydrolytic enzymes, including endo-1,4-β−mannanase (EC 3.2.1.78), β−mannosidase (EC 3.2.1.25), β−glucosidase (EC 3.2.1.21), and α−galactosidase (EC 3.2.1.22). 1,4-β−d-Mannanases hydrolyze main chain linkages of d-mannans and d-galacto-d-mannans. These enzymes, both of the endo or exo types, are produced in various microorganisms, including Bacillus subtilis, Aspergillus niger, and intestinal and rumen bacteria and commonly occur in Families GH5, GH26, and GH113 (http://www.cazy.org/). Likewise, 1,4-β-galactanases hydrolyze 1,4-linked β−d-galactosyl groups from the nonreducing end and are members of Family GH53.

Lichenin-Degrading Enzymes

Lichenase (1,3-1,4-β−d-glucan 4-glucanohydrolase, EC 3.2.1.73) is a mixed linkage β−glucanase, which cleaves the β−1,4 linkages adjacent to the β−1,3 bonds of the lichenin substrate. According to modern structure-based classification, lichenases can be members of Families 8, 16, or 17 (http://www.cazy.org/).

Other Polysaccharide-Degrading Activities

Other types of enzyme activities, in which polysaccharides characterized by other types of linkages (i.e., α- or β-, 1,2, 1,3, 1,6, etc.) are cleaved and are classified in various additional glycoside hydrolase families.

β−D-Xylosidases

The 1,4-β−d-xylosidases (1,4-β−d-xylan xylohydrolase, EC 3.2.1.37) hydrolyze xylooligosacharides (i.e., xylan breakdown products and mainly xylobiose) to xylose. These enzymes are either intracellular or extracellular components and are closely associated with hemicellulolytic activities. Monomeric, dimeric, and tetrameric xylosidases have been found with Mrs of 26,000–360,000. Many of the xylosidases act on a variety of substrates. For example Aspergillus niger produces an enzyme, classified as a β−xylosidase, that can hydrolyze β−galactosides, β−glucosides, and α−arabinosides, in addition to β−xylosides. The β−xylosidases are members of numerous glycoside hydrolase families, including GH3, GH30, GH39, GH43, GH52, GH54, GH116, and GH120 (http://www.cazy.org/).

Other Side Chain–Degrading Enzymes

α-d-Glucuronidases (EC 3.2.1.39) catalyze the cleavage of the α-1,2 glucosidic bond of 4-O-methyl-α-d-glucuronic acid side chain. This bond has a stabilizing effect on the neighboring xylosidic bonds of the main chain. Several α-glucuronidase genes have recently been cloned and sequenced and usually occupy Families GH67 and GH115.

α-l-Arabinofuranosidases (α-l-arabinofuranoside arabinofuranohydrolase, EC 3.2.1.55) is another important enzyme that cleaves nonreducing terminal α-l-arabinfuranosidic linkages in arabinoxylan, l-arabinan, and other l-arabinose-containing polysaccharides. These enzymes are found either in the cell-associated or extracellular form and can be members of Families GH43, GH51, GH54, GH62, or GH127. 1, 4-β-Mannosidases hydrolyze 1,4-linked β−d-mannosyl groups from the nonreducing end. These enzymes (similar to β−xylosidases) hydrolyze mainly the end products of the mannanases, i.e., mannobiose and mannotriose and are members of Families GH1, GH2, and GH5.

Pectin-Degrading Enzymes

Polysaccharide Lyases (PLs) cleave uronic acid-containing polysaccharide chains via a β-elimination mechanism to generate an unsaturated hexanuronic acid residue creating a new reducing end. Among the 22 PL families, only a few are relevant to the degradation of plant cell wall polysaccharides, specifically pectins and rhamnogalacturonan (Table 6.3 ). Like the other CAZymes, the PLs are modular proteins, which serve to complement the activities of the cellulases and other enzymes to better degrade the plant cell wall polysaccharide components.

Table 6.3 Polysaccharide lyase families involved in the degradation of plant cell wall polysaccharides and their enzymatic activities (See CAZy website for more details: http://www.cazy.org/)

Carbohydrate Esterases

The side chain substituents of xylan are composed not only of sugars but also of acidic residues, such as acetic, ferulic (4-hydroxy-3-methoxycinnamic), or p-coumaric (4-hydroxycinnamic) acids. Carbohydrate esterases that cleave these residues (see Fig. 6.3 ) are found in enzyme preparations from both hemicellulolytic and cellulolytic cultures (Borneman et al. 1993). Such enzymes sometimes represent separate modules, separated by linker segments from other cellulolytic or hemicellulolytic catalytic modules in the same polypeptide chain. Like the glycoside hydrolases, the carbohydrate esterases are classified into families according to sequence homology and common structural fold (http://www.cazy.org/) and they frequently appear together with other modular components, notably xylanases from glycoside hydrolase families 10 and 11, on the same polypeptide chain. A list of the important CE families is given in Table 6.4 . Most of the families contain enzymes that exhibit acetyl xylan esterase activity. Family CE1 also has members that cleave cinnamoyl and feruloyl bonds. These enzymes are very important ones, since this would allow a bacterium to sever the xylan components that are attached covalently to lignin. As lignin and its degradation products are frequently deleterious to enzymes that degrade plant-derived polysaccharides and to the bacterium itself, the action of the ferulic and coumaric acid esterases would promote more effective degradation of the xylan upon its separation from the lignin.

Table 6.4 Carbohydrate esterase families involved in the degradation of plant cell wall polysaccharides and their enzymatic activities (See CAZy website for more details: http://www.cazy.org/)

Cellulases and Hemicellulases Are Modular Enzymes

The initial contribution of biochemical methods for determining the characteristics of a given cellulase was extended immeasurably by the contribution of molecular biology and bioinformatics. By comparing the sequences of the cellulases and related enzymes, an entirely new view of these enzymes emerged.

Cellulases and hemicellulases are composed of a series of separate modular components. This fact explains the very large size of some of these enzymes and gives us some insight into their complex mode of action. Each module comprises a consecutive portion of the polypeptide chain and forms an independently folding, structurally and functionally distinct unit (Coutinho and Henrissat 1999a, b, c; Gilkes et al. 1991; Teeri et al. 1992). Each enzyme contains at least one catalytic module, which catalyzes the actual hydrolysis of the glycosidic bond and provides the basis for classification of the simple enzymes (i.e., those containing a single catalytic module). Other accessory or “helper” modules assist or modify the primary hydrolytic action of the enzyme, thus modulating the overall properties of the enzyme. Some of the different themes illustrating the modular compositions of the cellulases and related enzymes are presented in Fig. 6.5 . In many cases, certain patterns can be observed between the catalytic module(s) and the types of ancillary modules, notably the CBMs, which consistently occur in natural enzyme systems of the cellulolytic bacteria. Knowledge of the different modular components that comprise a given enzyme and thus modulate its activity can thus suggest the functional characteristics of the enzyme.

Fig. 6.5
figure 00675figure 00675

Scheme illustrating the diversity of the modular architecture of cellulases and other glycoside hydrolases. The different modules are grouped into families according to conserved sequences as shown by the pictograms in the Figure. (a) One of the most common types of cellulases consists of a catalytic module, flanked by a CBM at its N- or C-terminus. The particular enzyme shown in (a) comprises a catalytic module from Family-48 and a Family-2 CBM. (b) Cellulosomal enzymes are characterized by a “dockerin module” attached to a catalytic module. In this case, the same type of enzyme as in (a), carrying a Family-48 catalytic module, harbors a dockerin module instead of a CBM. (c) Many cellulases contain “X domains,” i.e., domains of unknown (as yet undefined) function. Often such domains prove to be a CBM when the appropriate binding specificity is determined experimentally. (d) Some enzymes have more than one CBM. Often, one CBM, such as the Family-3 CBM shown in the Figure, serves to bind the cellulase strongly to the flat surface of the insoluble substrate, whereas the other one (the Family-3c CBM) acts in concert with the catalytic module by binding transiently to a single cellulose or to a hemicellulose chain. (e) Some cellulosomal cellulases have a CBM together with a dockerin in the same polypeptide chain. (f) Some cellulases have more than one type of catalytic module, such as the Family-5 and Family-44 modules shown in the Figure, and the two probably work in concerted fashion to degrade the substrate efficiently

The Catalytic Modules: Families of Enzymes

The definitive component of a given enzyme is the catalytic module. Former EC-based classification schemes according to substrate specificity are now considered somewhat obsolete, since they fail to take into account the structural features of the enzymes themselves and for the compound reasons listed in the previous sections. The catalytic modules of glycoside hydrolases are now categorized into families according to amino acid sequence homology (Cantarel et al. 2009; Coutinho and Henrissat 1999a, b, c; Henrissat 1991; Henrissat and Bairoch 1996; Henrissat and Davies 1997; Henrissat et al. 1998). For more information, see the website server for the Carbohydrate-Active Enzymes (CAZy), designed and maintained by Bernard Henrissat and Pedro Coutinho (http://www.cazy.org/).

The enzymes of a given glycoside hydrolase family are similar in sequence, they display the same structural topology, and the positions of the catalytic residues are conserved with respect to the common fold. X-ray crystallography has provided a general overview of the structural themes of the glycoside hydrolases and their interaction with their intriguing set of substrates (Bayer et al. 1998; Davies and Henrissat 1995; Henrissat and Davies 1997).

The mechanism of cellulose and hemicellulose hydrolysis occurs via general acid catalysis and is accompanied by either an overall retention or an inversion of the configuration of the anomeric carbon (Davies and Henrissat 1995; McCarter and Withers 1994; White and Rose 1997; Withers 2001). In both cases, cleavage is catalyzed primarily by two active-site carboxyl groups. One of these acts as a proton donor and the other as a nucleophile or base. Retaining enzymes function via a double-displacement mechanism, by which a transient covalent enzyme-substrate intermediate is formed (Fig. 6.6a ). In contrast, inverting enzymes employ a single-step mechanism as shown schematically in Fig. 6.6b . The distance between the acid catalyst and the base represents the major structural difference between the two mechanisms. In retaining enzymes, the distance between the two catalytic residues is about 5.5 Å, whereas in inverting enzymes, the distance is about 10 Å. In the inverting enzymes, additional space is provided for a water molecule, involved directly in the hydrolysis, and the resultant product exhibits a stereochemistry opposite to that of the substrate. In all cases, the mechanism of hydrolysis is conserved within a given glycoside hydrolase family (Coutinho and Henrissat 1999a, b, c; Davies and Henrissat 1995; Henrissat and Davies 1997).

Fig. 6.6
figure 00676figure 00676

The two major catalytic mechanisms of glycosidic bond hydrolysis. (a) The retaining mechanism involves initial protonation of the glycosidic oxygen via the acid/base catalyst with concomitant formation of a glycosyl-enzyme intermediate through the nucleophile. Hydrolysis of the intermediate is then accomplished via attack by a water molecule, resulting in a product which exhibits the same stereochemistry as that of the substrate. (b) The inverting mechanism involves the single-step protonation of the glycosidic oxygen via the acid/base catalyst and concomitant attack of a water molecule, activated by the nucleophile. The resultant product exhibits a stereochemistry opposite to that of the substrate. The type of mechanism is conserved within a given glycoside hydrolase family and dictated by the active-site architecture and atomic distance between the acid/base and nucleophilic residues (aspartic and/or glutamic acids)

Carbohydrate-Binding Modules (CBMs)

In addition to the catalytic module, free cellulases and hemicellulases usually contain at least one carbohydrate-binding module (CBM) as an integral part of the polypeptide chain (Linder and Teeri 1997; Tomme et al. 1995b). The CBM serves predominantly as a targeting agent to direct and attach the catalytic module to the insoluble crystalline substrate. Like the catalytic modules, the CBMs are categorized into a series of families according to sequence homology and consequent structural fold.

For historical reference, until the year 2000 or so, the original term used in the literature for such substrate-binding modules was CBD, as an indication of cellulose-binding domain. However, CBD is deceptive, since not all of them bind to cellulose, and some families have members that bind to cellulose as well as other types of polysaccharides. It became clear that a more general term was required, and the term CBM (carbohydrate instead of cellulose, module instead of domain) was chosen and is clearly more appropriate on both counts.

Some CBM families (or subfamilies or family members) bind either preferentially or additionally to other insoluble polysaccharides, e.g., xylan or chitin. For example, the Family-5 CBM and some of the members of the Family-3 CBMs bind to chitin as well as cellulose (Brun et al. 1997; Morag et al. 1995). Moreover, the Family-2 CBMs can be divided into two subfamilies, one of which indeed binds preferentially to insoluble cellulose, but the other binds to xylan (Boraston et al. 1999). The molecular basis for this was proposed to reflect the fact that in the first subfamily, three surface-exposed tryptophans contribute to cellulose binding (Simpson et al. 1999; Williamson et al. 1999). However, in the case of the xylan-binding members, one of these tryptophans is missing, whereas the other two assume a different conformation, thereby allowing them to stack against the hydrophobic surfaces of two xylose rings of a xylan substrate.

Other types of CBM prefer less crystalline substrates (e.g., acid-swollen cellulose), single cellulose chains, and/or soluble oligosaccharides, e.g., laminarin (1,3-β-glucan) and barley 1,3/1,4-β-glucan (Tomme et al. 1996a, b; Zverlov et al. 2001). Still others exhibit alternative accessory function(s), a topic to be described below in more detail. Moreover, the CBMs responsible for the primary binding event may further disrupt hydrogen-bonding interactions between adjacent cellulose chains of the microfibril (Din et al. 1994), thereby increasing their accessibility to subsequent attack by the hydrolytic module.

Consequently, the concept of CBM has been broadened and redefined as CBM—i.e., carbohydrate-binding module (Boraston et al. 1999, 2004; Coutinho and Henrissat 1999a, b, c). In the previous edition of The Prokaryotes, more than a decade ago, 26 different CBM families were described. To date (March 2012), the number of CBM families have increased to 64 (http://www.cazy.org/). A CBM can be identified as a member of the family on the basis of sequence and position of binding residues before the binding function itself is established; nevertheless, it is imperative to confirm experimentally the specificity of binding of an individual CBM.

The structures of CBMs from a number of families and subfamilies have been determined, and an understanding of their structures has provided interesting information regarding the mode of binding to cellulose. Those that bind to crystalline substrates, appear to do so via a similar type of mechanism. One of the surfaces of such CBMs is characteristically flat and appears to complement the flat surface of crystalline cellulose. A series of aromatic amino acid residues on this flat surface form a planar strip (Mattinen et al. 1997; Simpson and Barras 1999; Tormo et al. 1996; Koyama et al. 1997; Lehtio et al. 2003; Ding et al. 2006) that stack opposite the glucose rings of a single cellulose chain. In addition, to the planar aromatic strip, several polar amino acid residues on the same surface appear to anchor the CBM to two adjacent cellulose chains. The binding of the CBM to crystalline cellulose would thus involve precisely oriented, contrasting hydrophobic and hydrophilic interactions between the reciprocally flat surfaces of the protein and the carbohydrate substrate. Together they provide a selective biological interaction, which contributes to the specificity that a CBM exhibits toward its structure. In some cases, the putative binding surface turns out to be irregular instead of flat, which may obstruct binding (Petkun et al. 2010).

In contrast to the interaction with the crystalline cellulose surface, other CBMs seem to interact with single cellulose chains. The Family-3c and Family-4 CBMs preferentially bind to non-crystalline forms of cellulose and clearly have a different function in nature (Johnson et al. 1996; Sakon et al. 1997; Tomme et al. 1996a, b). For example, the role of Family-4 CBM may be to recognize, bind to, and deliver an appropriate catalytic module to a cellulose chain, which has been loosened or liberated from a more ordered arrangement within the cellulose microfibril. The binding of the Family-3c CBM to single cellulose chains and its remarkable role in cellulose hydrolysis will be discussed later (Fig. 6.9 : section on “Helper Modules”). The role of the CBMs has been expanded recently from the conventional substrate targeting function to cell-surface attachment (Ezer et al. 2008; Montanier et al. 2009) and vital important biomass sensing functions leading to transcriptional regulation (Kahel-Raifer et al. 2010; Nataf et al. 2010; Bahari et al. 2011).

The Family-9 Cellulases: An Example

This section pertains to enzyme diversity and how a single type of catalytic module can be modified by the class of helper module(s) that flank its C- or N-terminus. We are only at the beginning in our understanding of how the modular arrangement affects the overall activity and function of a given enzyme.

In its simplest form, an enzyme would presumably consist of a single catalytic module, usually with a standard CBM, which would target the enzyme to the crystalline substrate. Indeed, this is the norm for many individual glycoside hydrolase families. However, in others, e.g., the Family-9 cellulases, the catalytic modules commonly occur in tandem with a number of accessory modules. Although the story remains rather incomplete, we can discuss the currently available information regarding Family 9 and draw several interesting conclusions from the few publications on this currently developing subject.

Family-9 Theme and Variations

The crystal structure of the Family-9 catalytic module displays an (α/α)6-barrel fold and inverting catalytic machinery. There are numerous Family-9 cellulases of plant origin (Coutinho et al. 2003a, b, c) the great majority of which are lone catalytic modules that lack accessory modules. Another type of eukaryotic Family-9 cellulase that lacks helper modules is produced by the termite (Ni et al. 2005). Only a few of the prokaryotic Family-9 enzymes consist of a solitary catalytic module (Fig. 6.7a ). The prokaryotic Family-9 enzymes, however, are almost invariably decorated with a variety of subsidiary modules that modulate the activity of the catalytic module.

Fig. 6.7
figure 00677figure 00677

Theme and variations: schematic view of some of the modular arrangements of the Family-9 glycoside hydrolases. (a) The solitary catalytic module; (b) the catalytic module and fused Family-3c CBM; (c) immunoglobulin-like (Ig) domain, fused to the catalytic module; (d) successive Family-4 CBM, Ig, and catalytic modules. The representations of the different modules are based on their known structures and are presented sequentially, left-to-right, from the N- to C-terminus. Structures in (a) and (b) are derived from cellulase E4 from Thermomonospora fusca (PDB code, 1TF4), those in (c) and (d) are from the CelD endoglucanase of C. thermocellum (PDB code, 1CLC). The Figure used for the Family-4 CBM in (d) is derived from the nmr structure of the N-terminal CBM of Cellulomonas fimi β-1,4-glucanase CenC (PDB code, 1ULO). The structures in (b) and (c) are authentic views of the respective crystallized bi-modular protein components. The CBM in (d) has been placed manually to indicate its N-terminal position in the protein sequence, but its spatial position in the quaternary structure and the structure of the linker segment remains unknown

Microbial Family-9 cellulases commonly conform to one of the themes shown in Fig. 6.7 , which were recognized in the previous edition. In one of these, the catalytic module is followed immediately downstream by a fused Family-3c CBM (Fig. 6.7b ). This particular type of CBM imparts special characteristics to the enzyme (see below). A second theme consists of an immunoglobulin-like (Ig) domain (of unknown function) immediately upstream to the catalytic module (Fig. 6.7c ). A variation of the latter theme includes a Family-4 CBM at the N-terminus of the enzyme, followed by an Ig domain and Family-9 catalytic module (Fig. 6.7d ). In addition to the above-described modular arrangement, each of the free prokaryotic enzyme systems includes a standard CBM that binds strongly to crystalline cellulose. In the last decade, several additional themes have been described, notably GH9-CBM3c′-CBM3b′ (i.e., a GH9 catalytic module followed by two successive subtypes of CBM3) with a C-terminal dockerin. This theme is present in the genomes of Clostridium thermocellum, Acidothermus cellulolytics, and Clostridium clariflavum.

The Family-9 glycoside hydrolase of the cellulosomal scaffoldin from the cellulolytic anaerobic bacterium Acetivibrio cellulolyticus contains no helper module (Ding et al. 1999). The A. cellulolyticus enzyme forms part of a multi-modular scaffoldin, but the catalytic module appears to be a functionally distinct entity that lacks adjoining helper modules. The other modules are conventional scaffoldin-associated modules, e.g., cohesins and a true cellulose-binding CBM.

This thematic arrangement of the Family-9 cellulases is mirrored in the respective sequences of the catalytic modules. The divergent sequences are reflected by the phylogenetic relationship of the parent cellulases (Fig. 6.8 ). Thus, the simplest cellulases (the Group A eukaryotic cellulases from plants) that lack adjacent helper modules are all phylogenetically related (Theme A). Interestingly, the catalytic module of ScaA from A. cellulolyticus is distinct from the other groups designated in Fig. 6.8 , but closest to the plant enzymes, as might be anticipated from its lack of a helper module. In a similar manner, catalytic modules from cellulases that are fused to a Family-3c CBM (Group B), all map within the same branch (Theme B). On the other hand, the catalytic modules that bear an adjacent Ig-like domain all fall into a cluster on the opposite side of the tree. Cellulases which have the Ig-like domain only (Theme C) occupy a small separate branch and those that also include a Family-4 CBM (Theme D) that develops distally to form a separate subcluster.

Fig. 6.8
figure 00678figure 00678

Phylogenetic analysis of the N-terminal Family-9 catalytic module of ScaA and its relationship with other Family-9 members. The various theme groupings roughly follow the groups shown in Fig. 6.7 . Theme A (Group A) enzymes lack associated helper modules. Theme B (Group B) enzymes carry a fused Family-3c CBM downstream to the catalytic module. Theme C (Group C) and Theme D (Group D) enzymes carry an Ig domain upstream to the catalytic module, the Theme D enzymes having an additional N-terminal Family-4 CBM. Theme A enzymes: ScaA Acece, ScaA scaffoldin from the cellulolytic bacterium A. cellulolyticus (AF155197); and plant (eukaryotic) cellulases from Prunus persica (X96853), Populus alba (D32166), Citrus sinensis (AF000135), Persea americana (M17634), Pinus radiata (X96853), Arabidopsis thaliana (X98543), Phaseolus vulgaris (M57400), Capsium annuum (X97189), Lycopersicon esculentum (U20590). Theme B enzymes: CelF Clotm, endoglucanase F from Clostridium thermocellum (X60545); CelZ Closr, exoglucanse Z from Clostridium stercorarium (X55299); CelA Calsa, cellulase A from Caldocellum saccharolyticum (L32742); CelG Cloce, endoglucanase G from Clostridium cellulolyticum (M87018); CelI Clotm, endoglucanase I from Clostridium thermocellum (L04735); CelB Celfi, endoglucanase B from Cellulomonas fimi (M64644); E4 Thefu, endo/exoglucanase E4 from Thermomonospora fusca (M73322). Theme C enzymes: CelJ Clotm, cellulase J from Clostridium thermocellum (D83704); CelD Clotm, endoglucanase D from Clostridium thermocellum (X04584); CelC Butfi, endoglucanase C from Butyrivibrio fibrisolvens (X55732). Theme D enzymes: CbhA Clotm, cellobiohydrolase A from Clostridium thermocellum (X80993); CelA Psefl, endoglucanase A from Pseudomonas fluorescens (X12570); CelC Celfi, endoglucanase C from Cellulomonas fimi (X57858); CelI Strre, endoglucanase I from Streptomyces reticuli (X65616); E1 Thefu, endoglucanase E1 from Thermomonospora fusca (L20094). The analysis of the designated catalytic modules was performed using GenBee, based on the respective GenBank sequences (accession codes in parentheses)

Fig. 6.9
figure 00679figure 00679

Structural aspects of Family-9 Theme-B cellulase E4 from T. fusca. (a) “Side view” of the E4 molecule. Shown are the Family-9 catalytic module (turquoise, on the left), the Family-3c CBM (in yellow, on the right) and the intermodular linker (dark blue strip). The presumed path of a single cellulose chain, from the CBM to the catalytic module, is shown at the bottom of the structure (arrows). The enzyme also possesses a fibronectin-like domain (FN3) and a cellulose-binding Family 2 CBM (not shown). Note that the linker appears to serve a defined structural role by which the Family-3c CBM is clamped tightly to the catalytic module. Selected surface residues on the catalytic module along the interface of both the linker and the CBM3c also serve to fasten both features tightly to the catalytic module. (b) “Bottom view” of the E4 molecule (∼90° rotation of a). From this perspective, the proposed catalytic residues (red), positioned in the active site cleft, are clearly visible. The path of the cellulose chain (arrows) passes through a succession of polar residues (green) on the bottom surface of the CBM which would conceivably bind to the incoming cellulose chain and serve to direct it toward the active-site acidic residues of the catalytic module

Family-9 Crystal Structures

Crystal structures of Family-9 cellulases have been elucidated, representing two subtypes of this particular family of glycoside hydrolase. These are cellulase E4 (or Cel9A) from Thermobifida fusca (Sakon et al. 1997) and Cel9D from Clostridium thermocellum (Juy et al. 1992). These two examples are architecturally distinct—the T. fusca Cel9A cellulase being an example of a Theme B Family-9 enzyme (see Figs. 6.7b and 6.8 ) and the C. thermocellum Cel9D cellulase being a Theme C enzyme. Fortunately, in both cases, one of the neighboring modules co-crystallized with the catalytic module, thus providing primary insight into their combined structures. In the case of T. fusca Cel9A, the catalytic module and neighboring Family-3c CBM were found to be interconnected by a long, rigid linker sequence, which envelops about half of the catalytic module until it connects to the adjacent CBM (Fig. 6.9a ). In contrast, in the C. thermocellum Cel9D, the catalytic module is adjoined at its N-terminus by a 7-stranded immunoglobulin-like (Ig) domain of unknown function. The comparison between the E4 and CelD cellulases indicates that a given type of catalytic module can be structurally and functionally modulated by different types of accessory modules.

Helper Modules

The Family-3c CBM is special. To date, this particular type of CBM has been found in nature associated exclusively with the Family-9 catalytic module. Structurally, the CBM is homologous to the other Family-3 CBMs, but contains substitutions in many important surface residues. The three-dimensional crystal structure of the T. fusca Cel9A cellulase revealed the close interrelationship between the Family-9 catalytic module and the Family-3c CBM, thus suggesting a functional role as a helper module. This CBM seems not to bind directly to crystalline cellulose but appears to act in concert with the catalytic module by binding transiently to the incoming cellulose chain, which is then fed into the active-site cleft pending hydrolysis (Fig. 6.9b ) (Gal et al. 1997a; Irwin et al. 1998; Sakon et al. 1997).

The information derived from the Family-9 enzymes suggests that the activity of catalytic modules can be modulated by accessory modules. The accessory modules can either supplement or otherwise alter the overall properties of an enzyme (Bayer et al. 1998b, c). The recurrent appearance in nature of a given type of module adjacent to a specific type of neighboring catalytic module may indicate a functionally significant theme. These observations raise the possibility of a more selective role for certain types of CBM and other modules, whereby their association with certain types of catalytic modules could signify a “helper” role. The helper module would provide hydrolytic efficiency and alter the catalytic character of the enzyme. Interestingly, in recent work on a Theme B enzyme, Cel9I from C. theromcellum (Burstein et al. 2009), recombinant forms of the individual GH9 catalytic module and CBM3c (together with the intermodular linker) were expressed individually, and the two modules underwent self-assembly to form a complex. Before complexation, the GH9 essentially lacked activity. Physical association of the two modules was shown to recover 60–70 % of the intact Cel9I endoglucanase activity.

Cellulase Analysis

The biochemical characterization of cellulases is in many cases a difficult task owing to the large variety of enzyme types and modes of action. At first glance, it is an intriguing phenomenon that for such a simple reaction (i.e., the hydrolysis of the β-1,4-glucose linkage in a linear glucose chain), nature has evolved so many types of cellulases. The vast varieties of enzymes are found not only among the different species of cellulolytic bacteria but also within the same organism. The reason for this extensive diversity comes from the insoluble nature of cellulose and the fact that although the chemical composition of the homopolymer is rather trivial, the physical and three-dimensional arrangement of the chains within the crystalline and amorphous regions of the microfibril can differ significantly.

Regarding the enzymes that degrade the substrate, the modular nature of the cellulases contributes additional degrees of complexity in our quest to characterize a given enzyme. Thus, the number, types, and arrangement of the accessory modules vis-à-vis the catalytic module are important structural features that modulate the overall activity of the enzyme in question. This descriptive information should always be defined for a recombinant enzyme. Whenever possible, it is desirable to determine the relative contribution of the individual accessory modules to the activity of the enzyme. In this regard, the affiliation of a given module, e.g., CBM, into a defined family does not necessarily define its contribution to enzyme activity, as different specificities and functions have been attributed to different members of the same family of module. Moreover, sequences for the different “X” modules (i.e., modules for which the function remains undefined) are widespread, most of which probably play a carbohydrate binding or processing role in assisting the catalytic module(s) in its capacity to hydrolyze the substrate.

Two decades ago, the range of cellulases and hemicellulases within a given species was assessed mainly by biochemical techniques. In some cases, individual enzymes were isolated and their properties assessed using desired insoluble or soluble substrates. Another approach involved electrophoretic separation of cell-derived or cell-free extracts, and analysis of desired activities using zymograms. There are advantages and disadvantages with each of these strategies, and the employment of combined complementary approaches is always advisable.

Molecular biology techniques are also used to reveal cellulase and hemicellulase genes, which can often be characterized on the basis of sequence homology with related, known genes (Béguin 1990; Hazlewood and Gilbert 1993) or according to their GH family membership (Table 6.2 ). If further information is required on the structure or action of a given enzyme, the gene can then be expressed in an appropriate host organism, and the properties of the product can be characterized.

It is always instructive to compare the properties of an expressed gene product with those of the same protein isolated from the original bacterial culture. The results may be surprising; there are hazards inherent to both approaches. Expression of a gene may yield preparations with reduced or altered enzymatic properties. In this context, the expressed gene product may not have been folded properly. It is of course assumed that the investigator has taken the time and trouble to sequence the cloned gene to ensure no mutations have occurred.

Unlike a gene expressed in a host cell environment, the native counterpart may have undergone posttranslational modifications (e.g., glycosylation, proteolytic truncation, etc.) that improve its physicochemical properties. Moreover, since the cellulase system in the native environment includes numerous enzyme types, often exhibiting similar molecular masses and other physical characteristics, the reputed purification of a given extracellular cellulase may still include contaminating enzymes that alter (usually increasing greatly due to synergistic action of two or more enzymes) the true enzymatic properties of the desired enzyme. The onus belongs to the conscience of the investigating scientist when publishing the properties of a given enzyme. Too often, erroneous data that enter the scientific literature are taken as fact. One should particularly be wary of comparing enzymatic activities of the same or similar types of enzymes (e.g., members of the same family) that have been published at different times and by different laboratories.

During the past decade, the phenomenal decrease in the costs associated with genomic and metagenomic sequencing efforts has completely altered accepted methodology for enzyme discovery. Today, sequencing of a cellulolytic microbe with concomitant bioinformatic annotation yields dozens and sometimes hundreds of new enzymes which can generally be included into the known families of glycoside hydrolases. The gargantuan efforts in establishing the CAZy database (http://www.cazy.org/) (Cantarel et al. 2009) have today provided the informed researcher with tools to determine the general features of a given enzyme. Nevertheless, researchers who seriously seek to understand more deeply the action of a newly discovered enzyme must perform the biochemical, structural, and enzymological studies in a meticulous manner.

The establishment of novel families – i.e., glycoside hydrolases as well as other carbohydrate-active enzyme superfamilies, requires much more intensive and elegant studies of this nature. This is particularly evident for many types of cellulases, where no simple colorimetric assays exist. In some cases, chromogenic substrates or assays are available and the detection of cellulolytic assays are, in this case, more straightforward; in others, the activity is much more subtle. This is reflected in the fact that most of the known glycoside hydrolase families which include genuine cellulase were identified early on, since their members were identified colorimetrically.

Since that time, new families of cellulases were difficult to establish, mainly due to the lack of a simple comprehensive assay or sets of assays that would definitively identify new types of cellulolytic activity. In the early 1990s, an important family was discovered that includes exoglucanases (glycoside hydrolase Family 48). The founding member of this family was a predominant component of the C. thermocellum cellulosome (Morag et al. 1991; Wang et al. 1993, 1994). Subsequent research has established that a member of this family is consistently a major component of each newly discovered cellulosome. In addition, members of this family have been discovered in both free and multifunctional cellulases.

Nearly two decades then passed until a new type of cellulase was discovered (Brás et al. 2011), which allowed formation of a new family (glycoside hydrolase Family 124). In this case, the actual cellulose-degrading function was somewhat cryptic and its detection required a combined approach until the enzyme could be verified as a cellulase.

Clearly, with continuing genomic and metagenomic sequencing, there are myriads of unknown and novel types of cellulases and other associated plant-derived polysaccharide-degrading enzymes that await future discovery. Novel, preferably medium- or high-throughput approaches will be required to promote this endeavor.

The assessment of cellulase activity is indeed a complicated undertaking, and there is no clear or standard methodology for doing so. This predicament apparently reflects a combination of factors, including the complex nature of the substrate, the multiplicity of enzymes and their synergistic action, and the variety of products formed. The fact that cellulose is an insoluble substrate converted to lower-order cellooligosaccharide products is a further complication. It must be noted that as the cellooligomers increase in length, they become less soluble, such that cellooctaose of eight glucose units is no longer soluble in aqueous solutions. Moreover, the accumulation of one (particularly cellobiose) or more of the cellulose degradation products may be inhibitory toward enzymatic activity.

Today, the study of cellulase action usually includes, in addition to conventional biochemical assays, the analysis of the primary structure and the assignment of the various modules into known families. The catalytic modules can usually be assigned into one of the known glycoside hydrolase families (Henrissat and Bairoch 1996; Henrissat and Davies 1997). Whenever the sequence of a known polysaccharide-degrading enzyme failed to match a known family, a new family of glycoside hydrolase was established. This approach was extensively developed in the last decade, due to the increasing number of available DNA sequences and bioinformatics analysis tools. At the same time, an increasing number of crystal or solution structures of various catalytic and accessory modules were published that allow us to examine a new protein sequence in light of its structure. Sometimes, the publication of the structure of an accessory module precedes determination of its function.

We can divide the analysis of a newly described prospective cellulase into several stages, such that a variety of complementary approaches are currently in use in order to classify the enzyme. Some of the questions one may ask are:

  1. 1.

    What is the primary structure (the amino acid sequence) of the enzyme? What are the binding residues and/or binding module(s) associated with the enzyme? What are its other accessory modules and their respective role(s) in catalysis or stability?

  2. 2.

    Is the enzyme a “true” cellulase, i.e., is its preferred substrate a cellulose or a cellulose-degradation product, and can the enzyme act alone on insoluble cellulose. This is to be distinguished from simple endoglucanases and exoglucanases and their activities on model substrates.

  3. 3.

    What is the mode of action? Does the enzyme act as an endoglucanase, an exoglucanase, or a processive enzyme?

  4. 4.

    What is the stereochemistry of the reaction? Does the enzyme exhibit an inverting or retaining mechanism?

  5. 5.

    What are the catalytic residues: the acid/base residue and the nucleophile that characterize a glycoside hydrolase?

In the early years of cellulase research, several extensive reviews and book chapters dealt with different assays of cellulose degradation (Ghose 1987; Wood and Kellogg 1988). In this treatise, we will briefly summarize the various approaches currently in use and direct the reader to the relevant literature.

While characterizing the activity of a new enzyme preparation, one has to bear in mind several secondary or indirect issues, such as the purity of the protein preparation, the sensitivity of the assay used, and the cross-reactivity of the expected enzymatic activities. In some cases, only detailed kinetic analysis can provide appropriate characterization of the enzyme. As for many other types of glycoside hydrolases, cellulases can exhibit cross-reactivity with substrates of similar structure. This is particularly true when using, e.g., p-nitrophenyl derivatized substrates that provide highly sensitive assays. However, in many cases, such a soluble synthetic chromogenic substrate can fit the active-site pocket of a related but atypical enzyme, which catalyzes its hydrolysis. For example, Family-10 glycoside hydrolases are typically xylanases but individual members of this family can readily hydrolyze p-nitrophenyl cellobioside which is a typical cellulase substrate. Without a detailed comparative kinetic analysis (kcat/km) using different substrates, the true specificity of the enzyme might be overlooked.

Today, given the amino acid sequence of the protein, its assignment to a given glycoside hydrolase family can in many cases provide a reasonable general indication of its activity. The description of the modular structure provides additional knowledge that can imply how the catalytic function might be modulated, but this knowledge can also be misleading. In the final analysis, there is no substitute for extensive biochemical and biophysical characterization of the given protein (recombinant or native) and its catalytic properties. In case of a native enzyme, it is imperative to ensure that contaminating enzymatic activities have been removed. This is not a trivial undertaking. In the case of a recombinant form of an enzyme, it is imperative to ensure that the enzyme is correctly folded and the activity(ies) is (are) indicative of the parent protein. In the case of multi-modular enzymes, wherein the ancillary modules may alter the character of the catalytic module, again, these efforts are nontrivial.

General procedures for assaying for cellulase and hemicellulase activities are very well documented in the Methods in Enzymology Volume 160 (Wood and Kellogg 1988) and a new Volume of this series is forthcoming (Gilbert, 2012). Conventional procedures for cellulase assay have been defined precisely by IUPAC (Ghose 1987). However, due to the complexity of the substrate and enzyme systems, these procedures can only provide a starting point for understanding the true nature of the enzyme in question.

Since the publication of Part A of this treatise (Coughlan and Mayer 1992), many of the previously reported assays of cellulase activity are still in common use. These include the use of soluble, derivatized forms of cellulose, e.g., carboxymethyl cellulose and hydroxymethyl cellulose, as conventional substrates for determining endoglucanase activity. In addition, a derivatized, colored form of insoluble cellulose, i.e., azure cellulose, is frequently used as an indication of cellulase activity. Zymograms with such colored embedded substrates are useful in detecting endoglucanase or xylanase activities (Béguin 1983). Individual soluble cellooligomers (cellotetraose, cellopentaose, cellohexaose, etc.) are still used as substrates for analyzing enzyme action, but the reliance on these substrates as determinants for assessing cellulase activity is no longer a definitive approach. Substrate analogues and reagents were developed that include the use of thioglycoside substrates (Driguez 1997), fluoride-derivatized sugars (Williams and Withers 2000), and chromophoric and fluorescent cellooligosaccharides (Claeyssens and Henrissat 1992; O’Neill et al. 1989; van Tilbeurgh et al. 1985). An ultraviolet-spectrophotometric method and an enzyme-based biosensor have also been described (Bach and Schollmeyer 1992; Hilden et al. 2001). In addition, a novel and intriguing bifunctionalized fluorogenic tetrasaccharide has been developed as an effective reagent for measuring the kinetic constants of cellulases by resonance energy transfer (Armand et al. 1997).

The thiooligosaccharides serve as competitive inhibitors that mimic natural substrates but are enzyme resistant (Driguez 1997). In this type of oligosaccaride, the oxygen of a bond to be cleaved is replaced by sulfur. The thiooligodextrins are sometimes more soluble than the native cellodextrins and longer chains can be synthesized. The modified sugars can be used in biochemical studies or crystallographic studies to gain some information about the geometry of the active site or determine the mechanism of action of an enzyme.

Determination of “True” Cellulase Activity: Solubilization of Crystalline Cellulose Substrates

True cellulase activity is usually defined as the ability to solubilize to an appreciable degree insoluble, “crystalline” forms of cellulose. The extent of hydrolysis can be evaluated by turbidity assays, weight loss of insoluble material, generation of reducing power, and accumulation of soluble sugars. It is important to realize that crystalline cellulose is not of uniform composition and therefore the rate of catalysis is in most cases not linear with time or enzyme concentration. Notably, the different preparations of crystalline cellulose contain varying levels of loosely associated loops and chains. The latter are readily accessible to hydrolysis by a given enzyme and lead to relatively high initial rates of activity, which do not reflect the actual degree of true cellulase activity. For example, such loose chains can be degraded by a relatively ineffectual enzyme, whereas the crystalline portions of the substrate will be immune to further hydrolysis by the same enzyme. To overcome these difficulties, IUPAC suggests determining the amount of enzyme required to achieve digestion of 5.2 % of the insoluble substrate (e.g., filter paper) in 16 h (Ghose 1987; Irwin et al. 1993).

Cellulose substrates commonly in use include Avicel, filter paper, cotton, Solka Floc, as well as bacterial cellulose from Acetobacter aceti and algal cellulose prepared from Valonia. Consequently, these assays should be treated as a relative and not quantitative assessment. The nature of the original substrate selected—especially its extent of crystallinity—should always be taken into account. Proper controls and reference substrates should always be used. One should be wary about comparison among results reported by different laboratories and even by different researchers in the same laboratory. Nevertheless, such assays give an excellent indication of whether a given enzyme preparation exhibits substantial activity toward crystalline cellulose substrates.

Endoglucanase Versus Exoglucanase Activity

As discussed earlier in this chapter, the cellulases have traditionally been divided into either endoglucanases or exoglucanases (Fig. 6.4 ). The biochemical or enzymatic assays that discriminate between these two modes of action usually involve soluble forms of cellulose, i.e., carboxymethyl or hydroxymethyl derivatives of cellulose. The action of a given enzyme on these substrates is followed by determining the amount of reducing ends generated by the enzyme and the degree of polymerization (DP). The reducing power is usually determined either by using reagents such as 3,5-dinitrosalicylic acid (DNS) (Miller et al. 1960), ferricyanide (Kidby and Davidson 1973), or copper-arseno molybdate (Green et al. 1989; Marais et al. 1966).

Despite their traditional popularity, these two methods are intrinsically disadvantageous, owing to interference by metal ions and certain buffers. Moreover, such assays are sensitive to the chain length of the reducing end. A more recent approach involves the use of disodium 2,2′-bicinchoninate (BCA) for determination of reducing sugar. This procedure is more sensitive than the conventional methods and gives comparable values of reducing sugars for cellodextrins of different lengths (Doner and Irwin 1992; Garcia et al. 1993; Vlasenko et al. 1998; Waffenschmidt and Jaenicke 1987).

Viscosity-based measurements represent the most common approach for assessing the degree of polymerization. This approach is highly sensitive for internal bond cleavage, which leads to significant reduction of the average molecular weight of the substrate. The comparison between the amount of reducing sugars generated and the average molecular weight (i.e., viscosity or fluidity of the soluble cellulose substrate) gives a very good indication whether an enzyme is essentially exo- or endo-acting.

The average degree of polymerization can also be evaluated by size-exclusion chromatography either alone (Srisodsuk et al. 1998; Teeri 1997) or combined with multi-angle laser light scattering (Vlasenko et al. 1998). Mass spectrometric procedures can also be applied to determine the identity and distribution of degradation products following hydrolysis of cellulosic substrates by an enzyme (Hurlbert and Preston III 2001; Rydlund and Dahlman 1997). The mode of enzymatic action can also be appraised by determining the increase in reducing power associated with the insoluble versus the soluble fraction of the substrate. Increase in the proportion of reducing sugars associated with the soluble fraction indicates an exo type of activity whereas a relatively large increase in the insoluble fraction would suggest an endo type of activity (Barr et al. 1996).

Exocellulases can exhibit different specificities depending on their preference for the reducing or nonreducing end of the cellulose chain (Barr et al. 1996; Teeri 1997). This feature of an exocellulase can be determined either by using oligosaccharide substrates labeled by tritium or 18O at the reducing end. Other procedures involve NMR, HPLC, and/or mass spectrometric analysis of products released from native (unlabeled) cellooligosaccharides. In previous studies, the 3D structures of enzyme-substrate complexes have been obtained, and the specificities of the enzyme can be interpreted directly from the data (Davies and Henrissat 1995; Davies et al. 1998; Divne et al. 1998; Juy et al. 1992; Notenboom et al. 1998; Parsiegla et al. 1998; Rouvinen et al. 1990; Sakon et al. 1997; Zou et al. 1999). These efforts have since continued as novel families of the glycoside hydrolases were established; selected members of these families were subjected to crystallization studies in order to characterize the overall structural features and mode of action of the entire family.

Processivity

One of the major recent conceptual advances in assessing the mode of enzymatic action of a cellulase is the concept of processivity. Processive enzyme action can be defined as the sequential cleavage of a cellulose chain by an enzyme. In effect, exoglucanases are by nature and structure processive enzymes. Their tunnel-like active site thus allows processive action on the cellulose chain. Endoglucanases, however, were thought to be intrinsically non-processive. However, the traditional distinction between exo- and endo-cellulases has been modified.

Experiments combining two or more purified cellulases have shown that synergism can even be detected upon mixing two different types of exo-acting enzymes. Such experiments led to the recognition that the exo enzymes can operate on both ends (i.e., the reducing and nonreducing ends) of the cellulose chain. Some enzymes, however, exhibit both endo and exo activities, although in such cases the endo-cellulase activity is usually very low. In attempts to explain these phenomena, the concept of processivity was proposed, by which the activity of the enzyme is characterized by the sequential hydrolysis of the cellulose chain. Implicit in this concept is the notion that the catalytic site of the enzyme remains in continual and intimate contact with a given chain of the cellulose substrate.

A more complete mechanistic picture of the processive nature of such cellulases was revealed with the advent of high-resolution 3D structures. It was thus demonstrated that the cellulose chain makes contact with the protein at multiple sites, either via a tunnel-shaped structural element (such as that observed in the Family-48 enzymes) or by a special type of CBM (such as the Family-9 Theme B cellulases). These arrangements allow the threading of the cellulose chain into the active site, and, following initial cleavage at the end of the chain, the enzyme can move along the chain and position itself for the next cleavage. In addition to this processive nature of the active site, these enzymes can also make classic endo cleavages thus generating new ends.

Biochemically processive enzymes exhibit characteristics between endo and exo enzymes. They have low but detectable endo activity toward soluble derivatives of cellulose (i.e., CMC), and may or may not possess exo activity on such substrates. With insoluble substrates, they will generate reducing power with a ratio between the soluble to the insoluble fractions of about 7. Endocellulases usually give a ratio of less than 2, whereas exocellulases produce a ratio of 12–23 (Irwin et al. 1998).

Once the processive nature of an enzyme has been indicated experimentally, molecular insight into the mechanisms responsible for this feature can be gained by determining the 3D crystal structure of the active site together with model cellodextrins. In the case of the cellulases, the crystal structure of the catalytic module together with the fused CBM, combined with accumulating enzymatic activity data, allowed further postulation as to the accessory role of the fused module. The fused CBM presumably interacts with a single cellulose chain and feeds it into the active site. Interestingly, this module does not bind crystalline cellulose, but is inferred to act in dynamic binding of the single cellulose chain prior to its hydrolysis, thereby imparting the quality of processivity to the enzyme. Once such a property is associated with a given type of enzyme, the primary structure of the protein can now be used as an indication for all such enzymes. In the case of the Family-9 Theme B enzymes, it is now possible to identify the catalytic module (e.g., glycoside hydrolase Family 9) and the additional accessory modules (in this case, Family-3c CBM). Thus, the primary structure may by itself give a strong indication of the nature of the enzyme itself. Of course, the ultimate identification as to the mechanism of enzyme activity will come from detailed 3-D structure of the enzyme-substrate complex.

An intriguing recent development in the analysis of the cellulolytic action of a given cellulase or a mixture of cellulase is the direct transmission electron microscopic (TEM) observation of the enzymatic action on bacterial cellulose ribbons. The approach provides information as to the endo or exo preference of the enzyme, the extent of processivity, as well as the directionality of hydrolysis (i.e., from the reducing to the nonreducing ends or vice versa). This strategy has been used to study the hydrolysis of bacterial cellulose ribbons by individual purified enzymes, mixtures of purified enzymes, and intact cellulosomes.

Mechanism of Catalysis

The mechanism of catalysis of cellulases address issues such as stereochemistry, binding- and active-site residues, and transition state intermediates. Excellent reviews have been published covering many of these issues (Ly and Withers 1999; McCarter and Withers 1994; Rye and Withers 2000; Sinnott 1990; White and Rose 1997; Withers 2001; Withers and Aebersold 1995; Zechel and Withers 2000). The fact that the stereochemistry and catalytic residues are conserved between members of the same family allows the putative identification of these elements if one member of the given (glycoside hydrolase) has been characterized biochemically (Henrissat and Bairoch 1996; Henrissat et al. 1995; Henrissat and Davies 1997).

The stereochemistry of the reaction can in most cases be determined by proton NMR spectroscopy or by using chromatography systems that allow the resolution of anomeric species. In the case of NMR, the reaction between the test enzyme and its substrate is carried out in D2O and the appearance of the anomeric proton can be easily detected. Thus, for the degradation of cellulose, a retaining enzyme would produce a product in the β configuration whereas an inverting enzyme would yield the α-sugar.

The catalytic residues can be identified by performing site-directed mutagenesis on conserved acidic residues and studying the catalytic properties of the mutants with substrates bearing different leaving groups. Commonly used phenol substituents include the following, listed in order of leaving group ability (pKa values shown parenthetically): 2,4-dinitro (3.96) > 2,5-dinitro (5.15) > 3,4-dinitro (5.36) > 2-chloro4-nitro (5.45) > 4-nitro (7.18) > 2-nitro (7.22) > 3,5-dichloro (8.19) > 3-nitro (8.39) > 4-cyano (8.49) > 4-bromo (9.34) (Tull and Withers 1994). In retaining enzymes, the nucleophilic residue can be identified directly by trapping the intermediate with an appropriate inhibitor. Such inhibitors include model saccharides containing a fluorine substituent in the 2- or 5-position and a good leaving group, such as fluoride or dinitrophenolate (Williams and Withers 2000). The substituted substrate forms a relatively stable covalent substrate-enzyme complex, involving the nucleophile residues. The complex is then subjected to proteolytic cleavage and sequencing of the glycosylated peptide. The use of protocols involving combined liquid chromatography and mass spectrometry has facilitated the identification of the modified residues.

The acid-base residue in a retaining enzyme can be identified by a combination of kinetics-based methodologies. Mutation of this residue (usually to alanine) should affect the rate of both chemical steps, i.e., glycosylation and deglycosylation, though the effect on each step should be different. The effect on the glycosylation step will depend strongly on the leaving group ability of the aglycon. Thus, rates of hydrolysis for substrates with a poor leaving group should be affected much more strongly than those for substrates with a good leaving group. The deglycosylation step, however, will be affected equally for all substrates carrying different leaving groups, because the same glycosyl enzyme intermediate is hydrolyzed during this step. Thus, detailed kinetic analysis (i.e., determination of k cat and Km) with substrates bearing different leaving groups can reveal whether the corresponding mutation is the acid-base residue. It should be noted that this approach requires synthetic substrates that are not necessarily recognized by all families of enzymes and are not necessarily commercially available. For example, the Family-11 xylanases fail to hydrolyze p-nitrophenyl xylobioside, which is an excellent substrate for the Family-10 xylanases. The assignment of the acid-base catalyst can also be examined by use of external nucleophilic anions, such as azide. In this approach, termed “azide rescue,” the small azide anion enters the vacant space created by alanine replacement of the acidic amino acid residue. The azide reacts with the anomeric carbon instead of a water molecule to form the corresponding β-glycosyl azide product. In the absence of an acid-base catalyst, which normally provides general base catalysis during the second step, the deglycosylation step is severely affected. Thus, the acceleration of the reaction by the mutant enzymes in the presence of these external anions (provided that the second step is rate limiting) is a good indication that a mutant residue is the acid-base catalyst. Finally, the assignment of the acid-base catalyst can be tested by comparing the pH-dependence profiles for the wild-type and mutant enzymes. The profile for the native enzyme would approximate a perfect bell-shape curve, reflecting the ionization of the two active site carboxylic acids, whereas the no reduction of activity at high pH values would be observed for the mutant. This pH dependency approach is also applicable for identifying the nucleophile residues and the catalytic residues in inverting enzymes.

Prokaryotic Cellulase Systems

The cellulolytic bacteria produce a variety of different cellulases and related enzymes, which together convert the plant cell wall polysaccharides to simple soluble sugars that can subsequently be assimilated. The complement of cellulases and hemicellulases that are synthesized by a given bacterium for this purpose is referred to as its cellulase system. Different bacteria exploit different strategies for the ultimate degradation of their substrates. The given strategy is reflected by the complement and type(s) of enzymes produced by a given bacterium. The bacterial cellulase system may be characterized by free enzymes, cell-bound enzymes, multifunctional enzymes, cellulosomes, or any combination of the latter. Collectively, these four types of enzymes represent the major paradigms of plant cell wall polysaccharide-degrading enzymes (Himmel et al. 2010).

Cellulase enzyme systems are comprised of several different types of components, each type may exist in a multiplicity of forms. To add to the complexity, the same component may exist as free individual entities in the culture fluid, as individual entities bound to cellulose, or associated with the cell surface. Alternatively, an individual component may be organized as part of a multicomponent cellulosome complex attached to the cell surface, to the cellulose, to both, or as free complexes in the culture fluid. Furthermore, the situation existing during growth under one set of conditions (e.g., pH, temperature, distribution of carbon source) may not exist under another, or may change considerably during the course of cultivation. The bacterium reacts to these changes and its production of cellulases and/or cellulosomes may reflect the dynamics of the growth conditions.

Free Enzymes

As mentioned earlier in this chapter, the free enzymes in their simplest form comprise a catalytic module alone with no accessory modules. Such enzymes often specialize on degrading soluble oligosaccharide breakdown products. Alternatively, such single-modular enzymes may rely on an intrinsic association with insoluble polysaccharide substrate such as cellulose, perhaps related to the active site of the enzyme.

A higher-order level of organization and activity are free enzymes composed of a polypeptide chain that includes both a catalytic module together with a CBM. This basic bi-modular arrangement can be further extended by the inclusion of additional types of modules or repeating units of the same module, all of which serve to modulate the activity of the catalytic module on the substrate. The intact free enzyme, however, remains unattached to other enzymes and can work in an independent manner on a given substrate. Free enzymes containing larger numbers of ancillary modules are also prevalent in components of bacterial cellulase systems.

Examples of bacteria that possess free carbohydrate-degrading enzymes include the well-established actinomyces, Thermobifida fusca and Cellulomonas fimi. More details of their enzyme systems will be presented in a forthcoming section.

The more recent discovery of Saccharophagus degradans 2-40 has provided a particularly intriguing and elaborate cellulolytic bacterium that can grow alone on cellulose without the assistance of other microorganisms. S. degradans 2-40 is the first free-living marine bacterium demonstrated to be capable of degrading cellulosic algae and higher plant material, and its genome codes 15 extraordinarily long polypeptides, ranging from 274 to 1,600 kDa (Weiner et al. 2008). This bacterium has a remarkable range of catabolic capabilities, and many of the enzymes exhibit unusual modular architectures including novel combinations of catalytic and substrate-binding modules. S. degradans 2-40 can degrade different complex polysaccharides (at least 10), including agar, chitin, alginic acid, cellulose, β-glucan, laminarine, pectin, pullulan, starch, and xylan and utilize them as sole carbon and energy sources (Ensor et al. 1999).

The genome of S. degradans encodes abundant glycoside hydrolases families mainly GH5 (20 in number) followed by GH43 (13), GH13 (10), GH16 (9), GH2 (7), and GH3 (6) (Taylor et al. 2006). The CAZymes of this bacterium are generally extracellular free enzymes, many of which are decorated with at least one CBM. One of the GH5 enzymes is believed to exhibit processive endoglucanase activity and contains two such catalytic modules together with three copies of family-6 CBMs in the same polypeptide chain (Watson et al. 2009). Interestingly, the chitinases, agarases, and alginases produced by this bacterium are not exported into the extracellular matrix but are localized in surface protuberances, resembling those of the cellulosome-producing bacteria.

The genome of S. degradans encodes the largest set of identifiable CBMs so far reported (Weiner et al. 2008). Carbohydrate binding modules of Family 6 (CBM6) are the most numerous (43 copies) followed by CBM 32 (26) and CBM2 (19). Among the long polypeptides encoded by S. degradans 2-40 genome, five of them contain at least 52 bacterial cadherin (CA) and cadherin-like (CADG) domains. Both domain types exhibit Ca2+-dependent binding to different complex polysaccharides which serve as growth substrates (Fraiberg et al. 2010, 2011).

Recent evidence suggests that the regulatory mechanisms that control the expression of the various enzymes of the cellulolytic system are very complex and contain an intricate chemotaxis signal transduction network for detecting both extracellular and intracellular signals and numerous chemotactic response regulators (Zhang and Hutcheson 2011).

Multifunctional Enzymes

Some cellulases exhibit a more complex architecture in that more than one catalytic module and/or CBM may be included in the same protein. Examples of such enzymes are the very similar cellulases from Anaerocellum thermophilum (Zverlov et al. 1998) and Caldocellum saccharolyticum (Te’o et al. 1995), both of which contain a Family-9 and a Family-48 catalytic module. Additional examples of the latter type of multifunctional enzyme have been found in A. cellulolyticus and C. clariflavum. Other paired catalytic modules include those from Family 44 and either Family 5 or 9. Such an arrangement might indicate a close cooperation between two particular catalytic modules, which may lead to synergistic action on the cellulosic substrate, thus portending on a smaller scale the advent of cellulosomes.

Like the cellulases, xylanases also tend to exhibit a modular structure, being composed of multiple modules joined by linker sequences. Family-10 and Family-11 xylanases may be linked in the same polypeptide chain either to each other, to catalytic modules from Families 5, 16, and 43, or to carbohydrate esterases (Flint et al. 1993; Laurie et al. 1997). One particularly interesting combination of multifunctional catalytic modules that appear in the same polypeptide chain is a typical xylanase together with a feruloyl esterase. Such a combination would allow the rapid cleavage of hemicellulose from the lignin in natural systems, i.e., the plant cell wall (see Fig. 6.3 ). In this manner, the xylan chain would be severed by the xylanase component (Xyn in Fig. 6.3 ) and the lignin-xylan association would be disconnected simultaneously by the feruloyl acid esterase (Fae in Fig. 6.3 ).

Indeed, some xylanases are extremely complex in their modular architecture (Fig. 6.10 ). In addition to multiple catalytic modules, these enzymes often contain several different types of CBMs. Why would such a xylanase contain several types of CBM? And why would a xylanase contain a cellulose-specific CBM? Unlike the case of various cellulases, for which the CBM is usually essential for degrading insoluble crystalline cellulose, the CBMs of a hemicellulase do not necessarily bind the hemicellulose component (xylan). In some cases, its CBM is in fact an authentic CBM that situates the hemicellulase on the insoluble plant cell wall material by utilizing the most abundant and most stable cell-wall component—cellulose. Indeed, the three Family-3 CBMs (CBM3) shown in Fig. 6.10 apparently bind to crystalline cellulose. Why would this xylanase require three tandem copies of the same type of CBM is yet another mystery that should eventually be addressed experimentally. At any rate, once bound via the cellulose component of the plant cell wall composite substrate, the immobilized enzyme then acts on the accessible and appropriate hemicellulose components. Once thus situated on the plant cell wall, another type of CBM on the same molecule would then assist in the binding to the xylan (or mannan, etc.) component in order to direct the appropriate catalytic module to its true substrate. Hence, the modular proximity of the xylanase shown in Fig. 6.10 would presumably indicate that the two CBM22 would modulate the action of the Family-10 catalytic module, and the C-terminal CBM6 would facilitate the catalysis by the Family-43 module. Together, the two catalytic modules would act synergistically to degrade susceptible plant cell wall components. In this context, the complex architecture of a xylanase would reflect the complex chemistry of its substrate and the neighboring polymers of its immediate environment in the plant cell wall.

Fig. 6.10
figure 006710figure 006710

A very large, cell-surface enzyme from Thermoanaerobacter thermosulfurogenes. The 1861-residue enzyme contains an SLH module, which is believed to mediate the attachment of the enzyme to the cell surface in Gram-positive bacteria. The enzyme contains a multiplicity of modules, which apparently serve to regulate the hydrolytic action of its single Family-13 catalytic module with the complex substrate. Several X domains of unknown function may either represent as yet undescribed catalytic functions, carbohydrate-binding activities, or structural entities

Cellulosomes

Cellulosomes are multienzyme complexes, which bind to and catalyze the efficient degradation of cellulosic substrates. The first cellulosome was discovered while studying the anaerobic thermophilic bacterium, Clostridium thermocellum (Bayer et al. 1983; Lamed et al. 1983a, b). Since its initial description in the literature, the cellulosome concept has been subject to numerous reviews (Bayer et al. 1996; Béguin and Lemaire 1996; Belaich et al. 1997; Doi et al. 1994; Doi and Tamura 2001; Felix and Ljungdahl 1993; Karita et al. 1997; Lamed and Bayer 1988a, b, 1991, 1993; Lamed et al. 1983; Shoham et al. 1999).

Cellulosomes in C. thermocellum exist in both cell-associated and extracellular forms, the cell-associated form being associated with polycellulosomal protuberance-like organelles on the cell surface. Later, cellulosomes were detected in other cellulolytic organisms (Lamed et al. 1987a, b; Mayer et al. 1987), including Acetivibrio cellulolyticus, Bacteroides cellulosolvens, Clostridium cellulovorans, and Ruminococcus albus, all of which contained protuberance-like organelles on their surfaces (Bayer et al. 1994; Lamed and Bayer 1988) (Fig. 6.11 ). The role of surface functions was further shown to be important in increasing the efficiency of cellulose fermentation (Lu et al. 2006).

Fig. 6.11
figure 006711figure 006711

A very large, multi-modular xylanase from Caldicellulosiruptor. The 1795-residue enzyme contains 8 separate modules, including 2 catalytic modules from Family-10 (invariably a xylanase) and Family-43 (frequently an arabinofuranosidase). These are modulated by numerous CBMs, which include three from Family-3 (likely for binding to crystalline cellulose), two from Family-22 (shown to function in xylan binding) and one from Family-6

The cellulosomes contain numerous components, many of which were shown to display enzymatic activity. They also contain a characteristic nonenzymatic high-molecular-weight component. This component proved to be highly antigenic and glycosylated (Bayer et al. 1985). The cellulosomal enzymatic subunits from this organism showed a broad range of different cellulolytic and xylanolytic activities (Morag et al. 1990). Ultrastructural evidence indicated the multi-subunit nature of the cellulosome (Fig. 6.12 ).

Fig. 6.12
figure 006712figure 006712

Scanning electron microscopy (SEM) of Acetivibrio cellulolyticus showing the presence of large characteristic protuberance-like structures on the cell surface. Cells are shown in the free state (a) or bound to cellulose (b). Cell preparations were treated with cationized ferritin before processing. Cationized ferritin has been shown to stabilize such surface structures, thus allowing their ultrastructural visualization (Lamed et al. 1987a, b). Without pretreatment with cationized ferritin, these structures are invisible. In (b), the cellulose-bound cells appear to be connected to the substrate via structural extensions of the cell-surface protuberances. Such a mechanism was originally observed for other cellulolytic prokaryotes, notably C. thermocellum (Bayer and Lamed 1986)

Eventually, genetic engineering techniques led to the sequencing of cellulosomal genes in C. thermocellum and several other bacteria, thus confirming the existence of cellulosomes as a major paradigm of prokaryotic degradation of cellulose and related plant cell wall polysaccharides. These efforts were further extended with the genome sequences of various Clostridia and Ruminococci species.

Cell-Bound Enzymes

Some enzymes are connected directly to the cell wall. In Gram-positive bacteria, this is frequently accomplished via a specialized type of module, the SLH (S-layer homology) module, previously shown to be associated with the cell surface of Gram-positive bacteria (Lupas et al. 1994). This arrangement may have evolved to provide a more economic degradation of insoluble substrates and to reduce competition with other bacteria for the soluble products, subject to diffusion in the media. As opposed to free enzymes, diffusion of an attached enzyme would itself be prevented.

Examples of enzymes, which are bound to the cell surface via an SLH module include, a Family-5 cellulase and a Family-13 amylase-pullulanase from Bacillus, a Family-10 xylanase from Caldicellulosiruptor (Saul et al. 1990), a Family-5 endoglucanase from Clostridium josui, a Family-16 lichenase and a Family-10 xylanase from Clostridium thermocellum (Jung et al. 1998), and a variety of enzymes (Family-10 xylanases, a Family-5 mannanase, and a Family-13 amylase-pullulanase) from different species of Thermoanaerobacter (Matuschek et al. 1996). The modular architecture of these enzymes may be particularly complicated, containing several different modules in a single polypeptide chain, thus forming extremely large enzymes sometimes comprising over 2,000 amino acids (Fig. 6.13 ). Other surface functions, such as adhesive properties, may also be associated with the same protein (Fraiberg et al. 2011; Ozdemir et al. 2012).

Fig. 6.13
figure 006713figure 006713

Comparison between negative staining (bottom) and cryo images (top) of the purified cellulosome from C. thermocellum, adsorbed on cellulose microcrystals from the algae Valonia ventricosa. The images illustrate the diversity of shapes of the cellulosomes, which adopt either compact or loosely organized ultrastructure. In the cryo images, the subunits of the cellulosomes (i.e., the individual enzymatic components) are clearly visible (Micrographs courtesy of Claire Boisset and Henri Chanzy (CNRS — CERMAV, Grenoble, France))

In a different bacterium, Ruminococcus flavefaciens, from the rumen of herbivores, the cellulosome is attached covalently to the bacterial cell surface by scaffoldin E (ScaE) (Rincon et al. 2005). ScaE is an anchoring scaffoldin that includes a C-terminal cell-anchoring signal motif for covalent attachment to the cell wall via the enzymatic action of an appropriate cell-associated sortase. Two key scaffoldins, scaffoldin B (ScaB) and cellulose-binding protein A (CttA), are attached to the cell-surface ScaE scaffoldin via a C-terminal X-dockerin (XDoc) modular dyad. ScaB is essentially an adaptor scaffoldin to which scaffoldin A (ScaA) and/or selected dockerin-bearing enzymes, including cellulases, are incorporated into the R. flavefaciens cellulosome. CttA is believed to mediate the attachment of the bacterium to cellulosic substrates (Rincon et al. 2007). Genome sequencing revealed several other structural proteins that include sortase signal motifs (Berg Miller et al. 2009; Rincon et al. 2010). In one case, a GH10 xylanase was sequenced that also bears a sortase signal motif at its C terminus.

An additional mechanism of bacterial surface attachment has recently been reported (Devillard et al. 2004; Xu et al. 2004; Ezer et al. 2008). Several CBMs were discovered following genomic sequencing of the rumen bacterium, R. albus. These CBMs were classified as Family-37 and found exclusively in R. albus. Half of the parent proteins are carbohydrate-acting enzymes (glycoside hydrolases, pectate lyases, and carbohydrate esterases). The involvement of CBMs in anchoring plant cell wall–degrading enzymes onto the bacterial cell surface extends the types of functions that this superfamily of protein modules performs in nature.

Clostridium thermocellum Cellulosomal Subunits and Their Modules

A simplified schematic view of the cellulosome from C. thermocellum and its interaction with its substrate is shown in Fig. 6.14 . The cellulosomal enzyme subunits are united into a complex by means of the primary scaffoldin subunit (Bayer et al. 1994; Shoseyov et al. 1992; Fujino et al. 1993; Gerngross et al. 1993). The scaffoldins usually contain a Family-3 CBM that provides the cellulose-binding function (Poole et al. 1992). The scaffoldins also contain multiple copies of a definitive type of cohesin module. The cellulosomal enzyme subunits, on the other hand, contain a complementary type of dockerin module. The interaction between the cohesin and dockerin modules provides the definitive molecular mechanism that integrates the enzyme subunits into the cellulosome complex (Salamitou et al. 1994b; Tokatlidis et al. 1991, 1993). Cohesin and dockerins are considered to be cellulosome “signature sequences”—i.e., their presence is a good indication of a cellulosome in a given bacterium (Bayer et al. 1998a). This has indeed been confirmed in many cases. However, non-cellulosomal cohesins and dockerins have been identified in many bacteria, as well as archaea and a few isolated cases of primitive eukarya, without a link to polysaccharide degradation (Bayer et al. 1999; Adams et al. 2008; Chitayat et al. 2008a, b; Peer et al. 2009; Voronov-Goldman et al. 2009).

Fig. 6.14
figure 006714figure 006714

Simplified schematic view of the molecular disposition of the cellulosome and one of the associated anchoring scaffoldins on the cell surface of C. thermocellum. The key defines the symbols used for the modules, from which the different cellulosomal proteins are fabricated. The progression of cell to anchoring scaffoldin to cellulosome to cellulose substrate is illustrated. The SLH module links the parent anchoring scaffoldin to the cell. The cellulosomal scaffoldin subunit performs three separate functions, each mediated by its resident functional modules: (1) its multiple type-I cohesins integrate the cellulosomal enzymes into the complex via their resident type-I dockerins, (2) its Family-3a CBM binds to the cellulose surface, and (3) its type-II dockerin interacts with the type-II cohesin of the exocellular anchoring scaffoldin

The major difference between free enzymes and cellulosomal enzymes is that the free enzymes usually contain a CBM for guiding the catalytic module to the substrate, whereas the cellulosomal enzymes carry a dockerin module that incorporates the enzyme into the cellulosome complex. Otherwise, both the free and cellulosomal enzymes contain very similar types of catalytic modules. The cellulosomal enzymes rely on the Family-3a CBM of the scaffoldin subunit for collective binding to crystalline cellulose.

The incorporation of the multiplicity of enzyme subunits into the cellulosome complex is a function of the repeated copies of the cohesin module borne by the scaffoldin subunit. For most species of scaffoldin, the cohesins have been classified as type-I on the basis of sequence homology. The cohesin module is composed of about 150 amino acid residues. The basic structure of the cohesin is known and comprises a nine-stranded beta sandwich with a jelly-roll topology (Shimon et al. 1997; Spinelli et al. 2000; Tavares et al. 1997).

The dockerin module contains about 70 amino acids and is distinguished by a 22-residue duplicated sequence (Chauvaux et al. 1990), which bears similarity to the well-characterized EF-hand motif of various calcium-binding proteins (e.g., calmodulin and troponin C). Within this repeated sequence is a 12-residue calcium-binding loop, indicating that calcium-binding is an important characteristic of the dockerin module. This assumption was eventually confirmed experimentally (Yaron et al. 1995). The specificity characteristics of the cohesin-dockerin interaction have also been investigated. The results showed that four suspected residues may serve as recognition codes for interaction with the cohesin module (Mechaly et al. 2000, 2001; Pagès et al. 1997). The three-dimensional solution structure of the 69-residue dockerin module of a Clostridium thermocellum cellulosomal cellulase subunit was determined (Lytle et al. 2001). As predicted earlier (Bayer et al. 1998; Lytle et al. 2000; Pagès et al. 1997), the structure consists of two Ca2+-binding loop-helix motifs connected by a linker; the E helices entering each loop of the classical EF-hand motif are absent from the dockerin module.

The scaffoldin of C. thermocellum also contains a special type of dockerin module. This dockerin failed to bind to the cohesins from the same scaffoldin subunit, but instead interacted with a different type of cohesin—termed “type-II” cohesins—identified on the basis of sequence homology (Salamitou et al. 1994a). These cohesins are somewhat different from those of type I, having an additional segment and diversity in the latter half of the sequence. Three-dimensional structures for several examples of type-II cohesins have been reported (Noach et al. 2003, 2005, 2008, 2009; 2010; Carvalho et al. 2005). The type-II cohesins were discovered as component parts of a group of noncatalytic cell-surface “anchoring” proteins on C. thermocellum (Leibovitz and Béguin 1996; Leibovitz et al. 1997; Lemaire et al. 1995; Salamitou et al. 1994a). The three known anchoring scaffoldins in C. thermocellum contain different copy-numbers of the type-II cohesins as illustrated in Fig. 6.15 . Each of these anchoring scaffoldins also contains an SLH (S-layer homology) module, analogous to those of the cell-bound enzymes mentioned above (see section on “Cell-Bound Enzymes”). The intervening sequences, however, between the cohesins and SLH modules are different. In any case, the type-II cohesins selectively bind the type-II dockerins, and the cellulosome (i.e., the scaffoldin subunit together with all of its enzyme subunits) is thereby incorporated into the cell surface of C. thermocellum.

Fig. 6.15
figure 006715figure 006715

Schematic view of the modular similarity and diversity of scaffoldins from different cellulosome species. Four major scaffoldins of the current C. thermocellum paradigm are shown. The type-I cohesin-dockerin pairs are shown in yellow, the type-II pairs are shown in pink, and the anchoring component (the SLH module) is in green. Anchoring scaffoldins are designated by the adjacent symbol of an anchor. Other mesophilic clostridial species are characterized by a single scaffoldin. The four scaffoldins of the A. cellulolyticus system are more cross-interactive than that of the C. thermocellum paradigm. The reversed types of cohesin-dockerin pairings are evident in the B. cellulosolvens system, as are its two exceptionally large scaffoldins. The type-III cohesin-bearing scaffoldins of the R. flavefaciens system are especially elaborate. The single-cohesin ScaC “adaptor” scaffoldin provides the means with which to modify the repertoire of cellulosomal components, and the monovalent ScaE cohesin attaches the ScaB adaptor scaffoldin to the cell surface. Each of the seven ScaB cohesins binds to a cohesin of the trivalent ScaA primary scaffoldin which incorporates dockerin-bearing enzymes into the complex. Micrographs of the different bacteria are included in the figure

In recent years, structures for cohesin-dockerin complexes have been reported, which represent a significant breakthrough in our understanding of how the scaffoldins are organized and cellulosome architecture in general. In this context, cohesin-dockerin complexes for both type I and type II have been elucidated. Moreover, the structures provide insight on the molecular level regarding the specificity of this high-affinity interaction.

The crystal structure of C. thermocellum scaffoldin-borne cohesin two module together with the dockerin module from xylanase 10B was the first cohesin-dockerin complex reported (Carvalho et al. 2003). Interestingly, very little conformational change was observed in the cohesin module relative to the known structure of the same cohesin alone. The dockerin bound to the 8-3-6-5 face of the cohesin via an extensive hydrogen-bonding network and supporting hydrophobic interactions. Surprisingly, the twofold symmetry observed for the type-I dockerin sequences of this bacterium reflected a 180° rotation on cohesin surface, resulting in a dual mode of binding, in which the parent enzyme can attain one of two very different conformations in space, with respect to the interacting modular couterparts (Carvalho et al. 2007; Pinheiro et al. 2008).

The subsequent crystal structure of the type-II complex between the C. thermocellum SdbA cohesin module and the CipA scaffoldin XDoc modular dyad provided additional surprises (Adams et al. 2005). The resultant complex structure exhibited striking differences from that of the type-I complex. Notably, the lack of sequential symmetry of the dockerin module appeared to preclude a dual mode of binding. Indeed, as opposed to the type-I cohesin-dockerin interaction, the type-II dockerin contacts the cohesin counterpart across the entire length of both helices, which appears to result in a higher affinity and a single mode of interaction.

Similarity and Diversity of Scaffoldins from Different Species

The modular architecture of the known scaffoldins and their comparison to that of Clostridium thermocellum is presented in Fig. 6.15 . Two scaffoldins for Acetivibrio cellulolyticus and Bacteroides cellulosolvens, like C. thermocellum, carry dockerin modules at their C terminus (Ding et al. 1999, 2000). The A. cellulolyticus genome also includes a gene (immediately downstream of the scaffoldin gene) coding for an anchoring scaffoldin, that contains type-II cohesins. It thus seems that the arrangement of the cellulosome on the cell surface of these latter strains may be analogous to that of C. thermocellum. It is interesting to note that the cohesins of the Bacteroides cellulosolvens scaffoldin are clearly type-II cohesins and not of type I. This infers that there is not a clear linkage between the type-II cohesins and anchoring scaffoldins.

The scaffoldins from the other clostridial species thus far described all lack “type-II” dockerin modules, the inference being that cells of C. cellulovorans, e.g., would apparently not bear anchoring scaffoldins which contain type-II cohesins. Since their cellulosomes appear to be surface bound, their anchoring thereto is likely accomplished via an alternative molecular mechanism. In subsequent publications (Doi and Tamura 2001; Tamaru and Doi 1999; Tamaru et al. 1999), a cell-surface binding function was proposed for a module of unknown function [designated X2 (Coutinho and Henrissat 1999a, b, c)] of the scaffoldin from C. cellulovorans. On the basis of sequence alignment of a few conserved identical amino acids with S-layer proteins from Mycoplasma hyorhinis and Plasmodium reichenowi, the authors consider that this module may be recognized as an SLH module. The four X2 modules of the C. cellulovorans scaffoldin are very similar in sequence to the X-modules from the scaffoldins of Clostridium cellulolyticum and C. josui, which contain only two and one copy of this module, respectively. If this module functions in attaching the scaffoldin with its complement of enzymes to the cell surface, it is unclear why there would be different copy numbers of the module in the different scaffoldins. Likewise, one of the C. cellulovorans cellulosomal enzyme components (EngE) also contains a triplicated segment of unknown function [designated X48 (Coutinho and Henrissat 1999a, b, c)] that the authors consider to be involved in cell-surface attachment (Tamaru and Doi 1999). In any case, final proof of the function of the X2 and X48 modules awaits biochemical examination, as has been clearly achieved for the SLH module of the C. thermocellum anchoring scaffoldins (Chauvaux et al. 1999; Lemaire et al. 1998).

Finally, two novel scaffoldins were sequenced from the rumen bacterium, Ruminococcus flavefaciens strain 17 (Ding et al. 2001; Rincon et al. 2003, 2004, 2007). Although the proteins contain multiple cohesins, their sequences indicate that they are neither of type-I or type-II, but occupy their own phylogenetic branch. Interestingly, the ruminococcal scaffoldins lack a known type of CBM. The lack of a scaffoldin CBM and the question as to how the ruminococcal cellulosome(s) and/or the bacterium bind to the substrate were eventually resolved at least partially by the discovery of an additional CBM-bearing scaffoldin coded by a gene in the scaffoldin gene cluster of this bacterium (Rincon et al. 2007). Furthermore, a draft genome sequence of a similar strain of the same species was recently reported (Berg Miller et al. 2009), which revealed an exceptionally elaborate cellulosome system with a multitude of dockerin-bearing components (Rincon et al. 2010), roughly threefold of that observed in the C. thermocellum genome.

Schematic Comparison of Prokaryotic Cellulase Systems

In this section, we will describe schematically the similarity of and diversity in representative enzyme systems, demonstrating different strategies, from different plant cell wall–degrading bacteria. It is emphasized that the accumulating information is based on what is known currently from biochemical data combined with gene sequencing and bioinformatics. The information is still rather sketchy but quite revealing when compared with different bacteria. As time progresses and the entire genomes of cellulolytic microorganisms become known, the data concerning the complement of enzymes produced by a given bacterium will be complete, and we will be able to speculate with heightened certainty how the various cellulase systems might have evolved. Indeed, during the past decade, the genomes of many cellulolytic species have been sequenced (see Table 6.1 ), thereby supplementing our knowledge of the cellulase and cellulosome components. Representative schematic lists of the latter components will be provided below in forthcoming figures. More extensive descriptions of the total content of carbohydrate-active enymes, i.e., the CAZome, of the different cellulolytic bacteria, are now readily obtainable via the CAZy database (http://www.cazy.org/). A survey of genes, however, does not inform us how a given bacterial system is regulated and what role(s) the bacterium and its enzyme system may play in nature. The explosive development of molecular biology techniques, however revealing, cannot supplant the fundamental contribution of biochemical and ecological approaches to the study of microbial degradation of cellulose and other plant cell wall polysaccharides.

Free Enzyme Systems

Many cellulolytic microorganisms show a very similar pattern in the types of enzymes that comprise the complement of their cellulase system. For the purposes of this discussion, the concept of “cellulase system” will include the complement of all plant cell wall hydolyzing enzymes and other glycoside hydrolases, including the different cellulases, per se; the hemicellulases (e.g., xylanases, mannanases); pectin-degrading enzymes; etc.

The cellulase system of the mesophilic cellulolytic aerobe, Cellulomonas fimi, is one of the first studied, and for many years has been one of the most studied bacterial cellulase systems (O’Neill et al. 1986; Shen et al. 1995; Whittle et al. 1982). The enzymes of this bacterium are essentially free enzymes, which allowed their early isolation and characterization. Moreover, the genes of the cellulases from this bacterium were of the earliest to have been sequenced. The modular composition and family associations of representative glycoside hydrolases from this bacterium are shown symbolically in Fig. 6.16 . As an example of a free enzyme system, most of the enzymes bear a substrate-targeting CBM, which, in Cellulomonas fimi, are mainly from Family-2. Several of the enzymes have multiple copies of the FN3 domain (fibronectin 3 domain), the function of which is still unknown.

Fig. 6.16
figure 006716figure 006716

Cellulomonas fimi cellulase system – an example of a cell-free enzyme system: Pictographic view of the enzyme components and their modular architecture. The modular content of the enzymes in this and subsequent figures is shown from (left to right) the N-terminus to the C-terminus of the polypeptide chain. The family numbers of the given modules are enumerated; the catalytic modules are in red. Key to symbols: GH glycoside hydrolase (e.g., cellulase, xylanase, mannanase), CE carbohydrate esterase (e.g., acetyl xylan esterase and ferulic acid esterase), CBM carbohydrate-binding module), SLH S-layer homology (module), FN3 fibronectin-3 (domain), Ig immunoglobulin-like domain, X domain of unknown function

The Cellulomonas system includes four Family-6 enzymes. Two of these are shown in the figure—an endoglucanase and an exoglucanase (cellobiohydrolase) of the types described in Fig. 6.4 . The modularity of the endoglucanase is very simple, having the Family-6 catalytic module together with a Family-2 CBM. The cellobiohydrolase is a bit more complex with 3 additional FN3 domains that separate the same two types of modules. The two additional Cel6 enzymes appear to lack CBMs and are not included in the figure. Another cellobiohydrolase (that exhibits processive cleavage of the substrate) is from Family-48. Its general modular architecture is similar to that of the Family-6 cellobiohydrolase with the substitution of the catalytic module from a different family. The cellulase system from this organism also includes two Family-9 cellulases with modular themes B and D, familiar to us from the earlier description (Fig. 6.7 ). Two additional Family-9 cellulases are included in Cellulomonas fimi; one contains a simple GH9 catalytic module with a single CBM2 and the other has no additional ancillary modules (neither are described in the figure). In addition, a simple Family-5 cellulase and an interesting cell-borne Family-26 mannanase are components of the system. An additional Family-5 enzyme bears a CBM13 and two other Family-26 enzymes are present (not shown). The fact that an enzyme, i.e., the Family-26 enzyme, bears an SLH module and is presumably cell-associated would underscore its importance to the cell. Finally, 3 xylanases are part of the enzymatic apparatus of Cellulomonas fimi. One of these xylanases is a simple enzyme consisting of a Family-10 catalytic module connected to a Family-2 CBM. The other two are more complicated, each containing two catalytic modules—either a Family-10 or a Family-11 module and a carbohydrate esterase (in both cases, probably an acetyl xylan esterase (Fig. 6.3 )—plus several CBMs. The genome for this bacterium has recently been sequenced, and its enzymatic system is much more extensive than that shown in the figure. For example, seven members of GH43 have been detected in its genome. For more information regarding the CAZome of Cellulomonas fimi, the reader is referred to the CAZy database (http://www.cazy.org/).

A second example of a free enzyme system, from the aerobic thermophilic bacterium Thermobifida fusca (formerly classified as Thermomonospora fusca), has also been studied extensively (Wilson 1992, 2004, 2008, 2009; Wilson and Irwin 1999). A brief comparison of its known enzyme components (Fig. 6.17 ) shows a striking resemblance to those of Cellulomonas (compare Figs. 6.16 and 6.17 ). Both species produce similar types of cellulases from families 5, 6, 9, and 48 plus xylanases from families10 and 11. Nevertheless, the modular repertoire of the corresponding enzyme in T. fusca is generally somewhat simpler. For example, two of the T. fusca cellulases include single FN3 domains, whereas several Cellulomonas cellulases harbor multiple copies of the same module. Some T. fusca enzymes lack accessory modules other than a cellulose-binding CBM, whereas the corresponding Cellulomonas enzyme is elaborated by multiple copies of accessory modules. In some cases though, the respective CBMs appear on opposite termini of the polypeptide chain (i.e., the Family-48 and Family-5 cellulases). The T. fusca genome has now been sequenced, and more extensive information is available regarding its CAZome (http://www.cazy.org/). In contrast to the numerous members of the GH43 enzymes in Cellulomonas fimi, there is only one GH43 enzyme in T. fusca.

Fig. 6.17
figure 006717figure 006717

Thermobifida fusca cellulase system. A cell-free enzyme system. Compare with the Cellulomonas system (Fig. 6.16 ). Key to symbols: GH glycoside hydrolase (e.g., cellulase, xylanase, mannanase), CBM carbohydrate-binding module, FN3 fibronectin-3 (domain), Ig immunoglobulin-like domain, X domain of unknown function

The complement of enzymes and their modular content of the free enzyme systems from Cellulomonas and T. fusca are not necessarily similar in other free enzyme systems. Many free enzyme systems, such as those of Butyrivibrio fibrisolvens, Pseudomonas fluorescens, Fibrobacter succinogenes, Saccharophagus degradans, and various species of Streptomyces, Erwinia, and Thermatoga, appear to have several cellulases, xylanases, and mannanases from the common families, together with other glycoside hydrolases, e.g., arabinosidases, lichenases, amylases, pullulanases, galactanases, polygalacturonase, glucuronidases, and pectate lyases. In many of these bacterial enzymes, the Family-2 CBM appears to predominate as a common cellulose-binding module, but in others (e.g., Erwinia), relevant enzymes usually bear a cellulose-binding CBM from Family-3. Nevertheless, in many of the free systems, many enzymes are characterized by CBMs from other families as well as other noncatalytic modules of unknown function (X modules). Once again, until the genome sequences of cellulolytic prokaryotes are widely available, we are still limited in our capacity to compare among the enzyme systems, due to our incomplete knowledge of their enzyme sequences.

Multifunctional Enzyme Systems

In an hyperthermophilic bacterium, classified as Caldicellulosiruptor, the enzymes currently characterized in this system also appear to be free enzymes, but their modular organization is of a higher order (Daniel et al. 1996; Gibbs et al. 2000; Reeves et al. 2000). Many of the enzymes of this system are “bifunctional” in that they contain two separate catalytic modules in the same polypeptide chain (Fig. 6.18 ). As mentioned earlier (see section “Multifunctional Enzymes”), the appearance of two catalytic modules in the same enzyme would infer a distinctive synergistic action between the two. Thus, in Caldicellulosiruptor CelA, the Family-9 and Family-48 catalytic modules would be expected to work in concerted fashion on crystalline cellulose. In another type of enzyme, the Family-10 xylanase and Family-5 cellulase would likely be most effective on regions of the plant cell wall that are characterized by cellulose-xylan junctions. The diversity in the modular architecture of the Family-10 xylanases is particularly striking, and the various combinations of this type of catalytic module are apparently important to the sustenance of the bacterium in its environment. One of these xylanases appears to be attached to the cell surface via an SLH module (Ozdemir et al. 2012). In contrast to the Cellulomonas and T. fusca enzymes that often harbor a Family-2 CBM, the module responsible for binding to cellulosic substrates in Caldicellulosiruptor enzymes is usually one or more copies of a Family-3 CBM. The presence of more than one copy of a CBM in this case may reflect the extreme temperatures of the ecosystem.

Fig. 6.18
figure 006718figure 006718

Caldicellulosiruptor enzyme system: An example of a cell-free enzyme system that includes several multifunctional enzymes. Key to symbols: GH glycoside hydrolase (e.g., cellulase, xylanase, mannanase), CBM carbohydrate-binding module, SLH S-layer homology (module). See also Table 6.5

Table 6.5 Bifunctional enzymes from the genus Caldicellulosiruptor

Other bacterial strains that include at least one free bifunctional enzyme in their enzyme systems are Anaerocellum thermophilum (now considered a species of Caldicellulosiruptor), Bacillus stearothermophilus, Fibrobacter succinogenes, Prevotella ruminicola, Ruminococcus albus, Ruminococcus flavefaciens, Streptomyces chattanoogensis, and thermophilic anaerobe NA10. The genomes of several species of Caldicellulosiruptor have now been sequenced (Kataeva et al. 2009; Blumer-Schuette et al. 2011). Each is characterized by different sets of bifunctional enzymes (Table 6.5 ), and some of these genomes either lack gene coding for such enzymes altogether or contain only one or two. Others carry up to seven bifunctional enzymes in their respective genomes (Dam et al. 2011). The different bifunctional enzymes include the various combinations (Himmel et al. 2010), notably cellulase-cellulase, cellulase-hemicellulase, hemicellulase-hemicellulase, hemicellulase-carbohydrate esterase, and even polysaccharide lyase-hemicellulase forms. The multiplicity of these genomes indicates the diverse nature of this genus of hyperthermophilic bacteria and reflects different patterns of substrate utilization.

Cellulosomal Systems

The inclusion of enzymes into a cellulosome via the noncatalytic scaffoldin subunit represents a higher level of organization. The association of complementary enzymes into a complex is considered to contribute sterically to their synergistic action on cellulose and other plant cell wall polysaccharides. As mentioned earlier (see earlier section “Similarity and Diversity of Scaffoldins from Different Species”), in the case of Clostridium thermocellum, Acetivibrio cellulolyticus, Bacteroides cellulosolvens, and Ruminococcus flavefaciens, the cellulosomes appear to be attached to the cell surface. The cellulosomes of C. cellulolyticum, C. cellulovorans, and C. josui may also be cell-associated, but, if so, the lack of a scaffoldin-borne dockerin and reciprocal anchoring scaffoldin would suggest an alternative mechanism.

The cellulosomes of some mesophilic clostridia, such as C. cellulolyticum, C. cellulovorans, C. josui, and C. papyroslvens are very similar. The genes encoding for many or most of the enzymes in the latter cellulosomal systems are arranged in a large cluster on the chromosome (Fig. 6.19 ). Additional cellulosomal genes, however, are located outside of the cluster in other regions of the chromosome. The majority of the cellulosome gene clusters from C. cellulolyticum and C. cellulovorans have been sequenced (Bagnara-Tardif et al. 1992; Belaich et al. 1999; Tamaru et al. 2000). In contrast, the cellulosomal genes from C. thermocellum are generally scattered over a large portion of the chromosome (Guglielmi and Béguin 1998). A few small clusters of cellulosomal genes are apparent in the genome, including a scaffoldin-containing cluster (Fig. 6.19 ) that also contains several cell-surface anchoring proteins (Fujino et al. 1993). The following descriptive analysis serves to compare the cellulosomal system of these three microorgansims. The genomes of all three bacteria have been sequenced, and the genomes of other cellulosome-producing bacteria are forthcoming in the near future.

Fig. 6.19
figure 006719figure 006719

Cellulosome-related gene clusters. Enzyme-linked gene clusters of the mesophilic Clostridia include an initial primary scaffoldin gene followed downstream by a series of genes encoding for various dockerin-bearing enzymes. Note the extensive similarity and subtle differences in the succession of enzyme-encoding genes. Multiple-scaffoldin gene clusters of the indicated bacteria comprise two or more genes in tandem that encode for scaffoldins. Scaffoldin genes and genes for enzymes are shown as light blue and pink arrows, respectively, whose length gives the approximate proportional size of the given gene relative to the others

Cellulosomal Components from Clostridium cellulolyticum

All of the sequenced enzymes from this organism are relatively common cellulases (Belaich et al. 1999). None of the known cellulosomal enzymes for this species contains more than one catalytic module (Fig. 6.20 ). The largest one, Cel9E (estimated at 94 kDa), is a Theme-D Family-9 cellulase (Gaudin et al. 2000). The critical Family-48 cellulase (Cel48F) is also a major cellulosome component (Reverbel-Leroy et al. 1997). The gene cluster of C. cellulolyticum contains several copies of other Family-9 cellulases, including Cel9G, Cel9H, and Cel9J, all of which contain the Theme-B fused Family-3c CBM (Belaich et al. 1998) (Fig. 6.8 ). The cellulosome system in this bacterium also contains numerous Family-5 cellulases (including Cel5A and Cel5D), a Family-5 mannanase (Man5K, which bears an N-terminal rather than C-terminal dockerin) and a Family-8 cellulase (Cel8C).

Fig. 6.20
figure 006720figure 006720

Clostridium cellulolyticum enzyme system. An example of a cellulosomal system. Key to symbols: GH glycoside hydrolase (e.g., cellulase, xylanase, mannanase), CBM carbohydrate-binding module, Doc dockerin module

Biochemical characterization of the C. cellulolyticum cellulosome demonstrated by SDS-PAGE a 160-kDa scaffoldin band and up to 16 smaller bands, representing putative enzyme subunits (Gal et al. 1997b). Many of these were clearly identified as known gene products. Early biochemical evidence suggested that xylanases from C. cellulolyticum are also organized in a cellulosome-like complex (Mohand-Oussaid et al. 1999). The genome sequence of this bacterium and subsequent proteomics studies revealed 62 dockerin-containing proteins, most of which are enzymes, including cellulases, xylanases, and other glycoside hydrolases, as well as carbohydrate esterases and polysaccharide lyases (Desvaux 2005; Blouzard et al. 2010).

Cellulosomal Components from Clostridium cellulovorans

Like C. cellulolyticum, the cellulases from this organism are relatively simple (see pictographical description of representative enzymes in Fig. 6.21 ). In addition to the cellulosomal enzymes thus described, several non-cellulosomal endoglucanases have also been partially or totally sequenced (Doi et al. 1998; Tamaru et al. 1999), notably those from Family-9 (Kosugi et al. 2002; Han et al. 2004, 2005).

Fig. 6.21
figure 006721figure 006721

Clostridium cellulovorans: A second cellulosomal system. Key to symbols: GH glycoside hydrolase (e.g., cellulase, xylanase, mannanase), CBM carbohydrate-binding module, Doc dockerin module, SLH S-layer homology (module), Ig immunoglobulin-like domain, X domain of unknown function

Several of the cellulosomal enzymes are architecturally synonymous to those of the C. cellulolyticum system (compare Figs. 6.20 and 6.21 ). This includes the critical Family-48 cellulase (Exg48S) (Liu and Doi 1998), two copies of the Theme-B Family-9 cellulase (Eng9H and Eng9Y), a Family-5 endoglucanase, and a Family-5 mannanase that bears an N-terminal dockerin (Tamaru and Doi 2000). Rather than a single Theme-D Family-9 cellulase as in C. cellulolyticum, the C. cellulovorans system contains two such enzymes (Eng9K and Eng9M). The C. cellulovorans cellulosome also appears to contain an unusual Theme-A Family-9 cellulase (Eng9L) that lacks helper modules. A dockerin-bearing pectate lyase (LyaA) infers that the bacterium would degrade pectin (Tamaru and Doi 2001). Indeed, early evidence (Sleat et al. 1984) indicated that, in addition to cellulose, C. cellulovorans is capable of assimilating a wide variety of other plant cell wall polysaccharides, including, xylans, pectins, and mannans.

The genome of C. cellulovorans was sequenced recently (Tamaru et al. 2010). Interestingly, 57 cellulosomal genes were identified in the genome, which, in addition to carbohydrate-active enzymes, also coded for lipases, peptidases, and proteinase inhibitors.

Cellulosomal Components from Clostridium thermocellum

Compared to the cellulosomal systems of C. cellulovorans and C. cellulolyticum, the enzymes from C. thermocellum are relatively large proteins, ranging in molecular size from about 40 to 180 kDa (Bayer et al. 1998c, 2000; Béguin and Lemaire 1996; Felix and Ljungdahl 1993; Lamed and Bayer 1988; Shoham et al. 1999). Examination of Fig. 6.22 reveals why these enzymes are so big—many of the larger ones contain multiple types of catalytic modules as well as other functional modules as an integral part of a single polypeptide chain [see Table I in (Bayer et al. 1998c) for a list of relevant references]. In addition to the cellulosomal enzymes, several noncellulosomal enzymes have also been described from this organism (Morag et al. 1990). These include two free enzymes (one of which lacks a CBM) and two cell-associated (SLH-containing) enzymes. Consequently, the potent cellulose- and plant cell wall–degrading activities of C. thermocellum are clearly reflected in its cellulase system, which displays an exceptional wealth, diversity, and intricacy of enzymatic components, thus representing the premier cellulose-degrading organism currently known.

Fig. 6.22
figure 006722figure 006722

Clostridium thermocellum: A complex cellulosomal system. Key to symbols: GH glycoside hydrolase (e.g., cellulase, xylanase, mannanase), CE carbohydrate esterase (e.g., acetyl xylan esterase and ferulic acid esterase), CBM carbohydrate-binding module, Doc dockerin module, SLH S-layer homology (module), Ig immunoglobulin-like domain, X domain of unknown function

Many of the C. thermocellum cellulosomal enzymes are cellulases, which include both endo- and exo-acting β-glucanases. Some of the important exoglucanases and processive cellulases include Cel48S, and various Family-9 cellulases. The Cel48S subunit is a member of the Family-48 glycoside hydrolases, and this particular family is recognized as a critical component of bacterial cellulosomes (Morag et al. 1991, 1993; Wang et al. 1993, 1994; Wu et al. 1988). Several other processive cellulases are members of the Family-9 glycoside hydrolases. Cel9F and Cel9N are Theme-B Family-9 enzymes (Fig. 6.7 ; Navarro et al. 1991). The other two are remarkably similar Theme-D enzymes, which exhibit nearly 95 % similarity along their common regions (Kataeva et al. 1999a, b; Zverlov et al. 1998c, 1999). The main difference between Cbh9A and Cel9K is the presence in the former of three extra modules (a Family-3 CBM and two modules of unknown function) (Kataeva et al. 2002, 2003, 2004, 2005). The functional significance of these supplementary modules to the activity of CbhA has not been elucidated.

The fact that the cellulosome from this organism contains many different types of cellulases is, of course, to be expected if we consider that growth of C. thermocellum is restricted to cellulose and its breakdown products, particularly cellobiose. Consequently, it is surprising to discover, in addition to the cellulases, numerous classic xylanases, i.e., those belonging to glycoside hydrolase families 10 and 11. In addition, two of the larger enzymes, Cel26H and Cel9/44J, contain hemicellulase components, i.e., Family-26 and Family-44 catalytic modules (a mannanase and a xylanase, respectively), together with a standard Family-5 and Family-9 (respectively) cellulase module in the same polypeptide chain (Ahsan et al. 1996; Yagüe et al. 1990). It is also interesting to note the presence of carbohydrate esterases together with xylanase modules in some of the enzyme subunits (i.e., XynU/A, XynY, XynZ and Cel5E), thus conferring the capacity to hydrolyze acetyl or feruloyl groups from hemicellulose substrates (Blum et al. 2000; Fernandes et al. 1999). Finally, the C. thermocellum cellulosome includes a typical Family-16 lichenase, a Family-26 mannanase, and a Family-18 chitinase.

The non-cellulosomal enzymes include another Theme-B Family-9 cellulase (Cel9I), and cell-bound forms of a xylanase (Xyn10X) and a lichenase (Lic16A), both of which contain multiple CBMs adjacent to the catalytic module. An additional non-cellulosomal Family-48 cellulase, Cel48Y, has also been described (Berger et al. 2007; Vazana et al. 2010). In the midst of all this complexity, the C. thermocellum non-cellulosomal cellulase system includes a simple Family-5 cellulase, Cel5C, which is completely devoid of additional accessory modules (Zverlov et al. 2005a; Feinberg et al. 2011). Why does this bacterium—which subsists exclusively on cellulosic substrates—need all these hemicellulases? The inclusion of such an impressive array of non-cellulolytic enzymes in a strict cellulose-utilizing species would suggest that their major purpose would be to collectively purge the unwanted polysaccharides from the milieu and to expose the preferred substrate—cellulose. The ferulic acid esterases, in concert with the xylanase components of the parent enzymes, could grant the bacterium a relatively simple mechanism by which it could detach the lignin component from the cellulose-hemicellulose composite. The lichenase (Lic16B) and chitinase (Chi18A) are also intriguing components of the cellulosome (Zverlov et al. 1991, 1998, 2002, 2005). The former would provide the bacterium with added action on cell-wall β-glucan components from certain types of plant matter. It is not clear whether the presence of the latter cellulosomal enzyme would reflect chitin-derived substrates from the exoskeletons of insects and/or from fungal cell walls. Whatever the source, the chitin breakdown products, like those of the hemicelluloses, would presumably not be utilized by the bacterium itself, but would be passed on to appropriate satellite bacteria for subsequent assimilation.

Subsequent genome sequencing of various strains of C. thermocellum has enhanced our understanding of the full complement of cellulosomal and non-cellulosomal enzymes produced by this bacterium (Zverlov et al. 2005a; Feinberg et al. 2011). Dockerin-containing components that are not directly involved in degradation of plant cell wall polysaccharides have also been identified (Kang et al. 2006; Schwarz and Zverlov 2006; Meguro et al. 2011). It is clear that the components and functions of the cellulosome system are much more complex than originally considered.

Gene Regulation of Cellulosomal Components

Over the past 10 years, the genomic revolution has provided the complete sequence of numerous bacterial genomes. Recent analysis of ∼1,500 of these genomes indicated that ∼40 % of the genomes encode for at least one cellulase gene (Medie et al. 2012). Within the cellulosome-producing bacteria, there are dozens of different cellulosome-related genes, and their expression appears to be highly regulated. Our ability to elucidate the regulatory mechanisms have changed dramatically in recent years due to the availability of new genomic sequences, the development of genetic tools for some of the classical cellulosome-producing strains and the establishment of workable proteomic procedures which allow the identification and quantification of numerous gene products in a single experiment. Much of the incentive for elucidating the regulatory mechanisms of cellulosome-producing bacteria is connected to their industrial potential for solubilizing lignocellulose for bioenergy production. In the context of this chapter, we will concentrate on new findings in C. thermocellum.

Regulation of Cellulase and Cellulosomal Genes in C. thermocellum

The various cellulosomal genes in C. thermocellum are, for the most part, mono-cistronic and scattered throughout the chromosome (Brown et al. 2007; Raman et al. 2009). Since the number of known dockerin-bearing enzymes is almost ten times the number of cohesins in the scaffoldin subunit, a unique interaction between cohesin-dockerin pairs is unlikely. This has indeed been substantiated for C. thermocellum in which all of the scaffoldin-borne cohesins recognize nearly all of the dockerin-containing enzymes. Thus, the composition of the cellulosome is governed by the relative amounts of the available dockerin-containing polypeptides that can be incorporated randomly into the complex (Bassen et al. 1995; Mitchell 1998). Regulation studies in C. thermocellum have indicated that the level and composition of the cellulosomal proteins vary with the composition of the growth media and cellobiose availability (Johnson et al. 1982; Nochur et al. 1990; Mishara et al. 1991; Bhat and Wood 1992; Nochur et al. 1992a, b; 1993; Raman et al. 2009).

Recent studies (Dror et al. 2003a, b; 2005; Stevenson and Weimer 2005; Zhang and Lynd 2005; Brown et al. 2007; Raman et al. 2011; Riederer et al. 2011) have demonstrated that expression of many cellulose-related genes is influenced by growth rate and the presence of extracellular polysaccharides. The molecular regulatory mechanisms in C. thermocellum were, until recently, very much obscure, and the bacterium does not appear to encode many of the well-characterized global regulatory elements found in Gram-positive bacteria, including the pleiotropic regulator CodY (Sonenshein 2007). In this regard, one of the LacI homologues, GlyR3, was shown to be a negative regulator of celC, a non-cellulosomal cellulase gene, and laminaribiose (a β-1-3 linked glucose dimer) appears to be its molecular inducer (Newcomb and Wu 2004; Demain et al. 2005; Newcomb et al. 2011). Remarkably, this was the first cellulose-related transcriptional factor identified in C. thermocellum.

Regulating by Sensing and the Involvement of Alternative Sigma Factors

As outlined above, it was postulated that C. thermocellum must possess a regulatory system that allows it to sense and react to the presence of high-molecular-weight polysaccharides in the extracellular environment presumably without importing their low-molecular-weight degradation products. While searching the C. thermocellum genome for the presence of carbohydrate binding modules, several Family-3 CBMs (CBM3s) were observed that were part of undefined polypeptides annotated as hypothetical proteins or membrane-associated proteins. Bioinformatic examination of these hypothetical peptides indicated possible homology to membrane-associated anti-σ factors. Following this initial observation, searching the public nucleotide and protein databases revealed that three strains of C. thermocellum contain a unique set of multiple ORFs resembling both Bacillus subtilis sigI and rsgI genes known to encode an alternative σI factor and its negative membrane-associated regulator RsgI, respectively (Asai et al. 2007).

Bioinformatic analysis of over 1,200 bacterial genomes revealed that the C. thermocellum RsgI-like proteins are unique to this species and are not present in several other cellulolytic clostridial species (e.g., Clostridium cellulolyticum and Clostridium papyrosolvens) (Kahel-Raifer et al. 2010). However, several new genome sequences of other cellulosome-producing bacteria, e.g., Acetivibrio cellulolyticus and Clostridium clariflavum, have revealed similar types of multiple biomass-sensing systems. Indeed, the possible involvement of alternative σ-factors in C. thermocellum was already suggested over 20 years ago by Mishara et al., following transcript analysis of three cellulosomal genes (celA, celD, celF) (Mishara et al. 1991). The C. thermocellum putative alternative σ factors are homologous to the recently characterized σI gene in B. subtilis (Zuber et al. 2001; Asai et al. 2007; Schirner and Errington 2009). Each of the genes encoding σI-like factors is positioned adjacent to a downstream gene encoding a multi-modular protein that contains only one strongly predicted trans-membrane helix (TMD) (Fig. 6.23 ). The ∼165-residue N-terminus of these trans-membrane proteins is homologous to the N-terminal segment of the B. subtilis anti-σI factor, RsgI. The C-terminal modules of these RsgI-like proteins, purportedly located outside the cell membrane, contain predicted polysaccharide-related functions including carbohydrate-binding modules (CBM3, CBM42), sugar-binding elements (PA14), and glycoside hydrolase modules of families 10 and 5.

Fig. 6.23
figure 006723figure 006723

Alternative σ-factor operons in C. thermocellum. The operons are made of two genes, the σ-factor gene (sig) and a transmembrane protein with an intracellular anti-σ factor at the N-terminus, followed by a transmembrane domain (TMD) and an extracellular sensor module at the C-terminus. Many of these proteins contain carbohydrate-related modules: i.e., a CBM, a glycoside hydrolase (GH), or sugar-binding proteins (PA)

The functional properties of the various elements of this system were verified experimentally (Kahel-Raifer et al. 2010; Nataf et al. 2010; Bahari et al. 2011). The binding properties of the extracellular sensing modules have been established with various polysaccharides including pectin, cellulose, arabinoxylan, and xylan (Kahel-Raifer et al. 2010; Bahari et al. 2011). Using isothermal titration calorimetry (ITC), it was possible to determine binding specificity and the dissociation constants (in the range of 0.02–1 μM) between the putative anti-σI factors to their corresponding σ factors (Nataf et al. 2010). The expression of the relevant alternative σ factor genes increased 3- to 30-fold in the presence of cellulose and xylan in the growth media, thus connecting their expression to direct detection of their extracellular polysaccharide substrates. Finally, the ability of σI1 to direct transcription from the σI1 promoter and from the promoter of celS (that encodes the Family 48 cellulase, Cel48S) was demonstrated in vitro by runoff transcription assays (Nataf et al. 2010).

Since many alternative sigma factors auto-regulate their own expression, it is possible to identify the signature of their promoter sequence by determining the transcriptional start sites of the target genes. Using this approach, over 60 cellulosomal genes were assigned to their corresponding alternative sigma factor.

In view of the above observations, a plausible model was proposed whereby the extracellular CBMs of putative anti-σI-like proteins can serve as biosensors that help assess the status of the biomass in the extracellular medium (Fig. 6.24 ). When the target substrate is unavailable, the σI-like factor is attached to the N-terminal cytoplasmic domain of the RsgI-like protein. Upon interaction with the target polysaccharide, the corresponding RsgI-borne CBM may undergo a conformational change, leading to the release of the σI factor, which then associates with RNA polymerase. Target gene(s) are then transcribed including those that code for various carbohydrate-active enzymes (CAZymes) and cellulosomal scaffoldins, as well as the σI/RsgI-like operon itself. These various CBMs recognize and bind different plant cell wall polysaccharides, which then induce different sets of CAZyme genes, thus activating the synthesis of the relevant glycoside hydrolases, carbohydrate esterases, and/or polysaccharide lyases.

Fig. 6.24
figure 006724figure 006724

Proposed mechanism for the activation of σ factors by extracellular polysaccharides. The carbohydrate-binding sensing module – the CBM – is positioned on the outer surface of the bacterium and linked via a short transmembrane domain to a short anti-σ peptide that binds and inactivates its cognate σ-factor (off state). In the presence of various target polysaccharides, the CBM binds the polymers which induces a conformational change that results in the release of the σ-factor, which now can initiate transcription of cellulose-utilization related promoters

New Genetic Tools for C. thermocellum

One of the major obstacles in studying gene regulation in C. thermocellum was the lack of reliable transformation procedures and genetic tools. Recently, the laboratory of Prof. Lee Lynd from Dartmouth College developed two procedures for obtaining knockout mutants in C. thermocellum (Olson et al. 2010; Tripathi et al. 2010; Argyros et al. 2011). Both procedures allow selection and counter selection for an integration event. The first system uses the elegant approach devised initially for yeast, taking advantage of the fact that mutants lacking orotidine-5′-phosphate decarboxylase (Pyr) not only require uracil for their growth (uracil auxotrophs) but are also resistant to the pyrimidine analog 5-fluoro-orotic acid (5-FOA) (Boeke et al. 1984). Thus, when working with a background-strain lacking orotidine-5′-phosphate decarboxylase (the pyrF gene in C. thermocellum), the presence of the pyrF gene can be selected for or against simply by the inclusion of uracil or 5-FOA, respectively, in the growth medium (Kondo et al. 1991; Schneider et al. 2005). The second approach utilizes the activity of hypoxanthine phosphoribosyl transferase (Hpt), which is required for purine metabolism and makes purine antimetabolites, such as 8-azahypoxanthine (AZH), toxic. Another component of the system is the gene thymidine kinase (Tdk) (which conveniently C. thermocellum lacks). Tdk converts fluoro deoxyuracil (FUDR) to fluoro-dUMP which is a suicide inhibitor of thymidylate synthetase, and this can be used for counter selection in the presence of FUDR. This second approach allows obtaining multiple deletion mutants without the presence of selection markers.

Genomics and Metagenomics

In the past decade, major strides for enzyme discovery have been achieved by genomic and metagenomic approaches, combined with bioinformatic analyses. An early work on a bacterial genome involved a plant cell wall polysaccharide-degrading species (Nelson et al. 1999). This initial work was eventually followed by genome sequencing studies of additional cellulolytic bacteria (Lykidis et al. 2007; Xie et al. 2007; Berg Miller et al. 2009; Kataeva et al. 2009; Hemme et al. 2010; Morrison et al. 2010; Tamaru et al. 2010; Feinberg et al. 2011). Combined bioinformatics, proteomics, and transcriptomics characterization can serve to reveal the components of the relevant enzyme system(s) in a given bacterium (Marcotte et al. 1999; Zverlov et al. 2005; Flint et al. 2008; Li et al. 2009; Raman et al. 2009; 2011; Rincon et al. 2010; Brulc et al. 2011; Dam et al. 2011).

The metagenomic approach utilizes genetic material directly from complex natural ecosystems, rather than using cultivated cells (Schloss and Handelsman 2003; Handelsman 2004). The advantage of this approach is that bias against unculturable bacteria is avoided and enzyme discovery is more representative. However, metagenomic libraries can introduce new types of biases due to nonuniform recovery of inserts and large numbers of clones required to cover the metagenome. Metagenomic analyses of different cellulose-containing ecosystems can serve to provide insight into novel types of enzymes that can be used for degradation of plant cell wall polysaccharides (Cottrell et al. 2005; Ferrer et al. 2005; Warnecke et al. 2007; Brulc et al. 2009; Li et al. 2009; Berg Miller et al. 2012).

Table 6.1 provides a list of cellulose-degrading bacteria whose genomes have been sequenced. It is clear that new genome sequences of cellulase- and cellulosome-producing bacteria will continue to accumulate, at least until the current involvement with cellulosic biomass-to-biofuel efforts remains in vogue.

Phylogenetics of Cellulase and Cellulosomal Systems

Early in the history of the development and establishment of the cellulosome concept, it was noted that the apparent occurrence of cellulosomes in different microorganisms tended to cross ecological, physiological, and evolutionary boundaries (Lamed et al. 1987). Initial biochemical and immunochemical evidence to this effect has been supported by the accumulated molecular biological studies.

Various lines of evidence indicate that the modular enzymes that degrade plant cell wall polysaccharides have evolved from a restricted number of common ancestral sequences. Much of the information in this direction remains a legacy, inherently encoded in the sequences of the functional modules that comprise the different enzymes. By comparing sequences of the various cellulosomal and noncellulosomal enzymes within and among the different strains, we can gain insight into the evolutionary rationale of the multigene families that comprise the glycoside hydrolases.

Horizontal Gene Transfer

It is clear that very similar enzymes which comprise a given glycoside hydrolase family are prevalent among a variety of different bacteria and fungi, thus indicating that they were not inherited through conventional evolutionary processes. The widespread occurrence of such conserved enzymes among phylogenetically different species argues that horizontal transfer of genes has been a major process by which a given microorganism can acquire a desirable enzyme. Once such a transfer event has taken place, the newly acquired gene would then be subjected to environmental pressures of its new surroundings, i.e., the genetic and physiological constitution of the cell itself. Following such selective pressure, the sequence of the gene would be adjusted to fit the host cell.

Gene Duplication

Sequence comparisons have also revealed the presence of very similar genes within a genome that may have very similar or even identical functions. One striking example is the tandem appearance of cbhA and celK genes in the chromosome of Clostridium thermocellum. Other examples are xynA and xynB also of C. thermocellum and xynA of the anaerobic fungus Neocallimastix patriciarum, which includes two very similar copies of Family-11 catalytic modules within the same polypeptide chain. These examples imply a mechanism of gene duplication (Chen et al. 1998; Gilbert et al. 1992), whereby the duplicated gene can serve as a template for secondary modifications that could result in two very similar enzymes with different properties, such as substrate and product specificities. A similar process could also account for the multiplicity of other types of modules (i.e., CBMs, cohesins or helper modules) within a polypeptide chain. Comparison of the modular architectures of similar genes from different species would suggest that individual modules can undergo a duplication process. This is exemplified by the multiple copies of FN3 in CelB from Cellulomonas fimi versus the single copy of the same module in cellulase E4 from Thermobifida fusca. But innumerable other examples are evident from the databases, whenever multiple copies of the same modular type exist in the same protein.

Domain Shuffling

Another observation from the genetic composition of the glycoside hydrolases argues for an alternative type of process, which would propagate new or modified types of enzymes. It is clear that many microbial enzyme systems contain individual hydrolases that carry very similar catalytic modules but include different types of accessory modules (Gilkes et al. 1991). An example that demonstrates this phenomenon is the observed species preference of otherwise very similar glycoside hydrolases for a given family of crystalline cellulose-binding CBM, which is entirely independent of the type of catalytic module borne by the complete enzyme. In this context, as we have seen above, the free enzymes of some bacteria, such as Cellulomonas fimi, Pseudomonas fluorescens, and Thermomonospora fusca, invariably include a Family-2 CBM, irrespective of the type of catalytic module. In contrast, those of other bacteria, e.g., Bacillus subtilis, Caldocellum saccharolyticum, Erwinia carotovora, and various clostridia, appear to prefer Family-3 CBMs. Moreover, the position of the CBM in the gene may be different for different genes. For example, the CBM may occur upstream or downstream from the catalytic module; it may be positioned either internally (sandwiched between two other modules) or at one of the termini of the polypeptide chain. The same pattern is characteristic of several other kinds of modules associated with the plant cell wall hydrolases. This is particularly evident in Family-9 cellulases and Family-10 xylanases, where the number and types of accessory modules may vary greatly within a given species. It seems that individual modules can be transferred en bloc and incorporated independently into appropriate enzymes. Once again, the modular architectures and sequence similarities between Clostridium thermocellum cellulosomal enzyme pairs (CbhA and CelK; XynA and XynB) are particularly revealing: in both cases, following an apparent gene duplication event, one or more additional modules appear to have been incorporated into the duplicated enzyme. Taken together, the information suggests that domain shuffling is an important process by which the properties of such enzymes can be modified and extended.

Proposed Mechanisms for Acquiring Cellulase and Cellulosomal Genes

Like the free enzyme systems, the phylogeny of cellulosomal components seems to have been driven by processes that include horizontal gene transfer, gene duplication, and domain shuffling. In cellulolytic/hemicellulolytic ecosystems, the resident microorganisms are usually in close contact, often under difficult conditions and in competition or cooperation with one another toward a common goal: the rapid degradation of recalcitrant polysaccharides and assimilation of their breakdown products.

A possible scenario for the molecular evolution of a cellulase/hemicellulase system in a prospective bacterium could involve the initial transfer of genetic material from one microbe to another in the same ecosystem. The size and type of transferred material could vary, such as a gene or part of gene (e.g., selected functional modules) or even all or part of a gene cluster. The process could then be sustained by gene duplication which would propagate the insertion of repeated modules, e.g., the multiple cohesin modules in the scaffoldins, or even smaller units, such as the linker sequences or the duplicated calcium-binding loop of the dockerin module. Domain shuffling can account for the observed permutations in the arrangement of modules in scaffoldin subunits from different species (Fig. 6.15 ). Finally, conventional mutagenesis would then render such products more suitable for the cellular environment or for interaction with other components of the cellulase system.

The available data suggest that there are no set of rules, which would, at this stage, enable us to anticipate the nature of a given cellulase system from a given microorganism. It seems that phylogenetically dissimilar organisms can possess similar types of cellulosomal or noncellulosomal enzyme systems, whereas phylogenetically related organisms that inhabit similar niches may be characterized by different types of enzyme systems. It is clear that in order to shed further light on this apparent enigma, we require more information about more types of enzyme systems. In addition to more sequences and structures, we will need more information—biochemical, physiological, and ecological—in order to sharpen existing notions regarding the enzymatic degradation of plant cell wall polysaccharides or to formulate new ones.