Keywords

1 An Overview

The genome sequencing projects of the past two decades have yielded many surprises, the most startling of which is unquestionably the revelation that the total number of genes in humans is not very different from many model organisms such as worms, fruit flies and simple plants. This discovery has cast a spotlight on the correlate that biological complexity is not linearly related to the number of genes among species. Why might this be the case? A variety of explanations can be offered, arising from different fields of biological research. For example, molecular biologists might suggest transcriptional regulation or epigenetic modifications as key factors. Others would cite alternative splicing. We, too, believe that these phenomena contribute to biological complexity. Nevertheless we would argue that the greatest amplification of genomic information occurs after genes have been translated into proteins when the latter become modified by a myriad of functionalities. Moreover, one type of post-translational modification, namely glycosylation, results in the greatest diversity of the products of gene expression in all forms of life [1].

Today it has been well-established that protein N- and O-glycosylation (the covalent attachment of carbohydrate sequences to the side-chains of asparagine and serine or threonine, respectively) is a phenomenon shared by all domains of life. In addition, protein glycosylation has been demonstrated to be an essential requirement, rather than just an intriguing decoration. For example, correct glycosylation ensures that the plethora of proteins which eukaryotic cells use to transmit, receive and respond to chemical, electrical and mechanical signals, are expressed in functionally active forms in the right places. The information such glycoproteins mediate is essential for cells to pass through the different stages of development that occur in an organism [1, 2].

Carbohydrates have enough structural diversity to play a pivotal role as informational molecules on cell surfaces Fig. 1.

Fig. 1
figure 1

Structure and symbolic representations of common carbohydrates. Glucose is central to carbohydrate biosynthesis because it is made de novo from carbon dioxide and water during photosynthesis

Importantly, they are in “the right place” to act as such. All eukaryotic cells are coated with a carbohydrate layer, referred to as the glycocalyx. It consists of glycoproteins and glycolipids embedded in the cell membrane, together with proteoglycans, another class of carbohydrate biopolymer, which may be loosely associated with the eukaryotic cell surface. Prokaryotes also express glycoproteins on their surfaces. Among the prokaryotic glycoproteins, the best understood are S-layers, pilins and flagellins, plus a selection of cell surface and secreted proteins which are known to be involved in adhesion and/or biofilm formation [3, 4]. Significantly, complex carbohydrates are often highly branched and each residue can be linked to another in any of several positions on each sugar ring. This allows the formation of a large number of oligosaccharide structures from a relatively small repertoire of building blocks. Indeed even greater diversity is often conferred by the addition of functional groups such as sulfates, phosphates, acetyl and methyl groups.

How do carbohydrates on cell surfaces fulfil their “information” roles? This is most often achieved by engagement with partner molecules on other cells thereby triggering adhesive and/or signalling events. These carbohydrate binding partners are called lectins [5]. Thanks largely to the Consortium for Functional Glycomics (CFG) (which was funded by the US National Institutes of Health to provide tools and resources to the international research community to understand the role of carbohydrate-protein interactions), scientists from all disciplines can readily access information pertaining to how surface carbohydrates and complementary lectins on opposing cell surfaces mediate cell-to-cell recognition. Thus the CFG website [2] provides a rich source of information and data which facilitates the engagement of researchers, unfamiliar with carbohydrates, with experts working in the field of glycobiology. Figure 2 illustrates several of the best understood biological interactions where carbohydrate-lectin recognition plays a central role. The meaning of the symbols used in the figure is explained in Fig. 2.

Fig. 2
figure 2

Glycan–lectin recognition is key to cell-to-cell communication

The tools of modern mass spectrometry have been crucial for unravelling the carbohydrate mediated processes exemplified in Fig. 2 [610]. Mass spectrometry is an enormously powerful tool for high sensitivity sequencing of complex carbohydrates. Its versatility permits the analysis of all families of glycopolymers. Moreover, complex mixtures of glycoproteins are not a problem for mass spectrometric analysis. Indeed, glycomic methodologies are capable of defining the carbohydrate sequences constituting the glycocalyx of tissues or cells without the need for time consuming purifications [11]. Glycomics research of the past decade, much of it supported by the CFG, has yielded substantial quantities of public data which are facilitating worldwide research addressing the roles of carbohydrates and lectins in complex systems [2, 12].

We hope that those reading this article are now stimulated to learn more about glycosylation and biological complexity. If this is the case, the CFG website is an excellent place to start your journey [2]. To whet your appetite we end our article with an introduction to an evolving story in glycobiology which has as its central character a famous carbohydrate moiety called sialyl Lewisx (Fig. 2).

First identified in rat brain glycoproteins in the 1970s, this carbohydrate was revealed, by the emerging glycomic strategies of the mid 1980s, to be present on human white blood cells and enriched in cancer cells. A few years later the Selectins were discovered. These constitute a lectin family that recognise sialyl Lewisx as their primary ligand. The Selectins play pivotal roles in lymphocyte trafficking and recruitment of neutrophils to sites of inflammation. Their discovery energised the field of glycobiology and spawned numerous biotech companies intent on developing new anti-inflammatories and anti-cancer agents. The high hopes for glyco-therapeutics that prevailed in the early 1990s continue to this day, but are now tempered by realism i.e. it takes a very long time to understand the processes mediated by carbohydrate recognition and even longer to develop effective therapies based on intercepting these processes.

Very recently sialyl Lewisx has re-appeared in the headlines of both scientific and lay articles. This is because of exciting discoveries concerning human reproduction [13]. Ultra-high sensitivity mass spectrometric analyses have now provided the first molecular insights into the recognition processes occurring at the very start of human life, when a single sperm first engages with the surface of a human egg. This research has shown that multiple sialyl Lewisx sequences are attached to the proteins constituting the jelly-like coat of the human egg, which is called the zona pellucida. Remarkably the density of sialyl Lewisx moieties on the human egg is orders of magnitude higher than on white blood cells, consistent with it playing a pivotal role in sperm recognition [13]. Interestingly human sperm do not express any of the known Selectins. Hence the race is now on to find the putative “Selectin-like” molecule on sperm that binds to the sialyl Lewisx sequence on the human egg.