Keywords

1 Introduction

Visualisations of metabolic pathways and pathway components such as enzymes and compounds have been used since the early years of research in biology, and metabolic pathway maps have become very popular in biochemistry textbooks, on posters, as well as in electronic resources and web pages. One example is Gerhard Michal’s famous poster Biochemical Pathways (Michal, 1968, 1998) which has been printed over a million times. The first example of computational representation of pathways was the EcoCyc database (Karp and Mavrovouniotis, 1994; Keseler et al., 2016). Another well-known example is the KEGG pathway database (Kanehisa and Goto, 2000; Kanehisa et al., 2012), the largest collection of manually curated pathway maps and related metabolic information.

Visualisations are commonly used in biology (Gehlenborg et al., 2010; Kerren et al., 2017). Metabolic pathway visualisations help to present knowledge and to support browsing through chemical structures, enzymes, reactions and pathways. Visual and immersive analytics of metabolism connects network analysis algorithms and (interactive and/or immersive) visualisation methods to investigate hubs, motifs, paths and so on in the network, or to compare pathways for finding differences between species or conditions. In addition, network visualisation also supports the mapping and investigation of further data such as metabolomics, proteomics, transcriptomics, enzyme activity and flux data, and the exploration of the data in the network context. It builds a foundation for exploring and navigating the dynamics of metabolic processes obtained either experimentally or via modelling and simulation. In conclusion, visualising and visually exploring metabolic pathways and networks helps in understanding them, is important in making sense of the complex biological data and knowledge that is being produced these days, and is an important research area. A simple visualisation example is shown in Fig. 12.1.

Fig. 12.1
figure 1

A metabolic pathway example with some time series data of metabolite concentrations shown within the vertices representing metabolites (excerpt from a MetaCrop pathway (Schreiber et al., 2012) rendered by the Vanted tool (Junker et al., 2006))

1.1 Network Representation

A network representing metabolic processes consists of a set of elements (called vertices or nodes) and their connections (called edges or arcs) which have a defined appearance (e. g. size of vertices) and are placed in a specific layout (e. g. coordinates of vertices). Typical representations of metabolic reactions as graphs with different interpretations of vertices and edges are shown in Fig. 12.2. Although initiatives for a uniform representation of metabolic pathways have been presented earlier (Kitano, 2003; Kitano et al., 2005; Michal, 1998), no graphical representation became a standard to represent metabolic processes. In 2006 an international consortium started developing a standard for the graphical representation of cellular processes and biological networks including metabolism called the Systems Biology Graphical Notation (SBGN) (Le Novère et al., 2009). SBGN allows the visualisation of complex biological knowledge, including metabolic networks (see also the information in Fig. 12.3). Within this chapter, we will use SBGN for representing metabolic pathways and networks where possible.

Fig. 12.2
figure 2

Different representations of biochemical reactions: (a) hypergraph (vertices denote metabolites and enzymes, edges denote reactions); (b) bipartite graph with enzymes represented within reactions (vertices denote metabolites and reactions including enzymes, edges connect metabolites with reactions); (c) simplified representation of (b) without co-substances (as used in KEGG), (d) bipartite graph with enzymes represented as separate entities (SBGN notation, vertices denote metabolites, enzymes and reactions, edges connect metabolites with reactions (consumption, production) and enzymes with reactions (catalysis)); (e) simplified metabolite network (vertices denote metabolites, edges connect metabolites transformed by reactions). Note that the classical representations such as the ones in Michal’s poster (Michal, 1998) and in Stryer’s biochemistry textbook (Stryer, 1988) are similar to (a)

Fig. 12.3
figure 3

Box SBGN: Explanation of SBGN and SBGN languages, the image shows as example protein phosphorylation catalysed by an enzyme and modulated by an inhibitor in all three SBGN languages: (a) PD, (b) ER, (c) AF (image from Le Novère et al. (2009))

1.2 Network Layout

Metabolic pathway maps have been produced manually for a long time. These drawings are manually created (usually with help of computer programs) long before their actual use and provide a static view of the data defined by the creator. They show the knowledge at the time of the map’s generation and an end-user cannot change the visualisation. Some navigation may be supported in electronic systems using such pre-drawn pictures, but the result of an action (the new picture) either replaces the current image or it is visualised in a new view, and the visualisations are not interactive.

However, the automatic computation of visualisations and interactive exploration methods are highly desirable, due to size and complexity of biological networks, a steady growth of knowledge and the derivation of user-specific parts of networks from databases. The computer-based generation (layout) of a network map on demand at the time it is needed is called dynamic visualisation. These visualisations are created by the end-user from up-to-date data with help of a layout algorithm. They can be modified to provide specific views of the data, and several navigation methods such as the extension of an existing drawing or map with new additional parts are supported.

The automatic layout of networks, that is the computation of maps from a given network, is called graph drawing. Graph drawing methods take a network (or graph) and compute a layout consisting of coordinates for the vertices and routings of the edges. See the books by Di Battista et al. (1999) and Kaufmann and Wagner (2001) for general graph drawing algorithms. Although standard graph drawing algorithms can be used for laying out metabolic networks, domain specific network visualisations that conform to biological representational conventions are advantageous (Bourqui et al., 2011; Schreiber, 2002).

In the following sections we will discuss major resources for metabolites, reactions and pathways (Sect. 12.2); the visualisation of metabolites and enzymes, which are the building blocks of metabolic pathways (Sect. 12.3); and the visualisation of the pathways themselves (Sect. 12.4). We will focus on key questions which can be addressed using visualisation, present important graph drawing algorithms in brief (both standard and domain specific algorithms) and discuss a selection of useful tools. Next we will discuss the exploration and analytics of pathways and data, in particular visual analytics and immersive analytics of metabolic pathways and related data (Sect. 12.5). We conclude with perspectives and research questions in this field. Boxes contain additional background information regarding, for example, standards for metabolic network representation and layout algorithms.

2 Resources for Metabolites, Reactions and Pathways

Large amounts of knowledge about metabolites, enzymes, metabolic reactions, pathways and networks have been accumulated and are derived at increasing speed. Several databases and information systems have been developed to provide a comprehensive way to manage, explore and export this knowledge in meaningful ways. We will concentrate on the most important primary databases and briefly discuss their typical content and important features.

Databases in this area can be divided into metabolite/compound databases providing information about the chemical compounds used or produced in biochemical reactions; reaction/enzyme databases containing information about enzymes and the reactions catalysed by them; and pathway databases providing information about metabolic pathways. See also Table 12.1 for more information and a comparison of relevant databases.

Table 12.1 Databases. Abbreviations: W - Web, S - Web services, F - FTP

2.1 Metabolite/Compound Databases (Chemical Databases)

The typical content is information about metabolites (compounds) and their properties such as name, synonyms, molecular weight, molecular formula and structure. Often associated information such as chemical reactions, metabolic pathways, publications and various links to other databases can also be found.

Important resources are PubChem (Kim et al., 2015, 2020; Wang et al., 2009) is a comprehensive source of compound and substance information (consisting of the three primary databases: Compounds, Substances and BioAssay). KEGG COMPOUND (Goto et al., 2002) is a database of small molecules, biopolymers and other chemical substances of biological interest. ChEBI (Hastings et al., 2015) is a database of small molecules with detailed information about nomenclature, molecular structure, formula and mass.

The visualisation and visual analysis of data from these databases is discussed in Sect. 12.3.

2.2 Reaction/Enzyme Databases

The typical content is information about enzymes and their properties such as nomenclature, enzyme structure, functional parameters and specificity. Often additional information about the reactions catalysed by the given enzyme, metabolic pathways, references and links to other databases can also be found.

Important resources are: BRENDA (Chang et al., 2020; Scheer et al., 2011) is a comprehensive enzyme information system providing detailed molecular and biochemical information on enzymes based on primary literature. ExPASy-ENZYME (Gasteiger et al., 2003) is an enzyme database which covers information related to the nomenclature of enzymes. Rhea (Lombardot et al., 2018) is an expert-curated reaction database with information about biochemical reactions and reaction participants. The KEGG databases ENZYME and REACTION (Kanehisa and Goto, 2000; Kanehisa et al., 2004) provide enzyme- and reaction-specific information about chemical reactions in the KEGG metabolic pathway database. Sabio-RK (Wittig et al., 2012, 2017) is an expert-curated biochemical reaction kinetics database with detailed kinetic information.

The visualisation and visual analysis of data from these databases is discussed in Sect. 12.3.

2.3 Metabolic Pathway Databases

The typical content is information about metabolic pathways and their single reactions, involved enzymes and reactants and associated information such as organism-specific information about genes, their related gene products, protein functions and expression data. Often several types of information are provided in the context of the graphical representation of pathways.

Major databases are: KEGG PATHWAY (Kanehisa et al., 2002, 2020), a multi-organism pathway database which contains metabolic pathways, represented as curated, manually drawn pathway maps consisting of links to information about compounds, enzymes, reactions and genes. BioCyc/MetaCyc (Caspi et al., 2012, 2019; Krieger et al., 2004) is a collection of organism-specific pathway databases including MetaCyc, a curated multiorganism pathway database, which contains metabolic pathways curated from the literature, lists of compounds, enzymes, reactions, genes and proteins associated with the pathways. Reactome (Croft et al., 2014; Matthews et al., 2009) is a curated multi-organism pathway database initially focussing on human biology. PANTHER pathway (Mi and Thomas, 2009; Mi et al., 2020) is an expert-curated multi-organism pathway database.

In addition to the mentioned primary databases, there are secondary pathway databases and collaborative databases. Secondary metabolic pathway database systems are collecting and presenting information from various sources. Examples are NCBI BioSystems (Geer et al., 2010) and Pathway Commons (Cerami et al., 2011; Rodchenkov et al., 2019). The former is a centralised repository for metabolic pathway information containing biological pathways from multiple databases (e. g. KEGG, Human Reactome, BioCyc and the National Cancer Institute’s Pathway Interaction database). The latter provides access to various public metabolic pathway databases, such as Reactome, HumanCyc and IMID. A well-known community-driven collaborative platform dedicated to the curation and representation of biological pathways is WikiPathways (Martens et al., 2021).

There is also BioModelsDB (Chelliah et al., 2013; Malik-Sheriff et al., 2019), a database of mathematical models representing biological processes including metabolism, and the BioModelsDB part Path2Models (Büchel et al., 2013), an automatic translation of metabolism from databases such as KEGG into biological models using the SBML and SBGN standards. In addition, there are also special metabolic pathway databases covering specific species or groups of species, for example, for plants PlantCyc (Schläpfer et al., 2017; Zhang et al., 2010), MetaCrop (Grafahrend-Belau et al., 2008; Weise et al., 2006) and Plant Reactome (Naithani et al., 2016, 2019).

The visualisation of data from these databases is discussed in Sect. 12.4. The above mentioned databases also provide static (e. g. KEGG) or dynamic (e. g. BioCyc) visualisations of pathways and networks. Furthermore, they often come with integrated analysis tools to support high-throughput experimental data analysis. For example, Reactome visualises pathways and maps expression data using colour-coding onto pathway maps. A Cytoscape plugin enables to generate new pathways based on database queries and to perform some graph analysis on these networks. KegArray is a light-weight data mapping utility, good for easily mapping expression data (csv) onto KEGG pathways to colour-code vertices and provides also some scatter plots of the data. And PANTHER Pathways allow users to view results in both SBGN process view and an automatically converted activity flow view. However, these tools are specifically developed for specific databases and often provide less functionality than the best general purpose tools presented in Sect. 12.4.3; therefore we will not present them in detail here.

2.4 Exchange Formats

To represent metabolic pathway information in a unified way and to support the exchange of pathway models between software tools, exchange formats have been proposed. Two exchange formats which focus on the exchange of information between software tools and databases are SBML (Hucka et al., 2003) and BioPAX (Demir et al., 2010), see the box in Fig. 12.4. Although they also partly support the exchange of graphical information, they are mainly relevant for software developers and modellers. The exchange format relevant for transferring graphical information and most relevant for users only interested in the visual representation of pathways is SBGN, the Systems Biology Graphical Notation (Le Novère et al., 2009), see the box in Fig. 12.3. Exchange of metabolite structures relies on several formats popularised in cheminformatics and drug design. MDL Molfiles are suitable for storing individual structures; collections of these files can then be assembled into SD Files (Dalby et al., 1992). While some pathway databases provide small molecule structures as MOL or SD files (e. g. KEGG, ChEBI), others provide the structures as SMILES (Weininger, 1988). SMILES, a so-called line notation, encodes the structure as a string and thus does not provide atom coordinates, but provides a more compact representation. A multitude of other file formats exists; most of these formats can be easily accessed and interconverted by cheminformatics toolkits and libraries (e. g. CDK (Steinbeck et al., 2003)) or conversion utilities (e. g. OpenBabel (O’Boyle et al., 2011)).

Fig. 12.4
figure 4

Box BioPAX and SBML

3 Visualising Metabolites and Enzymes

Textbook views of metabolic pathways often illustrate the underlying biochemical mechanisms. To this end it is essential not just to provide the name of the metabolites. Structural drawings are much better suited to illustrate the molecular details of an enzymatic reaction. The example in Fig. 12.5 shows the reactions catalysed by triosephosphate isomerase and glyceraldehyde 3-phosphate dehydrogenase, the isomerisation of dihydroxyacetone phosphate to D-glyceraldehyde-3-phosphate to D-glycerate 1,3-bisphosphate. Visualisation of the metabolites by their names, IDs or abbreviation makes it hard to understand the mechanism, while the structural drawings immediately reveal the conversion of the hydroxyl group to an aldehyde and of the keto group to a hydroxyl group and subsequent introduction of a phosphate group. The layout of the structural formulas has been designed to highlight the fact that the larger part of the structure remains unchanged during the two reactions. Only parts of the structure (highlighted by the boxes) are modified in the reaction. Manual layouts of metabolic pathways typically found in biochemistry textbooks are thus careful with the layout of both the structures and the pathway to maximise the mental map preservation between adjacent structures.

Fig. 12.5
figure 5

Triosephosphate isomerase (TIM) catalyses the conversion of dihydroxyacetone phosphate to D-glyceraldehyde-3-phosphate, which in turn can be converted to D-glycerate 1,3-bisphosphate by glyceraldehyde 3-phosphate dehydrogenase (GAPDH). A consistent layout of the three metabolites involved makes it easier to grasp the structural changes entailed by each metabolic reaction (highlighted by the boxes)

While drawing structural formulas comes natural to chemists and biochemists, the automated generation of structural formulas is a difficult task. The drawings have to adhere to numerous conventions developed since their initial conception by Kekulé towards the end of the 1800s. While many of these conventions have been standardised by the International Union of Pure and Applied Chemistry (IUPAC), there is no unique way for drawing a chemical structure; it can be adapted depending on the context, the level of detail required, and the information that needs to be conveyed.

Most small molecule chemical structures can be represented by planar graphs and thus can be laid out in 2D without major issues (Rücker and Meringer, 2002). Specific conventions have to be followed with respect to angles, representation of stereochemistry or bond orders, to name just a few. Ring systems pose particular challenges, since they are typically drawn in very specific ways and more often than not projection of the three-dimensional shape is preferred over a non-crossing planar embedding of the final structure. It could be shown that already the drawing of planar graphs with fixed edge lengths (as is the case for structural diagrams) is NP-hard (Eades and Wormald, 1990), most of these algorithms have to resort to heuristics to generate good layouts.

Several algorithms have been proposed over the years to layout structures in an aesthetic manner (Clark et al., 2006; Helson, 1990). In addition, a number of algorithms have been implemented in commercial tools for structure editing and cheminformatics, for example, in the ChemDraw suite.Footnote 1 in Accelrys Draw,Footnote 2 or in MOE.Footnote 3 Also academic cheminformatics projects such as CACTVS (Ihlenfeld et al., 1994) or the more recent Chemistry Development Kit, CDK, (Steinbeck et al., 2003) permit the layout of molecular structures. Based on the structure stored in pathway databases (see Sect. 12.2) these tools permit the rendering of the structure into a 2D image. Another option for the retrieval of structure drawings is PubChem,Footnote 4 which contains pre-computed structural formulas. These can be downloaded in PNG format.

A challenge in the visualisation of metabolic networks is currently the joint layout of the metabolic network and its constituent metabolites. While it is in principle possible to layout metabolic networks and simultaneously display the structural formulas of its metabolites, current pathway visualisation tools do not consider this problem (see, for example, Fig. 12.6). Not all tools are able to display structural formulas at all. Those that do, resort to pre-rendered images of the structures. If the structures are drawn individually, their orientation depends mostly on the algorithm used—there is no canonical orientation of a molecular structure. The orientation, size and general layout of any two structures adjacent in a metabolic network are thus mostly random, and it becomes very difficult to match the conserved common substructure between the two structures and thus to comprehend the underlying mechanism. Mental map preservation between any two adjacent structures would of course be preferable and clearly enhance readability of the pathway. Hand-made pathway diagrams found in textbooks are thus so far vastly superior to automatically drawn pathways with structural formulas. The simultaneous constrained drawing of metabolite structures and metabolic pathways is one of the more difficult problems in this area. Some algorithms for the constrained drawing of structures that should be suitable to solve this problem have been suggested in the literature in different contexts (Boissonnat et al., 2000; Fricker et al., 2004).

Fig. 12.6
figure 6

Visualisation of metabolic pathways usually relies on pre-computed metabolite structures. As a consequence, the resulting pathway layout does not match the (usually random) orientation of the structural formulas embedded in the pathway and hampers understanding of the pathway mechanisms; excerpt from a KEGG (Kanehisa et al., 2014) pathway rendered by BiNA (Gerasch et al., 2014)

For small molecules (metabolites) 2D visualisation is the method of choice, because the structures are easier to comprehend and—to the schooled eye—the three-dimensional aspects of the structures are typically obvious. The same does not apply to proteins, however. Representing proteins as structural formulas is not only impractical, but the function of proteins can only be understood from their three-dimensional structure.

4 Visualising Reactions and Pathways

4.1 Visualising the Structure of Metabolic Reactions and Pathways

Visual representations of metabolic pathways are widely used in the life sciences. They help in understanding the interconnections between metabolites, analysing the flow of substances through the network, and identifying main and alternative paths. Important visualisation requirements are (Schreiber, 2002):

  • For parts of reactions: The level of detail shown concerning specific substances and enzymes is very much dependent on the goal of the visualisation, see also Sect. 12.3. Often for main substances their name and/or structural formula should be shown, for co-substances the name or abbreviation, and for enzymes the name or EC-number.

  • For reactions: The reaction arrows should be shown from the reactants to the products with enzymes placed on one side of the arrow and co-substances on the opposite side. Both sides of a reaction as well as their reversibility should be visible.

  • For pathways: The main direction of reactions should be visible to show their temporal order. Few exceptions to the main direction are used to visualise specific pathways such as the fatty acid biosynthesis and the citric acid cycle. The arrangement of these cyclic reaction chains should be emphasised: a repetition of a reaction sequence in which the product of the sequence re-enters as reactant in the next loop, either as cycle (the reactant and the product of the reaction sequence are identical from loop to loop, e. g. citric acid cycle) or as spiral (the reactant of the reaction sequence varies slightly from the product, e. g. fatty acid biosynthesis).

Besides specific visualisation requirements, reaction and pathway visualisations should meet the usual quality criteria of network layouts such as low number of edge crossings and a good usage of the overall area. See Fig. 12.7 for an example which meets these requirements.

Fig. 12.7
figure 7

Example of metabolic pathway visualisation which meets the requirements outlined in Section 4.1 (citric acid cycle, including reversible and irreversible reactions and circular shape of the pathway; excerpt from a MetaCrop (Hippe et al., 2010) pathway rendered by Vanted (Colmsee et al., 2013))

4.2 Layout Algorithms for Visualising Metabolic Pathways and Networks

Metabolic networks are usually represented as directed graphs. Common approaches to automatically layout these networks are force-directed and hierarchical (or layered) layout methods. Although quite common as visualisation principal, for example, in the manual KEGG maps layout, automatic orthogonal (or grid) methods are less often used. See the box in Fig. 12.8 as well as the images in Fig. 12.9 for these layout methods. Force-directed methods are widely used, and several network analysis tools support such layouts. However, these approaches do not meet common visualisation requirements. Different vertex sizes, the special placement of co-substances and enzymes, the partitioning of substances into reactants and products and the general direction of pathways are not considered. A few approaches extend the force-directed layout method to deal with application specific requirements. An example is implemented in the PATIKA system (Demir et al., 2002; Dogrusöz et al., 2006) where the layout algorithm considers directional and rectangular regional constraints which can be used to enforce layout directions and sub-cellular locations.

Fig. 12.8
figure 8

Box layout algorithms

Fig. 12.9
figure 9

The same network with three different layouts: (from left to right) force-directed, layered (top to bottom) and orthogonal layout

Layered layout methods are often used as they emphasis the main direction within a network. Tools which support such layered layout methods are often based on existing layout libraries. These approaches show the main direction of reactions and are sometimes able to deal with different vertex sizes. However, there is no special placement of co-substances or specific pathways (e. g. cycles). Some improved approaches consider cyclic structures or depict pathways of different topology with different layouts, e. g. the algorithm by Becker and Rojas (2001) which emphasises cyclic structures, and PathDB (Mendes, 2000; Mendes et al., 2000) which visualises metabolic networks based on hierarchical layout allowing co-substances to be represented in a smaller font on the side of the reaction arrow.

There are some advanced methods for the automatic layout of metabolic pathways and networks such as the mixed, the extended layered and the constraint layout. The mixed layout approach (Karp and Mavrovouniotis, 1994) depicts (sub-)pathways of different topology with suitable layout algorithms such as linear, circular, tree and hierarchical layout, and places co-substances and enzymes beside reaction arrows. It is used in the MetaCyc/BioCyc database system. The extended layered approach (Schreiber, 2002) extends the hierarchical layout for different vertex sizes, consideration of co-substances and enzymes, and special layout of open and closed cycles; it is implemented in BioPath system (Brandenburg et al., 2004). Finally, the constraint layout approach (Schreiber et al., 2009) allows the expression of visualisation requirements including positions of co-substances and specific pathways as constraints and produces a layout by solving these constraints. This approach is particularly well suited in cases when parts of the layout are predefined as shown in Czauderna et al. (2013).

Figure 12.10 shows examples of visualisations computed by these layout algorithms.

Fig. 12.10
figure 10

Example visualisations computed by layout algorithms specifically tailored to metabolic networks: (top left) mixed layout (from the MetaCyc webpage), (top right) extended layered layout (from BioPath) and (bottom) constraint layout (from a prototype implementing of the constraint layout approach; note that these networks are not in SBGN notation)

4.3 Tools

There are more than 170 tools available, and previous reviews have already compared a number of them. Kono et al. focus in their comparison on pathway representation, data access, data export and exchange, mapping, editing and availability (Kono et al., 2009). Suderman and Hallett compare more than 35 tools relevant in 2007 regarding several aspects of network and data visualisation (Suderman and Hallett, 2007). Rohn et al. present a comparison of 11 non-commercial tools for the network-centred visualisation and analysis of biological data (Rohn et al., 2012). And Gehlenborg et al. present visualisation tools for interaction networks and biological pathways including tools for multivariant omics data visualisation (Gehlenborg et al., 2010). It should be noted that progress in this field is fast, many new tools appeared and old tools obtained new features since then. Well-known tools supporting network visualisation and analysis are:

These tools often provide a selection of standard and partly specific layout algorithms for metabolic pathways, the possibility to map additional data onto pathways as well as analysis algorithms.

Note that for a specific metabolic database or pathway collection often several different visualisation methods exists. For example, the visualisation of KEGG pathways can be done with tools and layout methods such as implemented in Pathway projector (Kono et al., 2009), KEGGgraph (Zhang and Wiemann, 2009) and Vanted (Rohn et al., 2012), can be rebuilt and visualised as in Gerasch et al. (2014), or can be even translated into SBML or SBGN and then layouted and visualised (Czauderna et al., 2013; Wrzodek et al., 2011).

5 Visual and Immersive Analytics of Metabolic Pathways and Related Data

For a fast and automatic production of pictures or maps of metabolic networks layout algorithms are very useful. However, a layout is just the first step, and in interactive systems many additional requirements exist, for example, for interactive exploration, structural analysis of the networks, visualisation of experimental data (transcriptomics, metabolomics, fluxes, etc.) in the network context, studying networks in their spatial (3D) context and so on. Here we discuss some typical examples.

5.1 Multiscale Representation of Metabolism and Navigation Through Metabolic Networks

Metabolic networks can be huge, and a visualisation may become unreadable due to the large number of objects and connections. Several abstraction and exploration techniques have been transferred from the field of information visualisation to navigate in metabolic networks. As metabolic pathways are hierarchically structured (e. g. carbohydrate metabolism includes a number of sub-pathways such as TCA cycle and glycolysis) this information can be used to help navigating through the network. Often used navigation techniques include clickable overview maps (in many databases and tools, e. g. KEGG (Kanehisa et al., 2002) and iPath (Letunic et al., 2008)), maps showing increasing levels of detail (e. g. the MetaCyc website (Caspi et al., 2012)), interconnected maps (e. g. in GLIEP (Jusufi et al., 2012)), overview and detail diagrams (e. g. method by Garkov et al. (2019)) and interactive extension of pathways within a map (e. g. the method in KGML-ED (Klukas and Schreiber, 2007)).

It should be noted that there is a major obstacle for simple interactive visualisation methods including automatic layout: the mental map of the user (Misue et al., 1995). When browsing through pathways the user builds a mental representation of the objects, their relative position and connections. Basically the user’s mental map is its understanding of the network based on the current view. However, sudden or large changes between the current and the next view destroy the user’s mental map and therefore hinder interactive understanding of networks. So far there are only few approaches which address this problem.

Metabolic networks can be part of multivariate networks (Kohlbacher et al., 2014) (see Fig. 12.11) and heterogeneous networks (Schreiber et al., 2014), both increase the complexity for representation and navigation. The development of interactive layout algorithms for these structures is still an open research problem, and so far only some initial approaches exists such as the previously mentioned constraint layout approach (Schreiber et al., 2009).

Fig. 12.11
figure 11

Multivariate networks: Different networks are connected through shared entities (from Kohlbacher et al. (2014))

5.2 Visual Analytics of the Structure of Metabolic Networks

Analysing structural properties in biological networks can help in gaining new insights, and there are several structural properties of interest in metabolic networks: shortest paths between metabolites which may represent preferred routes, network motifs within the network which may indicate functional properties, different centralities of metabolites and reactions which may correspond to their importance, and clusters or communities within the metabolic network which may structure the network into functional modules. Many network analysis algorithms which can be used for the investigation of structural properties in networks have been developed; overviews can be found in the book of Brandes and Erlebach (2005) and Junker and Schreiber (2008).

“Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces” (Thomas and Cook, 2006). An important aspect of this field is that data analysis is combined with interactive visualisation methods. Here, analytics includes structural analysis of networks as well as investigating additional data as discussed in the following Sect. 12.5.3. Also for metabolic networks interaction plays an important role in visual analytics (Kerren and Schreiber, 2012).

Several tools implement visual analytics methods, for example, Cytoscape, Ondex, Vanted, and VisAnt, often provided via additional Plugins/Add-ons (see also Sect. 12.4.3 for details and references). Some tools also allow the integration of a wide range of other data into the analysis (Rohn et al., 2012). To better understand the analysis results, visualisation algorithms can help by highlighting the relevant structures such as straightening the shortest path in the map, putting central elements in the centre of the image or laying out the same motifs in the same way. A few specialised layout algorithms have been developed for a better visualisation and graphical investigation of structures and connections in networks such as coordinated perspectives for the analysis of network motifs (Klukas et al., 2006) or visually comparing pathways, for example, to understand metabolic pathways across organisms using two and a half dimensional layout (Brandes et al., 2004).

5.3 Integration and Visualisation of Omics Data in Metabolic Networks

Data mapping deals with the integration of additional data into metabolic networks. Examples are metabolomics, transcriptomics and fluxomics measurements, which can be mapped on different network elements (such as metabolites, enzymes and reaction edges), see also Figs. 12.1 and 12.12. A common problem for data mapping and subsequent analysis such as correlation analysis and clustering is the usage of correct identifiers, that is having the correct name in both the data and the network. To help the user several tools support mapping tables which translate identifiers in the data into identifiers in the network, and translation services such as BridgeDB (van Iersel et al., 2010) exist. Depending on the data different diagram types are desired in the vertices or at the edges of the metabolic pathway. Examples are bar charts, pie charts, line charts, box plots and heat maps.

Fig. 12.12
figure 12

An example of flux visualisation showing the flux distribution in a metabolic network under two scenarios (rendered by Vanted, data from Rolletschek et al. (2011))

Whereas most tools support the visualisation of data connected to vertices of the network, only few tools provide mapping of data onto edges. Metabolomics data, in particular the results of stable isotope tracer experiments, yield important details on the dynamics of networks, and flux visualisation is important as it provides insights on the integrated response of a biochemical reaction network to environmental changes or genetic modifications. Thus, such representations are also important tools in metabolic engineering (Wiechert, 2001). To support the analysis and understanding of simulated or experimentally measured flux distributions, the visualisation of flux information in the network context is essential and is mainly performed by scaling the width of the reaction edges according to the flux data or by displaying the flux values in the corresponding reaction vertices, see Fig. 12.12. Tools such as FBASimViz (Grafahrend-Belau et al., 2009), MetaFluxNet (Lee et al., 2003), Omix (Droste et al., 2011) and OptFlux (Rocha et al., 2010) support such visualisations.

5.4 Immersive Analytics of Metabolic Networks

The visualisation of structures and pathways in 3D has advantages and disadvantages. A good 2D visualisation may be easier to understand and is directly printable on paper. For small molecules 2D visualisation is the method of choice, because the structures are easier to comprehend and the three-dimensional aspects of the structures are typically obvious to an expert. However, the same does not apply to proteins. Visualising proteins as structural formulas is not only impractical, but the function of proteins can only be understood from their three-dimensional structure. This provides arguments for an integration of 2D (mainly Information Visualisation) and 3D (mainly Scientific Visualisation) techniques (Kerren and Schreiber, 2014).

Early work of representing metabolic pathways in 3D by Qeli et al. (2004) and Rojdestvenski et al. (2003; 2002) goes back to the early 2000. In the last years the novel research field of immersive analytics (Chandler et al., 2015) is developed which is concerned with “the use of engaging, embodied analysis tools to support data understanding and decision making” with a focus on immersive (3D) environments (Dwyer et al., 2018). It builds on and combines ideas from the fields of data visualisation, visual analytics, virtual reality, computer graphics and human–computer interaction. The key idea is to get immersed into the data and employ all senses, not only vision. This area has many potential applications in the life and health sciences (Czauderna et al., 2018). Some initial applications for the visualisation and exploration of metabolism in immersive environments include MinOmics, an immersive tool for multi-omics analysis (Maes et al., 2018) and the integration and exploration of pathways in a cell environment (Sommer and Schreiber, 2017b) as shown in Fig. 12.13.

Fig. 12.13
figure 13

Exploration of metabolic pathways within the spatial context using an immersive environment based on CAVE2 (stereoscopic 3D) and zSpace (stereoscopic fishtank 3D) (from Sommer and Schreiber (2017b))

6 Perspectives

The visual exploration and analytics of metabolic networks is a fast developing field. Although there are already several methods and tools that help in understanding metabolic networks, continuous development is imminent. Here we outline some current directions of research and tool developments in this area:

  • Connection to other networks: Metabolism is strongly linked to other biological processes represented, for example, by protein interaction or gene regulatory networks (see also Sect. 12.5). The combined visualisation and easy visual travelling from one network to the next may help in better understanding effects such as regulation of metabolism.

  • Context for combined omics data: Although several tools support integration and visualisation of omics data within metabolic networks, the visualisation of complex data sets covering several domains (networks, images, sequences, omics data, etc.) is not yet sufficiently solved. Initial solutions have been presented (e. g. Rohn et al. (2011)), but as more and more such data sets are produced in experiments, there is an increasing need for better analysis and visualisation approaches.

  • Mental map preserving layouts: A mental map of a layout is a mental picture of the structure of the layout and helps understanding changing maps (Misue et al., 1995), see also Sect. 12.5.1. It is often used to measure the quality of a dynamic network layout (Archambault et al., 2011), and has been shown to be important in dynamic layouts (Purchase and Samra, 2008; Purchase et al., 2007). Most existing layout algorithms are not mental map preserving and often the same algorithm would produce different visualisations when applied to the same network. Also, there are only a few studies regarding mental map preserving network layouts in visual and immersive analytics (e. g. Kotlarek et al. 2020). However, acceptance of visualisation and exploration methods also depends on better support of the user’s mental map and this is an important area for future research.

Biological network visualisation and the layout of metabolic networks is an interesting area in graph drawing (Binucci et al., 2019). More open questions and major problems arising in biological network visualisation are also discussed in Albrecht et al. (2009). Metabolic network and pathway visualisation is only a small aspect of biological data visualisation. As biology aims to provide insights into the overall system, that is into processes on cellular, tissue, organ and even organism levels, visualisation of metabolism has to be embedded into broader visualisation frameworks. Beside networks and related data, other data modalities are also important, for example, imaging data and phenotypical data.

This chapter presented history and state-of-the-art of visualisation and visual analysis of metabolic pathways and networks, provided descriptions of important metabolic network databases and exchange formats, gave a brief overview of often used tools and discussed future research directions including immersive analytics. Methods and tools presented here are a building block of such a broader visualisation framework for biological data.