Keywords

1 Introduction

Many biological processes are represented as networks. Examples are networks from the area of molecular biology, such as metabolic networks, protein interaction networks, and gene regulatory networks, but also from other areas of the life sciences such as ecological networks, phylogenetic networks, neuronal networks, chemical structures, and infection networks. Network modeling, analysis, and visualization are important steps towards a systems biological understanding of organisms and organism communities. The graphical depiction of such networks supports the understanding of the underlying processes and is essential to make sense of much of the complex biological data that is now being generated.

Fig. 7.1
figure 1

A map of a metabolic pathway shown in the SBGN standard [88], derived from KEGG [61], computed and displayed by Vanted [110]

A picture of a network is called a network diagram or a network map; see Fig. 7.1 for an SBGN map of a metabolic pathway. A network diagram representing biological processes consists of a set of elements (called nodes or vertices) and their connections or interactions (called edges). These elements and connections often have a defined appearance and are placed in a specific layout. Due to the size and complexity of such networks, methods for their automatic visualization and interactive exploration are desired.

Network diagrams or maps have been produced manually for a long time. Examples are textbooks on biochemistry [8, 96], biological network posters [94, 99], and some electronic information systems such as ExPASy [4] and KEGG [61]. The drawings in these resources have been created manually long before their use and provide only a restricted view of the data. These maps represent the knowledge at the time of their generation and are static, hence cannot be changed by an end user. Therefore, this type of biological network visualization is often called static visualization.

Because of the size and complexity of biological networks, their steady growth and continuous change, as well as the compilation of user-specific networks from databases, novel automatic visualization, interaction, and exploration methods are desired. The generation of a network map on demand is called dynamic visualization. Such visualizations are automatically created by the end user from up-to-date data. Their advantages are, inter alia, that they can be modified to provide particular views at the data and often navigation and exploration methods are supported in interactive systems.

This review gives a brief introduction into (information) visualization, visual analytics, and automatic layout of networks, presents the state of the art in automatic network visualization for the life sciences, and standards for the graphical representation of cellular networks and biological processes. It is structured in two main parts as follows: Sect. 7.2 provides information about the foundations from computer science in general and looks into the subareas of information visualization, graph drawing (network visualization), and visual analytics in particular. Section 7.3 takes a closer look at the visualization of biological networks and discusses methods, some important tools, and the SBGN standard. It looks into the application and extension of computer science methods for the special requirements of the life sciences.

2 Background

The effective visualization of biological networks is influenced by research from many different fields. In the past, such networks were simply considered as large graphs (or hypergraphs), and a suitable visual representation was restricted to finding an appropriate (static) graph layout. Nowadays, research in the visualization of large and complex networks is more focused on interactive exploration and analysis that includes the consideration of additional data that might be attached to various graph elements or that might be the basis for the construction of biochemical networks. The process of such a data collection and storage will heavily increase in the future. This is especially true in systems biology where, for example, the huge amount of *omics data automatically generated by high-throughput technologies [3, 39] lead to the challenge of interpreting all of these data sets in context of networks. The fundamental problem today is to transform the data—which is typically not preprocessed, erratic, stored in idiosyncratic formats, sometimes uncertain, and often composed of various types (multidimensional, time dependent, geospatial, etc.)—into information and make it useful/available/analyzable to analysts. Often, this challenge is called the information overload problem. Positive effects of such a transformation are then to discover something that is interesting (like patterns or outliers) or to monitor a huge data set in real time [70].

Because of this general view on the problem, we provide a more general background section. First, we discuss the field of information visualization in the next subsection. We highlight the most important definitions/aims and present a brief high-level overview of visual representations and interaction techniques. Then, we outline the field of graph drawing and discuss the most often used layout algorithms. Finally, a relatively new field, called visual analytics, is introduced. Due to page limitations, we cannot give a comprehensive overview of all aspects of the aforementioned research fields. Instead, we present a selection of fundamental ideas/approaches and refer to the literature including surveys.

2.1 Information Visualization

Information visualization (InfoVis) is a research area which focuses on the use of interactive visualization techniques to help people understand and analyze data. While related fields such as scientific visualization involve the presentation of data that has some physical or geometric correspondence, information visualization centers on abstract information without such correspondences, i.e., information that cannot be mapped into the physical world in most cases. Examples of such abstract data are symbolic, tabular, networked, hierarchical, or textual information sources. The ever-increasing amount of data generated or made available every day amplifies the urgent need for InfoVis tools. To give the field a firm base, InfoVis combines several aspects of different research areas, such as scientific visualization, human-computer interaction, data mining, information design, cognitive psychology, visual perception, cartography, graph drawing, and computer graphics [73, 74].

2.1.1 The Importance of Human Visual Perception and Visual Metaphors

Human information processing and the human capability of information reception have to be adequately taken into account when developing visualization tools. This should be reflected in an appropriate user interface design, a clean requirement analysis and modeling, and perhaps most important an efficient interaction between the human analyst and the computer. Discussing the different features of our eye, the various process models of human visual perception (incl. preattentive perception and features) or our capabilities of pattern recognition would go beyond the scope of this background section. There are many good textbooks that deal with these topics in context of visualizations: we recommend the books of Ware [141], Kerren et al. [74], and Ward et al. [140].

Edward Tufte, one of the leaders in the field of visual data exploration, describes in his illustrated textbooks [131133] how information can be prepared so that the visual representation depicts both the data and the data context. The use of suitable visual metaphors assists our brain in its endeavor to connect new information received through the visual input channels to existing information stored in short- or long-term memory [72]. Tufte inspired many InfoVis researchers in their ambition to develop novel visual representations for the data sets under consideration (the process of representing a concrete data set by an appropriate visual structure is called “visual mapping”) as well as interaction techniques which support a better understanding of the data.

2.1.2 Visual Representations

Visual mappings explain how data models can be expressed using visual metaphors and be converted into corresponding visual representations which are suitable for interaction. This is typically done in the 2D space, because 3D representations usually introduce unnecessary clutter and navigation problems. We highlight the most important visualization techniques for basic data types in the following paragraphs. Of course there are other types of data that have to be considered. We refer to the literature if the reader is interested to get more information, such as [27, 102] for geo-spatial data, [2] for time-series data, or [41, 126, 140] for a comprehensive discussion of visual representations in general.

2.1.2.1 Visualization Techniques for Multivariate Data

Multivariate (or multidimensional) data sets can mostly be described as data tables with n data objects and m attributes/features, i.e., for each object exists an attribute vector with m dimensions. The attribute values can be classified into nominal, ordinal, or quantitative. In practice, we often have a large amount of data objects and many attributes with different types. Finding a suitable visual representation is thus challenging, and the right choice might depend on further parameters like application domain, integration into a larger visualization environment, or support of specific interaction techniques. In general, visual mappings for multivariate data can roughly be categorized as follows:

Fig. 7.2
figure 2

Some examples of often used visualization techniques. The screenshots in (a) and (b) were produced with D3 [22]. (a) Parallel coordinates that visualize a nutrient content data set with more than 1,000 data objects and 14 attributes (available online [31]). Note that the visible polylines were interactively selected in the 3rd and 10th axes. (b) A scatterplot matrix showing data from the Iris data set (available online [11]). Also in this case, the colored points indicate data selected by the user (see the grey-colored selection in the plot of the first column, second row). (c) Small icons/glyphs are embedded into the graph nodes of a metabolic network. In this case, they indicate reachable nodes in other (color-coded) pathways [60]. (d) A pixel-based approach to visualize weather data of a city. The rows represent years, and the temperatures (color-coded from blue over white to red) of each day are ordered from left to right [90]. (e) Sample tag cloud of a text document which is related to information visualization (generated with Wordle [32]) (Color figure online)

Point-based approaches::

This class of techniques projects n-dimensional objects from the data space to a lower-dimensional—typically 2D—display space [140]. There are different variations: scatterplot matrices, for instance, consist of a grid of 2D scatterplots each showing a possible pair of dimensions/attributes [19]; see Fig. 7.2b for an example. Dimensional reduction techniques, such as multidimensional scaling (MDS) [92, 145], principal component analysis (PCA) [53], or self-organizing maps (SOMs) [80], project n-dimensional data records into 2D/3D directly. The idea is to preserve properties of the multivariate data space during the projection, i.e., similar data objects in data space should also be similar in display space which is represented by neighborhood. Note that absolute positions in the display space are less important, in contrast to relative positions.

Axis-based approaches::

Here, a multidimensional data object is usually represented by a polyline, and its attribute values are marked on coordinate axes which can be arranged in various ways. Thus, the user can read the attribute values from the intersections between the coordinate axes and the polyline. The most prominent examples are parallel coordinate systems [49] (cf. Fig. 7.2a) or star plots [16] (also called Kiviat diagrams).

Icon-based approaches::

Icon- or glyph-based approaches are coherent graphical entities that represent the attribute values of a data record by modification of the entity’s visual features, such as line thickness, size, color, and orientation. There are many different realizations, such as stick figures [106], Chernoff faces [18], or shape coding [7]. A variant of so-called rose diagrams [100] is shown in Fig. 7.2c.

Pixel-based approaches::

Such approaches try to maximize the available display space by mapping attribute values to single pixels. There is only one degree of freedom to represent such a value by a pixel: its color. Therefore, the challenge in the development of pixel-based representations is to arrange the used pixels on the screen in a meaningful way. Well-known examples are recursive patterns [65] or the VisDB tool [66] for the analysis of databases. Figure 7.2d exemplifies the idea in context of the visualization of weather data collected over time.

2.1.2.2 Visualization Techniques for Hierarchical Data and Networks

Networks and trees are in the center of our interest in this chapter. Therefore, we provide an own Sect. 7.2.2 for a deeper discussion of suitable visualization possibilities for these data types and focus there on traditional node-link approaches. For the sake of completeness, we want to note that there are also so-called space-filling methods that try to solve some conceptual problems of node-link diagrams, such as the high space consumption and difficult inclusion of many (and complex) attributes into the drawing. Treemaps fall into this category in which the hierarchy is recursively mapped to rectangular areas [52]. Other examples are Beamtrees [134], sunburst approaches [108], or network matrices [1].

2.1.2.3 Visualization Techniques for Text and Documents

Today, the availability of texts and documents is overwhelming, and people want to actively deal with them to solve specific problems. Typical questions are as follows: what documents contain a text about a specific topic? Or are there similar documents to those that I already have? Information visualization is capable of supporting the aforementioned tasks in several ways.

Text visualization::

First, we focus on approaches to the visualization of a single text document. Tag Clouds provide information about the frequency of words contained in a text [63]. The approach uses different font sizes for each word in the text to indicate how often a certain word is used in comparison with the other words as shown in Fig. 7.2e. Several extensions and related approaches exist, such as Wordle or ManiWorlde [77, 138]. SparkClouds extend the original tag cloud idea with a temporal variable by so-called sparklines [87]. Thus, trends can easily be identified and analyzed. An approach for visual literary analysis is called Literature Fingerprinting [67]. It supports the visual comparison of texts by calculating features (e.g., word/sentence length or measurement of vocabulary richness) for different hierarchy levels and by creating characteristic fingerprints of the texts.

Document visualization::

Collections of text documents can be structured to some extent (software packages, wikis, etc.) or relatively unstructured (e-mails, patents, etc.). Early approaches, e.g., Lifestreams [34], simply arranged documents according to specific attribute values such as time tags. More recent works analyze the documents by metrics, such as similarity, and perform cluster analyses or compute SOMs. Conceptually similar (by looking at the resulting visual representation) is ThemeScapes [147] that follows a natural landscape metaphor. Single documents are categorized and then mapped to a document map as topic areas, whereas the documents themselves are shown as small dots. “Mountains” in the landscape represent document concentrations in a thematic environment (density), height lines connect concept domains, etc. There are many more recent approaches that make use of the same metaphor, such as [104]. In order to carry out comparisons of text documents using tag clouds, Parallel Tag Clouds [20] arrange tags on vertical lines for each document. Identical words are then highlighted by connection lines.

2.1.3 Interaction Techniques

Interaction techniques in information visualization are mechanisms “for modifying what the users see and how they see it” [140]. There are many taxonomies of interaction techniques in the literature which help to better understand the design space of interaction; a nice overview is provided by Yi et al. [148]. In the following, we present a simplified and shortened classification of interaction methods for information visualization from our paper [70] which is based on [43] of its own:

Data and view specification: :

This category focuses on the data space and how the data is visually represented (corresponds to data transformations and visual mappings in the InfoVis Reference Model [14]):

  • Encode/visualize: Users can choose the visual representation of the data records including graphical features, such as color and shape. Visual representations typically depend on the data types as discussed in Sect. 7.2.1.2.

  • Reconfigure: Some interaction techniques allow the user to map specific attributes to graphical entities. An example is the mapping of attributes in a multivariate data set to different axes in a scatterplot.

  • Filter: This technique is of great importance as it allows the user to interactively reduce the data shown in a view. Popular methods are dynamic queries by using range sliders [146] or picking a set of nodes in a network visualization for further analyses by performing a “lasso” selection [44].

  • Sort: Ordering of records according to their values is a fundamental operation in the visual analysis process. This is, for example, important in network analysis where nodes might be sorted based on specific centrality values [150].

View manipulation: :

Our second category addresses interacting with visual representations (view transformations in the InfoVis Reference Model).

  • Select: Selection is often used in advance of a filter operation. The aim is to select an individual object or a set of objects in order to highlight, manipulate, or filter them out. Examples include putting a placemark on a virtual map to highlight a spatial area or the specification of attribute ranges in parallel coordinate systems as seen in Fig. 7.2a.

  • Navigate/explore: This important class of interaction techniques typically modify the level of detail in visualizations following the mantra overview first, zoom and filter, and details on demand [121]. Well-known approaches are focus and context [111], overview and detail [51], zooming and panning [137], and semantic zooming [127].

  • Coordinate/connect: Linking a set of views or windows together to enable the user to discover related items. Brushing and linking techniques (e.g., histogram brushing [89]) are used in almost all information visualizations, such as in [59].

  • Organize: Large visualization systems often consist of several windows and workspaces that have to be organized on the screen. Adding and removing views can be confusing to the analyst. Some systems help the user to better overview and to preserve his/her mental map by grouping of views or by assigning specific places where they have to appear [50, 91].

Note that it is possible and also common practice to combine the aforementioned techniques. The given literature references only point to selected example works and make no claim to be complete.

2.2 Graph Drawing and Network Visualization

In this subsection, we distinguish between graphs and multivariate networks. A (simple) graph G = (V, E) consists of a finite set of vertices (or nodes) V and a set of edges E ⊆ { (u, v) | u, v ∈ V, uv}, whereas a multivariate network N consists of an underlying graph G plus additional attributes that are attached to the nodes and/or edges. To describe the fundamental ideas of graph visualization algorithms more efficiently, we have to provide some definitions:

  • An edge e = (u, v) with u = v is called a self-loop.

  • If an edge e exists several times in E, then it is called a multiple edge.

  • A simple graph has no self-loops and no multiple edges. Here, we assume that all graphs are simple graphs for the sake of convenience.

  • The neighbors of a node v are its adjacent nodes.

  • The degree of a node v is the number of its neighbors.

  • A directed graph (or digraph) is a graph with directed edges, i.e., (u, v) are ordered pairs of nodes.

  • A directed graph is called acyclic if it has no directed cycles, i.e., there is no directed path where the same node is visited twice.

  • A graph is connected if there is a path between u and v for each pair (u, v) of nodes.

  • A graph is planar if it can be drawn in the 2D plane without intersections of edges (edge crossings).

2.2.1 Traditional Graph Drawing (GD)

Graph drawing algorithms compute a 2D/3D layout of the nodes and the edges, mainly based on so-called node-link diagrams [141]. They play a fundamental role in network visualization. Particular graph layout algorithms can give an insight into the topological structure of a network if properly chosen and implemented. The graph readability is affected by quantitative measurements called aesthetic criteria [24], such as:

  • Minimization of edge crossings

  • Minimization of the drawing area

  • Displaying the symmetries of the graph topology

  • Constraining edge lengths

  • Constraining the number of edge bends

  • Maximization of the resolution

Thus, graph drawing generally deals with the ways of drawing graphs according to the set of predefined aesthetic criteria [17]. A problem is that these criteria are often contradictory, and problems which aim to optimize the criteria are often NP-hard. Therefore, many GD algorithms are heuristics. Note that we only focus on traditional GD approaches in this subsection. There are further possibilities to represent graphs, such as matrix representations [1] or hybridizations between both approaches [44] (cf. Sect. 7.2.1.2).

In the following paragraphs, a selection of drawing approaches is presented. These are layout methods for trees, force-based layout techniques, and hierarchical drawings. There are many more approaches not discussed here, for instance, orthogonal layouts [29], visualization of hypergraphs [9], or dynamic layouts for graphs that change over time [25] (a possible application of dynamic approaches is visualizing the evolution of biochemical networks [112], for instance). Implementing good graph drawing algorithms is usually complicated and time-consuming. Therefore, a number of different open source libraries were developed, such as JUNG [105] and many others, that allow to simply call predefined methods for the computation of a specific graph layout.

2.2.1.1 Tree Drawings

Trees are a special case of directed (acyclic) graphs that usually have a distinguished node called the root of the tree. We can regard a tree as a digraph with all edges oriented away from the root. A binary tree is a rooted tree where each node has at most two children (we assume here that binary trees are ordered). The graph drawing community developed a lot of different layout methods for binary and general trees. In this context, there is another set of more specified aesthetic criteria especially for (binary) trees:

  • Nodes at the same level of the tree should lie along a straight line, and the straight lines defining the levels should be parallel.

  • A left subtree should be positioned to the left of its parent node and a right subtree to the right.

  • A parent node should be centered over its subtrees.

  • Two isomorphic subtrees should be drawn equally. Graph isomorphism means that there is a bijection between two graphs, so that any two nodes u and v are adjacent in the first graph if and only if their bijections are adjacent in the second graph.

  • A tree and its mirror image should produce drawings that are reflections of one another.

  • Integer coordinates should be preferred which leads to a grid drawing at the end.

Many tree layout algorithms use a divide and conquer strategy, such as the well-known Reingold/Tilford algorithm for binary trees [107]. In a postorder traversal of the tree, the following simple steps are executed:

  1. 1.

    Draw the left subtree.

  2. 2.

    Draw the right subtree.

  3. 3.

    Combine both drawings with a specific minimum distance.

  4. 4.

    Place the root of both subtrees at the next upper level exactly in the center of its subtrees.

  5. 5.

    In case the parent node has only one subtree, place the root in a specific horizontal distance.

Reingold/Tilford runs in linear time and can relatively easily be extended for the layout of general trees [13, 139]. Of course, there are further possibilities of drawing trees with the help of node-link diagrams, such as radial layouts, H-trees, or HV-trees. We refer the reader to the standard literature [24, 64]. Figure 7.3 shows two example layouts computed with the yED tool [149].

Fig. 7.3
figure 3

Two sample tree layouts that were computed and displayed by the yED graph editor [149]. The identical input tree has 30 nodes and 29 edges. (a) A standard tree layout for general trees. (b) A so-called HV-drawing in which the layout algorithm switches between the horizontal and vertical orientation

2.2.1.2 Force-Based Drawings

Force-based layout techniques use a physical analogy to draw graphs and are widely used in practice. This is because of several reasons: the physical metaphor makes them easy to understand and to code, the results are suitable for many application fields, they are easy to extend with additional constraints, and the process of obtaining an equilibrium state (see below) can be animated which looks pretty nice. A simple version of a force-based layout algorithm using spring and electrical repulsion forces is introduced in the following. Here, the edges between nodes are modeled as springs, and the nodes can be considered as charged particles that repel each other. For the x-component of the force vector on a node v, the following holds (y-component analogous):

$$\displaystyle\begin{array}{rcl} \sum _{(u,v)\in E}(\mathrm{sti}_{uv}(d_{uv} - l_{uv}))\hat{x}_{uv} +\sum _{(u,v)\in V \times V }\frac{\mathrm{rep}_{uv}} {d_{uv}^{2}} \hat{x}_{uv}& &{}\end{array}$$
(7.1)

Here, \(\hat{x}_{uv}\) denotes the unit vector of (x v x u ). d uv is the Euclidean distance between u and v, l uv is the zero-energy (natural) length of the spring between u and v (i.e., no force if d uv  = l uv ), sti uv  ∈ [0, 1] is the stiffness of the spring between u and v (i.e., the larger this parameter the more the tendency for d uv to be close to l uv ), and finally rep uv is the strength of the electrical repulsion between the two nodes. In Eq. 7.1, the first sum represents the spring force between two nodes u and v connected with an edge and the second sum the repulsion force between v and other nodes. Both forces together build a complete force system for all graph elements. Depending on the underlying physical model, the repulsion forces avoid that nodes are getting too close, and the spring forces provide a uniform edge length, for instance. In the current formula, Hook’s law is used to specify the spring force between two nodes, i.e., if the distance between the two nodes is larger than the natural length of the spring, then the nodes attract each other. And the strength of the attraction is proportional to the difference between distance and natural length.

A simple algorithm that computes a final graph layout consists of a loop which firstly computes the forces of all nodes and then moves each node a bit into the direction of its force vector computed in Eq. 7.1. At the beginning, all nodes are positioned randomly. The loop is left if the sum of all forces together is small enough (equilibrium state) or after a specific number of iterations. This strategy works for undirected and directed graphs, with and without cycles, cf. Fig. 7.4a.

Fig. 7.4
figure 4

Two sample graph layouts that were computed and displayed by the yED graph editor [149]. The identical input digraph has 29 nodes and 39 edges. (a) Result of a force-based layout algorithm. (b) Layered (or hierarchical) drawing

2.2.1.3 Layered (Hierarchical) Drawings of Directed Graphs

A general aim for the layout of a directed graph is to compute a so-called monotone drawing in which all edges point into the same direction. Such a monotone drawing has some advantages in the interpretation of the digraph’s topology [47]. Obviously, the input digraph must be acyclic in that case, otherwise we would get edges that flow backwards (called feedback edges). In practice this apparent hard condition is not really a problem, because we can use such a drawing method for general directed graphs if we change the direction of a minimal number of the feedback edges. This step is known as cycle removal. By doing so, we get a directed acyclic graph (DAG) that is drawn by using a method for computing monotone layouts, such as a layered drawing as explained in this paragraph. If the final layout is ready, we simply reverse the feedback edges again.

Many people prefer a hierarchical structure of the final graph layout, i.e., the nodes of the graph are arranged on vertical or horizontal, parallel layers in the 2D plane. Often, such a structure is already given by the input data. For instance, if someone wants to visualize hyperlinks (edges) between the HTML pages (nodes) of a website, then usually the pages are already hierarchically organized. In the following, we briefly present a standard technique for layered drawings that is based on the fundamental work of Sugiyama et al. [129].

The basic idea is very simple and intuitive; it has three phases. In the first phase, the nodes of the graph are assigned to a number of layers (we can skip this phase if there is already a layering in the input graph). This layer assignment problem is NP-complete if we want to minimize the height and the width of the final layering. A further complication occurs if edges span over several layers: then we have to introduce the so-called dummy nodes that lie on the spanned layers, i.e., a long edge is thus subdivided by the dummy nodes. This strategy causes modified edges which only reach from one layer to the next one (the digraph is called proper in such cases) and is needed for the second phase. After the layer assignment, we have to eliminate the number of edge crossings. This is done by reordering the graph nodes and the dummy nodes within each layer. With the help of the dummy nodes, the algorithm gets control over the edge positioning, and in consequence, it is possible to avoid crossings of edges that span over several layers. Minimizing edge crossings in a proper layered digraph is NP-complete, even if there are only two layers. Note that the node positions (x-coordinates) on the layers are relative only up to now (the y-coordinates of the nodes are already specified by the node layers if we assume to have horizontal layers). The final phase is the real coordinate assignment of all nodes on the layers, i.e., we assign concrete x-coordinates for each (normal and dummy) node. Also this task leads to an optimization problem that can be solved, for instance, by linear programming (LP). Constraints of the LP are then the fixed orderings in the layers, and the target function is specified by the straightness of the edges. As a final step, we remove the dummy nodes and obtain the wished layered drawing as shown in Fig. 7.4b.

2.3 Multivariate Network Visualization

Good drawing algorithms as described in the previous subsection will not solely solve the problem of visualizing multivariate networks. There are several reasons for this statement. First, the most traditional graph drawings do not scale well, i.e., they are not able to represent huge data sets with many thousands of nodes and/or edges. Second, additional multivariate data cannot be intuitively embedded into a standard drawing. The InfoVis community tried to address those issues by visualization approaches that provide filtering and interaction possibilities in order to reduce the number of graph elements under consideration as well as by methods to visually analyze attributes in context of the underlying graph topology. Several approaches can be found in the literature that attempt to offer solutions for the problem of visualizing multivariate networks: multiple and coordinated views, integrated approaches, semantic substrates, attribute-driven layouts, and hybrid approaches [57]. We will discuss these concepts in the following paragraphs:

Multiple and coordinated views::

This category of solutions aims to combine several views and present them together. Coordinated views allow the use of the most powerful visualization techniques for each specific view and data set [41, 109]. As an application example, we highlight the work of Shannon et al. [120] who realized this idea in the network visualization domain. They use two distinct views: one view shows a parallel coordinate approach for the visual representation of the network attributes and the other view displays a node-link drawing of a graph. Their tool is equipped with a variety of visualization and interaction techniques; both views are coordinated by linking and brushing [126] techniques. The drawback of multiple views is that they split the displayed data because of the spatial separation of the visual elements.

Integrated approaches::

To provide a combined picture, attributes and the underlying graph can be displayed in one single view. “Integrated views can save space on a display and may decrease the time a user needs to find out relations; all data is displayed in one place” [41]. One example is described in Borisjuk et al. [10] work on the visualization of experimental data in relation of a metabolic network. The authors used a straightforward approach by employing small diagrams instead of representing the nodes as simple circles or rectangles. Each diagram, e.g., a bar chart, shows experimental data that is related to the regarded node. This approach provides a view to all available information, but the embedding of the visualizations into the nodes causes the nodes to grow in size. This issue may affect the readability of the network due to the overlaps that may appear when the number of nodes and the attributes is high [71]. Thus, it does not scale well. However, the problem of space usage and clutter introduced by such approaches can be avoided by using focus and context techniques (cf. Sect. 7.2.1). Magic lenses are one of several possibilities that are able to interactively visualize the node attributes within the same view as exemplified in Fig. 7.5.

Fig. 7.5
figure 5

Overview of the Network Lens tool [58]. The graphical user interface is divided into three distinctive parts: the main network visualization area, the lens information area on the right-hand side, and the bottom part where user-produced lenses are preserved. It offers a way to visualize additional network attributes (displayed inside of the circular lens), while preserving the overall network topology and context. The lens in the screenshot covers one node only and shows a small parallel coordinate diagram with four quantitative as well as four nominal attributes belonging to that node. The user is able to move the lens with the mouse or to translate the graph behind the lens

Fig. 7.6
figure 6

The screenshot shows a tool for the visual analysis of dynamic metabolic networks [112]. On the left-hand side, two time-series charts of selected attributes display attribute dynamics over time. Interval charts represent the dynamic topology of the graph in terms of life times of metabolites, enzymes, and reactions. On the right, the graph scene shows the set union graph (= the super graph that summarizes all nodes/edges of the individual graphs that appear over time) with the applied node coloring scheme which supports distinguishing between older and newer nodes

Semantic substrates::

In order to further avoid clutter in multivariate network visualizations, some researchers realized the idea of so-called semantic substrates that “are non-overlapping regions in which node placement is based on node attributes”: Shneiderman and Aris [122] introduced this idea and combined it with sliders to control the edge visibility and thus to ensure comprehensibility of the edges’ end nodes. One conceptual drawback of such approaches is that the underlying graph topology is not (completely) visible.

Attribute-driven layouts::

Those layouts use the display of the network elements to present insight about the attached multivariate data instead of visualizing the graph topology itself. While being similar to semantic substrates, this technique does not necessarily place the nodes into specific regions. Instead, it uses calculations based on node attributes to control the placement of a node in the graph layout. An example is PivotGraph [142] which uses a grid layout to show the relationship between (node) attributes and links.

Hybrid approaches::

They combine at least two of the previously discussed techniques. The most common combinations are multiple coordinated views with any of the integrated approaches. For instance, Rohrschneider et al. [112] integrate additional attributes of a biological network inside the nodes and edges; see Fig. 7.6. The authors also use other visual metaphors for creating multiple coordinated views to show time-related data of the network.

2.4 Visual Analytics

Fig. 7.7
figure 7

Overview of the ViNCent user interface [150]. The center shows the radial centrality view of the input network. The right side displays the corresponding histograms of the network centralities as well as detailed values of the network centralities for the currently hovered node. Histograms can be used to filter the views. The left panel allows changing the render settings and displays an overview of the respective node-link layout of the network. A node group has been manually selected and is shown as a light-blue stripe along the outer circle in the centrality view as well as in the overview (bottom left) by using a background region of the same color (Color figure online)

Visual analytics (VA) “is the science of analytical reasoning facilitated by interactive visual interfaces” [130]. A crucial property of this research field is that computational methods of data analysis are combined with interactive visualization techniques in order to analyze data more efficiently. Automatic data analysis covers various aspects from data storage and organization to automatic analysis algorithms, such as support vector machines, neural networks, and PCA. It might be classified among others into data management, data mining, and machine learning. For many data analysis problems, fully automated analysis methods only work for well-defined and well-understood problems, i.e., there has to exist a model of the underlying problem [68]. Otherwise, traditional data mining techniques will not work. Even if a model exists, then the results of the automated analyses have to be sufficiently communicated to and interpreted by analysts. Here, interactive visualizations come into the play as they are able to support the analyst to discover (possibly unexpected) patterns, trends, or relationships in the data. Interaction techniques (as presented in Sect. 7.2.1.3) are of particular importance to visually analyze large volumes of data. Interaction allows, among other things, to explore “unknown” data collections following Shneiderman’s mantra of information visualization [121] or to build hypotheses with the help of “What if?” questions and to verify them visually or with algorithmic methods. The need to combine interactive visualization with computational analysis methods is obvious and opens novel possibilities to address the information overload problem. A more detailed discussion on VA can be found in [68, 69, 130].

As an example from the field of visual network analysis, we have selected the ViNCent tool [75, 150] that combines exploratory data visualization with automatic analysis techniques, such as computing a variety of centrality values for network nodes as well as hierarchical clustering or node reordering based on centrality values. Automatic and interactive approaches are seamlessly integrated in one single analysis framework which provides insight into the importance of an individual node or groups of nodes and allows quantifying the network structure; see Fig. 7.7.

3 Visualization of Biological Networks

Visual representations of biological networks are widely used in the life sciences. Examples are shown in textbooks, on pathway posters, in databases, and by a large number of tools for the analysis and visualization of biological processes. Well-known software tools are listed in Sect. 7.3.1.2. Software tools often use established layout methods as described in Sect. 7.2.2 to visualize biological networks automatically. Sometimes those algorithms are modified, for example, by adding extra forces to force-based approaches. However, often these methods do not or only partly take into account specific requirements for the visualization of a particular biological network, and hence these visualizations are usually difficult to understand, especially if large networks are visualized.

In the following subsections, we will introduce some typical solutions for common networks from molecular biology, discuss domain-adapted solutions for particular networks, list major tools for the visualization of biological networks, and finally discuss the Systems Biology Graphical Notation (SBGN) as the graphical standard for biological networks.

3.1 Methods

3.1.1 Early Approaches

Driven by the emerging availability of biological networks from databases in the mid-1990s, several groups started to either use existing graph drawing algorithms or design extensions to these algorithms to automatically visualize biological networks. In the following, we present such early work for the three major types of networks from molecular biology.

Fig. 7.8
figure 8

Three sample layouts of biological networks. (a) and (b) were computed and displayed by the Vanted system [110]; (c) was computed by BioPath [33]. (a) A gene regulatory network (nodes represent genes, edges represent regulation, and labels show gene names). (b) A protein interaction network (nodes represent proteins; edges represent interaction). (c) A metabolic network (nodes represent metabolites, enzymes, and reactions; edges represent consumption and production)

3.1.1.1 Signal Transduction and Gene Regulatory Networks

These networks represent regulation or directed interaction between biological entities (such as genes) and are usually modeled as directed graphs; see Fig. 7.8a. There are two widely used methods to visualize such networks: force-based and layered drawings. Several systems provide force-based graph drawing methods for the visualization of these networks, for example, PATIKA [23] and GeNet [118]. These tools typically use well-known force-based algorithms such as Eades’ algorithm [28], often based on existing layout libraries and systems like Pajek [5] or yFiles [144]. There are some improvements of the general force-based method to consider application-specific requirements such as the representation of subcellular locations. One example is implemented in the PATIKA system.

Signal transduction and gene regulatory networks are directed graphs and, for example, the visualization of the main direction is important to understand the flow of information through the network. Therefore, layered drawing methods are often employed for the computation of maps of these networks. Some tools using this layout method are TransPath [85] and BioConductor [15]. Often layout libraries for layered drawings such as dot [84] are used.

3.1.1.2 Protein Interaction Networks

These networks represent proteins and their interactions and are modeled as undirected graphs; see Fig. 7.8b. Several systems which employ force-based graph drawing methods for their visualization have been presented, for instance [12, 42, 98, 119]. Also some work on interactive exploration of protein interaction networks has been done, for example, by combining circular and force-based layouts and smooth transitions between subsequent drawings using animation [35].

3.1.1.3 Metabolic Networks

These networks represent the transformation of metabolites into each other and are usually modeled as directed graphs; see Fig. 7.8c. There are two common approaches to visualizing metabolic networks: force-based and layered drawing methods. Several network analysis tools support force-based layouts, for example, BioJAKE [113], Cytoscape [119], PathwayAssist [101], and VisANT [45]. Frequently they visualize not only metabolic but also other types of biological networks. However, force-based approaches mostly do not meet common application-specific requirements. Such requirements are, inter alia, different sizes of nodes, the special placement of co-substances and enzymes, and the general direction of pathways.

Layered drawings are often used as they emphasis the main direction in the network. Tools supporting layered drawings are largely based on existing software libraries. Such solutions show the main direction within networks and partly deal with different node sizes. However, there is no specific placement of co-substances or special pathways such as cycles. Examples are PathFinder [40] (which uses the VCG library [114]) and BioMiner [123] (which employs yFiles [144]). The earliest approach to our knowledge is from Karp and Paley, where the complete network is separated into parts such as trees, paths, and circles, and the parts are laid out separately [62]. Although not a layered drawing algorithm as described in Sect. 7.2.2, it results in an overall layout with some layered structure. Extended layered drawings consider cyclic structures within the network or show pathways of different topology using different layouts, such as the algorithm by Becker and Rojas [6]. An advanced layered drawing algorithm for metabolic networks considering all relevant visualization requirements has been presented in [115].

3.1.2 Current Approaches and Tools

There are many challenges in current research of biological network visualization and visual analytics, such as visual analysis of integrated and correlated data, visual comparison of networks, integrated and overlapping networks, graphical representation of paths and flows, and hierarchical networks; see [3, 39]. Consequently, this field has become very research active and, for example, several special algorithms have been presented in the last few years concerning the layout of biological networks. Among them are grid-based methods [81], clustered circular layouts [38], and constraint-based methods [116]. The quality of these specialized layout algorithms is often much better than just applying standard methods, an example is shown in Fig. 7.1.

A broad range of more than 170 tools for the modeling, analysis, and visualization of biological networks is nowadays available on the Internet. These tools change often rapidly, new tools emerge, and old tools obtain new features or are not longer maintained. Therefore, only a small set of some important tools will be listed here. Other reviews are available, for example, Suderman and Hallett in 2007 compared more than 35 tools regarding network and data visualization [128]; Kono et al. compared tools for pathway representation, mapping and editing, and data exchange in 2009 [83]; and Gehlenborg et al. looked at visualization tools for interaction networks and biological pathways in 2010 [39].

The following tools may be of interest to the reader. As the functionality of the tools changes rapidly over time, we do not provide a feature list but encourage the reader to visit the respective tool websites given below:

3.2 SBGN Standard

Biological networks shown in books, articles, and online resources are often difficult to understand as the same biological concept can be shown by using different graphical representations. Therefore, it is time-consuming to get familiar with the graphical notation used, but this also carries the danger of misinterpretation. Consequently, particularly for molecular-biological networks such as gene regulatory, signal transduction, protein interaction, and metabolic networks, there were several attempts to define a uniform representation. This includes Kitano’s Process Diagrams [76], Kohn’s Molecular Interaction Maps [79], and Michal’s representation of metabolic pathways [95]. However, a single map type is often not enough to adequately illustrate the complexity of biological processes, and none of the mentioned attempts has asserted itself as a widely used standard.

Since 2006, there is a new initiative which partly builds on earlier standardization attempts and is closely connected with the successful exchange format SBML (System Biology Markup Language) [48]: SBGN—the System Biology Graphical Notation [88]. Additional material can be found under http://sbgn.org, and formal specifications are available [93, 97, 103]; see the previously mentioned website for the latest version of the specification.

Fig. 7.9
figure 9

Two examples of SBGN maps. (a) Part of a metabolic pathway in SBGN notation (pathway derived from MetaCrop [117], an information system based on Meta-All [143]). (b) Part of a gene regulatory network in SBGN notation (derived from RIMAS [55])

SBGN supports three corresponding views or maps on a biological process: process description which describes elements (cellular building blocks like molecules, and nucleic acid sequences but also other information like observable events) and interactions between these elements; entity relationship which presents the interaction between biological entities and the influence of entities on other elements; and activity flow which focuses on the flow of information from one activity to another. These different language types enable to show different aspects of biological processes. A process description contains, for example, a molecule often several times in different states, e.g., phosphorylated or unphosphorylated, while both other map types show in each case only one occurrence of such a molecule. Figure 7.9 shows two molecular-biological networks in SBGN notation.

There are several tools supporting SBGN, including CellDesigner [36], EPE (Edinburgh Pathway Editor) [30], PathVisio [135], and SBGN-ED [21] (an extension of Vanted [110]). A comparison has been done by Junker et al. [56]. There is also SBGN support for tool developers [136].