Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Visualizing time-varying relationships between entities using converging and diverging curves on a timeline has received a considerable amount of interest recently. The ability to display interactions among entities, while at the same time being able to put these in a chronological context has found applications beyond its initial purpose which coined its name. Munroe [26] introduced the storyline visualization as hand-drawn illustrations in xkcd’s “Movie Narrative Charts”, where lines represent the characters of various popular movies and the scenes are ordered chronologically and represented by bundling the lines of the corresponding characters. This concept has been used to visualize various spatiotemporal data, like communities in time-varying graphs [25, 33], software projects [28], topic analysis [9], etc.

However, hand crafted or semi-automated methods are limited in their applicability in a world of ever growing datasets. In order to obtain a storyline visualization automatically, Tanahashi and Ma [32] discuss various aspects of a well-designed storyline visualization and present an evolutionary algorithm that incorporates these in its objective function. They identify three important criteria that one usually wants to optimize: line crossings, whose number should be small, line wiggles, that should be avoided by drawing every chronological line as straight as possible, and space efficiency. Based on these aspects, Liu et al. [24] describe a technique which further improves the layout and runs significantly faster compared to the evolutionary algorithm in [32]. Being able to create storyline visualizations of bigger instances, Tanahashi et al. [31] take this one step further and show how to create storyline visualizations from streaming data.

In this paper, we study the crossing minimization problem in storyline visualization from a combinatorial optimization point of view. While most approaches tackle this problem with heuristics, Kostitsyna et al. [22] recently shed some light onto its combinatorial properties. Besides noting that the decision problem is NP-complete (by reduction from bipartite crossing number), they provide a lower bound for the number of crossings in a restricted variant of the problem and show that the general problem is fixed-parameter tractable in the number of characters. But a straightforward implementation of the algorithm is impractical, even for a small number of characters.

However, the problem is also similar to a few already well-studied problems in graph-drawing. It may seem that the problem is related to a special case of metro-line crossing minimization, in particular, the so called two-sided models in which metro-lines run only from left to right [5, 12]. However, all metro-line crossing minimization problems have in common that they are defined on a rail network whose embedding is fixed due to its geographical context. This difference makes a straightforward transformation difficult.

As already observed by Kostitsyna et al. [22], storyline crossing minimization has a strong relationship to multi-layer crossing minimization (MLCM). Here each node of the graph is assigned to one of the layers (parallel straight lines) in such a way that each edge connects two vertices on consecutive layers. The aim is to find an ordering of the nodes on every layer such that the total number of edge crossings is minimized. Although the corresponding planarity testing problem is linear-time solvable [19], the MLCM problem itself remains NP-hard even when restricted to two layers [13]. This led to the development of various heuristics, but also to exact approaches [6, 8, 16, 18].

In order to exploit existing techniques for solving MLCM instances, a straightforward transformation can be sketched as follows. We represent the characters as paths in an MLCM instance, in which the layers mark important points in time, e.g., a new bundle has to be created. Of course, a bundle of lines (paths) requires the corresponding vertices to be consecutive on the layer, a constraint which is problematic in the general MLCM setting.

But we can borrow ideas from another crossing minimization problem type, the so called tanglegrams. The general tanglegram problem consists of two trees and a set of edges connecting the leaves of one tree with the leaves of the other, i.e., the leaves and the connecting edges form a bipartite graph. The objective is essentially to perform a bipartite (or two-layer) crossing minimization with the additional constraint that leaves of the same subtree appear consecutively on the layers. However, when consulting the literature on tanglegrams, attention must be paid to the details. Some definitions require the trees to be binary, while others restrict the edge set to be a perfect matching, or both [7, 11, 27]. Since the focus of the paper is not on tanglegrams, we restrict ourselves to the general case. Here two works are of interest, Baumann et al. [4] describe an ILP-based approach, whereas Wotzlaw et al. [34] employ a SAT-formulation. However, not only the techniques differ, in [4] only two layers are considered, whereas the SAT approach in [34] works on multiple layers but requires that the tree constraints are k-ary with \(k>1\) fixed.

The related problem of testing level planarity under tree constraints is discussed by Angelini et al. [1]. They show that if edges are restricted to run between consecutive layers, then the problem can be solved in quadratic time, whereas if this restriction does not hold, the problem is NP-complete.

In this paper we solely focus on the crossing minimization problem in storyline visualization. Therefore, we neglect other design aspects and restrict ourselves to the combinatorial problem, i.e., determining an ordering of the lines such that the number of crossings is minimum. We model this problem as a special variant of the MLCM problem under tree constraints and provide an ILP formulation for it. Computational results show that we are able to solve instances of moderate size to optimality within a few seconds. Moreover, we provide solutions for storyline instances from the literature, some of which have been solved to optimality for the first time. These are of particular value, since they offer a reference when comparing the crossing minimization performance of heuristics.

2 Modelling Storyline Visualization as Multi-layer Crossing Minimization with Tree Constraints

We begin with a formal definition of the multi-layer crossing minimization problem with tree constraints (MLCM-TC). The input for MLCM-TC consists of a graph \(G=(V,E,\mathcal {T})\), where the set of the nodes \(V=\bigcup _{r=1}^p V_r\) is partitioned into p different layers. \(E=\bigcup _{r=1}^{p-1} E_r\) is the set of the edges such that \(E_r\subseteq V_r\times V_{r+1}\) for every \(r\in \{1,2,\ldots ,p-1\}\), i.e., each edge of \(E_r\) has one end in \(V_r\) and the other in \(V_{r+1}\). \(\mathcal {T}=\{T_{r}\mid r=1,2,\ldots ,p\}\) is a family of rooted trees with at least one internal node (root node), whose leaves are exactly the nodes of \(V_r\). In the following, whenever we consider a graph, we implicitly assume that it is of this type, which is known in the literature as “(proper) \(\mathcal {T}\)-level graph” [1].

Given an instance \(G=(V,E,\mathcal {T})\) of MLCM-TC, the task is to determine, for each layer \(r\in \{1,2,\ldots ,p\}\), permutations \(\pi _r=\langle v_1,v_2,\ldots ,v_{\left| V_r\right| }\rangle \) of the nodes in \(V_r\) such that for each internal node \(\tau \) of \(T_r\), all t leaves in the subtree rooted at \(\tau \) are adjacent in \(\pi _r\), i.e., they form a sub-permutation \(\langle v_i,v_{i+1},\ldots ,v_{i+t-1}\rangle \) for some \(i\in \{1,2,\ldots ,\left| V_r\right| -t+1\}\).

An easy reduction of the NP-hard MLCM problem to the MLCM-TC problem (add a trivial tree with the root as the only internal node to each layer) shows that MLCM-TC is NP-hard. This justifies the usage of integer programming techniques in the next section. Now we give a formal description of the storyline visualization problem in order to support our hypothesis that MLCM-TC captures its core when the criteria “line wiggle avoidance” and “space efficiency” are neglected in favour of crossing minimization. A story consists of a set of characters \(C=\{c_1,c_2,\ldots ,c_n\}\) and a set of scenes \(S\subseteq 2^C\). For each scene \(s\in S\), \(b_s\) and \(e_s\) are the points in time when s begins and ends, respectively. The time intervals \([b_{s_1},e_{s_1}]\) and \([b_{s_2},e_{s_2}]\) of two distinct scenes \(s_1\) and \(s_2\) may have a non-empty intersection, but if they do, we require \(s_1\cap s_2=\emptyset \).

The storyline visualization problem requires depicting each character \(c\in C\) by a curve in the Euclidean plane that is strictly monotone on the time axis that we arbitrarily fix to the horizontal x-axis. The curve begins at the x-coordinate \(x_c^b=\min \{b_s\mid c\in s\}\) and ends at \(x_c^e=\max \{e_s\mid c\in s\}\). We call the interval \([x_c^b,x_c^e]\) the lifespan of character c.

The curves must be such that for every scene \(s=\{c_{\sigma _1},c_{\sigma _2},\ldots ,c_{\sigma _k}\}\in S\) the k corresponding curves in the interval \([b_s,e_s]\) are horizontal parallel lines that are equally spaced with vertical distance 1. Furthermore, the curves of all \(c\not \in s\) are restricted to y-coordinates that have an absolute difference of at least 2 to the y-coordinates of the curves \(c_{\sigma _i}\in s\) in the interval \([b_s,e_s]\), and to all curves for characters that are not members of any scene that intersects with \([b_s,e_s]\). An example is given in Fig. 1.

Fig. 1.
figure 1

An example of a story with four scenes and four characters, where characters \(c_3\) and \(c_4\) enter late, character \(c_2\) leaves early, and the time intervals \([b_{s_2},e_{s_2}]\) and \([b_{s_3},e_{s_3}]\) have a non-empty intersection.

Given a story \((C,S,\{[b_s,e_s]\mid s\in S\})\), we construct an MLCM-TC instance \(G=(V,E,\mathcal {T})\) as follows:

  1. 1.

    Sort the points in time \(\{b_s\mid s\in S\}\cup \{e_s\mid s\in S\}\) in non-decreasing order, and let \(\langle t_1,t_2,\ldots ,t_p\rangle \) be the sorted sequence.

  2. 2.

    Associate a layer \(V_r\) with each \(t_r\) (\(r\in \{1,2,\ldots ,p\}\)), create a node \(v_{c,r}\) for each character c for which \(t_r\) is within its lifespan, i.e., for which \(t_r\in [x_c^b,x_c^e]\), and let \(V_r=\{v_{c,r}\mid t_r\in [x_c^b,x_c^e]\}\).

  3. 3.

    Let \(V=\bigcup _{r=1}^p V_r\).

  4. 4.

    Let \(E=\big \{\{v_{c,r},v_{c,r+1}\}\text { for all }c\in C\text { such that }t_r,t_{r+1}\in [x_c^b,x_c^e]\big \}\).

  5. 5.

    For each layer \(V_r\) create a tree \(T_r\) as follows:

    1. i.

      For each scene \(s=\{c_{\sigma _1},c_{\sigma _2},\ldots ,c_{\sigma _k}\}\) such that \(t_r\in [b_s,e_s]\) create an internal tree node \(v_{s,r}\) and tree edges \(\{v_{s,r},v_{c_{\sigma _i},r}\}\) for all \(i\in \{1,2,\ldots ,k\}\).

    2. ii.

      Unless the above results in a rooted tree with all nodes in \(V_r\) as leaves, create a tree root \(\rho _r\) and tree edges connecting \(\rho _r\) to all previously created internal tree nodes of \(T_r\) and to all character nodes in \(V_r\) that are not joined to a previously added internal tree node.

In Fig. 2, we demonstrate the construction for our example instance from Fig. 1.

Fig. 2.
figure 2

The MLCM-TC instance of the story of Fig. 1.

Notice that the trees in \(\mathcal {T}\) are all of height up to 2, which means that storyline visualization instances yield a special subclass of MLCM-TC instances. By construction, an optimal solution of this MLCM-TC instance induces a storyline visualization with the minimum number of crossings, and, conversely, any instance of this special MLCM-TC subclass with trees of height up to 2 is the result of the given transformation for some story. Thus, both problems are equivalent. As MLCM can be reduced to this special subclass, NP-hardness is maintained.

3 Integer Linear Programming Formulation

We present an integer linear programming (ILP) formulation of MLCM-TC. ILP formulations have already been introduced for the general MLCM problem [16, 18] as well as for MLCM-TC, when restricted to the special case of two layers only [4]. Both models use quadratic ordering formulations. In this section, we will extend these formulations to an ILP model for MLCM-TC.

To this end, let \(G=(V,E,\mathcal {T})\) be an instance of MLCM-TC, as described in Sect. 2. For every layer \(r\in \{1,2,\ldots ,p\}\), let \(V_r^{(2)}=\{(i,j)\in V_r\times V_r:i<j\}\) be the set of all the ordered pairs of nodes on the considered layer with the first index smaller than the second. As the total number of edge crossings is the sum of all crossings in adjacent layers r and \(r+1\), summed up for all \(r\in \{1,2,\ldots ,p-1\}\), let us consider the problem for a pair of adjacent layers r and \(r+1\), with \(r\in \{1,2,\ldots ,p-1\}\).

A permutation of the nodes in \(V_r\) is characterized by variables \(x_{ij}^r\in \{0,1\}\) associated with the pairs \((i,j)\in V_r^{(2)}\) as follows:

$$\begin{aligned} x_{ij}^r=1\quad \text {if and only if }i \text { is placed above }j \text { on layer }r. \end{aligned}$$

Then a pair of edges \((i,k),(j,\ell )\in E_r\) crosses if and only if

$$\begin{aligned} i \text { is placed above }j \text { on layer }r \text { and }\ell \text { is placed above }k \text { on layer }r+1 \end{aligned}$$

or

$$\begin{aligned} j \text { is placed above }i \text { on layer }r \text { and }k \text { is placed above }\ell \text { on layer }r+1, \end{aligned}$$

see Fig. 3.

Fig. 3.
figure 3

An edge pair crosses in two of four cases.

Therefore, if \(\{x_{ij}^r\mid (i,j)\in V_r^{(2)}\}\) and \(\{x_{k\ell }^{r+1}\mid (k,\ell )\in V_{r+1}^{(2)}\}\) describe node permutations on layers r and \(r+1\), respectively, we have

$$\begin{aligned} c_{ijk\ell }:=x_{ij}^r(1-x_{k\ell }^{r+1})+(1-x_{ij}^r)x_{k\ell }^{r+1}\in \{0,1\} \end{aligned}$$

and \(c_{ijk\ell }=1\) if and only if the edges (ik) and \((j,\ell )\) cross.

It is well known (see, e.g., [14]) that \(\{x_{ij}^r\in \{0,1\}\mid (i,j)\in V_r^{(2)}\}\) characterizes a node permutation on \(V_r\) if and only if the transitivity conditions

$$\begin{aligned} 0\le x_{hi}^r+x_{ij}^r-x_{hj}^r\le 1\qquad (h<i<j) \end{aligned}$$

are satisfied for all \(r\in \{1,2,\ldots ,p\}\).

It remains to model the tree conditions implied by the elements of \(\mathcal {T}\). Given a layer \(r\in \{1,2,\ldots ,p\}\) and two nodes i and j in \(V_r\), we denote by P(ij) the lowest common ancestor of i and j in \(T_r\). Let \(V_r^{(3)}=\{(h,i,j)\in V_r\times V_r\times V_r:h<i<j\}\). For every \(r\in \{1,2,\ldots ,p\}\) and every triple \((h,i,j)\in V_r^{(3)}\), we impose the tree constraints

$$\begin{aligned} \begin{array}{ll} x_{hj}^r=x_{ij}^r&{}\qquad \text {if }P(h,i)\ne P(P(h,i),j), \\ x_{hi}^r=x_{hj}^r&{}\qquad \text {if }P(i,j)\ne P(h,P(i,j)). \end{array} \end{aligned}$$

The first equation forbids the placement of j between h and i in case j does not belong to the smallest subtree containing h and i. Similarly, the second equation forbids the placement of h between i and j in case h is not contained in the smallest subtree of i and j.

Putting it all together, we obtain the following model for MLCM-TC based on a combination of [18] for MLCM and [4] for the special case of MLCM-TC for two layers:

$$\begin{aligned} \text {minimize}\qquad \sum \limits _{r=1}^{p-1}\;\sum \limits _{\mathop {(i,k),(j,\ell )\in E_r}\limits ^{\scriptstyle {(i,j)\in V_r^{(2)},\;(k,\ell )\in V_{r+1}^{(2)}}}}x_{ij}^r(1-x_{k\ell }^{r+1})+(1-x_{ij}^r)x_{k\ell }^{r+1} \end{aligned}$$

subject to

$$\begin{aligned} \begin{array}{c@{\quad }ll} 0\;\le \;x_{hi}^r+x_{ij}^r-x_{hj}^r\le 1&{}\text {for all }r\in \{1,2,\ldots ,p\}&{}\text { and }(h,i,j)\in V_r^{(3)}\\ x_{hj}^r=x_{ij}^r&{}\text {for all }r\in \{1,2,\ldots ,p\}&{}\text { and }(h,i,j)\in V_r^{(3)}\\ &{}&{}\text { if }P(h,i)\ne P(P(h,i),j)\\ x_{hi}^r=x_{hj}^r&{}\text {for all }r\in \{1,2,\ldots ,p\}&{}\text { and }(h,i,j)\in V_r^{(3)}\\ &{}&{}\text { if }P(i,j)\ne P(h,P(i,j))\\ x_{ij}^r\in \{0,1\}&{}\text {for all }r\in \{1,2,\ldots ,p\}&{}\text { and }(i,j)\in V_r^{(2)}. \end{array} \end{aligned}$$

This is a quadratic 0–1-programming problem with linear constraints, namely, the transitivity conditions and the tree conditions. (Without the tree conditions, the problem is also called a quadratic linear ordering problem.)

When we temporarily ignore the transitivity conditions and the tree conditions, the remaining problem is known as quadratic 0–1-optimization of the form

$$\begin{aligned} \begin{array}{rl} \text {minimize}&{}z^TQz+q^Tz\\ \text {s.t.}&{}z\in \{0,1\}^N \end{array} \end{aligned}$$

for an upper triangular matrix \(Q\in \mathbb {Z}^{N\times N}\) and a vector \(q\in \mathbb {Z}^N\). A well known construction of Hammer [15], see also [2, 10, 23], results in an equivalent formulation as a maximum cut problem on a graph \(G_\textit{mc}=(V_ \textit{mc},E_ \textit{mc})\) with \(N+1\) nodes, all but one are identified with the \(z_i\), \(i\in \{1,2,\ldots ,N\}\). Let us call the additional node \(z_0\), so \(V_ \textit{mc}=\{z_0,z_1,\ldots ,z_N\}\). The undirected edges \((z_i,z_j)\), \(1\le i<j\le N\), correspond to the nonzero entries of the matrix Q, and there are additional N edges \((z_0,z_i)\) for \(1\le i\le N\), giving the edge set \(E_ \textit{mc}\). The edge weights \(w_e=w_{ij}\), \(0\le i<j\le N\), are easily computed from Q and q. For \(W\subseteq V_ \textit{mc}\) the edge set \(\delta (W)=\{(i,j)\in E_ \textit{mc}\mid i\in W,\,j\in V_ \textit{mc}\setminus W\}\) is called a cut in \(G_\textit{mc}\). Then the resulting maximum cut problem has the form

$$\begin{aligned} \max \{w(\delta (W))\mid W\subseteq V_ \textit{mc}\}. \end{aligned}$$

By introducing variables \(y_e\in \{0,1\}\) for each \(e\in E_\textit{mc}\), the maximum cut problem can be formulated as

$$\begin{aligned} \text {maximize}\qquad \sum \limits _{e\in E_\textit{mc}}w_ey_e \end{aligned}$$

subject to

$$\begin{aligned} \begin{array}{c@{\quad }l} \sum \limits _{e\in F}y_e-\sum \limits _{e\in C\setminus F}y_e\le |F|-1&{}\text {for all cycles }C\subseteq E_\textit{mc}\text { and all }F\subseteq C,\,|F|\text { odd}\\ y_e\in \{0,1\}&{}\text {for all }e\in E_\textit{mc}\,, \end{array} \end{aligned}$$

see [3]. The constraints are called odd cycle constraints.

Applying this transformation is the key to our algorithm: The edges \(e\in E_\textit{mc}\) not incident to \(z_0\) correspond to edge pairs \((i,k),(j,\ell )\in E_r\), \(r\in \{1,2,\ldots ,p-1\}\). The edges \(e\in E_\textit{mc}\) that are incident to \(z_0\) correspond to our variables \(x_{ij}^r\) for \(r\in \{1,2,\ldots ,p\}\), \(i<j\). In view of the latter property, we can formulate MLCM-TC as a maximum cut problem with the additional transitivity and tree constraints, and we can solve it using a branch and cut approach for the maximum cut problem like in [2] that additionally enforces these extra constraints.

4 Implementation

The implementation used to determine the minimum number of crossings in a storyline visualization consists of two main phases, a preprocessing phase and a branch and cut phase. During the preprocessing, we first reduce the number of layers of the problem (if possible), by identifying two consecutive layers r and \(r+1\) in case the corresponding trees \(T_r\) and \(T_{r+1}\) are identical and every node in \(V_r\) and \(V_{r+1}\) is an end of one edge of \(E_r\) (e.g., layers 4 and 5 of Fig. 2 can be identified). Then, a variant of the barycenter heuristic proposed by Sugiyama et al. [29], in which the presence of the trees on layers is taken into account, is executed in order to obtain an initial feasible solution that defines the indexing within the layers: In this heuristic, the nodes of the trees are sorted according to their barycenters. The barycenter of a given leaf t is computed by assigning to each edge, that has t as end, the relative position of the other end as weight. The barycenter of each internal node \(\tau \) is the mean of the barycenters of all the leaves of the subtree rooted at \(\tau \).

During the creation of the maximum cut graph induced by the heuristic solution, we exploit the fact that the tree constraints force many variables to assume the same value, so that we can identify them. Moreover, this procedure reduces also the number of constraints consistently after all variables have been replaced by their representatives: On the one hand, the tree constraints are not needed in the formulation anymore; on the other hand, some transitivity constraints become deactivated or redundant. It is important to point out that, during this first phase, the problem is initialized without constraints and they are added according to need during the subsequent branch and cut phase.

The branch and cut phase is realized in C++ using ABACUS [20] and CPLEX [17]. The initial relaxation consists just of the objective function together with lower bounds 0 and upper bounds 1 for the variables. Odd cycle constraints and transitivity constraints are generated via separation, the former with the same strategy as described in [2], the latter by complete enumeration.

5 Computational Results

Our test-bed consists of:

  • three movie instances [30], namely “Inception”, the original trilogy of “Star Wars” and “The Matrix”;

  • three book instances from the Stanford GraphBase database [21], namely “Adventures of Huckleberry Finn”, “Anna Karenina” and “Les Misérables”.

These instances have been converted to MLCM-TC by using the procedure described in Sect. 2. In the conversion of the book instances, a slight change is required: Since these instances do not report time intervals, but just the list of the characters involved in each scene of each chapter, a layer has been created for each of these scenes, instead of for each beginning and ending time point.

The three movie instances have been generated using the raw data set from [30] in order to compare them with results in the literature. We obtained “Inception”, “Star Wars” and “The Matrix” following the principles described in Sect. 2. However, after having solved them, we realized that the number of crossings given by our algorithm for “Inception” was 35, while it was 24 in [31] and 23 in [24]. After a careful study of the layouts provided in [24, 31, 32], we noticed that the storylines of “Inception” and “The Matrix” in [24, 31, 32] differ from the raw data set provided by [30], and therefore are not comparable with our instances.

In order to make a comparison possible, “Inception” required three major modifications. This modified instance is called “Inception-sf” and is generated by incorporating the following changes that are based on a careful study of the layouts provided in [24, 31, 32]. The storyline for the character “Mal” is allowed to take shortcuts, i.e., in long periods of absence it is drawn as a thin curve that may cross other storylines without accounting for these crossings (see Fig. 12 in [24]). Moreover, the grouping at the end of the movie does not correspond to the last scene in the data set. To keep our layout comparable, we enforced in our new instance the same grouping at the end. The third discrepancy is the number of characters. In the data from [30] there are ten characters listed in the corresponding file, whereas the layouts from the literature [24, 31, 32] contain only eight storylines, in which “Arch” and “Asian” are missing. A major modification was also necessary in “The Matrix”, where the storylines for the characters “Brown”, “Smith” and “Jones” are allowed to take shortcuts as well. We call it “The Matrix-sf”.

Since the instances “Anna Karenina” and “Les Misérables” are very big, we have split them into chapters and sequences of chapters. The resulting test-bed is made of eight chapters, seven pairs of chapters, six triples of chapters and five quadruples of chapters from “Anna Karenina”, and five chapters, four pairs of chapters and three triples of chapters from “Les Misérables”, plus the entire “Adventures of Huckleberry Finn”, “Inception-sf”, “Inception”, “Star Wars”, “The Matrix-sf”, and “The Matrix”.

To the best of our knowledge, this is the first time in which ILP techniques are applied to storyline visualizations. Thus comparisons of computational results are not possible. Runs were performed on one node of the HPC Cluster of the Computer Science Department of the University of Cologne. The node used consists of two Intel E5-2690v2 CPUs with ten cores each and 128GB RAM.

While the book instances generated from the Stanford GraphBase database are introduced here for the first time, the literature provides crossing counts for the three movie instances (“Inception”, “Star Wars”, and “The Matrix”). Table 1 shows a comparison of the minimum number of crossings (OPT) from our approach with the numbers of crossings obtained by the streaming-oriented approach from Tanahashi et al. [31] (THM), the Storyflow approach from Liu et al. [24] (LIU), and the evolutionary algorithm from Tanahashi and Ma [32] (TM). Crossing counts for THM, LIU and TM are taken from Table 3 in [31]. We can confirm that the best solution reported by Liu et al. [24] for the movie “Inception” is optimal. For “Star Wars” the approach from Tanahashi et al. [31] comes very close to the optimal solution, even though the instance is the biggest and has the highest crossing count. One may conclude that the heuristics in [24, 31] deliver solutions with a good crossing count, especially when considering the fact that they do not optimize the crossing count alone.

Table 1. Comparison of the solution of the movies.
Table 2. Information about the solution of the considered instances.
Fig. 4.
figure 4

Storyline visualizations with minimum number of crossings of the three movies from [30].

In Table 2, we report the information about the solution of the considered instances: The number of layers (p), of nodes (|V|), of edges (|E|), the minimum number of crossings (cr) in boldface or a pair [lower bound, best known number of crossings], the number of variables (\(n_{var}\)), of odd cycle constraints added during the separation (\(n_\textit{oddc}\)), of transitivity constraints added during the separation (\(n_\textit{trans}\)), of subproblems in the branch and cut tree (\(n_\textit{sub}\)), of linear programming relaxations solved (\(n_\textit{LPs}\)), and the runtime expressed in seconds (Time) where “t.l.” means that the run was aborted due to the time limit of one hour, in which cases the cr column contains an interval. While 29 of the 42 instances have been solved to optimality, for the remaining 13 instances the best lower bound for the number of crossings differs from the best solution found at timeout termination.

Fig. 5.
figure 5

Storyline visualizations of two chapters from “Anna Karenina” and “Les Misérables” [21].

When we analyze the behaviour of our algorithm, we have to distinguish between movie and book instances: Since the original instances from [30] allow more than one scene per layer, the trees on the layers of the movie instances restrict consistently the possible permutations of the corresponding nodes and consequently reduce the number of variables. On the other hand, this is not the case for the book instances, where only one scene per layer occurs. We can observe that MLCM-TC for movies tends to be much easier in comparison to a book instance with similar numbers of layers, nodes, and edges.

The difficulty of a book instance is mainly influenced by the combination of two parameters: the number of layers p and the number of nodes |V|. If the number of nodes is fixed, the higher the number of layers is, the easier the solution is, since the distribution of the nodes on more layers reduces the number of variables of the problem. On the other hand, if the number of layers is fixed, the difficulty increases with the number of nodes.

The hardest instance we have been able to solve to optimality is “anna2–4”, where \(2\,637\) nodes are distributed on only 158 layers which results in \(40\,789\) variables. The biggest solved instance in terms of number of layers is “jean1–3” with 254 layers but only \(2\,853\) nodes, which results in \(27\,720\) variables.

We present crossing minimal storyline visualizations of the three movie instances in Fig. 4 and the two book instances in Fig. 5.

6 Conclusion

In this work we have tackled the crossing minimization problem in storyline visualization via an ILP formulation. Despite being an NP-hard problem, computational results show that with our approach one can handle instances of medium size within a reasonable time frame. However, our approach is of purely combinatorial nature, thus, extending it to automatically generate storyline visualizations such that other design criteria are taken into account is not straightforward.