Keywords

1 Introduction

In textbooks, linear algebra expressions often use indices to access the entries of vectors and matrices, or to sum over certain dimensions. These expressions can be translated directly into loops over indices in the program code, which, however, is often not efficient [1, 6]. It is more efficient to map expressions with indices to highly tuned parallel linear algebra libraries like NumPy [9] or Eigen [8]. These libraries expect their input in index-free form. Therefore, in order to use efficient linear algebra libraries, expressions in index form need to be transformed into index-free form. We present an implementation of this approach, that we call Lina. Lina comprises three parts:

  1. 1.

    A formal input language close to textbook form (Sect. 2),

  2. 2.

    the transformation from index form into index-free form (Sect. 3), and

  3. 3.

    mappings to linear algebra libraries and frameworks (Sect. 4).

The transformation from index form to index-free form is the most challenging part and requires a good understanding of fundamental linear algebra. Consider for example the classical ridge regression problem [13]. Given a feature matrix \(X\in \mathbb {R}^{n\times m}\), a label vector \(y\in \mathbb {R}^n\), and hyperparameters \(\beta \in \mathbb {R}^m\), \(\mu , \lambda \in \mathbb {R}\) the ridge regression problem in textbook form reads as

$$\begin{aligned} \min _{\beta }\quad \sum _{i=1}^n \Big (y_i - \mu - \sum _{j=1}^m X_{ij}\beta _j\Big )^2 + \lambda \sum _{j=1}^m \beta _j. \end{aligned}$$

This expression includes three summation operations, each of which become loops in the implementation. Transforming the expression into index-free form results in

$$\begin{aligned} \min _{\beta }\quad (y-\mu \cdot \mathbbm {1}^n-X^\top \beta )^\top (y-\mu \cdot \mathbbm {1}^n-X^\top \beta )+\lambda \cdot \beta ^\top \mathbbm {1}^m, \end{aligned}$$

where \(\mathbbm {1}^n=(1,1,\dots ,1)\) is the all-ones vector. Since the index-free expression does only contain compound linear algebra operations, we can map it directly to linear algebra libraries. However, developers usually do not take the time to formulate their problems in index-free form, although using a highly optimized linear algebra library would lead to a better performance. Here, our focus is on automatically transforming expressions into index-free form. We further optimize the resulting index-free expressions before we map them to Eigen and NumPy routines. An easy-to-use implementation of our approach can be found online at https://lina.ti2.uni-jena.de.

Related Work. Various approaches already exist for mapping expressions in index-free form to linear algebra libraries [7, 14, 15]. Often such methods make use of additional information about the expressions’ variables and parameters, for example, symmetry of matrices [2, 16]. Also, multiple attempts are known to generate efficient code for expressions in index form [3, 4, 12]. These approaches are not transforming expressions into index-free form, but directly optimize the loops, for instance by reordering.

2 A Language for Linear Algebra Expressions

In this section, we describe a formal language for extended linear algebra expressions in index form. The notation used in this language is close to MATLAB [10]. It is rich enough to cover most classical machine learning problems, even problems not contained in standard libraries like scikit-learn [5].

Fig. 1.
figure 1

EBNF grammar for linear algebra expressions in index form. In this grammar, number is a placeholder for an arbitrary floating point number and alpha for Latin characters.

The language supports binary operations as well as unary point-wise operations like \(\log \) or \(\exp \), and of course, variables and numbers. A special operation is the summation operation sum that has a non-optional index. This index is used to address elements of vectors or matrices. The full grammar for the language is shown in Fig. 1. In this language, the classical ridge regression example reads as

$$\begin{aligned} {\text {sum}}[i]((y[i] - \mu - {\text {sum}}[j](X[i,j]*\beta [j]))^2) + \lambda *{\text {sum}}[j](\beta [j]). \end{aligned}$$

A point worth emphasizing is that the indices always select scalar entries of a vector or matrix. This makes every operation an operation between scalars, which is different in index-free notation, where operations are on compound structures.

Expressions that follow the above grammar are parsed into an expression tree. An expression tree \(G=(V,E)\) is a binary tree, where each node \(v\in V\) has a specific label. This label can be either an operation, a variable name, a number, or an index. Furthermore, we assign each node a scope, containing indices. For leaf nodes, describing vectors and matrices, the scope contains the associated indices, and for all other nodes, except for the special sum nodes, the scope is the union of the scopes of the child nodes. Since the sum operation removes an index, the scope of a sum node removes an index from the union of their children’s scopes. An expression tree for the ridge regression example is shown in Fig. 2.

Fig. 2.
figure 2

Expression tree for the ridge regression problem with different node types. Bold nodes indicate \({\text {sum}}\) operations, dashed nodes are indices, and all other nodes are either common operators or variables. For some nodes, we show the scope of the node in dotted rectangles.

3 Transformation from Index Form into Index-Free Form

The main part of Lina is the automatic transformation of expressions from index form into index-free form. In index form expressions, all operations are operations on scalars, whereas in index-free form expressions operations are on compound structures like vectors or matrices. For example, the multiplication \(X_{ij}\cdot \beta _j\) multiplies the value of X at index (ij) with the value of \(\beta \) at index j. We can collect these values into an \((m\times n)\)-matrix \((X_{ij}\cdot \beta _j)_{i\in [n], j\in [m]}\). This matrix can be transformed into a point-wise product of two matrices, where X is the first matrix and the outer product \(\mathbbm {1}^n\beta ^\top \) is the second matrix. Indeed, we compute

$$\begin{aligned} \begin{aligned} \begin{pmatrix} X_{11}\cdot \beta _1 &{} \dots &{} X_{1m}\cdot \beta _m \\ \vdots &{} \ddots &{} \vdots \\ X_{n1}\cdot \beta _1 &{} \dots &{} X_{nm}\cdot \beta _m \end{pmatrix}&= \begin{pmatrix} X_{11} &{} \dots &{} X_{1m} \\ \vdots &{} \ddots &{} \vdots \\ X_{n1} &{} \dots &{} X_{nm} \end{pmatrix} \odot \begin{pmatrix} \beta _1 &{} \dots &{} \beta _m \\ \vdots &{} \ddots &{} \vdots \\ \beta _1 &{} \dots &{} \beta _m \end{pmatrix}\\&= \begin{pmatrix} X_{11} &{} \dots &{} X_{1m} \\ \vdots &{} \ddots &{} \vdots \\ X_{n1} &{} \dots &{} X_{nm} \end{pmatrix} \odot \begin{pmatrix} 1 \\ \vdots \\ 1 \end{pmatrix}\cdot \begin{pmatrix} \beta _1 \\ \vdots \\ \beta _m \end{pmatrix}^\top . \end{aligned} \end{aligned}$$

The idea of increasing the dimension of a subexpression by an outer product to enable a point-wise operation works directly for vectors, but not for matrices. In these cases, we use the scope assigned to each node. To work with the scopes, we switch to an Einstein like notation. In this notation, each multiplication is represented by a tuple (mlr). The tuple (mlr) contains the dimension m of the result of the operation, the dimension l of the left, and the dimension r of the right operand. The following equations show how to represent the different multiplication types in Einstein-like notation as well as in linear algebra notation. Let \(x\in \mathbb {R}^n\) and \(y\in \mathbb {R}^n\), then

$$\begin{aligned} x^\top \cdot y&= x \cdot _{(\emptyset , n, n)} y,&\text {(inner product)}\\ x\odot y&= x \cdot _{(n, n, n)} y,&\text {(point-wise multiplication)}\\ x\cdot y^\top&= x \cdot _{(nn, n, n)} y.&\text {(outer product)} \end{aligned}$$

The scope of a node and the scopes of its left and right child, respectively, directly correspond to the entries of the tuple. This notation enables us to describe the multiplication type we need during the transformation. The transformation starts at the root and recursively adjust the left and right child of nodes to satisfy their index requirements. If we encounter a node, except a sum node, with a left or right child that does not have the same scope, we adjust the scope by adding a multiplication node to the respective subtrees. The new multiplication nodes multiply the old subtrees with an all-ones vectors to supply the missing index. In our example, we have multiplied an all-ones vector with \(\beta \) to supply the dimension represented by index i. The sum node is special since it removes an index from the scope. This can be accomplished by an inner-product with an all-ones vector. We therefore relabel sum nodes as multiplication nodes, and change their left child nodes, which represent an index, into all-ones vectors with the corresponding indices.

In summary, we use outer products to transform binary operations over unequal dimensions into point wise operations over equal dimensions and inner products to reduce a dimension by sum operations. The transformed expression tree, for the ridge regression example, is shown in Fig. 3. There, we highlight added and adjusted nodes in gray. Note, that sum nodes have turned into multiplication nodes. The semantics of product nodes can be decided from their scopes.

Fig. 3.
figure 3

Expression tree for the ridge regression problem after the transformation into index-free form. Nodes that have been transformed are highlighted in gray. For clarity, we still show the index nodes, although they are no longer needed.

4 Compilation into Multiple Backends

During the transformation from index to index-free form, the nodes of the expression tree are replaced by compound subexpressions, which can make the tree unnecessarily large. Listing 1 shows non-optimized index-free NumPy code.

figure a

Therefore, before we compile the expression tree into the different backends, we perform various optimizations that reduce the size of the tree. For instance, we perform a default constant folding, and identify common subexpressions, extract them, and link nodes in the tree to the corresponding common subexpression. Furthermore, we exploit broadcasting operations of the individual backends like NumPy or Eigen. For example, the subexpression \(y-\mathbbm {1}^n\cdot \mu \) in index-free form can be written as \(y - \mu \) in most backends. In this case, no extra all-ones vector has to be created, which speeds up the computation and reduces the memory footprint. Since broadcasting operations are different for different backends, we apply them during the compilation into the backends when possible.

Lina is currently compiling into NumPy and Eigen. The compilation traverses the index-free expression tree from the root to the leaves and replaces nodes by methods from the respective backends. At a leaf node, the name of the node is returned, and at a multiplication node the type is determined using the nodes’ scope. During the transformation, higher-dimensional tensors may arise that cannot be expressed by matrix-vector operations. There are several approaches to map them efficiently to code [11, 17]. For instance, NumPy directly allows us to calculate tensor subexpressions using the einsum method. To do so, we use the scope of the node and the scopes of its left and right child to compute an einsum string corresponding to the operation. The optimized compiled NumPy code for the ridge regression problem is shown in Listing 2.

figure b

5 Conclusion

We have presented Lina, a tool for automatically compiling extended linear algebra expressions with indices into efficient linear algebra routines. Our main contribution is the transformation of expressions with indices into index-free form. We map this index-free form to NumPy and Eigen, which exploit the parallelization and vectorization capabilities of the underlying hardware. Lina, is available at https://lina.ti2.uni-jena.de.