Compiling Linear Algebra Expressions into Efficient Code

Klaus, Julien; Blacher, Mark; Giesen, Joachim; Rump, Paul Gerhardt; Wiedom, Konstantin

doi:10.1007/978-3-031-08754-7_2

Julien Klaus¹³,
Mark Blacher¹³,
Joachim Giesen¹³,
Paul Gerhardt Rump¹³ &
…
Konstantin Wiedom¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13351))

Included in the following conference series:

International Conference on Computational Science

1084 Accesses
1 Citations

Abstract

In textbooks, linear algebra expressions often use indices to specify the elements of variables. This index form expressions cannot be directly translated into efficient code, since optimized linear algebra libraries and frameworks require expressions in index-free form. To address this problem, we developed Lina, a tool that automatically converts linear algebra expressions with indices into index-free linear algebra expressions that we map efficiently to NumPy and Eigen code.

Access provided by Autonomous University of Puebla. Download conference paper PDF

What Is New in CoCoA?

Basics and Practice of Linear Algebra Calculation Library BLAS and LAPACK

Gajit: Symbolic Optimisation and JIT Compilation of Geometric Algebra in Python with GAALOP and Numba

Keywords

1 Introduction

In textbooks, linear algebra expressions often use indices to access the entries of vectors and matrices, or to sum over certain dimensions. These expressions can be translated directly into loops over indices in the program code, which, however, is often not efficient [1, 6]. It is more efficient to map expressions with indices to highly tuned parallel linear algebra libraries like NumPy [9] or Eigen [8]. These libraries expect their input in index-free form. Therefore, in order to use efficient linear algebra libraries, expressions in index form need to be transformed into index-free form. We present an implementation of this approach, that we call Lina. Lina comprises three parts:

1.
A formal input language close to textbook form (Sect. 2),
2.
the transformation from index form into index-free form (Sect. 3), and
3.
mappings to linear algebra libraries and frameworks (Sect. 4).

The transformation from index form to index-free form is the most challenging part and requires a good understanding of fundamental linear algebra. Consider for example the classical ridge regression problem [13]. Given a feature matrix $X\in \mathbb {R}^{n\times m}$, a label vector $y\in \mathbb {R}^n$, and hyperparameters $\beta \in \mathbb {R}^m$, $\mu , \lambda \in \mathbb {R}$ the ridge regression problem in textbook form reads as

$$\begin{aligned} \min _{\beta }\quad \sum _{i=1}^n \Big (y_i - \mu - \sum _{j=1}^m X_{ij}\beta _j\Big )^2 + \lambda \sum _{j=1}^m \beta _j. \end{aligned}$$

This expression includes three summation operations, each of which become loops in the implementation. Transforming the expression into index-free form results in

$$\begin{aligned} \min _{\beta }\quad (y-\mu \cdot \mathbbm {1}^n-X^\top \beta )^\top (y-\mu \cdot \mathbbm {1}^n-X^\top \beta )+\lambda \cdot \beta ^\top \mathbbm {1}^m, \end{aligned}$$

where $\mathbbm {1}^n=(1,1,\dots ,1)$ is the all-ones vector. Since the index-free expression does only contain compound linear algebra operations, we can map it directly to linear algebra libraries. However, developers usually do not take the time to formulate their problems in index-free form, although using a highly optimized linear algebra library would lead to a better performance. Here, our focus is on automatically transforming expressions into index-free form. We further optimize the resulting index-free expressions before we map them to Eigen and NumPy routines. An easy-to-use implementation of our approach can be found online at https://lina.ti2.uni-jena.de.

Related Work. Various approaches already exist for mapping expressions in index-free form to linear algebra libraries [7, 14, 15]. Often such methods make use of additional information about the expressions’ variables and parameters, for example, symmetry of matrices [2, 16]. Also, multiple attempts are known to generate efficient code for expressions in index form [3, 4, 12]. These approaches are not transforming expressions into index-free form, but directly optimize the loops, for instance by reordering.

2 A Language for Linear Algebra Expressions

In this section, we describe a formal language for extended linear algebra expressions in index form. The notation used in this language is close to MATLAB [10]. It is rich enough to cover most classical machine learning problems, even problems not contained in standard libraries like scikit-learn [5].

The language supports binary operations as well as unary point-wise operations like $\log $ or $\exp $, and of course, variables and numbers. A special operation is the summation operation sum that has a non-optional index. This index is used to address elements of vectors or matrices. The full grammar for the language is shown in Fig. 1. In this language, the classical ridge regression example reads as

$$\begin{aligned} {\text {sum}}[i]((y[i] - \mu - {\text {sum}}[j](X[i,j]*\beta [j]))^2) + \lambda *{\text {sum}}[j](\beta [j]). \end{aligned}$$

A point worth emphasizing is that the indices always select scalar entries of a vector or matrix. This makes every operation an operation between scalars, which is different in index-free notation, where operations are on compound structures.

Expressions that follow the above grammar are parsed into an expression tree. An expression tree $G=(V,E)$ is a binary tree, where each node $v\in V$ has a specific label. This label can be either an operation, a variable name, a number, or an index. Furthermore, we assign each node a scope, containing indices. For leaf nodes, describing vectors and matrices, the scope contains the associated indices, and for all other nodes, except for the special sum nodes, the scope is the union of the scopes of the child nodes. Since the sum operation removes an index, the scope of a sum node removes an index from the union of their children’s scopes. An expression tree for the ridge regression example is shown in Fig. 2.

3 Transformation from Index Form into Index-Free Form

The main part of Lina is the automatic transformation of expressions from index form into index-free form. In index form expressions, all operations are operations on scalars, whereas in index-free form expressions operations are on compound structures like vectors or matrices. For example, the multiplication $X_{ij}\cdot \beta _j$ multiplies the value of X at index (i, j) with the value of $\beta $ at index j. We can collect these values into an $(m\times n)$-matrix $(X_{ij}\cdot \beta _j)_{i\in [n], j\in [m]}$. This matrix can be transformed into a point-wise product of two matrices, where X is the first matrix and the outer product $\mathbbm {1}^n\beta ^\top $ is the second matrix. Indeed, we compute

$$\begin{aligned} \begin{aligned} \begin{pmatrix} X_{11}\cdot \beta _1 &{} \dots &{} X_{1m}\cdot \beta _m \\ \vdots &{} \ddots &{} \vdots \\ X_{n1}\cdot \beta _1 &{} \dots &{} X_{nm}\cdot \beta _m \end{pmatrix}&= \begin{pmatrix} X_{11} &{} \dots &{} X_{1m} \\ \vdots &{} \ddots &{} \vdots \\ X_{n1} &{} \dots &{} X_{nm} \end{pmatrix} \odot \begin{pmatrix} \beta _1 &{} \dots &{} \beta _m \\ \vdots &{} \ddots &{} \vdots \\ \beta _1 &{} \dots &{} \beta _m \end{pmatrix}\\&= \begin{pmatrix} X_{11} &{} \dots &{} X_{1m} \\ \vdots &{} \ddots &{} \vdots \\ X_{n1} &{} \dots &{} X_{nm} \end{pmatrix} \odot \begin{pmatrix} 1 \\ \vdots \\ 1 \end{pmatrix}\cdot \begin{pmatrix} \beta _1 \\ \vdots \\ \beta _m \end{pmatrix}^\top . \end{aligned} \end{aligned}$$

The idea of increasing the dimension of a subexpression by an outer product to enable a point-wise operation works directly for vectors, but not for matrices. In these cases, we use the scope assigned to each node. To work with the scopes, we switch to an Einstein like notation. In this notation, each multiplication is represented by a tuple (m, l, r). The tuple (m, l, r) contains the dimension m of the result of the operation, the dimension l of the left, and the dimension r of the right operand. The following equations show how to represent the different multiplication types in Einstein-like notation as well as in linear algebra notation. Let $x\in \mathbb {R}^n$ and $y\in \mathbb {R}^n$, then

$$\begin{aligned} x^\top \cdot y&= x \cdot _{(\emptyset , n, n)} y,&\text {(inner product)}\\ x\odot y&= x \cdot _{(n, n, n)} y,&\text {(point-wise multiplication)}\\ x\cdot y^\top&= x \cdot _{(nn, n, n)} y.&\text {(outer product)} \end{aligned}$$

The scope of a node and the scopes of its left and right child, respectively, directly correspond to the entries of the tuple. This notation enables us to describe the multiplication type we need during the transformation. The transformation starts at the root and recursively adjust the left and right child of nodes to satisfy their index requirements. If we encounter a node, except a sum node, with a left or right child that does not have the same scope, we adjust the scope by adding a multiplication node to the respective subtrees. The new multiplication nodes multiply the old subtrees with an all-ones vectors to supply the missing index. In our example, we have multiplied an all-ones vector with $\beta $ to supply the dimension represented by index i. The sum node is special since it removes an index from the scope. This can be accomplished by an inner-product with an all-ones vector. We therefore relabel sum nodes as multiplication nodes, and change their left child nodes, which represent an index, into all-ones vectors with the corresponding indices.

In summary, we use outer products to transform binary operations over unequal dimensions into point wise operations over equal dimensions and inner products to reduce a dimension by sum operations. The transformed expression tree, for the ridge regression example, is shown in Fig. 3. There, we highlight added and adjusted nodes in gray. Note, that sum nodes have turned into multiplication nodes. The semantics of product nodes can be decided from their scopes.

4 Compilation into Multiple Backends

During the transformation from index to index-free form, the nodes of the expression tree are replaced by compound subexpressions, which can make the tree unnecessarily large. Listing 1 shows non-optimized index-free NumPy code.

Therefore, before we compile the expression tree into the different backends, we perform various optimizations that reduce the size of the tree. For instance, we perform a default constant folding, and identify common subexpressions, extract them, and link nodes in the tree to the corresponding common subexpression. Furthermore, we exploit broadcasting operations of the individual backends like NumPy or Eigen. For example, the subexpression $y-\mathbbm {1}^n\cdot \mu $ in index-free form can be written as $y - \mu $ in most backends. In this case, no extra all-ones vector has to be created, which speeds up the computation and reduces the memory footprint. Since broadcasting operations are different for different backends, we apply them during the compilation into the backends when possible.

Lina is currently compiling into NumPy and Eigen. The compilation traverses the index-free expression tree from the root to the leaves and replaces nodes by methods from the respective backends. At a leaf node, the name of the node is returned, and at a multiplication node the type is determined using the nodes’ scope. During the transformation, higher-dimensional tensors may arise that cannot be expressed by matrix-vector operations. There are several approaches to map them efficiently to code [11, 17]. For instance, NumPy directly allows us to calculate tensor subexpressions using the einsum method. To do so, we use the scope of the node and the scopes of its left and right child to compute an einsum string corresponding to the operation. The optimized compiled NumPy code for the ridge regression problem is shown in Listing 2.

5 Conclusion

We have presented Lina, a tool for automatically compiling extended linear algebra expressions with indices into efficient linear algebra routines. Our main contribution is the transformation of expressions with indices into index-free form. We map this index-free form to NumPy and Eigen, which exploit the parallelization and vectorization capabilities of the underlying hardware. Lina, is available at https://lina.ti2.uni-jena.de.

References

Ascher, D., Dubois, P.F., Hinsen, K., Hugunin, J., Oliphant, T., et al.: Numerical Python (2001)
Google Scholar
Barthels, H., Psarras, C., Bientinesi, P.: Linnea: automatic generation of efficient linear algebra programs. ACM Trans. Math. Softw. 47(3), 1–26 (2021)
Article MathSciNet Google Scholar
Baumgartner, G., et al.: Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models. Proc. IEEE 93(2), 276–292 (2005)
Article Google Scholar
Bilmes, J.A., Asanovic, K., Chin, C., Demmel, J.: Author retrospective for optimizing matrix multiply using PHiPAC: a portable high-performance ANSI C coding methodology. In: ACM International Conference on Supercomputing 25th Anniversary Volume. ACM (2014)
Google Scholar
Buitinck, L., et al.: API design for machine learning software: experiences from the Scikit-learn project. In: ECML PKDD Workshop (2013)
Google Scholar
Cai, X., Langtangen, H.P., Moe, H.: On the performance of the python programming language for serial and parallel scientific computations. Sci. Program. 13(1) (2005)
Google Scholar
Franchetti, F., et al.: SPIRAL: extreme performance portability. Proc. IEEE 106(11), 1935–1968 (2018)
Article Google Scholar
Guennebaud, G., Jacob, B., et al.: Eigen v3. http://eigen.tuxfamily.org (2010)
Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020)
Article Google Scholar
The Mathworks Inc, Natick, Massachusetts: MATLAB version R2021a (2021)
Google Scholar
Matthews, D.A.: High-performance tensor contraction without transposition. SIAM J. Sci. Comput. 40(1), C1–C24 (2018)
Article MathSciNet Google Scholar
Nuzman, D., et al.: Vapor SIMD: auto-vectorize once, run everywhere. In: Proceedings of the CGO 2011. IEEE Computer Society (2011)
Google Scholar
Owen, A.B.: A robust hybrid of lasso and ridge regression. Contemp. Math. 443(7), 59–72 (2007)
Article MathSciNet Google Scholar
Psarras, C., Barthels, H., Bientinesi, P.: The linear algebra mapping problem. arXiv preprint arXiv:1911.09421 (2019)
Sethi, R., Ullman, J.D.: The generation of optimal code for arithmetic expressions. J. ACM 17(4), 715–728 (1970)
Article MathSciNet Google Scholar
Spampinato, D.G., Fabregat-Traver, D., Bientinesi, P., Püschel, M.: Program generation for small-scale linear algebra applications. In: Proceedings of the 2018 International Symposium on Code Generation and Optimization. ACM (2018)
Google Scholar
Vasilache, N., et al.: Tensor comprehensions: framework-agnostic high-performance machine learning abstractions. CoRR abs/1802.04730 (2018)
Google Scholar

Download references

Acknowledgements

This work was supported by the Carl Zeiss Foundation within the project Interactive Inference and from the Ministry for Economics, Sciences and Digital Society of Thuringia (TMWWDG), under the framework of the Landesprogramm ProDigital (DigLeben-5575/10-9).

Author information

Authors and Affiliations

Friedrich Schiller University Jena, Jena, Germany
Julien Klaus, Mark Blacher, Joachim Giesen, Paul Gerhardt Rump & Konstantin Wiedom

Authors

Julien Klaus
View author publications
You can also search for this author in PubMed Google Scholar
Mark Blacher
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Giesen
View author publications
You can also search for this author in PubMed Google Scholar
Paul Gerhardt Rump
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Wiedom
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julien Klaus .

Editor information

Editors and Affiliations

Brunel University London, London, UK
Derek Groen
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Klaus, J., Blacher, M., Giesen, J., Rump, P.G., Wiedom, K. (2022). Compiling Linear Algebra Expressions into Efficient Code. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. ICCS 2022. Lecture Notes in Computer Science, vol 13351. Springer, Cham. https://doi.org/10.1007/978-3-031-08754-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-08754-7_2
Published: 15 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08753-0
Online ISBN: 978-3-031-08754-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Compiling Linear Algebra Expressions into Efficient Code

Abstract