Abstract
In textbooks, linear algebra expressions often use indices to specify the elements of variables. This index form expressions cannot be directly translated into efficient code, since optimized linear algebra libraries and frameworks require expressions in index-free form. To address this problem, we developed Lina, a tool that automatically converts linear algebra expressions with indices into index-free linear algebra expressions that we map efficiently to NumPy and Eigen code.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In textbooks, linear algebra expressions often use indices to access the entries of vectors and matrices, or to sum over certain dimensions. These expressions can be translated directly into loops over indices in the program code, which, however, is often not efficient [1, 6]. It is more efficient to map expressions with indices to highly tuned parallel linear algebra libraries like NumPy [9] or Eigen [8]. These libraries expect their input in index-free form. Therefore, in order to use efficient linear algebra libraries, expressions in index form need to be transformed into index-free form. We present an implementation of this approach, that we call Lina. Lina comprises three parts:
-
1.
A formal input language close to textbook form (Sect. 2),
-
2.
the transformation from index form into index-free form (Sect. 3), and
-
3.
mappings to linear algebra libraries and frameworks (Sect. 4).
The transformation from index form to index-free form is the most challenging part and requires a good understanding of fundamental linear algebra. Consider for example the classical ridge regression problem [13]. Given a feature matrix \(X\in \mathbb {R}^{n\times m}\), a label vector \(y\in \mathbb {R}^n\), and hyperparameters \(\beta \in \mathbb {R}^m\), \(\mu , \lambda \in \mathbb {R}\) the ridge regression problem in textbook form reads as
This expression includes three summation operations, each of which become loops in the implementation. Transforming the expression into index-free form results in
where \(\mathbbm {1}^n=(1,1,\dots ,1)\) is the all-ones vector. Since the index-free expression does only contain compound linear algebra operations, we can map it directly to linear algebra libraries. However, developers usually do not take the time to formulate their problems in index-free form, although using a highly optimized linear algebra library would lead to a better performance. Here, our focus is on automatically transforming expressions into index-free form. We further optimize the resulting index-free expressions before we map them to Eigen and NumPy routines. An easy-to-use implementation of our approach can be found online at https://lina.ti2.uni-jena.de.
Related Work. Various approaches already exist for mapping expressions in index-free form to linear algebra libraries [7, 14, 15]. Often such methods make use of additional information about the expressions’ variables and parameters, for example, symmetry of matrices [2, 16]. Also, multiple attempts are known to generate efficient code for expressions in index form [3, 4, 12]. These approaches are not transforming expressions into index-free form, but directly optimize the loops, for instance by reordering.
2 A Language for Linear Algebra Expressions
In this section, we describe a formal language for extended linear algebra expressions in index form. The notation used in this language is close to MATLAB [10]. It is rich enough to cover most classical machine learning problems, even problems not contained in standard libraries like scikit-learn [5].
The language supports binary operations as well as unary point-wise operations like \(\log \) or \(\exp \), and of course, variables and numbers. A special operation is the summation operation sum that has a non-optional index. This index is used to address elements of vectors or matrices. The full grammar for the language is shown in Fig. 1. In this language, the classical ridge regression example reads as
A point worth emphasizing is that the indices always select scalar entries of a vector or matrix. This makes every operation an operation between scalars, which is different in index-free notation, where operations are on compound structures.
Expressions that follow the above grammar are parsed into an expression tree. An expression tree \(G=(V,E)\) is a binary tree, where each node \(v\in V\) has a specific label. This label can be either an operation, a variable name, a number, or an index. Furthermore, we assign each node a scope, containing indices. For leaf nodes, describing vectors and matrices, the scope contains the associated indices, and for all other nodes, except for the special sum nodes, the scope is the union of the scopes of the child nodes. Since the sum operation removes an index, the scope of a sum node removes an index from the union of their children’s scopes. An expression tree for the ridge regression example is shown in Fig. 2.
3 Transformation from Index Form into Index-Free Form
The main part of Lina is the automatic transformation of expressions from index form into index-free form. In index form expressions, all operations are operations on scalars, whereas in index-free form expressions operations are on compound structures like vectors or matrices. For example, the multiplication \(X_{ij}\cdot \beta _j\) multiplies the value of X at index (i, j) with the value of \(\beta \) at index j. We can collect these values into an \((m\times n)\)-matrix \((X_{ij}\cdot \beta _j)_{i\in [n], j\in [m]}\). This matrix can be transformed into a point-wise product of two matrices, where X is the first matrix and the outer product \(\mathbbm {1}^n\beta ^\top \) is the second matrix. Indeed, we compute
The idea of increasing the dimension of a subexpression by an outer product to enable a point-wise operation works directly for vectors, but not for matrices. In these cases, we use the scope assigned to each node. To work with the scopes, we switch to an Einstein like notation. In this notation, each multiplication is represented by a tuple (m, l, r). The tuple (m, l, r) contains the dimension m of the result of the operation, the dimension l of the left, and the dimension r of the right operand. The following equations show how to represent the different multiplication types in Einstein-like notation as well as in linear algebra notation. Let \(x\in \mathbb {R}^n\) and \(y\in \mathbb {R}^n\), then
The scope of a node and the scopes of its left and right child, respectively, directly correspond to the entries of the tuple. This notation enables us to describe the multiplication type we need during the transformation. The transformation starts at the root and recursively adjust the left and right child of nodes to satisfy their index requirements. If we encounter a node, except a sum node, with a left or right child that does not have the same scope, we adjust the scope by adding a multiplication node to the respective subtrees. The new multiplication nodes multiply the old subtrees with an all-ones vectors to supply the missing index. In our example, we have multiplied an all-ones vector with \(\beta \) to supply the dimension represented by index i. The sum node is special since it removes an index from the scope. This can be accomplished by an inner-product with an all-ones vector. We therefore relabel sum nodes as multiplication nodes, and change their left child nodes, which represent an index, into all-ones vectors with the corresponding indices.
In summary, we use outer products to transform binary operations over unequal dimensions into point wise operations over equal dimensions and inner products to reduce a dimension by sum operations. The transformed expression tree, for the ridge regression example, is shown in Fig. 3. There, we highlight added and adjusted nodes in gray. Note, that sum nodes have turned into multiplication nodes. The semantics of product nodes can be decided from their scopes.
4 Compilation into Multiple Backends
During the transformation from index to index-free form, the nodes of the expression tree are replaced by compound subexpressions, which can make the tree unnecessarily large. Listing 1 shows non-optimized index-free NumPy code.
Therefore, before we compile the expression tree into the different backends, we perform various optimizations that reduce the size of the tree. For instance, we perform a default constant folding, and identify common subexpressions, extract them, and link nodes in the tree to the corresponding common subexpression. Furthermore, we exploit broadcasting operations of the individual backends like NumPy or Eigen. For example, the subexpression \(y-\mathbbm {1}^n\cdot \mu \) in index-free form can be written as \(y - \mu \) in most backends. In this case, no extra all-ones vector has to be created, which speeds up the computation and reduces the memory footprint. Since broadcasting operations are different for different backends, we apply them during the compilation into the backends when possible.
Lina is currently compiling into NumPy and Eigen. The compilation traverses the index-free expression tree from the root to the leaves and replaces nodes by methods from the respective backends. At a leaf node, the name of the node is returned, and at a multiplication node the type is determined using the nodes’ scope. During the transformation, higher-dimensional tensors may arise that cannot be expressed by matrix-vector operations. There are several approaches to map them efficiently to code [11, 17]. For instance, NumPy directly allows us to calculate tensor subexpressions using the einsum method. To do so, we use the scope of the node and the scopes of its left and right child to compute an einsum string corresponding to the operation. The optimized compiled NumPy code for the ridge regression problem is shown in Listing 2.
5 Conclusion
We have presented Lina, a tool for automatically compiling extended linear algebra expressions with indices into efficient linear algebra routines. Our main contribution is the transformation of expressions with indices into index-free form. We map this index-free form to NumPy and Eigen, which exploit the parallelization and vectorization capabilities of the underlying hardware. Lina, is available at https://lina.ti2.uni-jena.de.
References
Ascher, D., Dubois, P.F., Hinsen, K., Hugunin, J., Oliphant, T., et al.: Numerical Python (2001)
Barthels, H., Psarras, C., Bientinesi, P.: Linnea: automatic generation of efficient linear algebra programs. ACM Trans. Math. Softw. 47(3), 1–26 (2021)
Baumgartner, G., et al.: Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models. Proc. IEEE 93(2), 276–292 (2005)
Bilmes, J.A., Asanovic, K., Chin, C., Demmel, J.: Author retrospective for optimizing matrix multiply using PHiPAC: a portable high-performance ANSI C coding methodology. In: ACM International Conference on Supercomputing 25th Anniversary Volume. ACM (2014)
Buitinck, L., et al.: API design for machine learning software: experiences from the Scikit-learn project. In: ECML PKDD Workshop (2013)
Cai, X., Langtangen, H.P., Moe, H.: On the performance of the python programming language for serial and parallel scientific computations. Sci. Program. 13(1) (2005)
Franchetti, F., et al.: SPIRAL: extreme performance portability. Proc. IEEE 106(11), 1935–1968 (2018)
Guennebaud, G., Jacob, B., et al.: Eigen v3. http://eigen.tuxfamily.org (2010)
Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020)
The Mathworks Inc, Natick, Massachusetts: MATLAB version R2021a (2021)
Matthews, D.A.: High-performance tensor contraction without transposition. SIAM J. Sci. Comput. 40(1), C1–C24 (2018)
Nuzman, D., et al.: Vapor SIMD: auto-vectorize once, run everywhere. In: Proceedings of the CGO 2011. IEEE Computer Society (2011)
Owen, A.B.: A robust hybrid of lasso and ridge regression. Contemp. Math. 443(7), 59–72 (2007)
Psarras, C., Barthels, H., Bientinesi, P.: The linear algebra mapping problem. arXiv preprint arXiv:1911.09421 (2019)
Sethi, R., Ullman, J.D.: The generation of optimal code for arithmetic expressions. J. ACM 17(4), 715–728 (1970)
Spampinato, D.G., Fabregat-Traver, D., Bientinesi, P., Püschel, M.: Program generation for small-scale linear algebra applications. In: Proceedings of the 2018 International Symposium on Code Generation and Optimization. ACM (2018)
Vasilache, N., et al.: Tensor comprehensions: framework-agnostic high-performance machine learning abstractions. CoRR abs/1802.04730 (2018)
Acknowledgements
This work was supported by the Carl Zeiss Foundation within the project Interactive Inference and from the Ministry for Economics, Sciences and Digital Society of Thuringia (TMWWDG), under the framework of the Landesprogramm ProDigital (DigLeben-5575/10-9).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Klaus, J., Blacher, M., Giesen, J., Rump, P.G., Wiedom, K. (2022). Compiling Linear Algebra Expressions into Efficient Code. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. ICCS 2022. Lecture Notes in Computer Science, vol 13351. Springer, Cham. https://doi.org/10.1007/978-3-031-08754-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-08754-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08753-0
Online ISBN: 978-3-031-08754-7
eBook Packages: Computer ScienceComputer Science (R0)