Keywords

1 Introduction

Software development in higher level languages, greatly reduces the burden on the programmer, to seek solutions to problems in the system and application domains. However, translation of programs from source languages to object code, generates inferior, inefficient code due to the inherent nature, of the structure of the programming languages. To generate code that is optimal from an execution perspective, cleanup of the translated code is necessary, an activity that is usually referred to as Code Optimization. Code optimization is possible from several perspectives namely, reduction of execution time, reduction of storage requirement or reduction of energy requirement. In the present context, we use the term code optimization to mean code transformation, to improve the running characteristics of the program. We sometimes use the term Optimization to mean either transformation or parallelization. Program transformation includes code changes, that affect a particular aspect of code, such as the instruction count. Program parallelization involves finding concurrent threads of execution, that can be run on separate processing elements.

The Program transformation landscape is quite fertile, and myriad solutions exist. Although the primary goal of all transformation methods, is to improve the running time of the program, the methodology followed by each technique, in reaching the goal is unique. However it is possible to categorize these individual techniques, based on one or more of the following criteria, namely those that target the Instruction Count, Memory Latency, Locality and those that enable other transformations and parallelization.

Program parallelization is a related problem, that is interesting as well, and can be classified along multiple axes. At a very high level the parallelism that is inherent in a program, can be visualized from a code or data perspective, and accordingly we have parallelism that is code centric or data centric. Based on the Programmer Involvement required or Ease of Use, parallelization can be categorized as Manual, Semi-Automated or Explicit and Automated or Implicit. Considering the Granularity of the Parallel Tasks, parallelization can be categorized as Fine Grained or Coarse Grained. Parallelism in a program is contained, in different regions and structures and accordingly can be classified as Loop Level parallelism, Thread Level parallelism or Process Level parallelism. From a performance criteria, parallelization can be classified as Task Level, Instruction Level and Pipeline parallelism. Depending on the Architectural characteristics of the target machine and the resulting Scalability, parallelization can be classified as Shared Memory Parallelization, or Distributed Memory Parallelization. Parallelization could also be categorized, based on the latest emerging trends in the field, as Parallelization by Speculation [1] and Parallelization by Comprehension. Parallelization presents some interesting sub-problems, such as Sub-Program Creation, Orchestration and Distribution which could also serve as criteria for classification.

Since by nature the problems and their solutions, in the field of program transformation and parallelization are non trivial, precise mathematical models are required, to represent the problems and subsequently derive solutions. The range of models used and reported, in the literature is quite vast. We have models based on Trees, Graphs, Machine Learning, Algebra, Statistics, Enumeration, Heuristics among others [2]. From the above presented arguments, it should clear to the reader that the transformation and parallelization domain is exhaustive, and diverse and requires classification and categorization, to simplify and comprehend.

This research work is an effort in this direction and we attempt to fill the void by providing comprehensive taxonomies of the methods and models used in the domain of program transformation and parallelization. Section 2 provides the taxonomy of methods used in program transformation. Section 3 contains the taxonomic details of the methods used in program parallelization. Section 4 is dedicated to a discussion of the taxonomies of various models used in program transformation and parallelization. Section 5 presents the various taxonomies discussed earlier in graphical form, for easy comprehension.

2 Taxonomy of Methods Used in Program Transformation

The domain of Program transformation contains several techniques or methods that can be classified along the following characteristics,

  • Instruction Count Reduction

  • Locality Improvement

  • Memory Latency Reduction

  • Parallelization Enablement

  • Transformation Enablement.

2.1 Instruction Count Reduction

Instruction count for a given program, is the number of instructions executed, for a certain run of the program. This metric is usually obtained, with the help of dedicated hardware counters present in the architecture. This metric has a direct bearing, on the execution time of the program. So one way to reduce the running time of the program, is to reduce the instruction count.

Table 1 lists several popular transformation methods along with their descriptions, which can be categorized as techniques that aim for the reduction of Instruction Execution Count.

Table 1. Instruction count reducing transformations

2.2 Locality Improvement

Program Locality is the term used to refer to the program behavior, wherein recently used code and data are once again accessed, in a short time span. Hardware caches are used, to store a subset of the recently used code and datum. Accessing these items once again, can result in a cache hit. Since accessing an item from the cache takes fewer execution cycles, than getting them from memory, this can induce a substantial savings in the run time of a program.

The following table, Table 2 provides a listing of techniques and their explanations, whose primary goal is to improve the Locality behavior of the program, through caching of both code and data.

Table 2. Locality improving transformations

2.3 Memory Latency Reduction

For programs that do not exhibit good locality, caches cannot improve the running time. Techniques such as Prefetching are employed, whereby an item in memory is fetched in anticipation, before it is actually needed. Such techniques work, by hiding the memory latency from the user.

Redundant Load Store Elimination is a transformation, that falls under the category and involves removal of back-to-back Store followed by Load or vice-versa. Prefetching is another transformation of this kind, which attempts to fetch code and data to the cache in anticipation, before their actual reference. Cache Block Alignment is aimed at eliminating multiple fetch requests to objects, that span two cache lines by alignment, so that the request can be fulfilled in a single request.

2.4 Parallalization Enablement

Techniques such as Loop Unrolling, that prepare code and data to effectively enable the parallelization, which follows this step, can be referred to as Parallelization Enablers. They basically transform code so that they are more parallelization friendly even though they may not produce results right away.

Table 3 lists some transformations, that are parallelization enablers.

Table 3. Parallelization enabling transformations

2.5 Transformation Enablement

There are some transformations such as Copy Propagation, which don’t offer benefits right away. However other transformations which follow, can benefit from these preparatory transformations. These transformation techniques, can be referred to as Transformation Enablers.

Transformations such as Copy Propagation, Constant Propagation and Pointer Alias Analysis constitute the category of transformations enabling other transformations. Copy Propagation aims to replace the assignee of an assignment by the assigned in subsequent operations. Constant Propagation is a related transformation, that propagates constant values among, a sequence of related variables.

3 Taxonomy of Methods Used in Program Parallelization

The domain of program parallelization offers an interesting ensemble of methods and techniques, which can be grouped as follows,

  • Simplicity and Ease of Use

  • Performance

  • Granularity

  • Program Structure and Module

  • Scalability

  • Novelty

  • Orchestration and Management.

Table 4. Parallelization methods based on simplicity and ease of use

3.1 Simplicity and Ease of Use

A Parallelization technique can be viewed, on the basis of how simple or easy it is to implement and use. For instance, manually parallelizing a program is cumbersome to users, compared to the compiler technique which automatically parallelizes a program.

Table 4 categorizes parallelization, based on the criteria of their simplicity and ease of use.

3.2 Performance

Parallelization exists at various levels in a program, such as procedures or statements. How we unleash it depends on how much performance we are expecting and the effort we are willing to invest.

Parallelization carried out at the Instruction Level or the level of the Task and inside the hardware instruction Pipeline, fall under this group. Accordingly, Instruction Level Parallelization is the parallelization carried out the level of program statements or instructions. Task Parallelization is realized at the level of procedures or modules. Pipeline Parallelization is conducted at the level of machine instructions.

3.3 Granularity

Granularity is a term which is used to refer, to the size of the structure (abstraction) of a program, such as a module or a procedure. Normally, extracting parallelism from a structure that is coarse grained (large size), is easier than extracting from a structure, that is fine grained (small size).

Parallelization carried out at the task level, can be classified as Coarse Grain Parallelization and parallelization carried out at the statement or instruction level could be termed as Fine Grain Parallelization.

3.4 Program Structure or Module

Parallelism is inherent in program elements, such as Loops and Procedures both normal and recursive. Based on the available source and the statement grouping techniques such as multi-programming and threading we can categorize resulting parallelization.

In Process Parallelization, inherent parallelization is extracted through multiple invocations of the same program, each invocation acting as parallel component of the original program. In Thread Parallelization multiple threads are used, to achieve parallel run of the given program. Loop Parallelization refers to the parallelism that is present and subsequently extracted, by concurrent executions of different group of iterations of the loop.

3.5 Scalability

Certain large scale programs such as those that are numerically intensive, can rigorously test the limits of the executing hardware. Some of the hardware architectures can reach a bottleneck, after running the program of a certain size. To overcome such hardware limits, newer architectures have been proposed to scale the program.

Shared Memory Parallelization and Distributed Memory Parallelization, are two parallelization techniques that can be distinguished, based on the target architecture used to run parallel sub-programs. Shared Memory Parallelization involves a setup where the parallel programs communicate through shared memory. In the case of Distributed Memory Parallelization, the parallel components communicate with the help of explicit Sends and Receive calls.

3.6 Novelty

Published literature, periodically presents novel methodologies, to solve traditional problems. We have such examples in the parallelization domain also, which can serve as a basis for classification of parallelization methods.

This category includes Speculative Parallelization and Parallelization by Comprehension, two techniques that are beneficiaries of seen some recent research activity. Speculative Parallelization is initially carried out without dependence testing, but checks for collision and possible rollback are carried out at a later stage [5]. Parallelization by Comprehension is a parallelization process, based on the concept of algorithm inference.

3.7 Orchestration and Management

Parallelization process is not complete, with out concurrently executing the parallel components identified, during the parallelization phases. The final step in parallelization, is the control and management of the parallel pieces, of the original program which can also serve as a criteria for classifying a parallelization methodology.

Independent of the facilities offered by the modern programming languages, one could classify parallel sub-program generation techniques and the parallelization realized as a result. Program Slicing is a technique for creating subprograms by splitting the given program. Multi-programming involves creating sub-programs that are replicas of the original program, but executing only a subset of instructions in each replica, which when executed together produce the same result, as executing the original program.

4 Taxonomy of Models Used in Transformation and Parallelization

The domain of transformation and parallelization is rich, in terms of the mathematical models employed, to solve a problem at hand. Here we look at the various models, provide a definition for each modeling technique, and present a comprehensive listing of the use cases for each, in the published literature.

The various Models that are employed in the transformation and parallelization activity, can be categorized along the following criteria,

  • Models based on the Tree concept

  • Models based on the Graph concept

  • Models based on Machine Learning

  • Models based on Algebra

  • Models based on Statistics

  • Models based on Enumeration

  • Models based on Heuristics.

4.1 Tree Based Models

Tree is a very basic mathematical model, that is widely used in the optimization domain and Parse tree is a specialized model that falls under this category. It graphically shows how a string is derived in some language [6]. Parse tree has been employed by researchers in implementing optimizations such as the Common Sub-expression Elimination [6].

4.2 Graph Based Models

Graph is the most popular model to solve problems, in the program transformation and parallelization domain. Table 5, provides a description of several models based on the graph concept. The following table, Table 6, provides a listing of their use cases in published literature.

4.3 Machine Learning Based Models

Machine learning offers several opportunities to solve problems, in the domain of transformation and parallelization. Table 7, provides a description of models based on Machine Learning followed by table, Table 8, which lists out their uses.

4.4 Models Based on Algebra

Several models exist in the domain based on the Algebra model. Models based on Integer Linear Programming, Polyhedra, Linear Algebra and Symbolic Algebra fall under this category.

Integer Linear Programming is a system with a set of variables to be optimized, based on a function and a set of constraints. Integer Linear Programming has been used to solve problem such as, finding the Longest path length of a loop-nest, Locality optimization, Loop-nest parallelization, Register allocator optimization [25], Speculative instruction scheduler [1]. Polyhedral models use linear algebra abstractions such as matrices and their operations. [26, 27]. Polyhedral models have been used in literature to implement, Code size reduction, Vectorization selection [26], Compute loop iteration counts [28], Identify fusionable loops [29], Improve cache misses [30] Linear Algebra models use matrices, determinants, linear equations and their transformations and vector spaces. Linear Algebra models find use in solving Data Layout Transformation problems. Symbolic Algebra models are a collection of techniques for symbolically manipulating mathematical expressions. Symbolic Algebra has been employed in Power optimization, Floating point to fixed point conversion, Model conditions, loops, and procedures in programs.

Table 5. Description of graph models
Table 6. Application of graph models
Table 7. Description of machine learning based models
Table 8. Application of machine learning based models

4.5 Models Based on Statistics

There are several models employed in the domain that are stochastic in nature.

Orthogonal Arrays and Principal Component Analysis are models based on the Statistics model. Orthogonal Arrays measure the influence of a process of independent variables, on response or dependent variables. It has been used to influence Compiler Option Selection. Principal Component Analysis model can be used to prune the program feature space. It has been used to solve Iterative Optimization problems.

4.6 Models Based on Enumeration

When the problems involves search, Enumeration provides several opportunities.

Models based on Branch and Bound, Nelder-Mead Simplex and Enumeration fall under this category. Branch and Bound is a recursive search technique that uses trees [31]. It has been used in the past for Combining Optimization Options [31]. Nelder-Mead Simplex is a multi-dimensional search technique. It has been employed to solve Iterative Parameter Search problems [32]. Enumeration is a search space pruning technique that uses history data. Researchers have used enumeration to implement Optimal Scheduling of Super-blocks.

4.7 Models Based on Heuristics

When exhaustive methods do not provide solutions in a prescribed amount of time, Heuristic methods provide approximate solutions to fill the void.

Heuristics provide approximate solutions to NP-Hard problems and have been used to solve optimization problems, in the domain of translation and parallelization such as Combinatorial Optimization [33].

5 A Graphical Taxonomy of Methods and Models Used in Transformation and Parallelization

This section presents the taxonomy, of the various methods and models used in program translation and parallelization in graphical form.

5.1 A Graphical Taxonomy of Methods Used in Transformation

The various program transformation techniques, that are available today were chronicled earlier. Figure 1 provides a taxonomy of the transformation techniques in graphical form.

Fig. 1.
figure 1

Taxonomy of methods used in transformation

5.2 A Graphical Taxonomy of Methods Used in Parallelization

A classification of the various parallelization methods, cited in published literature are presented here in graphical form. See Fig. 2 for details.

Fig. 2.
figure 2

Taxonomy of methods used in parallelization

5.3 A Graphical Taxonomy of Models Used in Transformation and Parallelization

We discussed the various transformation and parallelization models found in published literature earlier. Here we provide a taxonomy of the models in a hierarchical graph representation. See Fig. 3 for details.

Fig. 3.
figure 3

Taxonomy of models used in transformation and parallelization

6 Conclusion

We started this research work, with the goal of developing a detailed taxonomy of methods and models, used in the program transformation and parallelization domain, which would foster learning and mastering of the subject area. We classified the program transformation methods along the following axes namely, instruction count reduction, locality improvement, memory latency reduction, parallelization enablement and transformation enablement. In a similar vein, program parallelization methods were categorized based on the following criteria namely, simplicity and ease of use, performance, granularity, program structure and module, scalability, novelty, and orchestration and management. Models used in the transformation and parallelization domain, allow the problems to be represented mathematically and subsequently develop solutions. The various models used in the domain as reported in published literature were classified as follows: models based on the tree concept, models based on the graph concept, models based on machine learning, models based on algebra, models based on statistics, models based on enumeration and models based on heuristics. The outcomes of our research are organized in the form of tables and graphs for easy comprehension. We hope the comprehensive taxonomy we have developed here, will benefit researchers and practitioners alike, and help them in their respective endeavors.