Recommendations for using Simulated Annealing in task mapping

Orsila, Heikki; Salminen, Erno; Hämäläinen, Timo

doi:10.1007/s10617-013-9119-0

Recommendations for using Simulated Annealing in task mapping

Published: 12 September 2013

Volume 17, pages 53–85, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Design Automation for Embedded Systems Aims and scope Submit manuscript

Recommendations for using Simulated Annealing in task mapping

Download PDF

Heikki Orsila¹,
Erno Salminen¹ &
Timo Hämäläinen¹

706 Accesses
10 Citations
Explore all metrics

Abstract

A Multiprocessor System-on-Chip (MPSoC) may contain hundreds of processing elements (PEs) and thousands of tasks but design productivity is lagging the evolution of HW platforms. One problem is application task mapping, which tries to find a placement of tasks onto PEs which optimizes several criteria such as application runtime, intertask communication, memory usage, energy consumption, real-time constraints, as well as area in case that PE selection or buffer sizing are combined with the mapping procedure. Among optimization algorithms for the task mapping, we focus in this paper on Simulated Annealing (SA) heuristics. We present a literature survey and 5 general recommendations for reporting heuristics that should allow disciplined comparisons and reproduction by other researchers. Most importantly, we present our findings about SA parameter selection and 7 guidelines for obtaining a good trade-off made between solution quality and algorithm’s execution time. Notably, SA is compared against global optimum. Thorough experiments were performed with 2–8 PEs, 11–32 tasks, 10 graphs per system, and 1000 independent runs, totaling over 500 CPU days of computation. Results show that SA offers 4–6 orders of magnitude reduction is optimization time compared to brute force while achieving high quality solutions. In fact, the globally optimum solution was achieved with a 1.6—90 % probability when problem size is around 1e9–4e9 possibilities. There is approx. 90 % probability for finding a solution that is at most 18 % worse than optimum.

A novel simulated annealing-based optimization approach for cluster-based task scheduling

Article 27 May 2021

A Heterogeneous Multi-core Network-on-Chip Mapping Optimization Algorithm

Domain-Knowledge Optimized Simulated Annealing for Network-on-Chip Application Mapping

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

An efficient multiprocessor SoC (MPSoC) implementation requires automated exploration to find an efficient HW allocation, task mapping and scheduling [13]. Heterogeneous MPSoCs are needed for low power, high performance and high volume markets [36]. The central idea in MPSoCs is to increase performance and energy-efficiency. This is achieved by efficient communication between cores and keeping clock frequency low while providing enough parallelism.

Mapping means placing each application task onto some processing element (PE), as depicted in Fig. 1. Task refers here to the smallest unit of computation that can be re-located. Scheduling means determining execution timetable of the application components on the platform. Example Fig. 1 shows that tasks 0 and 1 are mapped to PE ₀, task 2 is mapped to PE ₁, and task N−1 is mapped to PE _M−1. In a strict mapping problem, state of the system is defined as a mapping of each task to a PE. The state is optimized with respect to given criteria, such as system’s throughput, latency or power. The optimized criteria is defined by a cost function whose value is minimized. For example, system’s performance can be optimized by setting the cost function to be its total execution time for a given program and input.

Tasks in a general mapping problem may execute arbitrary programs, deterministic or non-deterministic. In this paper, however, tasks are restricted to a subset known as Kahn Process Networks (KPNs) [15]. Each PE executes a program where reads are blocking, and testing for existing readable data on a communication channel can only be done by a blocking read. This enforces a deterministic result of the computation regardless of timing and mapping of tasks.

A large design space must be pruned systematically, since the exploration of the whole design space is not feasible [13]. Fast optimization procedure is desired in order to cover reasonable design space. However, this comes with the expense of accuracy. Iterative optimization algorithms evaluate a number of application mappings for each resource allocation candidate. The application is simulated for each mapping to evaluate the cost of a solution. The cost may depend on multiple factors, such as cycle count, latency, energy consumption, silicon area and others.

This paper investigates the use of Simulated Annealing (SA) algorithm in task mapping. We analyze the research question of how to select SA optimization parameters for a given point in design space exploration. Section 2 presents a survey of current state-of-the-art in SA task mapping. Section 4 presents our results on SA global optimum properties, and Sect. 5 properties of SA acceptance functions. Section 6.1 presents recommendations for reporting SA results for disciplined comparisons and reproduction of the experiments by other researchers. Section 6.2 presents recommendations for selecting SA parameters based on related work and new analysis. Finally, we conclude the paper with high level directions on how to improve existing state-of-the-art in Sect. 7.

2 Related work

We limit the discussion about mapping heuristics to Simulated Annealing and mention other approaches when direct comparison has been reported in literature.

2.1 Simulated Annealing algorithm

SA is a widely used metaheuristic for complex optimization problems. Term heuristic means that optimality is not guaranteed but algorithm usually produces satisfactory results, whereas meta denotes that it can be fitted into many kinds of problems. SA is a probabilistic non-greedy algorithm that explores the search space of a problem by annealing from a high to a low temperature. Temperature is a historic term originating from annealing in metallurgy where material is heated and cooled to increase the size of its crystals and reduce their effects. Temperature indicates the algorithm’s willingness to accept moves to a worse state.

Probabilistic behavior means that SA can find solutions of different goodness between runs. Non-greedy means that SA may accept a move into a worse state, and this allows escaping local minima. Local minimum is such a point in design space where no single move can improve the cost, but perhaps two or three consecutive moves can. The algorithm always accepts a move into a better state. Move to a worse state is accepted with a changing probability. This probability decreases along with the temperature, and thus the algorithm starts as a non-greedy algorithm and gradually becomes more and more greedy.

Figure 2 shows the SA pseudocode. The algorithm takes initial temperature T ₀ and initial state S as parameters. State is modified on every iteration. Cost function evaluates the objective function to be optimized, e.g. by simulating the platform. This algorithm seeks to minimize the cost. Temperature function returns the annealing temperature as a function of T ₀ and loop iteration number i. Move functions generates a new state from a given state. Hence, it re-locates some task(s) in our case and the associated cost (e.g. cycle count) is evaluated.

This new state can be accepted as a base for the next iteration or discarded. Move to better state is always accepted and Accept function calculates the probability for accepting a state change when cost function difference ΔC>0. Random function returns a random real number in range [0,1).

End-Condition function returns True iff optimization should be terminated. Parameters of the end conditions are not shown in the pseudocode. These may include various measures, such as the number of consecutive rejected moves, current and a given final temperature, current and accepted final cost. Finally the algorithm returns the best state S _best in terms of the Cost function.

2.2 SA in task mapping

Table 1 shows SA usage in 14 publications, each of which are summarized below. All publications use SA for task mapping, 5 for task scheduling, and 1 also for communication routing. None uses SA simultaneously for all purposes. There are many other methods for task mapping, such as genetic algorithms (GA), but they are outside the scope of this paper. Synthetic app. means the paper uses task graphs that are not directed toward any particular application, but artificial and meant for benchmarking the mapping algorithms. SA usually performs hundreds or thousands of iterations and therefore it is usually run off-line. Performing SA mapping at runtime is rare.

Table 1 Publications where Simulated Annealing has been applied in Task mapping, Scheduling, or Communication routing

Recommendations for using Simulated Annealing in task mapping

Abstract

Similar content being viewed by others

A novel simulated annealing-based optimization approach for cluster-based task scheduling

A Heterogeneous Multi-core Network-on-Chip Mapping Optimization Algorithm

Domain-Knowledge Optimized Simulated Annealing for Network-on-Chip Application Mapping

1 Introduction

2 Related work

2.1 Simulated Annealing algorithm

2.2 SA in task mapping

3 SA parameters

3.1 Move functions

3.2 Acceptance functions

3.3 Annealing schedule

3.4 Adaptivity in start and stop conditions

3.5 Iteration per temperature level

4 On global optimum results

4.1 Experiment setup

4.2 Example convergence with one graph

4.3 Convergence results for 10 graphs

4.4 Differences between graphs

4.5 Applicability of SA on larger problems

5 On SA acceptance probability

5.1 On acceptor functions

5.2 On zero transition probability

6 Recommendations

6.1 On reporting results

6.2 Recommended practices for task mapping with Simulated Annealing

7 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Convergence results to larger systems with 3–6 PEs

Appendix: Convergence results to larger systems with 3–6 PEs

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation