# Optimization of Reconfigurable Multi-core SOCs for Multi-standard Applications

Ali Ahmadinia<sup>1</sup>, Tughrul Arslan<sup>2</sup>, and Hernando Fernandez Canque<sup>1</sup>

<sup>1</sup> School of Engineering and Computing, Glasgow Caledonian University, UK <sup>2</sup> School of Engineering and Electronics, University of Edinburgh, UK

**Abstract.** Today there is a need for high performance chips that can provide very low power consumption, yet can operate over a number of application standards, such as operating a number of telecommunication standards depending on which country the device is in. This paper presents a new framework to enable the design of flexible systems by incorporating different range of reconfigurability in an embedded platform within an SOC design automatically. The SOC design automation involves identifying the best architectural features for the SOC platform, the configuration setting of reconfigurable cores, the type of interconnection schemes, their associated parameters such as data bandwidth, and placement of embedded cores in the communication infrastructures. For this optimization problem, a two-stage multi-objective optimization algorithm is presented. A multi-standard wireless telecommunication protocol is used to demonstrate our optimized designs in terms of area, power and performance.

### **1** Introduction

The current design automation technology has not been able to match the advance in System-On-Chip technology [1], therefore the ability to use the increasing number of gates effectively has decreased. Existing design methodologies are restricted and mainly based on verification. This paper presents a new design tool to automatically create a digital system directly from a description of the application described in software. In addition, an intelligent multi-objective optimization algorithm is provided to tune the architecture for optimal results under different configurations.

Recently numerous reconfigurable architectures have emerged that can be embedded within an SOC platform. Custom reconfigurable embedded cores can be reconfigured with a small set of configuration bits rather than reconfiguring millions of switch boxes like FPGAs. These provide the advantage of high performance as well as flexibility to future upgrades and dynamic reconfiguration. Reconfigurable SOC architecture incorporates both fixed and newly emerging custom reconfigurable cores. To our best knowledge, there is no existing approach for concurrent optimization of system-level placement of embedded cores and interconnection topology in such reconfigurable SOC architectures. Custom reconfigurable cores can provide multiple standard applications in a single chip, i.e. the resultant architecture should be optimal in different configuration of reconfigurable cores. These optimizations will minimize the overall power consumption and resource area utilization and maximize the whole system's throughput.

#### 2 Related Work

There has been considerable effort in designing tools that enable a designer to measure various performance metrics of System-on-Chips. Platune framework [2] tunes performance and power consumption of SOC platforms. Platune is used to simulate an embedded application that is mapped onto the SOC platform and output performance and power metrics for any configuration of the SOC platform. It proposes a space exploration algorithm based on the dependency between parameters. The basic idea is to cluster dependent parameters and then carry out an exhaustive exploration within these clusters. If the size of these clusters increases too much, due to great dependency between the parameters, the approach becomes a purely exhaustive search, with a consequent loss of efficiency.

In ARTS [3], a multi-objective genetic algorithm is proposed to optimize mapping a set of task graphs onto a heterogeneous multiprocessor platform. The objective is to meet all real-time deadlines subject to minimizing system cost and power consumption, while staying within bounds on local memory and interface buffer sizes.

Ascia et al. [4] propose a strategy for exploration of the architectural parameters of the processor, memory, and bus making up a parameterized SOC platform with tight power consumption and performance constraints. It uses multi-objective genetic algorithms as optimization technique for DSE.

Most of the tools outlined so far are designed for tuning of design parameters and subsystems toward a single fixed optimal SOC solution. Our proposed framework (ReCAD) extends such work further by allowing for power, area and throughput analysis of an entire parameterized reconfigurable SOC platform, to achieve a number of optimal results (for each scenario or standard) on the reconfigurable SOC platform within its range of configuration. Furthermore, while the earlier work has either focused on simulation of a user-selected configuration or design space exploration, ReCAD allows for automatic search and exploration of Pareto-optimal configurations with the ability of transaction level simulation of whole system for more accurate performance evaluation and speeding up system verification.

#### **3 ReCAD Framework**

The main aim of developed ReCAD tool is to automate the development of multi-core architectures, incorporating custom reconfigurable components, conventional RISC based processors, hardware accelerators and memory blocks etc. The components are added in the built-in library. Depending on the application required, components can be chosen to be integrated. In order to get the optimized system, the tool selects the communication media and embedded cores from the library of components with characteristics that satisfy the user constraints.

In order to verify the functionality of the many different configurations generated by platform transformations, there is a need for a fast simulation. To facilitate the ease of simulation, the most successful approaches are based on SystemC [6].

For power estimation, a state-based power model integrated with SystemC is used. Throughput estimations are mainly based on the SystemC simulations. A design space exploration will be used to search possible interconnection schemes with optimal placement of components in the communication medium to meet power and performance constraint of each application scenario.

### 4 Architectural Tuning for Multi-standard Applications

This section discusses the development of what we term the architectural tuning engine, which deals with:

a. Identifying the best architectural features for the SOC platform including embedded core types and interconnection schemes.

b. Identifying best parameter sets associated with each entity including configuration settings and bus bandwidth.

c. Determining the optimal core placement in the communication medium.

The above tasks should be carried out with considering the multi-scenario application to meet all scenarios' (standards') constraints. To identify the optimal set of embedded cores, interconnection topology and its associated parameters, a mapping of components to the communication medium is necessary. The power and throughput metrics may vary with different placement of modules in the interconnection scheme. For this reason, objectives should be examined against each feasible placement to identify the optimal solution which its performance results meet all required scenarios' specifications.

Like similar approaches, the ReCAD framework includes RTL implementation of each component of its library. Through one-off RTL synthesis of components, their resource area utilizations are calculated. In ReCAD tool, SystemC model for each SOC platform component as object has been developed which then could be used to model the complete SOC platform as communicating objects.

A SystemC based power model is developed for measurement of component's switching activities to obtain power estimation more accurately [5]. It is important to notice that gate-level simulation (for power analysis) has to be performed only once for each component. Therefore, when executing the complete system level model of a particular configuration, fast and sufficiently accurate power estimates will be obtained. Moreover, the SystemC models of components enable us to estimate overall throughput more accurately with a fast SystemC simulation compared to RTL based estimations. We assumed that power and throughput estimations of communication media depend on the transaction size of data between two ports. Since, the distance and type of two communicating ports have impact on the throughput and power consumption of the system, the transaction size is not the only major factor for performance estimation, especially in hybrid interconnections and NoCs. For this reason, the proposed SystemC power simulation is used for different traffic patterns between each two ports of the interconnection schemes. Thus, an approximate throughput and power consumption of the interconnection medium will be obtained for each communication scenario. With this information, we are able to estimate the overall area. power, and throughput of a complete reconfigurable SOC platform in different configuration modes to examine whether the estimated reconfigurable SOC solution meets the required constraints of each application scenario.

So far, we have described the simulation model and power analysis techniques used in the ReCAD tool. In the next section, we formulate the exploration problem and outline the algorithms used for performing it automatically.

### 5 Design Space Exploration

#### 5.1 Problem Definition

Let *S* be a parameterized reconfigurable SOC platform with *n* parameters. The generic parameter  $p_i$ , i=1, 2, ..., n, can take any value in the set  $V_i$ . A complete assignment of values to all the parameters is a configuration. The problem is to efficiently, compute the Pareto-optimal configurations, with respect to power, area and throughput, for a multi-scenario application executing on the reconfigurable SOC platform. In our problem, a configuration  $C_i$  is Pareto-optimal if no other configuration  $C_j$  has better power as well as area and throughput than  $C_i$ .

Our SOC platform is composed of numerous embedded cores and interconnection schemes. The type of embedded cores and interconnection scheme parameters need to be identified and fixed for the application. However, configuration settings of reconfigurable embedded cores provide flexibility to the SOC platform in order to execute multiple standard applications. In contrast with traditional design space exploration problem that an optimal fixed configuration should be achieved for a fixed application, here a reconfigurable SOC platform for a multi-standard application should be provided.

Therefore, we adapt traditional DSE problem to meet our requirement. We define two sets of parameters for optimization. The first set includes the configuration setting of reconfigurable cores and the second one includes the rest of parameters. Now, we use a two-stage design space exploration: in the first stage the configuration space should find Pareto-optimal solutions for the application scenario with the  $min(Objective_1)$  among all application scenarios, for example a standard with maximum power consumption constraint. In this stage all reconfigurable cores are set to their largest configuration size. Then in the second stage, there would be a set of optimal solutions that should explored by only the configuration setting parameter to meet constraints of each scenario, which can be explored by heuristics such as hill climbing algorithm.

#### 5.2 Multi-objective Optimization

To obtain an optimized solution for the exploration problem, first we need to define a DSE strategy that will give a good approximation of the Pareto-optimal front for a SOC platform *S* and an application *A*, simulating as few configurations as possible. The search for optimal configurations is a question of multi-objective optimization, where some of the objectives conflict with others, for example, performance and power consumption. Although this causes a considerable increase in the complexity of DSE strategies, it has the advantage of offering the SOC designer not one but a set of optimal configurations (Pareto-optimal set) from which he can choose the one that represents the best tradeoff in relation to the set of constraints has to be met. There are two main approaches for DSE of SOC platform. The first, GA, uses Genetic Algorithms as the optimization engine. A configuration is mapped onto a chromosome and

a population of configurations is made to evolve until it converges on the Paretooptimal set.

The second approach is the interdependency model proposed in [2], which tries to prune the configuration space. They have used a graph model to capture the parameter interdependencies. Such a graph is constructed with its nodes representing parameters and edges representing interdependencies between parameters.

This algorithm consists of two phases. The first phase performs a local search for Pareto-optimal configurations. The phase performs clustering of interdependent nodes in the graph. This is the same problem as finding strongly connected components of a graph (e.g., a depth first search can be used to accomplish this). The second phase iteratively expands the local search to discover global Pareto-optimal configurations. The stage combines pairs of clusters into a single cluster and computes Pareto-optimal configurations within it. Then, it limits the space of this new cluster to the Pareto-optimal configurations only. This procedure is repeated until all the clusters have been merged and a single cluster remains. The Pareto-optimal configurations within this last cluster represent Pareto-optimal configurations of the entire configuration space.

#### 5.3 Genetic Algorithm

The approach we propose for exploration of the configuration space of a parameterized reconfigurable SOC uses both inter-dependency and GA algorithms. For GA approach, we chose SPEA2 [6], which is very effective in sampling from along the entire Pareto-optimal front and distributing the solutions generated over the trade-off surface. The representation of a configuration can be mapped on a chromosome whose genes define the parameters of the system. The gene coding the parameter  $P_i$ can only take the values belonging to the set  $V_i$ . The chromosome of the GA will then be defined with as many genes as there are free parameters and each gene will be coded according to the set of values it can take (Fig. 1). For each objective to be optimized, it is necessary to define the respective measurement functions. These functions, which we will call objective functions, frequently represent cost functions to be minimized (area, power, and throughput).



Fig.1. Representation of a configuration as a chromosome

#### 6 Result Analysis

The proposed tuning algorithm has been applied to a multi-standard telecommunication application which handles different standards: WiMAX, WLAN, GSM, and 3G-CDMA with embedded custom reconfigurable cores of FFT and Viterbi decoder.

Fig. 2 shows the parameterized embedded cores targeting multiple standard wireless SOC devices used in our experiments. In our parameterized reconfigurable SOC architecture, there are two global parameters (data and address-width) to parameterize data bandwidth of embedded cores and interconnection schemes. Unlike other approaches, we use two parameters for interconnection cores: topology and placement. Topology parameter chooses the topology of interconnection structure, and Placement parameter determines the placement of embedded cores in the communication infrastructure. To obtain estimation values of power and area for the placement parameter, a set of different traffic patterns has been simulated between each two ports that their average power consumption and throughput give a realistic estimation of the interaction between ports to evaluate the effect of different placements. Moreover, configuration setting parameters should be optimized for a range of configurations to give optimal solution in different wireless standards.

We explored the configuration space for the mentioned application standards. In the first stage, we have optimized our architecture with configuration setting parameters for the 3-G CDMA standard (FFT Size=4096, Constraint Length=9, Code Rate=1/3). In this stage, we have simulated our architecture with two algorithms: the GA one and the interdependency approach [2].

With the set of optimal solutions obtained from the first stage, in the second stage, these solutions are explored by only the configuration setting parameters to meet constraints of each standard with a hill climbing algorithm. Results are summarized in



Fig. 2. Parameterized cores for optimization of a reconfigurable SOC architecture for multiple wireless standard applications

| Stage | GA BASED         |                   |                           |                        |                         | Inter-Dependency Based |                   |                           |                        |                         |
|-------|------------------|-------------------|---------------------------|------------------------|-------------------------|------------------------|-------------------|---------------------------|------------------------|-------------------------|
|       | Config.<br>Space | Pareto<br>Optimal | T hroughput<br>T rade-off | Power<br>Trade-<br>off | A rea<br>T rade-<br>off | Config.<br>Space       | Pareto<br>Optimal | T hroughput<br>T rade-off | Power<br>Trade-<br>off | A rea<br>T rade-<br>off |
| 1     | 967670           | 396               | 7.4                       | 8.3                    | 9.1                     | 967670                 | 743               | 7.4                       | 8.3                    | 9.1                     |
| 2     | 28512            | 26                | 6.3                       | 6.4                    | 4.5                     | 53496                  | 43                | 5.5                       | 5.5                    | 4                       |

Table 1. Two Stage Optimization Results



GA algorithm in the first part



Fig. 3. Power/Throughput Tradeoff in the two stage optimization with different algorithms in the first stage and hill climbing heuristics in the second stage

Table 1. The power, throughput, and area trade-offs of the Pareto-optimal configurations for all 4 application standards are presented. In the second stage, average tradeoffs decreased with the interdependency algorithms [2] due to its effective configuration pruning algorithm.

Fig. 3 presents the power/throughput tradeoff results in our two stage optimization with the two different algorithms used in the first stage: GA and Inter-dependency approaches. The simulation times with the GA algorithm were in the range of few seconds, whereas the Inter-dependency approach needed hours to compute the optimal solutions. On the other hand, the Inter-dependency approach was more effective in pruning the configuration space. The simulation times of our second stage were in the range of minutes, since it optimized a small configuration space with a few configuration setting parameters.

#### 7 Conclusion

This paper has presented a tool for automatic creation of digital systems directly from a description of the application described in software. Further to this, an intelligent multi-objective optimization algorithm has been specially tailored to tune the architecture for optimal results with different configuration. This proposed novel approach involves concurrent optimization of system-level placement of embedded cores and interconnection topology within a custom reconfigurable SOC architecture for a multi-scenario application. Custom reconfigurable cores can provide multiple standard applications in a single chip, i.e. the resultant architecture is optimized in different configuration of reconfigurable cores. These optimizations minimize the overall power consumption, resource area utilization and maximize the whole system's throughput. This approach has been applied to a wireless multi-standard application, and the results demonstrate the feasibility of our proposed approach.

## References

- 1. International Technology Roadmap for Semiconductors (ITRS), 2005 edn
- Givargis, T., Vahid, F.: Platune: A Tuning Framework for System-on-a-Chip Platforms. IEEE Tran. on Computer Aided Design 21(11), 1317–1327 (2002)
- Madsen, J., Stidsen, T.K., Kjaerulf, P., Mahadevan, S.: Multi-Objective Design Space Exploration of Embedded System Platforms. In: Conference on Distributed and Parallel Embedded Systems (DIPES), Braga, Portugal, October 11-13, 2006, pp. 185–194 (2006)
- Ascia, G., Catania, V., Palesi, M.: A GA based design space exploration framework for parameterized system-on-a-chip platforms. IEEE Transactions on Evolutionary Computation 8(4), 329–346 (2004)
- Ahmadinia, A., Ahmad, B., Arslan, T.: Efficient High-Level Power Estimation for Multi-Standard Wireless Systems. In: Proceedings of IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Montpellier, France, April 7-9, 2008, pp. 275–280 (2008)
- Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the performance of the strength pareto evolutionary algorithm. In: Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems, Athens, Greece, pp. 95–100 (2001)