Emulation and verification framework for MPSoC based on NoC and RISC-V

Khamis, Mostafa; El-Ashry, Sameh; AbdElsalam, Mohamed; El-Kharashi, M. Watheq; Shalaby, Ahmed

doi:10.1007/s10617-022-09265-1

Emulation and verification framework for MPSoC based on NoC and RISC-V

Published: 14 September 2022

Volume 26, pages 133–159, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Design Automation for Embedded Systems Aims and scope Submit manuscript

Emulation and verification framework for MPSoC based on NoC and RISC-V

Download PDF

531 Accesses
4 Citations
Explore all metrics

Abstract

Nowadays, embedded systems have multiprocessing capabilities to meet the complexity of modern applications, such as signal processing and multimedia. However, as the embedded system’s functionality expands, complexity increases and numerous constraints become necessary. Constraints, such as high performance, low power consumption, and development time, became critical demands. Therefore, emulation and verification are necessary to assess the correctness and performance of such architectures and accelerate the development phase. We propose a robust, scalable, and flexible hardware-software emulation framework that focuses on design space exploration for MPSoC architectures. Our framework supports 2D and 3D NoC-based architectures built on an open-source RISC-V. According to user configuration, the framework auto-generates the corresponding universal verification methodology environment to explore the design space, evaluate the performance, and compare the results for wide configurations and parameters. Then, it provides the best solution based on provided user criteria. Our framework uses an emulation co-modeling technology to enable the designer to explore and detect architecture failures. We provide numerous experimental results for different 2D and 3D NoC architectures to assess their correctness and performance, including energy and power consumption. Noticeably, results show an acceleration by \(40\times \) in comparison to software simulators.

A Timed-Value Stream Based ESL Timing and Power Estimation and Simulation Framework for Heterogeneous MPSoCs

Article 05 March 2020

The Analyzes of Network-on-Chip Architectures Based on NOXIM Simulator

Rapid Hybrid Simulation Methods for Exploring the Design Space of Signal Processors with Dynamic and Scalable Timing Models

Article 27 September 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Today’s embedded systems are becoming more complex, with many cores, accelerators, and Intellectual Property (IP) blocks, offering more modularity, scalability, and processing power than ever before [1, 2]. Nevertheless, these advantages come at the expense of the performance due to (1) the activity management; such as task mapping, task movement, Quality of Service (QoS) processing, and system monitoring, and (2) the complexity; as the embedded system’s functionality expands, numerous constraints are required, such as area, throughput, memory, power consumption, and time-to-market.^{Footnote 1} Therefore, interconnections between the various IPs are limited by particular constraints. Hence, IP-communication has emerged as one of the significant challenges confronting the performance of modern embedded systems. Customarily, bus architectures were the solution. However, as most present-day applications’ scalability, heterogeneity, and constraints increase, bus architectures fail to meet the requirement, particularly regarding throughput and bandwidth [3,4,5,6]. Networks-on-Chip developed as an answer for interconnection challenges to tackle the pitfalls of traditional bus architectures [7,8,9]

The assessment of the correctness and performance of NoC-based architectures involves extensive simulation and hardware emulation techniques [10]. The simulation approach describes the architectural designs in software routines to speed up the development time. However, this technique degrades as designs scale up. It slows down as the number of IPs per system design increases due to the complexity of the synchronizations and inter-and-intra IPs communications [11,12,13]. On the other hand, hardware emulation models fine-grained parallelism effectively and operates at ultra-high speed compared to simulators. it defines architectural designs in Hardware Description Languages (HDLs) [14, 15]. It provides high cycle-level accuracy and detects design issues early on. It supports exploration and validation for various parameters by re-configuring the corresponding FPGAs without re-synthesizing the whole architecture. Emulation systems, like Veloce [16], exploit the co-simulation combined with the transaction-level methodology (co-emulation) [17], where the transactor, interfaced with the DUT, runs on the emulator through the testbench.

This paper proposes a framework to evaluate and verify NoC-based architectures through hardware emulation, where run-time errors are captured, utilizing Universal Verification Methodology (UVM). UVM is a standard portable open-source verification library to evaluate and verify advanced digital architectures [18]. UVM verification environments can be reused for NoC-based designs with different configurations, network dimensions, and topologies [19, 20]. We propose a framework that auto-generates a scalable NoC-based MPSoC design and its UVM verification environment. It can run in simulation and emulation, and it has an extensive capacity to support various NoC configurations, testbench acceleration, and power analysis. We used the RISC-V as a processor tile to provide real traffic patterns into NoCs through a portable Core Network Interface (CNI). To the best of our knowledge, no previous work has explored using hardware emulation for NoC-based architecture verification through UVM. Below, we list the key contributions of this work.

1.
NoC-based emulation and verification framework enable functional and timing verification of several NoC-based architectures at different levels of abstraction and configuration [21].
2.
Design-space exploration automation, where the user defines design-space and target performance. The corresponding UVM verification parameters are auto-generated and updated. Then, the framework compiles, emulates, evaluates the whole NoC-based architecture, and provides the best performance based on the provided user criteria.
3.
Evaluation and performance analysis for various configurations and parameters supporting 2D and 3D NoC-based architectures and utilizing real traffic patterns injected by RISC-V PEs (Processing Elements).

The rest of the paper is organized as follows. Section 2 discusses related work. Section 3 details the proposed framework and the hardware emulation flow. Section 4 presents the test-cases and experimental results. Finally, Sect. 5 concludes the paper and presents the future work.

2 Related work

In recent years, research focused on developing several simulation and emulation frameworks to verify and evaluate NoC-based designs. Connect [22] and MAIA [23] are open-source NoC verification frameworks centered around Verilog test-cases supporting re-usability and scalability. ATLAS [24] provides simulation and hardware emulation utilizing SystemC for prototyping and verifying NoC architectures. Xu et al. evaluate bus-based and switch-based on-chip networks and analyze system performance at a cycle-level accuracy [25]. Cheung et al. present the INSIDE framework that quickly scans the structure space for extensible processors and considers the area and performance constraints of an embedded application [26]. INSIDE estimates the performance of an application, focusing only on the processor behavior context.

Monemi et al. propose the ProNoC framework, which is an automated tool for rapid prototyping and validating the NoC-based platforms targeting FPGA [27]. They present an evaluation comparison against other NoC simulators. Busseuil et al. present the Open-Scale platform, which is a scalable open-source framework that can be used for design space exploration for NoC-based memory MPSoCs [28]. Zhang et al. [29] present an NoC-based homogeneous multi-cores framework based on x86 processor architecture and shared distributed memory. The framework is based on a high-speed Network Interface (NI) using GEMS and Booksim NoC simulators. However, they do not provide a debugging tool for platform verification or emulation, and do not support the RTL model level.

Balkind et al. present OpenPiton system, which is an automated open-source framework based on a general-purpose multi-core processor [30]. OpenPiton supports a complete ad-hoc Verilog verification infrastructure synthesizable to Xilinx FPGAs. It supports homogeneous architectures. Skalicky et al. present a hardware and software co-design structure for MPSoCs frameworks, focusing on FPGA prototyping [31]. The configurable platform automatically compiles, synthesizes, and generates heterogeneous systems. Prabhu et al. present an NoC simulator framework based on FPGA acceleration [32]. They evaluate their framework on limited 2D mesh network configurations of \(6\times 6\) and \(8\times 8\) utilizing five-port router architecture. Ruaro et al. propose Memphis framework supporting many-core tiles with a hierarchical organization and connection to the peripherals at the chip borders to feed applications [33]. The framework is a cycle-accurate, with a SystemC prototype to accelerate simulation time and a VHDL prototype to synthesize on FPGA.

Table 1 Comparison between the proposed framework and other related MPSoCs design frameworks

Full size table

In summary, several cycle-accurate simulation frameworks in VHDL and SystemC have been proposed for NoC design space exploration [34,35,36]. However, they cannot perform real-life injection traces at a fast speed. Hence, FPGA-based emulation frameworks are proposed to reduce validation time [37, 38]. However, these proposed frameworks could not emulate complete and large-scale NoC-based architectures as FPGA resources limit them. Besides, they support limited NoC configurations and applications like multimedia [39,40,41,42], and none of them at any point utilize RISC-V core. Table 1 compares related works and our proposed framework to clarify the difference. As shown, our framework supports emulation utilizing UVM, multiple-routers implementations and application-traffic patterns driven by RISC-V.

3 Proposed framework

In this section, we present our hardware-software emulation framework. Our framework: (1) implements a generic configurable and portable UVM verification environment that provides accurate simultaneous performance analysis; (2) supports auto-generation for simulation and hardware emulation with different configurations. In brief, the framework automatically generates the Hardware Description Language (HDL), and Hardware Verification Language (HVL) models, compiles, simulates, synthesizes, emulates, and reports the final performance results; (3) enables evaluation and verification of large-scale MPSoC, including 2D and 3D NoC-based architectures. It supports real traffic patterns utilizing RISC-V as a processor tile and connects external peripherals through the AXI bus interface; (4) evaluates the performance of different parameters, configurations and real-life applications. Fig. 1 illustrates the main three layers: (a) the hardware NoC-based architecture layer, (b) UVM and the emulation layer, and (c) the software layer. Below, we describe each layer in detail.

3.1 Hardware NoC-based architecture layer

Figure 2 details the hardware NoC-based MPSoC layer. It is divided into three main components: (1) RISC-V core PE, which injects and collects packets according to the real-traffic patterns by applications; (2) CNI, which interfaces the RISC-V PE with the NoC router; (3) NoC architecture, an auto-generated network-on-chip with different configurations and parameters (2D and 3D, topology, buffer size, virtual channels, etc.).

3.1.1 RISC-V processor tile

The application or user-configured traffic pattern is written in C and ported to RI5CY core through a custom tool-chain GCC compiler. Accelerators, co-processors, and different I/O peripherals communicate with the RISC-V core through the AXI4 bus interface, as depicted in Fig. 2 [43, 44]. Although this work principally centers around the RI5CY core, other IP cores can be used easily, but the application should be recompiled. It should be highlighted that our framework is the only framework that supports the AXI bus interface with processor tiles, which facilitates the plug-and-play replacement of different processor tiles or IPs. Other frameworks use custom direct-memory interface, e.g., ProNoC [27] implements Wishbone interface. Besides, Debug Access Port (DAP) is implemented to provide debugger access to system components through the Joint Test Action Group (JTAG).

3.1.2 Core network interface (CNI)

Core Network Interface (CNI) [45] is modified to support 2D and 3D NoC architectures. CNI connects the processor tile and the auto-generated NoC-based architecture. CNI functions are to: (1) fetch the packets from the read-write register-bank ; (2) format the packets with designated control information, such as the source, destination, packet length, packet ID, interval cycle, and others according to the NoC configuration; (3) divide the packets into an appropriate number of flits and compose head, body, and tail flits; (4) inject the flits into the input port at the source router; (5) collect the flits from the output port at the destination router; (6) reformat the flits to packets ; (7) store the collected packets in the read-only register-bank until the processor tile read them through the AXI4 bus interface; (8) handle acknowledgments and synchronization between the router ports and RISC-V PE. As shown in Fig. 2, CNI is divided into five main modules register-bank, source controller, sink controller, NoC injector, and collector.

CNI registers CNI registers are divided into: (1) read-write registers, (2) source data and control registers for packet injection, (3) read-only sink data and control registers for packet collection.
CNI controller CNI controller is connected to the router’s local input and output ports, and interfaces the CNI registers and NoC injector and collector. CNI controller is responsible for synchronizing, formatting, dividing the packets into flits, sourcing, and sinking the packets between the processor tile and NoC injector and collector.
Network injector and collector The NoC injector and collector manage the transmission and reception of flits and the communications with NoC local ports.

3.1.3 2D and 3D NoC router architectures

Our flexible framework supports different NoC-based configurations utilizing various router architectures to facilitate design space exploration. Below, we present four router architectures: (1) Daniel’s router, (2) distributed/centralized networks-based Conventional Buffer (CB) router, (3) Round-Robin Flexible Buffer (RRFB) router, and (4) distributed/centralized NoC-based Virtual Channel Conventional Buffer (VCCB) router. A brief description of each router is as follows:

(1)
Daniel’s router [46]. It is an open-source, multi-functional, flit-based, fully synthesizable RTL router. It supports different NoC configurations, Table 2 presents these configurations. We modified the router architecture to support 3D mesh and torus topologies. Two input and output ports for up and down directions are added, and the control unit is modified, as shown in Fig. 3.
(2)
Conventional buffer (CB) router [47]. It represents the base router. It has input and output ports connected by an intermediate crossbar. It has five/seven ports in 2D/3D NoCs respectively, as shown in Fig. 4. The input port has three main blocks: (1) FIFO buffer: stores the incoming packets from the upstream router, (2) Input controller: handles hand-shacking control from/to the upstream router and communicates with output ports to transfer the received packets, and (3) Routing logic module: applies the routing algorithm to determine the packet destination. The output port has: (1) Arbiter: handles all received requests by input ports connected to the Crossbar and permits bus allocation, (2) Output controller: communicates with the downstream router. At the same time, the Crossbar switches the packets from the upstream router to the downstream router based on the arbiter decision.
(3)
Round-Robin flexible buffer (RRFB) router [48]. It has the same architecture as the CB router with an additional unit, FIFO Flexibility Controller (FFC), at each input port. It operates similarly to the CB router till congestion occurs. At that time, the flexible router does not wait for free slots at the input-port FIFO. However, the FFC unit searches for a free slot at other ports. Once it finds a free slot, it grants back the request to the upstream router, and the packet is transferred to the selected FIFO port. Afterward, the RRFB router operates normally like the CB router. Figure 4 illustrates the difference between CB and RRFB architectures.
(4)
Virtual channel conventional buffering (VCCB) router [9]. It adopts virtual channel flow control to improve NoC performance and resolve the congestion, where packets are traveled on a flit basis, and each virtual channel stores flits per packet. The packet is divided into several flits: (1) the head flit contains the source, destination address, and selected flow control, (2) the body flit carries all packet data, and (3) the tail flit indicates the end of the packet.

Table 2 Daniel’s router parameters and configurations

Full size table

3.1.4 RRFB architecture and deadlock analysis

RRFB router supports dynamic buffering where all FIFOs can store any incoming flit. Such dynamism comes at the expense of deadlocks. Models of deadlock situations are studied; two of them are presented in Fig. 5. For all deadlock model, all buffers are full, and only the head flit is shown. For example, in Fig. 5a, all buffers are full in routers \(R_1\) and \(R_2\), and all head flits of each queue buffer in \(R_1\) are moving east to \(R_2\), while all head flits in \(R_2\) are moving west to \(R_1\) at the exact moment. Another example, but more complex in the 3D NoC, where directions are east, south, or down. Directions mean moving toward positive X, Y, and Z coordinates, respectively. In Fig. 5b, the deadlock cycles are framed in the XY plane; all routers are in the same Z plane. As shown, all flits are made a beeline for either E, N, S, or W. There are two cycles framed, one clockwise (\(\hbox {E}\rightarrow \hbox {S}\rightarrow \hbox {W}\rightarrow \hbox {N}\rightarrow \hbox {E}\)) and another counter-clockwise (\(\hbox {N}\rightarrow \hbox {W}\rightarrow \hbox {S}\rightarrow \hbox {E}\rightarrow \hbox {N}\)). It is clear that no flit could push ahead in any direction because of the deadlock.

Table 3 XYZ-based CB and XYZ-based RRFB restrictions; allowed and forbidden next hop directions

Full size table

The RRFB architectures developed for solving all such deadlocks models and have the property of abstaining from putting away incoming flits in the related input port buffer under certain conditions and to store the incoming flits in other free buffers in other ports. The deadlock problem is carefully analyzed under (1) XYZ, (2) West-First, and (3) Negative-First routing algorithms, as follows:

(1)
XYZ-based flexible scheme. The XY turn-model is extended to the 3D-NoC new paths, i.e., up and down. For example, for a set of routers located in the XZ plane with the same Y coordinate, it is forbidden for a flit to move down, then east or west. Also, it is forbidden for a flit to move Up then east or west. The same rules are applied to the routers in the YZ plane. The extended XYZ turn-model is shown in Fig. 6a–c. As shown, there is one forbidden turn to break the deadlock possibility in each cycle. Table 3 presents the restrictions of XYZ-based RRFB router. By applying these restrictions, the XYZ-based RRFB router bans deadlocks under XYZ routing.
(2)
West First-based flexible scheme. Similarly, the West-First (WF) turn-model [49] is extended to 3D NoCs, turns from any direction followed by a move toward the west are forbidden. For example, in an XY plane, moving north/south then west is forbidden. In the XZ plane, north and south are mapped to up and down, respectively. Then, the same rule is applied; moving from up/down to west is forbidden. For the YZ plane, moving from up/down to the south is forbidden, whereas the south is analogous to the west. The extended WF turn-model is shown in Figs. 7a–c. There is one forbidden turn to break the deadlock possibility in each cycle. These restrictions of the WF-based RRFB router are presented in Table 4.
(3)
Negative First-based flexible scheme. In the Negative-First (NegF) routing algorithm, turns from negative to positive directions are forbidden. In order to determine the direction sign, the location of the origin (router (0, 0, 0)) must be defined first. East, south, and down are all positive directions, while west, north, and up are negative. Therefore, the following turns are all prohibited; \(\overset{+}{\text {E}} \rightarrow \overset{-}{\text {N}}\), \(\overset{+}{\text {E}}\rightarrow \overset{-}{\text {U}}\), \(\overset{+}{\text {S}}\rightarrow \overset{-}{\text {W}}\), \(\overset{+}{\text {S}}\rightarrow \overset{-}{\text {U}}\), \(\overset{+}{\text {D}}\rightarrow \overset{-}{\text {W}}\), and \(\overset{+}{\text {D}}\rightarrow \overset{-}{\text {N}}\). Table 5 lists the restrictions of NegF-based CB and NegF-based RRFB, that are presented in Figs. 8a–c.

Table 4 WF-based CB and WF-based RRFB restrictions; allowed and forbidden next hop directions

Full size table

Table 5 NegF-based CB and NegF-based RRFB restrictions; allowed and forbidden next hop directions

Full size table

3.1.5 Centralized and distributed NoC-based architectures

Our framework supports centralized and distributed configurations. We modified the routers as follows:

(1)
CB and RRFB architectures. Centralized NoC-based CB and RRFB architectures are implemented. The routers have the same modules as distributed based-NoC, Fig. 4. However, all ports are connected to the PEs directly, as illustrated in Fig. 9.
(2)
2D/3D VCCB architectures. Centralized NoC-based VCCB architecture is demonstrated in Fig. 10, each input/output port is connected directly to PE for real traffic pattern acceleration. On the other hand, Fig. 11 illustrates the distributed NoC-based VCCB architecture. A 3D VCCB router architecture is implemented, as depicted in Fig. 11.

3.2 UVM and emulation layer

This work explores hardware emulation to verify NoC-based architectures via UVM. The hardware emulation platform, such as Siemens’s Veloce [16] or Synopsys’s ZeBu ASIC [50], enhances evaluation over conventional simulators. It facilitates functional verification and virtual prototyping for complex architectures. It provides accurate measurements for functional behavior and functional coverage of each module [51]. A hardware emulator composes of an array of FPGAs. Initially, the design behavior is described in HDL and synthesized to a gate-level netlist by the RTL compiler/synthesis tool. Then, the design is mapped to a crystal chip, an advanced FPGA with additional memories, control and debug facilities. However, as design size increases, it is mapped to multiple crystal chips on the Advanced Verification Board (AVB). The number of AVBs specifies the emulator capacity.

3.2.1 UVM environment for emulation

We aim to improve the accuracy and accelerate the evaluation for complex, large-scale NoC-based architectures. In this context, Our framework automates design space exploration, where the user sets the target performance and configuration parameters. Then the framework generates corresponding test scenarios and related implementation of UVM and hardware emulation.

Our framework facilitates merging the UVM environment to NoC-based architectures. The UVM environment depends on HDL and Hardware Verification Language (HVL) TOP modules. The HDL TOP module describes the RTL design, and the HVL TOP module describes the UVM testbench environment. The HVL TOP is an untimed, class-based, behavioral, and dynamic architecture. The communication between both TOPs is performed through transaction model-based communication, where:

A physical communications link is set between the software-based simulator and hardware emulator that deals in data packets format instead of transaction objects.
The traffic amount on the physical link must be controlled to achieve the most profitable execution time.

UVM verification modules are added to the hardware; the synthesis process must pass them. However, the UVM testbench is based on Object-Oriented Programming (OOP) and utilizes SystemVerilog constructs and classes that are not synthesizable and cannot be implemented on the emulator. Thus, these constructs and classes are implemented on software simulation and separated from the synthesizable RTL modules on the hardware emulator. Hence,

Bus Functional Model (BFM) is utilized to interface the transactor untimed fragment in the HVL space to the HDL space. The testbench proxy and the corresponding HDL BFM must be interfaced at the Transaction Level Model (TLM) to communicate between the software-based simulator and the hardware emulator [52].
Time-plan constructs, including synchronization with a clock or other delays, should be removed as they block performance evaluation of the hardware emulation. They should be implemented using using a synchronous clock model in the HDL time-space [53].
The verification environment objects that overcome any obstruction between the RTL and the testbench—i.e., monitors and drivers—should be included in timed and untimed modules.

3.2.2 Proposed UVM architecture for NoC emulation

The proposed emulation UVM environment, shown in Fig. 12, is developed based on our previous related works [54,55,56]. The developed UVM performs two main tasks: (1) performance evaluation for all described routers architectures with different NoC parameters and configurations, (2) function verification and debugging for deadlock cycles and network congestion. Our UVM environment is flexible and generic and supports the AXI interface to automate design space exploration. The user needs to define any configuration (router, PE, NoC sizes, NoC dimension, etc.). Then, corresponding UVM environment will be auto-generated, connected to RTL architecture, and discover the best performance based on user criteria. The developed main UVM components are:

Test establishes verification scenarios according to the test plan, connects the DUT to the verification environment through virtual interfaces, and generates the system clock.
Environment is the parent of all hierarchical verification modules. It instantiates multiple active and passive agents, including agent configuration objects, subscriber modules like ScoreBoard (SB) and Coverage Collector (CC), and sequences.
Agent Each UVM environment could include multiple active/passive agents. The active agent encapsulates sequencer, driver, and monitor modules, while the passive agent has only a monitor module. In the proposed UVM architecture, we have three types of agents:
1. 1.
  Active source agent drives and monitors the packets on the router local-port.
2. 2.
  Passive sink agent monitors the signal activities on the router local-port.
3. 3.
  Passive routing agent monitors the signal activities for the other router ports (east, west, north, and south in 2D).
Driver is an active module that drives the input signals of the router local-port. Based on the user configurations for application-based traffic, the driver generates the corresponding injected patterns in hex file format to be loaded to PEs’ RAMs, as shown in Fig. 12. The driver passes the packets with their details into a backdoor access function to inject/collect packets to/from the PE. Backdoor access function forces assigning values to RTL modules or software routines.
Monitor captures signals activity of the DUT interface, then transfers them into a transaction level. It has TLM analysis ports to broadcast the captured transaction to other components like subscribers. The proposed UVM environment provides four types of monitors implemented in different agents:
1. 1.
  Source monitor, located in the active NoC source agents, captures the signals activity from the input local-port for sent packets.
2. 2.
  Response sink monitor, located in the passive NoC sink agents, captures the signals activity from the output local-ports for the received packets.
3. 3.
  Response routing monitor, located in passive routing agents, captures the signals activity from the other output ports (east, west, north, and south) to track the packet routing path.
4. 4.
  Response AXI monitor, located in passive AXI agents, captures the signals activity related to writing/reading packets to/from the CNI.
All captured data are transacted to the subscribers and collected to figure out the evaluation results.
Sequence creates the scenarios sent to the driver in the transaction format. We developed several sequences for synthetic traffic patterns, like uniform, bit-complement, transpose, and application-based traffic patterns like Digital Video Object Plane Decoder (DVOPD) and Moving Picture Experts Group (MPEG4). These sequences randomize delays between packets to give the additional injection and throughput rates.
Sequencer connects the sequence with the driver, and generates the data transactions.
Subscribers (SB and CC). SB checks and verifies the functionality of the DUT. SB reads the injected data, compares it with received data, and calculates the overall performance results in throughput and latency metrics. It compares the AXI written/read data to/from the CNIs with sent packets to validate the functionality of the PE and CNI. It verifies the route of each packet by tracking it through routing monitors. CC evaluates the testability coverage of the NoC-based MPSoCs by collecting the functional coverage for all scenarios to check if there are any missing scenario.

In brief, the proposed UVM emulation environment is flexible and generic and has the following duties: (1) evaluate the performance of the MPSoC architectures with different parameters and configurations, (2) accelerate evaluation and discover corner-case bugs, (3) examine and catch various deadlock models and defined errors.

3.3 Software layer

The software layer is responsible for the emulation process to implement and evaluate NoC-based MPSoC architectures. Initially, it parses the user configurations, such as IPs, traffic patterns, NoC size, topologies, etc. Next, It works to auto-build the corresponding UVM environment. Then, it performs the evaluation and result collection. Later, it recommends the best configuration based on the design space. The software layer consists of four modules: (1) NoC and UVM configuration and generation, (2) traffic patterns generator and controller, (3) emulation flow for design-space exploration, and 4) performance analysis, as shown in Fig. 1.

3.3.1 NoC and UVM configuration and generation

A software tool based on Perl scripts is developed to read and parse the user configurations, then auto-generate, build, and connect the implemented NoC-based architecture. Besides, it sets the design space exploration parameters, configures the corresponding UVM environment, and generates the top environment module (Env_Top), Fig. 12.

3.3.2 Traffic patterns generator and controller

The traffic generator injects packets according to the adopted traffic pattern. The test layer supports application-based and synthetic traffic patterns, such as uniform, hot-spot, transpose, bit-shuffle, bit-rotation, bit-reversal, tornado, and neighbor traffic patterns [7].

3.3.3 Emulation flow and design-space exploration

This module performs emulation flow to auto-configure, run, and control the emulation, as described in Fig. 13. The emulation flow is as follows:

1.
Select the router architecture from (Daniel, CB, RRFB, VCCB) and set user-defined configurations for NoC parameters.
2.
Generate the configured NoC-based MPSoC architecture and its corresponding UVM environment.
3.
Compile the injector and collector RISC-V software files for the user-defined traffic pattern.
4.
Load the generated data memory of each core into the hardware emulator.
5.
Get testbench acceleration co-modeling by running the UVM environment with the generated NoC-based architecture on the emulator, then investigate if there is any congestion or bus routing failure.
6.
Emulate NoC-based architecture on the hardware emulator under the pre-configured traffic patterns.
7.
Analyze the results in terms of network latency, network throughput, maximum energy, and power consumption [57, 58] and the power overheads consumed in: (a) PE to CNI, and (b) CNI to network input ports.

3.3.4 Performance analysis

This module collects results, coverage reports, and SB checkers outputs. It plots the throughput, latency, maximum consumed energy, and power consumption of NoC-based MPSoC. It replicates the emulation process for all user-defined space exploration parameters. It recommends the best findings in terms of configuration parameters to the user based on the required criteria.

4 Experiments and results

We developed four case-studies based on the various router architectures discussed in Sect. 3.1.3:

Daniel’s router the emulation is performed for NoC-based architecture with network sizes: \(2\times 2\), \(4\times 4\), \(8\times 8\), \(8\times 16\), buffer sizes: 16, 32, 64, virtual channels number: 1, 2, 3, Traffic Patterns (TP)s: transpose, bit-complement, uniform, 3D topologies: mesh and torus under \(8\times 8\times 4\) network size.
CB router the emulation is performed for NoC-based architecture with topologies: mesh, torus, network dimension: 2D, 3D, network architectures: centralized-based and distributed-based. All configurations are under 64 PEs capacity.
RRFB router the emulation is performed for NoC-based architecture with routing algorithms: XYZ, West-First (WF), Negative-First (NegF), 2D network architectures: centralized-based and distributed-based. All configurations are under 64 PEs capacity.
VCCB router the emulation is performed for NoC-based architectures with 2D/3D networks: centralized-based and distributed-based, with 64 PEs capacity for 2D NoC-based architectures and 256 PEs in 3D NoC-based architectures.

These four case studies clarify the flexibility of our framework to support design space exploration for NoC-based architectures. Different configurations are applied to validate and measure their performance, hence, verifying our framework’s accuracy and scalability. We will focus our discussion of results on NoC performance evaluation under synthetic and real bench-mark traffic patterns and emulation versus simulation speed-up.

4.1 NoC performance evaluation

NoC-based MPSoC architectures with different configurations are implemented and verified to guarantee no bus routing failures or NoC congestion. After, the MPSoC architecture is emulated where co-modeling accelerates the generation of the performance results.

4.1.1 Performance comparison

Experimental results, such as throughput and latency versus injection rates for various topologies, network sizes, network dimensions, network connection architectures, TPs, VCs number, and buffering techniques are shown below.

The performance evaluation of Daniel’s router (first case study) is as follows:

NoC sizes Figure 14a, b illustrate the NoC throughput and latency for \(2\times 2\), \(4\times 4\), \(8\times 8\), and \(16\times 8\) NoC size under the configuration of one virtual channel, 64 bits buffer size, and uniform traffic. From the figures, we can notice that small-size NoCs have better throughput and latency compared to larger ones.
Buffer sizes Figure 14c, d present the NoC performance for different buffer sizes: 16 bits, 32 bits, and 64 bits under the configuration of \(8\times 8\) NoC size and uniform traffic. From the figures, we notice that increasing the NoC buffer size improves performance. However, it comes at the expense of area and power consumption.
Traffic patterns Figure 15a, b show the NoC performance for different traffic patterns: uniform, transpose, and bit complement under the configuration of \(4\times 4\) NoC size and 16-bit buffer size. From the figures, we notice that the transpose traffic pattern has the best performance while the uniform has the worst ones.
Number of VCs Figure 15c, d illustrate the NoC performance for various virtual channels per port: 1, 2, and 3 VCs. From the figures, we notice that increasing VCs number improves NoC performance. Similar to the buffer size, it comes at the cost of area and power consumption.
Topologies-3D NoC Figure 16a, b present the NoC performance for various topologies of 3D NoCs: \(8\times 8\times 4\) mesh and torus. The figures show that the NoC performance for torus topology is better than mesh topology, and this is because of the wrap-around routing per x, y, and z directions. We can notice that increasing NoC dimensions improves performance at the cost of the area and power consumption.

The performance evaluation of CB and RRFB router architectures (second and third case studies) is as follows:

NoC topologies and router architectures Figure 17a, b illustrate the NoC performance for various topologies: mesh and torus of CB and RRFB 2D-router architectures. The figures show that torus topology has a better performance than mesh. RRFB router architecture provides better performance than the CB architecture due to the adaptability to handle busy ports with congested buffers, as discussed in Sect. 3.1.4.
Topologies 3D-NoC Figure 17c, d present the NoC performance for various topologies: mesh and torus of CB and RRFB 3D-router architectures. Figures show that 3D-router architectures provide better performance than the 2D ones, as we illustrated in Sect. 3.1.4.
Routing algorithms Figure 18 shows the NoC performance for different routing algorithms (XYZ, WF, NegF) implemented in RRFB 2D-router under the configuration of \(8\times 8\) NoC size, random traffic pattern, and 64-bit buffer size. The figure shows that the XYZ routing algorithm provides the best performance, then WF, and last NegF. XYZ algorithm supports buffering flexibility; there is no constraint on port selection to transfer a packet from source to destination, like WF and NegF algorithms.

The performance evaluation of VCCB router architecture (fourth case study) is as follows:

Router architectures Figure 19 illustrates the NoC performance for centralized and distributed network-based VCCB and CB router architectures under the configuration of 64 PEs capacity, 64-bit buffer size, random traffic pattern, and three VC per port for VCCB routers. The figure shows that VCCB centralized network-based provides the best performance, then VCCB distributed network-based, and the CB centralized comes last. This result is due to the VC flow control, which eases the transfer of the packets quickly. However, the centralized network-based provides better performance than the distributed network-based as there is only one hop latency from the source to the destination but in charge of area and power, which are huge in the centralized network.
3D router architectures Figure 20 presents the NoC performance for the 3D-VCCB router compared with the 3D-Daniel router under the configuration of \(8\times 8\times 4\) NoC size. We found that our proposed architecture of 3D-VCCB provides better performance than the 3D-Daniel router.

4.1.2 Maximum energy and power per buffer

Buffers contribute more than \(55\%\) of the dynamic power and \(65\%\) of the area of NoCs [59]. Eventually, the total power per buffer is directly proportional to the total number of stored flits in buffers among all routers. Figure 21a, b illustrate the average maximum NoC energy and power per buffer versus the average injection rate for NoC-based architecture under the configuration of 3D-NoC with 27 PEs and router architecture as follows:

RRFB 3D-router applying (XYZ, WF, NegF) routing algorithms .
Distributed-network based CB 3D-router.
Distributed-network based VCCB 3D-router.
3D-Daniel router.

As shown in Fig. 21a, b, the NoC performance in terms of power and energy matches previous results in metrics of throughput and latency. So, as long as the NoC throughput and latency improve, the maximum energy and power per buffer are enhanced. The figures show that the best average maximum consumed energy and power per buffer is achieved by the XYZ routing algorithm based on RRFB router architecture. Since this routing algorithm utilizes the NoC resource efficiently and balances the packets traversing the NoC. On the other hand, the worst average maximum energy and power per buffer are for CB router architecture which applies the basic routing algorithm.

4.2 Evaluation under real bench-mark traffic

As mentioned, our framework supports synthetic and application-based traffic patterns. Besides synthetic traffic evaluation, we evaluate the 3D-RRFB and 3D-CB router architectures with real bench-marks traffic patterns of the popular video application; MPGE4 and DVOPD [60] based on Communication Task Graphs (CTG). We model the mapping problem using the discrete optimization language of MiniZinc [61]. We optimize the mapping of MPGE4 and DVOPD applications to reduce the communication cost (defined by the number of hops between every two routers multiplied by the communication bandwidth between them in the graph of communication tasks). The bench-mark traffic patterns are loaded into the memory of the HW emulator for each router, either CB or RRFB architecture, then the same emulation flow, which is illustrated in Sect. 3.3.3, is proceeded to get the performance evaluation. The NoC performance in the metrics of throughput and latency are presented in Figs. 22a, b and 23a, b.

4.3 Emulation versus simulation speed-up

We have examined the emulation versus simulation speed-up and capacity performance to assess the high-performance improvements between the two environments. We found an emulation performance gain of average (40X) without sacrificing cycle accuracy.

5 Conclusion

High-performance embedded systems increase the design complexity and the demand for efficient SoC architectures. NoC-based architecture paradigm is considered a solution to deliver the required performance and meet modern embedded system constraints in power, time, and throughput. In order to assess NoC-based architecture performance, verification and emulation become a primary necessity. This paper proposed a flexible and scalable hardware-software verification framework that works in both simulation and hardware emulation utilizing UVM. Emulation and testbench acceleration are completely auto-generated and proceeded in view of a design space exploration flow for various NoC-based architecture configurations employing synthetic and application-based traffic patterns. Many experiments are implemented to assess the correctness and performance of 2D and 3D NoC-based MPSoC. Results show that our framework speeds up performance evaluation by \(40\times \) with respect to software simulators. As a future work, we aim to support an open-source framework and provide a graphical debugging tool.

Data availability

Data-sets generated during and/or analyzed during the current study are available from the first author upon reasonable request.

Notes

The length of time from the idea of a product until its availability on consumer markets.

References

Hyeonguk J, Kyuseung H, Sukho L, Jae-Jin L, Woojoo L (2019) Mmnoc: embedding memory management units into network-on-chip for lightweight embedded systems. IEEE Access 7:80011–80019
Article Google Scholar
Muhammad E, Abdelhafid B (2011) A hardwired noc infrastructure for embedded systems on fpgas. Microprocess Microsyst 35:200–216
Article Google Scholar
Sgroi M, Sheets M, Mihal A, Keutzer K, Malik S, Rabaey J, Sangiovanni-Vencentelli A (2001)Addressing the system-on-a-chip interconnect woes through communication-based design. In: Proceedings of the 38th annual Design Automation Conference. ACM, pp 667–672
Elmiligi H, Morgan AA, El-Kharashi MW, Gebali F (2007) A topology-based design methodology for networks-on-chip applications. In: Zorian Y, ElTahawy H, Ivanov A, Salem A (eds) Proceedings of the second IEEE International Design and Test Workshop (IDT 2007), Cairo, Egypt, pp 61–65
Morgan AA, Elmiligi H, El-Kharashi MW, Gebali F (2010) Multiobjective optimization for networks-on-chip architectures using genetic algorithms. In: IEEE International Symposium on Circuits and Systems, 2010. ISCAS 2010. IEEE, Paris, pp 3725–3728
Said M, Hassan H, Kim H, Khamis M (2017) A novel power reduction technique using wire multiplexing. In: 30th IEEE International System-on-Chip Conference (SOCC). IEEE, Munich, pp 149–152
Dally WJ, Towles BP (2003) Principles and practices of interconnection networks. Elsevier, San Francisco
Google Scholar
Benini L, Micheli GD (2002) Topology-based design methodology for networks-on-chip applications. Computer 35:70–78
Article Google Scholar
El-Naggar A, Medhat A, Al-Abassy B, Massoud E, Ibrahim H, amd MK, Shalaby A (2017)Performance evaluation of virtual channel ow control in centralized and distributed networks for system on chip. In: 29th International Conference on Microelectronics (ICM). IEEE, Beirut
Micheli GD, Benini L (2006) Networks on chips: technology and tools. Academic Press
Google Scholar
Wolkotte P, Holzenspies P, Smit G (2007) Fast, accurate and detailed NoC simulations. In: The IEEE/ACM Int. Symp. on Networks-on-Chip (NOCS). IEEE/ACM, Princeton, pp 323–332
Wang D (2010) An FPGA-based accelerator platform for network-on-chip simulation. Masters thesis, University of Toronto, Toronto, ON, Canada
Jian N, Becker DU, Michelogiannakis G, Balfour J, Towles B, Shaw DE, Kim J, Dally WJ (2013) A detailed and exible cycle-accurate network-on-chip simulator. In: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, Austin, pp 86–96
Genko N, Atienza D, Micheli GD, Mendias JM, Hermida R, Catthoor F (2005) A complete network-on-chip emulation framework. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), vol 1. IEEE, Munich, pp 246–251
Liu Y, Liu P, Jiang Y, Yang M, Wu K, Wang W, Yao Q (2010) Building a multi-FPGA-based emulation framework to support NoC design and verification. Int J Electron 97:1241–1262
Article Google Scholar
Mentor a Siemens Business, Veloce. https://www.mentor.com/products/fv/emulation-systems/. Accessed 1 Sept 2020
Hassoun S, Kudlugi M, Pryor D, Selvidge C (2005) A transaction-based unified architecture for simulation and emulation. IEEE Trans Very Large Scale Integr (VLSI) Syst 13:278–287
Article Google Scholar
Yun YN, Kim JB, Kim ND, Min B (2011) Beyond UVM for practical SoC verification. In: International SoC Design Conference (ISOCC). IEEE, Jeju, pp 158–162
Eissa AS, Ibrahem MA, Elmohr MA, Zamzam Y, El-Yamany A, El-Ashry S, Khamis M, Shalaby A (2017) A reusable verification environment for NoC platforms using UVM. In: IEEE EU- ROCON 2017-17th International Conference on Smart Technologies. IEEE, pp 239–242
El-Naggar A, Massoud E, Medhat A, Ibrahim H, Al-Abassy B, El-Ashry S, Khamis M, Shalaby A (2016) A narrative of UVM testbench environment for interconnection routers: a practical approach. In: 11th International Design & Test Symposium (IDT). IEEE
Khamis M, El-Ashry S, Shalaby A, AbdElsalam M, El- Kharashi MW (2018) A Configurable RISC-V for NoC-based MPSoCs: a framework for hardware emulation. In: 11th International Workshop on Network on Chip Architectures (NoCArc). IEEE, Fukuoka
Papamichael MK, Hoe JC (2012) Connect: re-examining conventional wisdom for designing NoCs in the context of FPGAs. In: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays. ACM/SIGDA, Monterey, pp 37–46
Ost L, Mello A, Palma J, Moraes F, Calazans N (2005) MAIA: a framework for networks on chip generation and verification. In: Proceedings of the 2005 Asia and South Pacific Design Automation Conference. IEEE, Shanghai, pp 49–52
Mello A, Calazans N, Moraes F (2011) Atlas—an environment for NoC generation and evaluation. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France
Xu J, Wolf W, Henkel J, Chakradhar S, Lv T (2004) A case study in networks-on-chip design for embedded video. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), vol 2. IEEE, Paris, pp 770–775
Cheung N, Parameswaran S, Henkel J (2003) INSIDE: INstruction Selection/Identification & Design Exploration for extensible processors. In: IEEE/ACM International Conference on Computer Aided Design (IC- CAD), San Jose, CA, USA
Monemi A, Tang JW, Palesi M, Marsono MN (2017) ProNoC: a low latency network-on-chip based many-core system-on-chip prototyping platform. Microprocess Microsyst 54:60–74
Article Google Scholar
Busseuil R, Barthe L, Almeida GM, Ost L, Bruguier F, Sassatelli G, Benoit P, Robert M, Torres L (2011) Open-scale: a scalable, open-source NOC-based MPSoC for design space exploration. In: International Conference on Reconfigurable Computing and FPGAs. IEEE, Cancun, pp 357–362
Zhang Q, Zhou M, Chen J, Yang H (2015) A homogeneous many-core x86 processor full system framework based on NoC. In: 4th International Conference on Computer Science and Network Technology (ICCSNT), Harbin, China
Balkind J, McKeown M, Fu Y, Nguyen T, Zhou Y, Lavrov A, Shahrad M, Fuchs A, Payne S, Liang X, Matl M, Wentzlaff D (2016) OpenPiton: an open source manycore research framework. CM SIGPLAN Notices 51:217–232
Article Google Scholar
Skalicky S, Schmidt AG, Lopez S, French M (2015) A unified hardware/ software mpsoc system construction and run-time framework. In: Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, Grenoble, pp 301–304
Prasad BMP, Parane K, Talawar B (2020) An efficient FPGA-based network-on-chip simulation framework utilizing the hard blocks. Circuits Syst Signal Process 39:5247–5271
Article Google Scholar
Ruaro M, Caimi LL, Fochi V, Moraes FG (2019) Memphis: a framework for heterogeneous many-core SoCs generation and validation. Des Autom Embed Syst 23:103–122
Article Google Scholar
Bertozzi D, Jalabert A, Murali S, Tamhankar R, Stergiou S, Benini L, Micheli GD (2005) NoC synthesis ow for customized domain specific multiprocessor systems-on-chip. IEEE Trans Parallel Distrib Syst 16:113–129
Article Google Scholar
Goossens K, Dielissen J, Radulescu A (2005) Æthereal network on chip: concepts, architectures, and implementations. IEEE Des Test v22 i5:414–421
Article Google Scholar
Siguenza-Tortosa D, Nurmi J (2002) Vhdl-based simulation environment for proteo NoC. In: High-level design validation and test workshop. IEEE, Cannes
Genko N, Atienza D, Micheli GD, Benini L (2007) Feature-NoC emulation: a tool and design ow for MPSoC. IEEE Circuits Syst Mag 7:42–51
Article Google Scholar
Krasteva YE, Criado F, Torre E, Riesgo T (2008) A fast emulation-based NoC prototyping framework. In: International conference on reconfigurable computing and FPGAs. IEEE, Cancun, pp 211–216
Sievers G, Ax J, Kucza N, Flaßkamp M, Jungeblut T, Kelly W, Porrmann M, Rückert U (2015) Evaluation of interconnect fabrics for an embedded MPSoC in 28 nm FD-SOI. In: IEEE International Symposium on Circuits and Systems, 2015. ISCAS 2015. IEEE, Lisbon, pp 1925–1928
Hübener B, Sievers G, Jungeblut T, Porrmann M, Rückert U (2014) Coreva: a configurable resource-efficient vliw processor architecture. In: 2014 12th IEEE International Conference on Embedded and Ubiquitous Computing (EUC). IEEE, Milano, pp 9–16
Carara EA, Oliveira RPD, Calazans NL, Moraes FG (2009) HeMPS—a framework for NoC-based MPSoC generation. In: IEEE International Symposium on Circuits and Systems, 2009. ISCAS
Plasma core. https://opencores.org/projects/plasma. Accessed 1 Sept 2020
Waterman A, Lee Y, Patterson DA, Asanovic K (2011) The RISC-V instruction set manual: base user-level ISA. In: EECS Department, UC Berkeley, Tech. Rep. UCB/EECS-2011-62, vol 1
Traber A, Stucki S, Zaruba F, Gautschi M, Pullini A, Benini L (2015) Pulpino: a RISC-V based single-core system. In: OpenRISC Conference. ORCONF, Geneva
Elmohr M, Eissa A, Ibrahim M, Khamis M, El-Ashry S, Shalaby A, AbdElsalam M, El Kharashi MW (2018) RVNoC: a framework for generating RISC-V NoC-based MPSoC. In: 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). IEEE, Cambridge, pp 617–621
Becker DU (2012) Efficient microarchitecture for network-on-chip routers. Ph.D. thesis, Stanford University, Standford, CA, USA
Khamis M, Said M, Shalaby A (2017) Work-in-progress: a flexible router architecture for 3D NoCs. In: IEEE Real-Time Systems Symposium (RTSS). IEEE, Paris, pp 1025–1040
Khamis M, Zaytoon A, Shalaby A (2015) Evaluating the feasibility of centralized router for network on chip. In: 27th International Conference on Microelectronics (ICM). IEEE, Casablanca, pp 238–241
Glass C, Ni L (1992) The turn model for adaptive routing. In: Computer Architecture, 1992. Proceedings. The 19th Annual International Symposium. IEEE, Gold Coast, pp 278–287
Synopsys corporation, Zebu emulation platform. http://www.synopsys.com/verification/emulation.html/. Accessed 1 Sept 2020
Hatem E, Mostafa K, Amr S, Mohammed K (2017) A novel assertions-based code coverage automatic cad tool. In: IEEE EUROCON 2017, 17th International Conference on Smart Technologies. IEEE, Ohrid, pp 277–281
Roe S, Um U, Koh H, Ahn H, Kim Y, Choi S (2018) UVM acceleration using hardware mulator at pre-silicon stage. In: Proceedings of Design and Verification Conference (DVCON), San Jose, CA
Rao N, Kumar KR, Verma V, Kumar G (2014) Using simulation acceleration to achieve 100X performance improvement with UVM based testbenches. In: Proceedings of Design and Verification Conference (DV- CON), Bangalore, India
El-Ashry S, Khamis M, Ibrahim H, Shalaby A, Abdelsalam M, El-Kharashi MW (2020) On error injection for NoC platforms: a UVM-based generic verification environment. IEEE Trans Comput Aided Des Integr Circuits Syst 39:1137–1150
Article Google Scholar
El-Ashry S, Ibrahim H, Ibrahim M, Khamis M, Shalaby A, Abdelsalam M, El-Kharashi MW (2017)On error injection for NoC platforms: a UVM-based practical case study. In: 10th International Workshop on Network on Chip Architectures (NoCArc), Cambridge, MA, USA
IEEE standard for Universal Verification Methodology language reference manual. In: IEEE Std 1800.2-2017, May 2017
Cota E, Amory Ad, Lubaszewski MS (2012) Reliability, availability and serviceability of networks-on-chip. Springer
Book MATH Google Scholar
Pande PP, Grecu C, Jones M, Ivanov A, Saleh R (2005) Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Trans Comput 54:1025–1040
Article Google Scholar
Jerger NE, Peh LS (2009) On-chip networks, synthesis lectures on computer architecture. Morgan & Claypool publishers
Google Scholar
Sahu PK, Manna K, Shah N, Chattopadhyay S (2014) Extending Kernighan-Lin partitioning heuristic for application mapping onto network on-chip. J Syst Archit 60:562–578
Article Google Scholar
Nicholas N, Peter S, Ralph B, Sebastian B, Gregory D, Guido T (2007) Minizinc: towards a standard CP modelling language. In: Principles and practice of Constraint Programming—CP, vol 4741. Springer, Berlin, pp 529–543

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

STMicroelectronics, Cairo, Egypt
Mostafa Khamis
Department of Computer and Systems Engineering, Ain Shams University, Cairo, Egypt
Sameh El-Ashry & M. Watheq El-Kharashi
Siemens Digital Industries Software, Siemens EDA, Cairo, Egypt
Mohamed AbdElsalam
Department of Electrical and Computer Engineering, Faculty of Engineering and Computer Science, University of Victoria, Victoria, V8P 5C2, Canada
M. Watheq El-Kharashi
Department of Computer Science, Benha University, Benha, Egypt
Ahmed Shalaby

Authors

Mostafa Khamis
View author publications
You can also search for this author in PubMed Google Scholar
Sameh El-Ashry
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed AbdElsalam
View author publications
You can also search for this author in PubMed Google Scholar
M. Watheq El-Kharashi
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Shalaby
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Watheq El-Kharashi.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khamis, M., El-Ashry, S., AbdElsalam, M. et al. Emulation and verification framework for MPSoC based on NoC and RISC-V. Des Autom Embed Syst 26, 133–159 (2022). https://doi.org/10.1007/s10617-022-09265-1

Download citation

Received: 17 October 2020
Accepted: 06 August 2022
Published: 14 September 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10617-022-09265-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Emulation and verification framework for MPSoC based on NoC and RISC-V

Abstract

Similar content being viewed by others

A Timed-Value Stream Based ESL Timing and Power Estimation and Simulation Framework for Heterogeneous MPSoCs

The Analyzes of Network-on-Chip Architectures Based on NOXIM Simulator

Rapid Hybrid Simulation Methods for Exploring the Design Space of Signal Processors with Dynamic and Scalable Timing Models

Explore related subjects

1 Introduction

2 Related work

3 Proposed framework

3.1 Hardware NoC-based architecture layer

3.1.1 RISC-V processor tile

3.1.2 Core network interface (CNI)

3.1.3 2D and 3D NoC router architectures

3.1.4 RRFB architecture and deadlock analysis

3.1.5 Centralized and distributed NoC-based architectures

3.2 UVM and emulation layer

3.2.1 UVM environment for emulation

3.2.2 Proposed UVM architecture for NoC emulation

3.3 Software layer

3.3.1 NoC and UVM configuration and generation

3.3.2 Traffic patterns generator and controller

3.3.3 Emulation flow and design-space exploration

3.3.4 Performance analysis

4 Experiments and results

4.1 NoC performance evaluation

4.1.1 Performance comparison

4.1.2 Maximum energy and power per buffer

4.2 Evaluation under real bench-mark traffic

4.3 Emulation versus simulation speed-up

5 Conclusion

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation