# **Performance Analysis and Implementation of Highly Reconfigurable Modified SDM-Based NoC for MPSoC Platform on Spartan6 FPGA**

#### **Y. Amar Babu and G.M.V. Prasad**

**Abstract** To meet today's demanding requirements such as low power consumption and high performance while maintaining flexibility and scalability, system-on-chip will integrate several number of processor cores and other IPs with network-on-chip. To implement NoC-based MPSoC on an FPGA, NoCs should provide guaranteed services and be run-time-reconfigurable. Current TDM- and SDM-based NoCs take more area and would not support run-time reconfiguration. This paper presents modified spatial division multiplexing-based NoC on FPGA; in this we have modified complex network interface and proposed flexible network interface and efficient SDM-based NoC. This proposed architecture explored feasibility of connection requirements dynamically from soft cores during run-time.

**Keywords** NoC ⋅ SDM ⋅ VHDL code ⋅ Microblazes

# **1 Introduction**

According to Moore's law, chip density is increasing exponentially, allowing multiple processor system-on-chip to be realized on today's FPGA. The main challenge in today's MPSoC is communication architectures among processors. The conventional way of utilizing bus architectures for inter-IP core communication has many limitations. Mainly, it is not scalable well with increasing soft cores on single FPGAs. The design flow of computationally complex and high-bandwidth

Y.A. Babu  $(\boxtimes)$ 

G.M.V. Prasad

© Springer Nature Singapore Pte Ltd. 2018

441

LBR College of Engineering, Mylavaram, Andhra Pradesh, India e-mail: amarbabuy77@gmail.com

B.V.C Institute of Technology & Science, Batlapalem, Andhra Pradesh, India e-mail: drgmvprasad@gmail.com

P.K. Sa et al. (eds.), *Progress in Intelligent Computing Techniques: Theory, Practice, and Applications*, Advances in Intelligent Systems and Computing 518, DOI 10.1007/978-981-10-3373-5\_44

on-chip communication of the MPSoC platform takes long design time and is very expensive. Network-on-chip has become the only alternative to solve these problems [\[1](#page-8-0)].

Time Division Multiplexing based Network-on-chip uses packet switching techiques to transfer data from source node to destination node. MANGO and Xpipes NoCs are good packet-based NoCs which provide best effort service [[2\]](#page-8-0). In TDM based NoC, no need to establish path from source node to destination node. But in SDM based NoC, we need to fix links between source to destination through different routers. SPIN and PNoC [[3\]](#page-8-0) are based on circuit switching method. Today's multimedia-based system demands predictable performance as node link between soft cores is tightly time-constrained. For such multi-core systems, it is compulsory to provide guaranteed throughput service before run-time. To meet these constraints, link allocation should be done in advance during design flow time.

TDM-based NoC provides guaranteed throughput where different time slots are used on the same link. One disadvantage of TDM-based NoC is that configuration of router switching needs to be updated for given time slot. This unique feature needs time slot table memory that requires huge area leading to power consumption in every router. Nostrum and AEthereal are based on TDM NoCs; those architectures have to maintain time slot tables. Spatial division multiplexing-based NoC is a best method where node physical connections, which interlink the routers, are granted to different connections. Number of wires for every link has been allocated to them. The serialized data are sent from sender on the assigned wires, and those are deserialized by the receiver for making the data format of the destination IP core. The main advantage of SDM-based NoC compared with other techniques is that SDM-based NoC removes the need of memory required for time slot tables that leads to power optimization, but area complexity is moved to the serializer and deserializer of network interface.

In this paper, we provide best methods to the above problems. We have proposed a novel design methodology and modified logic structure for network interface to handle the complexity of serializer and deserializer which is common in SDM-based NoC [[4\]](#page-8-0). A simple router with less area complexity is proposed which optimizes area at the higher cost of routing flexibility. This unique feature mainly reduces reordering of the data problem when data reach the destination network interface. Number of channels required between routers and width of each channel depends on application. How tasks in application transfer data from another. We have modeled VHDL code for SDM-based NoC. We have connected microblazes in an MPSoC using modified SDM-based NoC with Xilinx EDK, and an emulation prototype has been realized on a Xilinx Spartan6 FPGA SP605 development board (SP605). The Xilinx soft processor 32-bit RSIC microblazes have been utilized to evaluate the run-time dynamic configurability of the NoC as well as for on-chip data communication between each other nodes.

#### **2 Modified SDM-Based NoC Architecture**

The proposed architecture has been modeled as a dual-layer structure where first layer is utilized for data transfer and the second layer is used for configuring router links. The network topology used for the architecture is mesh which is best for multimedia applications. Figure 1 shows basic architecture of modified SDM-based NoC.

# *2.1 Dual-Layer Structure*

The second layer is a simple network mainly used to program the NoC as application demanded bandwidth. To program the NoC, links between routers should be fixed as soft cores which require data from other IP or soft cores. Number of programming byte required to program the NoC depends on size of mesh network (i.e.,  $2 \times 2$  to maximum  $7 \times 7$ ) that can mapped onto target FPGA. For each router, there will be one soft IP which was internally connected through network interface. Network interface in each IP serializes soft IP data from sender and deserializes at receiver side in order to receive data from source IP. All routers and soft IP with network will be placed in the first layer which is responsible to transfer data from any source NoC node to any destination NoC node through router links. Designer fixes the number of wires demanded to transfer data from source NoC node to destination NoC node as per bandwidth requirements which is programmable at design time. Figure [2](#page-3-0) shows dual layers in detail.

# *2.2 Modified Network Interface*

The modified network interface logic architecture for the spatial division multiplexing-based NoC has a special control block that will be used to control incoming 32-bit data from different channels of soft IP cores. This intelligent





<span id="page-3-0"></span>

**Fig. 2 a** Data layer, **b** control layer



**Fig. 3** Network interface

control unit replaces multiple data distributors at transmitter side and multiple data collectors at receiver side with only one data distributor and one data collector. Proposed control unit can be used for fault tolerance to minimize faults at transmitter and receiver blocks of network interface. The modified network interface has many features; one of the features is huge area saving which is a main problem in any network-on-chip architectures and fault tolerance that is very demand for multi-core system-on-chip in embedded applications. We have modified 32-bit to 1-bit serializer with intelligent control unit in network interface. Figure 3 shows network interface.

## *2.3 Router for Modified SDM-Based NoC*

We have targeted Xilinx FPGA to implement SDM-based NoC for MPSoC platform, so router architecture was modified just like architecture of Xilinx switch by avoiding unnecessary complex logic. Modified router for proposed networkon-chip has five ports which include north, south, east, west and local. Soft IP cores are connected through local port. From local port, designer can send data to adjacent router through other four ports. This feature is very unique when compared to any other network-on-chip architectures, which provides more flexibility, scalability and huge area saving. Port size is function of number of sending channels and receiving channels and width of each channel. Figure 4 shows router architecture in detail. Each side has one input port of size 8 bits, one output port of size 8 bits, one out allocated input port of size 3 bits and one out allocated input index port of size 3 bits. The size of out allocated input port size 3 bit because in this, only 5 possible direction data can be sent from any side. The size of out allocated input index port depends on size of input port and output port on each side.





#### **3 Results**

# *3.1 Simulation Results*

We have set up  $2 \times 2$  NoC architecture with proposed blocks which are modeled using VHDL and simulated using Xilinx ISE simulator ISIM. Figure shows network interface results and data sent from transmitter of network interface and data received by soft IP core into receiver of network interface. Figure also shows top-level  $2 \times 2$  NoC architecture results with all four routers (Fig. 5).

#### *3.2 Synthesis Results*

For our experimental test setup,  $2 \times 2$  modified SDM-based NoC for MPSoC platform synthesis reports is generated using Xilinx synthesis tool XST. Figure shows synthesis report which has available resource on Spartan6 FPGA used for our test setup and percentage utilization of available resources. Our report concludes that area is optimized at network interface level and router side which can be compared with any other network-on-chip architectures for area optimization.



**Fig. 5** Simulation results

| Spartan6 FPGA utilization summary |      |           |                    |  |  |  |
|-----------------------------------|------|-----------|--------------------|--|--|--|
| Logic utilization                 | Used | Available | Utilization $(\%)$ |  |  |  |
| Number of slice registers         | 6062 | 54576     | 11                 |  |  |  |
| Number of slice LUTs              | 5257 | 27288     | 19                 |  |  |  |
| Number of fully used LUT-FF pairs | 2457 | 8862      | 27                 |  |  |  |
| Number of bonded IOBs             | 288  | 296       | 97                 |  |  |  |
| Number of BUFG/BUFGCTRLs          | 2    | 16        | 12                 |  |  |  |

**Table 1** Synthesis report





# *3.3 Implementation Results*

We have integrated proposed modified SDM-based NoC for MpSoC platform with four microblaze soft Xilinx IP RSIC cores and 9 fast simplex links (FSL) using Xilinx embedded development kit (EDK) 13.3 ISE design suite. Table 1 shows Spartan6 FPGA utilization summary after implementation. Table 2 shows device utilization summary of proposed modified SDM-based NoC.

# **4 Performance Analysis**

To analyze the performance of proposed NoC architectures on Spartan6 FPGA [\[5](#page-8-0)–[9](#page-8-0)], we have selected some case studies which include advanced encryption standard (AES) algorithm, JPEG compression, JPEG2000 compression and H263 video compression standards. We have evaluated the application programs on proposed NoC architecture and other well-popular TDM-based NoCs, SDM-based NoCs, shared bus architectures and advanced extensible interconnect (AXI) architectures and compared them. Our NoC architectures show better results than other popular architectures in terms of area, power, execution time and reconfiguration time as shown in Tables [3](#page-7-0), [4](#page-7-0) and [5.](#page-7-0)

| Application      | Proposed NoC<br>architecture | SDM-based<br>NoC | TDM-based<br>NoC | PLB<br>shared<br>bus | AXI<br>architecture |
|------------------|------------------------------|------------------|------------------|----------------------|---------------------|
| <b>AES</b>       | 6000                         | 6500             | 7250             | 6250                 | 6450                |
| <b>JPEG</b>      | 6300                         | 7000             | 7780             | 6700                 | 6900                |
| <b>JPEG2000</b>  | 7000                         | 7500             | 8300             | 7400                 | 7600                |
| H <sub>263</sub> | 7900                         | 8200             | 9000             | 8200                 | 8500                |

<span id="page-7-0"></span>**Table 3** Device utilization in slices

**Table 4** Power consumption

| Application      | Proposed     | SDM-based  | TDM-based  | <b>PLB</b> | <b>AXI</b>   |
|------------------|--------------|------------|------------|------------|--------------|
|                  | NoC          | $NoC$ (mW) | $NoC$ (mW) | shared     | architecture |
|                  | architecture |            |            | bus        | (mW)         |
|                  | (mW)         |            |            | (mW)       |              |
| <b>AES</b>       | 200          | 250        | 300        | 400        | 390          |
| <b>JPEG</b>      | 236          | 290        | 320        | 410        | 400          |
| <b>JPEG2000</b>  | 300          | 360        | 390        | 490        | 480          |
| H <sub>263</sub> | 435          | 450        | 490        | 560        | 550          |

**Table 5** Execution time of application on test architectures



# **5 Conclusion**

In this paper, we have analyzed performance of various computationally complex applications and proposed a novel design and flexible network interface architecture for existing SDM-based NoC to improve performance and provide guaranteed service for multimedia applications. This architecture saves huge area and required only 5% of existing architectures. In future, multiple applications can be evaluated concurrently on modified SDM-based NoC to explore high scalability and performance.

## <span id="page-8-0"></span>**References**

- 1. International Technology Roadmap for Semiconductors: Semiconductor Industry Association, Dec 2015.
- 2. K. Goossens, J. Dielissen, and A. Radulescu, "Æthereal network on chip: concepts, architectures, and implementations," IEEE Design & Test of Computers, vol. 22, pp. 414–21, 2005.
- 3. E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny, "QNoC: QoS architecture and design process for network on chip," Journal of Systems Architecture, vol. 50, pp. 105–128, 2004.
- 4. C. Hilton and B. Nelson, "PNoC: A flexible circuit-switched NoC for FPGA-based systems," IEE Proceedings: Computers and Digital Techniques, vol. 153, pp. 181–188, 2006.
- 5. A. Leroy, D. Milojevic, D. Verkest, F. Robert, and F. Catthoor, "Concepts and implementation of spatial division multiplexing for guaranteed throughput in networks-on-chip," IEEE Transactions on Computers, vol. 57, pp. 1182–1195, 2008.
- 6. J. Rose and S. Brown, "Flexibility of interconnection structures for field programmable gate arrays," IEEE Journal of Solid-State Circuits, vol. 26, pp. 277–282, 1991.
- 7. A. Kumar, S. Fernando, Y. Ha, B. Mesman, and H. Corporaal, "Multiprocessor system-level synthesis for multiple applications on platform FPGA," in Proceedings–2007 International Conference on Field Programmable Logic and Applications, FPL, 2007, pp. 92–97.
- 8. A. Javey, J. Guo, M. Paulsson, Q. Wang, D. Mann, M. Lundstrom, and H. Dai. High-field quasiballistic transport in short carbon nanotubes. Physical Review Letters, 92(10), 2004.
- 9. V. Agarwal, M. S. Hrishikesh, S.W. Keckler, and D. Burger. Clock rate versus ipc: the end of the road for conventional microarchitectures. In ISCA'00: Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 248.259. ACM.
- 10. R. H. Havemann, and J. A. Hutchby, "High Performance Interconnects: An Integration Overview", Proceedings of the IEEE, vol. 89, No. 5, May 2001.
- 11. D. Bertozzi, A. Jalabert, S. Murali, R. Tamhankar, S. Stergiou, L. Benini, and G. De Micheli, "NoC synthesis flow for customized domain specific multiprocessor systems-on-chip," IEEE Transactions on Parallel and Distributed Systems, vol. 16, pp. 113–129, 2005.
- 12. T. Bjerregaard and J. Sparso, "A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip," in Proceedings -Design, Automation and Test in Europe, DATE'05, 2005, pp. 1226–1231.
- 13. D. Castells-Rufas, J. Joven, and J. Carrabina, "A validation and performance evaluation tool for ProtoNoC," in 2006 International Symposium on System-on-Chip, SOC, 2006.
- 14. A. Lines, "Asynchronous interconnect for synchronous SoC design," IEEE Micro, vol. 24, pp. 32–41, 2004.
- 15. M. Millberg, E. Nilsson, R. Thid, and A. Jantsch, "Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip," in Proceedings–Design, Automation and Test in Europe Conference and Exhibition, 2004, pp. 890–895.