Abstract
In the current many-core architectures, network-on-chips (NoCs) have been efficiently utilized as communication backbones for enabling massive parallelism and high degree of integration on a chip. In spite of the advantages of conventional NoCs, wired multi-hop links impose limitations on their performance by long delay and much power consumption especially in large systems. To overcome these limitations, different solutions such as using wireless interconnections have been proposed. Utilizing long-range, high bandwidth and low power wireless links can lead to solve the problems corresponding to wired links. Meanwhile, the grid-like mesh is the most stable topology in conventional NoC designs. That is why most of the wireless network-on-chip (WNoC) architectures have been designed based on this topology. The goals of this article are to challenge mesh topology and to demonstrate the efficiency of honeycomb-based WNoC architectures. In this article, we propose HoneyWiN, hybrid wired/wireless NoC architecture with honeycomb topology. Also, a partition-based XYZ-planar routing algorithm for energy conservation is proposed. In order to demonstrate the advantages of the proposed architecture, first, an analytical comparison of HoneyWiN with a mesh-based WNoC, as the baseline architecture, is carried out. In order to compare the proposed architecture, we implement our partition-based routing algorithm in the form of 2-axes coordinate system in the baseline architecture. Simulation results show that HoneyWiN reduces about 17% of energy consumption while increases the throughput by 10% compared to the mesh-based WNoC. Then, HoneyWiN is compared with four state-of-the-art mesh-based NoC architectures. In all of the evaluations, HoneyWiN provides higher performance in term of delay, throughput and energy consumption. Overall, the results indicate that HoneyWiN is very effective in improving throughput, increasing speed and reducing energy consumption.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
During the last decade, the communication infrastructure of many-core system-on-chips (MCSoCs) has gradually changed from traditional point-to-point and bus interconnections to network-on-chip (NoC). Nowadays, NoC-based systems that are capable of accommodating hundreds of processing elements are commercially available [7, 46], but, in large systems, the multi-hop nature of metal-based interconnections has become a bottleneck for improving performance and energy consumption [31, 45]. This has motivated researchers to seek alternative architectures such as wireless NoC (WNoC) [1, 18, 19, 21, 24, 38, 44, 52]. The key idea of WNoC is to adopt express communication infrastructure in the form of hybrid wired/wireless routers and links in order to reduce transmission delay with reasonable energy consumption while providing high bandwidth and throughput.
Topology that defines how nodes are placed and connected greatly, affects the performance, energy consumption, and circuit area of NoC architectures. The mesh topology has been mostly used in designing existing on-chip networks because its simplicity and regularity, and, the fact that processor tiles are traditionally square or rectangular, [20, 39, 43, 51]. However, efficiently mapping applications can be a challenge when it needs to communicate between processors that are not neighboring in a mesh-based structure [54]. To address this problem, a wired NoC with honeycomb topology has been presented in [58]. It has been shown that this topology provides higher performance compared with a wired NoC based on mesh topology. The comparison results of two wired NoC architectures, one with mesh and the other with honeycomb topology have shown that a wired NoC with honeycomb topology outperforms a wired mesh-based NoC by at least 25.9%, 54.2% and 30.0% reduction in power consumption, area and delay, respectively [57]. The effectiveness of honeycomb topology in mapping algorithms on NoC architectures has been demonstrated in [54]. Besides that, in [55], a configurable honeycomb-based NoC has been designed. These wired NoCs show that honeycomb topology provides high performance and consumes less energy. Moreover, the structure of honeycomb-based NoCs is regular and easy to extend.
This matter motivated us to explore and analysis performance characteristics of a NoC architecture employing the both honeycomb topology and wireless interconnections techniques. To explore this viewpoint, we have proposed HoneyWiN, a novel WNoC architecture with honeycomb topology with its partition-based routing algorithm. In this article, HoneyWiN is analyzed and assessed from different points of view. As it was expected, HoneyWiN provides higher performance in terms of throughput, power consumption, delay, and even area compared to the mesh-based WNoC. Besides the NoC topology, the routing algorithm is an important and definitive factor that affects the performance; especially it can improve energy-efficiency and throughput. The routing algorithm of HoneyWiN is partition-based and XYZ-planar means three directions in 2D topology. In order to compare the proposed architecture, we implement our partition-based routing algorithm in the form of 2-axes coordinate system in the baseline architecture. Furthermore, in order to demonstrate the effectiveness of HoneyWiN, this architecture has been compared with four state-of-the-art mesh-based NoC architectures [37, 38, 44, 52].
The rest of this article is organized as follows. The background, the related studies, and the motivation are presented in Sect. 2. HoneyWiN architecture and its energy-efficient partition-based routing algorithm are expounded by Sect. 3. In Sect. 4, the simulation methodology and experimental setups are described. Then, Sect. 5 presents the simulation results and performance evaluations of HoneyWiN compared with five mesh-based NoC architectures. Finally, the article is concluded by Sect. 6.
2 Background and related work
In wired NoC architectures, metal links provide cheap and easy communication media, but they consume much energy, especially as the number of cores, routers and links increases. It has been predicted that on-chip interconnections consume up to 80% of chip energy [34]. In WNoC architectures, wireless links are adapted into NoCs along with wired links, to increase performance and reduce energy consumption. In [18], various challenges and emerging solutions regarding the design of an efficient and reliable WNoC architecture has been presented. On-chip wireless interconnections were used for the first time to distribute global clock signals [23]. In [16], an ultra-wideband WNoC that uses hybrid wired/wireless signals to realize a synchronous and distributed medium access control protocol has been implemented. A hybrid wireless NoC for heterogeneous CPU-GPU architectures has been presented in [24]. Moreover, a congestion-aware platform, named CAP-W [44], has been designed for WNoC in order to reduce congestion in the network and especially over wireless routers. In [52], the authors quantified the latency, energy dissipation, and thermal profiles of millimeter-wave small-world WNoC (mSWNoC) architectures by incorporating irregular network routing strategies. DWiNoC [38] has been presented as an interference-aware WR placement and routing algorithm for WNoC architecture. DWiNoC architecture enables point-to-point links between transceivers and hence multiple wireless links can operate at the same time.
Several architectures have been introduced to obtain an optimal trade-off between area overhead and expected communication bandwidth [1, 13, 17, 48, 59, 61]. In addition, in recent years, the idea of wireless 3D NoCs has been proposed in order to design many-core chips with greater performance and lower energy consumption. This technology is the combination of different dies that are stacked on each other. The authors of [37] proposed a deadlock-free routing algorithm for wireless 3D NoCs.
In order to transmit data across the chip in WNoC architectures, various approaches have been introduced. In Table 1, different wireless on-chip transceivers are compared.
The metal zigzag antennas [36] utilize millimeter wave (mm-Wave) as part of the electro-magnetic (EM) spectrum to operate in tens of gigahertz frequency. By employing mm-Wave approach in 40 nm CMOS technology, the data rate of 11 Gbps at 56 GHz frequency with bit error rate (BER) of \(10^{-11}\) has been reported [32]. By designing an on-off keying (OOK) transmitter in 65 nm CMOS, the data rate of 16 Gbps at 60 GHz frequency has been achieved [60]. In RF-I approach, EM waves travel via transmission line to exchange data between long-distance on-chip cores. One of the first implementation of RF-I has been proposed in 90 nm CMOS technology with the data rate of 5 Gbps [14]. Although the signals in RF-I can propagate in light speed, RF-I suffers from crosstalk and scalability issues [31].
On the other hand, carbon nano-tube (CNT) technology operates in terahertz/optical frequency range while reduces the size of antennas. In [6, 26], a fundamental property analysis of the CNT antennas including input impedance, current distribution, and radiation pattern has been provided. Moreover, graphene antennas also operate in terahertz frequency and provide low energy dissipation and less area overhead [47]. But these miniaturized antennas suffer from different challenges during implementation. For example, in nano-scale communication of the terahertz band, molecular absorption causes path loss and high noise [5]. To address this issue, a recent research has been proposed [50], by the way, more research efforts are required to fully design and measure physical properties of the graphene antennas.
The existing mm-Wave WNoCs suffers from not only free-space spreading loss, but also, molecular absorption attenuation. These problems are increased at a high frequency band, because the reliability of the system is reduced [3]. Surface wave interconnection (SWI) is another approach in which a 2D waveguide medium is used as the wireless communication layer to propagate surface wave signals. To physically implement this medium, a dielectric coated metal layer is used. Comparing with free-space signal propagation environment, energy dissipation can be substantially reduced in SWI because of the signal propagation in 2D communication fabric. However, the SWI-based architectures offer BER of less than \(10^{-14}\) which is similar to BER of wired communication [30]. In [3], the authors proposed a surface wave communication fabric that is able to match the reliability of traditional wired NoCs. They employ a carefully designed transducer and commercially available thin metal conductor coated with a low-cost dielectric material to generate surface wave signals with improved transmission gain.
The above efforts demonstrate how promising WNoC designs are to be employed as the backbone of future MCSoCs. However, since wireless routers (WRs) are more energy hungry than conventional metal-based wired routers (CRs), new proposals are required to address the trade-off between energy consumption and performance.
From the architectural point of view, there is a common wisdom that conventional grid-like mesh-based systems have better performance and reasonable energy consumption in comparison with other 2D topologies. That is why almost all the emerging WNoC architectures also have been focused on mesh topology [20, 39, 43, 51]. But some researchers have designed wired NoCs with honeycomb topology and reported better performance results of honeycomb-based NoCs compared with mesh-based NoCs [2, 54, 55, 57, 58]. Motivated by the fact that applications often have largely localized communication patterns, the authors of [54] proposed an eight-neighbor mesh topology and also a six-neighbor topology with honeycomb-shaped processor tiles, both increase local connectivity while keep much of the simplicity and regularity of traditional mesh-based architectures.
Table 2 compares a 2D honeycomb-based topology with the common 2D topologies. It states the honeycomb-based topology provides fine cost regards to degree. Generally, a topology is evaluated in terms of four parameters [55] as follows:
-
Degree means the number of I/O channels per node. This parameter affects the amount of hardware resource utilization and area.
-
Network diameter indicates the maximum value of all shortest paths between any two nodes of the network. Generally, degree and diameter are opposed to each other; a small degree means a large network diameter.
-
Network cost is calculated by the product of the degree and the network diameter.
-
Bisection width is the minimum number of wires that must be cut when the network is divided into two equal sets of nodes.
These preliminary results not only challenge the efficiency of the mesh-based WNoC, but also motivate us to seek an alternative topology in emerging hybrid on-chip architectures. In [2], HoneyWiN architecture as the first WNoC architecture with honeycomb topology has been proposed. Continuing that design, in this article, we present a specific partition-based routing algorithm that uses a planar 3-axes coordinate system for HoneyWiN. One goal of this article is to study problems and challenges of the mesh topology in WNoC architectures and demonstrate how HoneyWiN mitigate these challenges. Investigating the role of reconfigurable partitions (i.e., homogeneous/heterogeneous and complete/partial partitioning) in HoneyWiN is another goal of this article. The experimental results are expanded for deeply comparison of HoneyWiN architecture with a mesh-based WNoC and a mesh-based NoC, and also with four state-of-the-art mesh-based NoC architectures.
3 HoneyWiN architecture
HoneyWiN consists of a wired network with 5-port CRs. Each CR connects to its corresponding core and at most three adjacent CRs via wired links. Meanwhile, another port is forecasted for possible connection to a WR. On top of the wired network, at the second level, a wireless network is adopted by WRs (see Fig. 1). Each WR is a multi-port router equipped with a transceiver that is capable of both wired and wireless communications. With high regularity, symmetry and scalability, the whole network has the potential of improving the communication efficiency and reducing energy consumption.
The transmission interfaces of WRs operate as bridges between the wireless and the wired communication layers. Each wireless transmission interface is responsible for transmitting and receiving wireless signals. This component works closely with the router logic, a virtual channel allocator, an arbiter and a crossbar switch for efficient wireless signal transmission. CRs, the routers without the wireless transmission interfaces, must forward packets to the nearest wireless nodes in a multi-hop manner before they can finally exploit the single-hop wireless links to remote destinations. Moreover, if the destination node is not a wireless node, the packet is transmitted to the nearest wireless node and then is sent to the corresponding CR [41].
The structure of CRs and WRs of HoneyWiN are different from the routers of the mesh-based WNoC. Each CR of the mesh-based WNoC has four global ports for wired connecting to other CRs. It also includes one local port for connecting to its nearest IP and another port for connecting to a WR, if it is necessary. Basically, a CR of the mesh-based WNoC utilizes more amounts of hardware resources and hence, consumes more dynamic and static energy than a CR of HoneyWiN architecture.
3.1 Partitioning
Partitioning methods of a NoC before applying a routing algorithm, divide the network into several logical partitions and assign destinations to different sets. Smart partitioning methods must balance the sets and reduce the path length within each partition. The main effect of partitioning methods is reducing the propagation delay of network components [22]. Partitioning methods especially in hybrid wired/wireless NoCs have been investigated in several works [4, 15, 22]. One of the benefits of partitioning of WNoC architectures is that intra-subnet communications are handled through wired paths, while for inter-subnets communications a function of hop counts and congestion is used in order to select the efficient path [42].
It is important to select an efficient routing algorithm as it must be able to leverage the full potential of the topology. After partitioning the network, bidirectional turn restrictions within partitions are determined. As partitions are independent, it is possible to place turn restrictions within a partition, independently from other partitions. Different partitions may lead to different trade-offs in terms of performance, energy consumption, and even area overhead.
In HoneyWiN, partitioning can be viewed and examined from different viewpoints. One way to see partitioning is based on the number of cores within each subnet that can be homogeneous or heterogeneous. In homogeneous structure, all the subnets have equal number of cores, but in heterogeneous partitioning, each subnet can have different numbers of cores. Hence, homogeneous partitioning is suitable for the networks with almost uniformly packet distributions, and heterogeneous partitioning is proper for the networks with high communication demand for some specific cores (i.e., non-uniform traffic patterns). Another way to see partitioning is based on the participant cores in the process of subdividing that can be complete (i.e., all the cores participate in partitioning) or partial (i.e., some of the cores are involved in the process). Complete partitioning can be utilized in the networks with high traffic rates, whereas partial partitioning is beneficial for medium and low traffic rates.
For example, Fig. 1a illustrates a 24-core partial homogeneous HoneyWiN with three WRs. Figure 1b, c show two complete homogeneous partitioning by dividing the network into four and six partitions, respectively. As another instance, Fig. 2a depicts a complete homogeneous partitioning of HoneyWiN architectures with 54 cores. In this example, all the subnets are equipped with similar WRs. Figure 2b illustrates a partial heterogeneous version of the same architecture. In this case, the middle subnet with more number of cores requires a stronger WR with more ports.
For increasing network efficiency, HoneyWiN is partitioned in a way that any core within a partition has the minimum hop-count toward the WR of the same partition. For borderline cases that one core may have the same hop-count from two or more WRs, the core will be randomly assigned to one of the possible partitions [44].
As the number of cores increasing, the number of WRs would be increased, and various partitioning structures can be organized. Considering only the delay reduction for evaluation, it is obvious that the more number of WRs, the less the delay. On the other hand, the results show that using more than 10 WRs will significantly increase power consumption of WNoCs [43]. By the way, there are many different configurations for a specific number of cores with various numbers of WRs. For example, HoneyWiN architecture with 256 cores can be developed with different configurations of homogeneous or heterogeneous, partial or complete partitioning, based on the numbers of WRs. Figure 3 illustrates an instance configuration of HoneyWiN with 256 cores and nine WRs; the WRs are placed at the second level of the NoC and form heterogeneous partitions. The optimal decision about the number of cores, number of WRs, partitioning strategy and WR placement algorithms are directly related to the application, and consequently, task mapping and link allocation policies. Task mapping procedure is out of the scope of this article.
3.2 Routing
The proposed routing algorithm of HoneyWiN is based on XYZ-planar algorithm, according to the planar 3-axes coordinate system introduced in [49]. The X, Y and Z axes start from the center of the network and divide the topology into three regions. Packet traversal may happen via the wired network or the combination of wired and wireless networks. In each step, if the corresponding CRs of both source and destination cores are connected to two different WRs, express wireless communication links are used. In this case, long multi-hop wired paths will be avoided. Otherwise, when both source and destination CRs are connected to the same WR, a wired path is established.
In order to prevent deadlock in wired network, one out of six possible turns will be disabled in each clockwise and non-clockwise dependent cycle. Also, to alleviate dead-lock when packets are routed via both wired and wireless networks, in each input port of the routers two sets of virtual channels (VCs) are used [43]. One is for packet transmission from CR to WR, while the other one is for packet transmission from WR to CR or from CR to CR. The routing algorithm of HoneyWiN architecture is shown in Fig. 4.
An effective adaptive routing algorithm is able to minimize path congestion through load balancing. To predict temporal network congestion, routing algorithms based on ant colony optimization (ACO) have the power of identifying the near-future non-congested path to a desired target according to historical network information [12]. Therefore, the ACO strategy has been employed in partition-based XYZ-planar routing algorithm of HoneyWiN.
Figure 5 shows different routing examples of HoneyWiN architecture with 24 cores. As can be seen in Fig. 5a, the intra-partition communications will be done via wired links. On the other hand, for inter-partition communications, when the destination router is connected to a WR, the routing path will use both wired and wireless networks as shown in Fig. 5b, c. Otherwise, as depicted in Fig. 5d, only the wired network will be utilized. In other words, in order to prevent over utilization of WRs, only the packets in which their destination routers are connected to a WR are allowed to use the wireless network. Figure 6 illustrates the flowchart of HoneyWiN routing algorithm.
4 Methodology and experimental setups
For obtaining results of simulation and evaluations of the proposed architecture, Noxim [10, 11], a SystemC-based cycle-accurate NoC simulator is used. Besides that, for tuning the NoC simulator, the energy analysis has been exploited by Orion 2.0 [29]. The simulation setup is shown in Table 3.
In addition to synthetic traffic patterns provided by Noxim, for more analysis of HoneyWiN real application are also considered. To do so, M5 simulator [8] is employed to acquire memory access traces from full system running of the applications of SPLASH-2 and PARSEC v2.1 benchmark suite. These traces are used to drive Noxim. For obtaining the traces from real applications, 64 two-wide superscalar out-of-order cores with private 32 KB L1 instruction and data cache, and a shared 16 MB L2 cache are employed. For comparing HoneyWiN with four state-of-the-art mesh-based NoC architectures [37, 38, 44, 52], the simulation parameters in our simulator, have been modified according to the scenario utilized in each of the intended references. Therefore, it is possible to compare our obtained results with their reported results.
5 Results
In this section, the experimental results for evaluation of HoneyWiN are presented. The performance analysis of HoneyWiN is compared with the traditional mesh-based WNoC, as the baseline. For evaluation, we implement our partition-based routing algorithm in the form of 2-axes coordinate system in the baseline architecture. Additionally, HoneyWiN is compared with four NoC architectures that recently presented. The architectures are evaluated from different performance metrics with different injection rates on various packet distributions in the form of synthetic and real traffic patterns.
5.1 Evaluation of 24-core HoneyWiN
In this section, HoneyWiN architecture with 24 cores is evaluated. In Fig. 7, the comparison results of network throughput for HoneyWiN using 24 cores and complete homogeneous partitioning with the mesh-based WNoC architecture is illustrated. This analysis is performed with four, six, and eight WRs, in 0.1 injection rate. As can be seen, HoneyWiN has higher or equal network throughput in most of the traffic patterns in comparison with the mesh-based WNoC. With Uniform traffic pattern, the least throughput is obtained and is again the same for both HoneyWiN and the mesh-based WNoC. In Bit-Reversal traffic pattern, throughput is higher than the other distributions and is the same for both architectures. It is noticeable that Bit-Reversal usually is among the worst cases from the viewpoint of its realization in common wired networks. Off course, the obtained results of analyzing wireless and 3D NoCs architectures (for instance the researches of [15, 20, 35, 56]) show that this traffic pattern is not the worst case, even it can be one of the best ones. The reason is in this traffic pattern; there are irregular and various paths for packet routing in WNoCs and 3D NoCs architectures.
Figure 8 depicts comparison of total energy consumption of HoneyWiN and the mesh-based WNoC with 24 cores. HoneyWiN has less energy consumption than the mesh-based WNoC for all the synthetic traffic patterns with four, six or eight WRs. Furthermore, according to Figs. 7 and 8, it seems that less number of WRs is more efficient in terms of both performance and energy consumption.
As another experiment, Fig. 9 shows the average delay ratio of HoneyWiN over the mesh-based WNoC for different injection rates. The average has been obtained from the results of five executions. As the figure illustrated, the delay ratio is less than one for most of the traffic patterns. In Uniform traffic pattern, notwithstanding the injection rate and the number of wireless routers, the delay is high. With the other selected traffic patterns, the delay ratio (HoneyWiN over the mesh-based WNoC) is equal or less than one. As a result, implementing a WNoC in honeycomb form provides faster architecture than the mesh-based WNoC; but the speed improvement is not considerable. Also, generally HoneyWiN performs better than the mesh-based WNoC in high injection rates. This means that HoneyWiN operates more efficient in systems with frequent communications.
In Figs. 10 and 11, a 24-core HoneyWiN is evaluated for different number of WRs considering the synthetic traffic patterns and the obtained traces of real application from PARSEC benchmark suite. As the figure shows, the 24-core HoneyWiN with four WRs provides better performance than in the cases of using six and eight WRs. In addition, with four WRs, the 24-core HoneyWiN has less logic utilization, smaller area and less power consumption.
Figure 12 compares a HoneyWiN with the mesh-based WNoC with 24 cores, three and four WRs in 0.05 and 0.25 injection rates of Transpose 2 traffic pattern. As the results show, HoneyWiN provides better performance in terms of power consumption, received flits, throughput and delay than the mesh-based WNoC. Figure 12a illustrates HoneyWiN with four WRs consumes less power than the other configurations of this analysis. Figure 12b states the flit ratio over the ideal case. As the figure shows, HoneyWiN with three WRs provides better received flit ratio because of the uniformly WRs placement in the NoC. Thus, Fig. 12c, d show that HoneyWiN provides better throughput in terms of the overall flit per cycle (flit/cycle) and flit per cycle for each core (flit/cycle/IP) especially with three WRs.
As it is also previously demonstrated, the propagation delay of HoneyWiN is a little less than the mesh-based WNoC (see Fig. 12e). Figure 12f compares the area ratio of the four NoC configurations: the mesh-based WNoC, which is considered as the baseline and its area, is assumed to be equal to one, HoneyWiN, a wired mesh-based NoC, and a wired honeycomb-based NoC, all consist of 24 cores. The results of area are estimated according to the number of WRs and CRs, number of ports and buffers of different routers.
5.2 Evaluation of 54-core HoneyWiN
In this section, HoneyWiN architecture with 54 cores is evaluated with two configurations: first, a complete homogeneous HoneyWiN with six WRs, and second, a partial heterogeneous HoneyWiN with seven WRs. The obtained results of network throughput and energy consumption comparison are shown in Figs. 13 and 14. As Fig. 13 illustrates, in average, HoneyWiN provides higher network throughput than the mesh-based WNoC. Moreover, for all the traffic patterns, HoneyWiN architecture consumes much less energy than the mesh-based WNoC in both configurations with six or seven WRs (see Fig. 14). Besides, heterogeneous partitioning can provide more flexibility for application-specific architectures. As the figure show, with seven WRs that one of them is in the center of HoneyWiN, the higher throughput than utilizing six WRs can be achieved.
5.3 Evaluation of 64-core HoneyWiN
In this section, a configuration of HoneyWiN with 64 cores is compared with two other WNoC architectures [38, 52], and also with a 3D NoC architecture [37]. According to the available results of the intended architecture, for this evaluation, some applications of PARSEC and SPLASH benchmark suites have been considered.
In Fig. 15, the energy consumption of HoneyWiN and DWiNoC [38] is compared. These architectures have 64 cores and six WRs. As the figure shows, for all the benchmark applications, HoneyWiN considerably consumes less energy than DWiNoC.
Figure 16 shows the results of comparing HoneyWiN with mSWNoC [52] and with the traditional mesh-based WNoC in the terms of energy-delay product (EDP). For this comparison, the MROOTS strategy of mSWNoC is considered. Both architectures consist of 64 cores and 12 WRs. As it is observable, HoneyWiN outperforms both architectures in terms of energy consumption and propagation delay.
In Fig. 17, the delay of HoneyWiN is compared with two 3D WNoCs referred as LASH and FIT [37]. The results are for RANDOM traffic pattern with 0.3 injection rate. The architectures are equipped with four WRS, 8-flit packets and 16-flit buffers according to [37]. It is noticeable that buffers with depth of 16 flits cause increasing the delay cycles. According to our analysis, four flits are enough for each port buffers. By the way, in this situation, as the figure shows, HoneyWiN provides faster architecture than the LASH and FIT in the different configurations.
5.4 Evaluation of 256-core HoneyWiN
In this section, first, a configuration of HoneyWiN with 256 cores and nine WRs is evaluated and compared with the mesh-based WNoC in terms of network throughput and energy consumption. Then, propagation delay of HoneyWiN is compared with CAP-W architecture [44], both with 256 cores and eight WRs.
As Fig. 18 illustrates, HoneyWiN provides more energy-efficient architecture. HoneyWiN consumes in average 30.6% less energy than the mesh-based WNoC. Moreover, HoneyWiN provides in average, 8% higher network throughput. Off course, as Fig. 19 shows, for two traffic patterns, HoneyWiN provides less network throughput than the mesh-based WNoC. Figure 20 shows the delay of HoneyWiN comparing with CAP-W and a mesh-based NoC [44]. As it can be seen in the figure, HoneyWiN outperforms CAP-W in the terms of propagation delay in average 14%. Moreover, HoneyWiN is about two times faster than the mesh-based NoC.
6 Conclusion
In this article, first it was shown that the mesh topology does not always provide the best performance and efficient energy consumption in WNoC architectures. Then, we analyzed HoneyWiN, the proposed honeycomb-based WNoC, with its specific partition-based routing algorithm. The concepts of reconfigurable homogeneous/heterogeneous and complete/partial partitioning were also discussed. Finally, experimental results depicted HoneyWiN consumes less energy (i.e., on average 17%) and improves the network throughput (i.e., on average 10%) in comparison with the mesh-based WNoC. Moreover, comparing HoneyWiN with four state-of-the-art mesh-based NoC architectures demonstrated that HoneyWiN provides higher performance in term of delay, throughput and energy consumption. Overall, the results indicate that HoneyWiN is very effective in improving throughput, increasing speed and reducing energy consumption.
References
Abadal S, Cabellos-Aparicio A, Alarcon E, Torrellas J (2016) WiSync: an architecture for fast synchronization through on-chip wireless communication. In: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp 3–17
Afsharmazayejani R, Yazdanpanah F, Rezaei A, Alaei M, Daneshtalab M (2018) Honeywin: novel honeycomb-based wireless NoC architecture in many-core era. In: 14th International Symposium on Applied Reconfigurable Computing (ARC), Santorini, Greece
Agyeman MO, Vien QT, Ahmadinia A, Yakovlev A, Tong KF, Mak T (2017) A resilient 2-d waveguide communication fabric for hybrid wired-wireless NoC design. IEEE Trans Parallel Distrib Syst 28(2):359–373
Arun M, Jisha P, Jose J (2016) A novel energy efficient multicasting approach for mesh NoCs. Proc Comput Sci Int Conf Adv Comput Commun 93:283–291
Balasubramaniam S, Kangasharju J (2013) Realizing the internet of nano things: challenges, solutions, and applications. Computer 46(2):62–68
Bengio EA, Senic D, Taylor LW, Tsentalovich DE, Chen P, Holloway CL, Babakhani A, Long CJ, Novotny DR, Booth JC, Orloff ND, Pasquali M (2017) High efficiency carbon nanotube thread antennas. Appl Phys Lett 111(16). https://doi.org/10.1063/1.4991822
Benini L, Micheli GD (2002) Networks on chips: a new SoC paradigm. IEEE Comput 35(1):70–78
Binkert NL, Dreslinski RG, Hsu LR, Lim KT, Saidi AG, Reinhardt SK (2006) The M5 simulator: modeling networked systems. IEEE Micro 26(4):52–60
Byeon CW, Yoon CH, Park CS (2013) A 67-mW 10.7-Gb/s 60-GHz OOK CMOS transceiver for short-range wireless communications. IEEE Trans Microw Theory Tech 61(9):3391–3401
Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2015) Noxim: an open, extensible and cycle-accurate network on chip simulator. In: IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp 162–163
Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2016) Cycle-accurate network on chip simulation with Noxim. ACM Trans Model Comput Simul 27(1):1–25. https://doi.org/10.1145/2953878
Chang E, Wu A (2017) Overview of high-efficiency ant colony optimization (aco)-based adaptive routings for traffic balancing in network-on-chip systems. In: International Conference on ASIC (ASICON), pp 80–83
Chang MCF, Socher E, Tam SW, Cong J, Reinman G (2008) RF interconnects for communications on-chip. In: International Symposium on Physical Design, ISPD ’08, pp 78–83
Chang MF, Cong J, Kaplan A, Naik M, Reinman G, SoCher E, Tam SW (2008) CMP network-on-chip overlaid with multi-band RF-interconnect. In: International Symposium on High Performance Computer Architecture (HPCA), pp 191–202
Dai J, Jiang X, Watanabe T (2017) An adaptive routing algorithm based on network partitioning for 3D network-on-chip. In: International Conference on Computer, Information and Telecommunication Systems (CITS), pp 229–233
Deb S, Chang K, Cosic M, Ganguly A, Pande PP, Heo D, Belzer B (2012) Cmos compatible many-core NoC architectures with multi-channel millimeter-wave wireless links. In: Great Lakes Symposium on VLSI, GLSVLSI ’12, pp 165–170
Deb S, Chang K, Yu X, Sah SP, Cosic M, Ganguly A, Pande PP, Belzer B, Heo D (2013) Design of an energy-efficient cmos-compatible NoC architecture with millimeter-wave wireless interconnects. IEEE Trans Comput 62(12):2382–2396
Deb S, Ganguly A, Pande PP, Belzer B, Heo D (2012) Wireless NoC as interconnection backbone for multicore chips: promises and challenges. IEEE J Emerg Sel Top Circuits Syst 2(2):228–239
Dehghani A, Jamshidi K (2015) A fault-tolerant hierarchical hybrid mesh-based wireless network-on-chip architecture for multicore platforms. J Supercomput 71(8):3116–3148
DiTomaso D, Kodi A, Kaya S, Matolak D (2011) iwise: inter-router wireless scalable express channels for network-on-chips (NoCs) architecture. In: Annual Symposium on High Performance Interconnects (HOTI), pp 11–18
Duraisamy K, Xue Y, Bogdan P, Pande PP (2017) Multicast-aware high-performance wireless network-on-chip architectures. IEEE Trans Very Large Scale Integr VLSI Syst 25(3):1126–1139
Ebrahimi M, Daneshtalab M, Liljeberg P, Plosila J, Flich J, Tenhunen H (2014) Path-based partitioning methods for 3d networks-on-chip with minimal adaptive routing. IEEE Trans Comput 63(3):718–733
Floyd BA, Guo X, Caserta J, Dickson T, Hung CM, Kim K, Kenneth KO (2002) Wireless interconnects for clock distribution. In: 8th ACM/IEEE International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, TAU ’02, pp 105–108
Gade SH, Deb S (2017) Hywin: hybrid wireless NoC with sandboxed sub-networks for CPU/GPU architectures. IEEE Trans Comput 66(7):1145–1158
Ganguly A, Chang K, Deb S, Pande PP, Belzer B, Teuscher C (2011) Scalable hybrid wireless network-on-chip architectures for multicore systems. IEEE Trans Comput 60(10):1485–1502
Hanson GW (2005) Fundamental transmitting properties of carbon nanotube antennas. IEEE Trans Antennas Propag 53(11):3426–3435
Hu J, Xu J, Huang M, Wu H (2013) A 25-Gbps 8-ps/mm transmission line based interconnect for on-chip communications in multi-core chips. In: IEEE International Microwave Symposium Digest (IMS), pp 1–4
Ito H, Kimura M, Miyashita K, Ishii T, Okada K, Masu K (2008) A bidirectional- and multi-drop-transmission-line interconnect for multipoint-to-multipoint on-chip communications. IEEE J Solid-State Circuits 43(4):1020–1029
Kahng AB, Li B, Peh LS, Samadi K (2012) ORION 2.0: a power-area simulator for interconnection networks. IEEE Trans Very Large Scale Integr VLSI Syst 20(1):191–196
Karkar A, Al-Dujaily R, Yakovlev A., Tong K, Mak T (2012) Surface wave communication system for on-chip and off-chip interconnects. In: International Workshop on Network on Chip Architectures (NoCArc), pp 11–16
Karkar A, Mak T, Tong KF, Yakovlev A (2016) A survey of emerging interconnects for on-chip efficient multicast and broadcast in many-cores. IEEE Circuits Syst Mag 16(1):58–72
Kawasaki K, Akiyama Y, Komori K, Uno M, Takeuchi H, Itagaki T, Hino Y, Kawasaki Y, Ito K, Hajimiri A (2010) A millimeter-wave intra-connect solution. IEEE J Solid-State Circuits 45(12):2655–2666
Liang Y, Yu H, Zhao J, Yang, W, Wang Y (2015) An energy efficient and low cross-talk CMOS sub-THz I/O with surface-wave modulator and interconnect. In: IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), pp 110–115
Magen N, Kolodny A, Weiser U, Shamir N (2004) Interconnect-power dissipation in a microprocessor. In: International Workshop on System Level Interconnect Prediction, SLIP ’04, pp 7–13
Majumdar A, Dash R, Pangracious V, Turuk AK (2017) An efficient multi-objective thermal aware routing algorithm 3d network-on-chips. In: 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp 1–4
Mineo A, Palesi M, Ascia G, Catania V (2014) An adaptive transmitting power technique for energy efficient mm-wave wireless NoCs. In: Design, automation and test in Europe (DATE). https://doi.org/10.7873/DATE.2014.284
Mohseni Z, Reshadi M (2018) A deadlock-free routing algorithm for irregular 3d network-on-chips with wireless links. J Supercomput 74(2):953–969
Mondal HK, Gade SH, Shamim MS, Deb S, Ganguly A (2017) Interference-aware wireless network-on-chip architecture using directional antennas. IEEE Trans Multi-Scale Comput Syst 3(3):193–205
More A, Taskin B (2012) A unified design methodology for a hybrid wireless 2-D NoC. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp 640–643
Nakajima K, Maruyama A, Kohtani M, Sugiura T, Otobe E, Lee J, Cho S, Kwak K, Lee J, Yoshimasu T, Fujishima M (2014) 23 Gbps 9.4 pJ/bit 80/100 GHz band CMOS transceiver with on-board antenna for short-range communication. In: IEEE Asian Solid-State Circuits Conference (A-SSCC), pp 173–176
Opoku Agyeman M, Zong W, Yakovlev A, Tong KF, Mak T (2017) Extending the performance of hybrid NoCs beyond the limitations of network heterogeneity. J Low Power Electron Appl. https://doi.org/10.3390/jlpea7020008
Rezaei A, Daneshtalab M, Palesi M, Zhao D (2016) Efficient congestion-aware scheme for wireless on-chip networks. In: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pp 742–749
Rezaei A, Daneshtalab M, Safaei F, Zhao D (2016) Hierarchical approach for hybrid wireless network-on-chip in many-core era. Comput Electr Eng 51(C):225–234. https://doi.org/10.1016/j.compeleceng.2015.10.007
Rezaei A, Daneshtalab M, Zhao D (2017) Cap-w: congestion-aware platform for wireless-based network-on-chip in many-core era. Microprocess Microsyst 52:23–33
Rezaei A, Daneshtalab M, Zhao D, Modarressi M (2016) SAMi: self-aware migration approach for congestion reduction in NoC-based MCSoC. In: IEEE International System-on-Chip Conference (SOCC), pp 145–150
Rezaei A, Zhao D, Daneshtalab M, Zhou H (2017) Multi-objective task mapping approach for wireless NoC in dark silicon age. In: Euro-MICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 589–592
Saxena S, Manur DS, Shamim MS, Ganguly A (2017) A folded wireless network-on-chip using graphene based thz-band antennas. In: International Conference on Nanoscale Computing and Communication (NaNoCom)
Sikder AI, Kodi AK, Louri A (2016) Reconfigurable optical and wireless (R-OWN) network-on-chip for high performance computing. In: 3rd ACM International Conference on Nanoscale Computing and Communication, NANOCOM’16, pp 25:1–25:6
Stojmenovic I (1997) Honeycomb networks: topological properties and communication algorithms. IEEE Trans Parallel Distrib Syst 8, issue 10(10):1036–1042
Vien QT, Agyeman MO, Le TA, Mak T (2017) On the nanocommunications at THz band in graphene-enabled wireless network-on-chip. Math Probl Eng. https://doi.org/10.1155/2017/9768604
Hu WH, Wang C, Bagherzadeh N (2015) Design and analysis of a mesh-based wireless network-on-chip. J Supercomput 71(8):2830–2846
Wettin P, Kim R, Murray J, Yu X, Pande PP, Ganguly A, Heoamlan D (2014) Design space exploration for wireless nocs incorporating irregular network routing. IEEE Trans Comput Aided Des Integr Circuits Syst 33(11):1732–1745
Wu H, Nan L, Tam SW, Hsieh HH, Jou C, Reinman G, Cong J, Chang MCF (2012) A 60 GHz on-chip RF-interconnect with \(\lambda \)/4 coupler for 5 Gbps bi-directional communication and multi-drop arbitration. In: IEEE Custom Integrated Circuits Conference (CICC), pp 1–4
Xiao Z, Baas B (2013) A hexagonal processor and interconnect topology for many-core architecture with dense on-chip networks. In: IFIP WG 10.5/IEEE International Conference on VLSI-SoC: From Algorithms to Circuits and System-on-Chip Design, pp 125–143. Springer, Berlin Heidelberg
Yang P, Wang Q (2015) Heterogeneous honeycomb-like NoC topology and routing based on communication division. J Future Gener Commun Netw 8:19–26
Yao C, Feng C, Zhang M, Guo W, Zhu S, Wei S (2016) Partitioning methods for multicast in bufferless 3d network on chip. In: Conference Computer Engineering and Technology, pp 13–22
Yin A, Chen N, Liljeberg P, Tenhunen H (2012) Comparison of mesh and honeycomb network-on-chip architectures. In: IEEE Conference on Industrial Electronics and Applications (ICIEA), pp 1716–1720
Yin AW, Xu TC, Liljeberg P, Tenhunen H (2009) Explorations of honeycomb topologies for network-on-chip. In: IFIP International Conference on Network and Parallel Computing (NPC), pp 73–79
Yu X, Baylon J, Wettin P, Heo D, Pande PP, Mirabbasi S (2014) Architecture and design of multichannel millimeter-wave wireless NoC. IEEE Des Test 31(6):19–28
Yu X, Sah SP, Rashtian H, Mirabbasi S, Pande PP, Heo D (2014) A 1.2-pJ/bit 16-Gb/s 60-GHz OOK transmitter in 65-nm CMOS for wireless network-on-chip. IEEE Trans Microw Theory Tech 62, issue 10(10):2357–2369
Zhao D, Wang Y, Wu H, Kikkawa T (2015) I(Re)2-WiNoC: exploring scalable wireless on-chip micronetworks for heterogeneous embedded many-core SoCs. Digital Commun Netw 1(1):45–56
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yazdanpanah, F., AfsharMazayejani, R., Alaei, M. et al. An energy-efficient partition-based XYZ-planar routing algorithm for a wireless network-on-chip. J Supercomput 75, 837–861 (2019). https://doi.org/10.1007/s11227-018-2617-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2617-x