1 Introduction

During the last decade, the communication infrastructure of many-core system-on-chips (MCSoCs) has gradually changed from traditional point-to-point and bus interconnections to network-on-chip (NoC). Nowadays, NoC-based systems that are capable of accommodating hundreds of processing elements are commercially available [7, 46], but, in large systems, the multi-hop nature of metal-based interconnections has become a bottleneck for improving performance and energy consumption [31, 45]. This has motivated researchers to seek alternative architectures such as wireless NoC (WNoC) [1, 18, 19, 21, 24, 38, 44, 52]. The key idea of WNoC is to adopt express communication infrastructure in the form of hybrid wired/wireless routers and links in order to reduce transmission delay with reasonable energy consumption while providing high bandwidth and throughput.

Topology that defines how nodes are placed and connected greatly, affects the performance, energy consumption, and circuit area of NoC architectures. The mesh topology has been mostly used in designing existing on-chip networks because its simplicity and regularity, and, the fact that processor tiles are traditionally square or rectangular, [20, 39, 43, 51]. However, efficiently mapping applications can be a challenge when it needs to communicate between processors that are not neighboring in a mesh-based structure [54]. To address this problem, a wired NoC with honeycomb topology has been presented in [58]. It has been shown that this topology provides higher performance compared with a wired NoC based on mesh topology. The comparison results of two wired NoC architectures, one with mesh and the other with honeycomb topology have shown that a wired NoC with honeycomb topology outperforms a wired mesh-based NoC by at least 25.9%, 54.2% and 30.0% reduction in power consumption, area and delay, respectively [57]. The effectiveness of honeycomb topology in mapping algorithms on NoC architectures has been demonstrated in [54]. Besides that, in [55], a configurable honeycomb-based NoC has been designed. These wired NoCs show that honeycomb topology provides high performance and consumes less energy. Moreover, the structure of honeycomb-based NoCs is regular and easy to extend.

This matter motivated us to explore and analysis performance characteristics of a NoC architecture employing the both honeycomb topology and wireless interconnections techniques. To explore this viewpoint, we have proposed HoneyWiN, a novel WNoC architecture with honeycomb topology with its partition-based routing algorithm. In this article, HoneyWiN is analyzed and assessed from different points of view. As it was expected, HoneyWiN provides higher performance in terms of throughput, power consumption, delay, and even area compared to the mesh-based WNoC. Besides the NoC topology, the routing algorithm is an important and definitive factor that affects the performance; especially it can improve energy-efficiency and throughput. The routing algorithm of HoneyWiN is partition-based and XYZ-planar means three directions in 2D topology. In order to compare the proposed architecture, we implement our partition-based routing algorithm in the form of 2-axes coordinate system in the baseline architecture. Furthermore, in order to demonstrate the effectiveness of HoneyWiN, this architecture has been compared with four state-of-the-art mesh-based NoC architectures [37, 38, 44, 52].

The rest of this article is organized as follows. The background, the related studies, and the motivation are presented in Sect. 2. HoneyWiN architecture and its energy-efficient partition-based routing algorithm are expounded by Sect. 3. In Sect. 4, the simulation methodology and experimental setups are described. Then, Sect. 5 presents the simulation results and performance evaluations of HoneyWiN compared with five mesh-based NoC architectures. Finally, the article is concluded by Sect. 6.

2 Background and related work

In wired NoC architectures, metal links provide cheap and easy communication media, but they consume much energy, especially as the number of cores, routers and links increases. It has been predicted that on-chip interconnections consume up to 80% of chip energy [34]. In WNoC architectures, wireless links are adapted into NoCs along with wired links, to increase performance and reduce energy consumption. In [18], various challenges and emerging solutions regarding the design of an efficient and reliable WNoC architecture has been presented. On-chip wireless interconnections were used for the first time to distribute global clock signals [23]. In [16], an ultra-wideband WNoC that uses hybrid wired/wireless signals to realize a synchronous and distributed medium access control protocol has been implemented. A hybrid wireless NoC for heterogeneous CPU-GPU architectures has been presented in [24]. Moreover, a congestion-aware platform, named CAP-W [44], has been designed for WNoC in order to reduce congestion in the network and especially over wireless routers. In [52], the authors quantified the latency, energy dissipation, and thermal profiles of millimeter-wave small-world WNoC (mSWNoC) architectures by incorporating irregular network routing strategies. DWiNoC [38] has been presented as an interference-aware WR placement and routing algorithm for WNoC architecture. DWiNoC architecture enables point-to-point links between transceivers and hence multiple wireless links can operate at the same time.

Several architectures have been introduced to obtain an optimal trade-off between area overhead and expected communication bandwidth [1, 13, 17, 48, 59, 61]. In addition, in recent years, the idea of wireless 3D NoCs has been proposed in order to design many-core chips with greater performance and lower energy consumption. This technology is the combination of different dies that are stacked on each other. The authors of [37] proposed a deadlock-free routing algorithm for wireless 3D NoCs.

In order to transmit data across the chip in WNoC architectures, various approaches have been introduced. In Table 1, different wireless on-chip transceivers are compared.

Table 1 Wireless on-chip transceivers comparison

The metal zigzag antennas [36] utilize millimeter wave (mm-Wave) as part of the electro-magnetic (EM) spectrum to operate in tens of gigahertz frequency. By employing mm-Wave approach in 40 nm CMOS technology, the data rate of 11 Gbps at 56 GHz frequency with bit error rate (BER) of \(10^{-11}\) has been reported [32]. By designing an on-off keying (OOK) transmitter in 65 nm CMOS, the data rate of 16 Gbps at 60 GHz frequency has been achieved [60]. In RF-I approach, EM waves travel via transmission line to exchange data between long-distance on-chip cores. One of the first implementation of RF-I has been proposed in 90 nm CMOS technology with the data rate of 5 Gbps [14]. Although the signals in RF-I can propagate in light speed, RF-I suffers from crosstalk and scalability issues [31].

On the other hand, carbon nano-tube (CNT) technology operates in terahertz/optical frequency range while reduces the size of antennas. In [6, 26], a fundamental property analysis of the CNT antennas including input impedance, current distribution, and radiation pattern has been provided. Moreover, graphene antennas also operate in terahertz frequency and provide low energy dissipation and less area overhead [47]. But these miniaturized antennas suffer from different challenges during implementation. For example, in nano-scale communication of the terahertz band, molecular absorption causes path loss and high noise [5]. To address this issue, a recent research has been proposed [50], by the way, more research efforts are required to fully design and measure physical properties of the graphene antennas.

The existing mm-Wave WNoCs suffers from not only free-space spreading loss, but also, molecular absorption attenuation. These problems are increased at a high frequency band, because the reliability of the system is reduced [3]. Surface wave interconnection (SWI) is another approach in which a 2D waveguide medium is used as the wireless communication layer to propagate surface wave signals. To physically implement this medium, a dielectric coated metal layer is used. Comparing with free-space signal propagation environment, energy dissipation can be substantially reduced in SWI because of the signal propagation in 2D communication fabric. However, the SWI-based architectures offer BER of less than \(10^{-14}\) which is similar to BER of wired communication [30]. In [3], the authors proposed a surface wave communication fabric that is able to match the reliability of traditional wired NoCs. They employ a carefully designed transducer and commercially available thin metal conductor coated with a low-cost dielectric material to generate surface wave signals with improved transmission gain.

The above efforts demonstrate how promising WNoC designs are to be employed as the backbone of future MCSoCs. However, since wireless routers (WRs) are more energy hungry than conventional metal-based wired routers (CRs), new proposals are required to address the trade-off between energy consumption and performance.

From the architectural point of view, there is a common wisdom that conventional grid-like mesh-based systems have better performance and reasonable energy consumption in comparison with other 2D topologies. That is why almost all the emerging WNoC architectures also have been focused on mesh topology [20, 39, 43, 51]. But some researchers have designed wired NoCs with honeycomb topology and reported better performance results of honeycomb-based NoCs compared with mesh-based NoCs [2, 54, 55, 57, 58]. Motivated by the fact that applications often have largely localized communication patterns, the authors of [54] proposed an eight-neighbor mesh topology and also a six-neighbor topology with honeycomb-shaped processor tiles, both increase local connectivity while keep much of the simplicity and regularity of traditional mesh-based architectures.

Table 2 compares a 2D honeycomb-based topology with the common 2D topologies. It states the honeycomb-based topology provides fine cost regards to degree. Generally, a topology is evaluated in terms of four parameters [55] as follows:

  • Degree means the number of I/O channels per node. This parameter affects the amount of hardware resource utilization and area.

  • Network diameter indicates the maximum value of all shortest paths between any two nodes of the network. Generally, degree and diameter are opposed to each other; a small degree means a large network diameter.

  • Network cost is calculated by the product of the degree and the network diameter.

  • Bisection width is the minimum number of wires that must be cut when the network is divided into two equal sets of nodes.

Table 2 Comparison of N-node 2D topologies [55]

These preliminary results not only challenge the efficiency of the mesh-based WNoC, but also motivate us to seek an alternative topology in emerging hybrid on-chip architectures. In [2], HoneyWiN architecture as the first WNoC architecture with honeycomb topology has been proposed. Continuing that design, in this article, we present a specific partition-based routing algorithm that uses a planar 3-axes coordinate system for HoneyWiN. One goal of this article is to study problems and challenges of the mesh topology in WNoC architectures and demonstrate how HoneyWiN mitigate these challenges. Investigating the role of reconfigurable partitions (i.e., homogeneous/heterogeneous and complete/partial partitioning) in HoneyWiN is another goal of this article. The experimental results are expanded for deeply comparison of HoneyWiN architecture with a mesh-based WNoC and a mesh-based NoC, and also with four state-of-the-art mesh-based NoC architectures.

3 HoneyWiN architecture

HoneyWiN consists of a wired network with 5-port CRs. Each CR connects to its corresponding core and at most three adjacent CRs via wired links. Meanwhile, another port is forecasted for possible connection to a WR. On top of the wired network, at the second level, a wireless network is adopted by WRs (see Fig. 1). Each WR is a multi-port router equipped with a transceiver that is capable of both wired and wireless communications. With high regularity, symmetry and scalability, the whole network has the potential of improving the communication efficiency and reducing energy consumption.

The transmission interfaces of WRs operate as bridges between the wireless and the wired communication layers. Each wireless transmission interface is responsible for transmitting and receiving wireless signals. This component works closely with the router logic, a virtual channel allocator, an arbiter and a crossbar switch for efficient wireless signal transmission. CRs, the routers without the wireless transmission interfaces, must forward packets to the nearest wireless nodes in a multi-hop manner before they can finally exploit the single-hop wireless links to remote destinations. Moreover, if the destination node is not a wireless node, the packet is transmitted to the nearest wireless node and then is sent to the corresponding CR [41].

The structure of CRs and WRs of HoneyWiN are different from the routers of the mesh-based WNoC. Each CR of the mesh-based WNoC has four global ports for wired connecting to other CRs. It also includes one local port for connecting to its nearest IP and another port for connecting to a WR, if it is necessary. Basically, a CR of the mesh-based WNoC utilizes more amounts of hardware resources and hence, consumes more dynamic and static energy than a CR of HoneyWiN architecture.

3.1 Partitioning

Partitioning methods of a NoC before applying a routing algorithm, divide the network into several logical partitions and assign destinations to different sets. Smart partitioning methods must balance the sets and reduce the path length within each partition. The main effect of partitioning methods is reducing the propagation delay of network components [22]. Partitioning methods especially in hybrid wired/wireless NoCs have been investigated in several works [4, 15, 22]. One of the benefits of partitioning of WNoC architectures is that intra-subnet communications are handled through wired paths, while for inter-subnets communications a function of hop counts and congestion is used in order to select the efficient path [42].

It is important to select an efficient routing algorithm as it must be able to leverage the full potential of the topology. After partitioning the network, bidirectional turn restrictions within partitions are determined. As partitions are independent, it is possible to place turn restrictions within a partition, independently from other partitions. Different partitions may lead to different trade-offs in terms of performance, energy consumption, and even area overhead.

In HoneyWiN, partitioning can be viewed and examined from different viewpoints. One way to see partitioning is based on the number of cores within each subnet that can be homogeneous or heterogeneous. In homogeneous structure, all the subnets have equal number of cores, but in heterogeneous partitioning, each subnet can have different numbers of cores. Hence, homogeneous partitioning is suitable for the networks with almost uniformly packet distributions, and heterogeneous partitioning is proper for the networks with high communication demand for some specific cores (i.e., non-uniform traffic patterns). Another way to see partitioning is based on the participant cores in the process of subdividing that can be complete (i.e., all the cores participate in partitioning) or partial (i.e., some of the cores are involved in the process). Complete partitioning can be utilized in the networks with high traffic rates, whereas partial partitioning is beneficial for medium and low traffic rates.

For example, Fig. 1a illustrates a 24-core partial homogeneous HoneyWiN with three WRs. Figure 1b, c show two complete homogeneous partitioning by dividing the network into four and six partitions, respectively. As another instance, Fig. 2a depicts a complete homogeneous partitioning of HoneyWiN architectures with 54 cores. In this example, all the subnets are equipped with similar WRs. Figure 2b illustrates a partial heterogeneous version of the same architecture. In this case, the middle subnet with more number of cores requires a stronger WR with more ports.

Fig. 1
figure 1

24-core HoneyWiN architecture a partial homogeneous partitioning with three WRs, b complete homogeneous partitioning with four WRs, c complete homogeneous partitioning with six WRs

Fig. 2
figure 2

54-core HoneyWiN a complete homogeneous partitioning with six WRs, b partial heterogeneous partitioning with seven WRs

For increasing network efficiency, HoneyWiN is partitioned in a way that any core within a partition has the minimum hop-count toward the WR of the same partition. For borderline cases that one core may have the same hop-count from two or more WRs, the core will be randomly assigned to one of the possible partitions [44].

As the number of cores increasing, the number of WRs would be increased, and various partitioning structures can be organized. Considering only the delay reduction for evaluation, it is obvious that the more number of WRs, the less the delay. On the other hand, the results show that using more than 10 WRs will significantly increase power consumption of WNoCs [43]. By the way, there are many different configurations for a specific number of cores with various numbers of WRs. For example, HoneyWiN architecture with 256 cores can be developed with different configurations of homogeneous or heterogeneous, partial or complete partitioning, based on the numbers of WRs. Figure 3 illustrates an instance configuration of HoneyWiN with 256 cores and nine WRs; the WRs are placed at the second level of the NoC and form heterogeneous partitions. The optimal decision about the number of cores, number of WRs, partitioning strategy and WR placement algorithms are directly related to the application, and consequently, task mapping and link allocation policies. Task mapping procedure is out of the scope of this article.

Fig. 3
figure 3

256-core HoneyWiN architecture with nine WRs

3.2 Routing

The proposed routing algorithm of HoneyWiN is based on XYZ-planar algorithm, according to the planar 3-axes coordinate system introduced in [49]. The X, Y and Z axes start from the center of the network and divide the topology into three regions. Packet traversal may happen via the wired network or the combination of wired and wireless networks. In each step, if the corresponding CRs of both source and destination cores are connected to two different WRs, express wireless communication links are used. In this case, long multi-hop wired paths will be avoided. Otherwise, when both source and destination CRs are connected to the same WR, a wired path is established.

Fig. 4
figure 4

The routing algorithm

Fig. 5
figure 5

Routing examples a 1-hop wired, b 3-hop hybrid, c 4-hop hybrid, d 7-hop wired links

Fig. 6
figure 6

The flowchart of routing algorithm

In order to prevent deadlock in wired network, one out of six possible turns will be disabled in each clockwise and non-clockwise dependent cycle. Also, to alleviate dead-lock when packets are routed via both wired and wireless networks, in each input port of the routers two sets of virtual channels (VCs) are used [43]. One is for packet transmission from CR to WR, while the other one is for packet transmission from WR to CR or from CR to CR. The routing algorithm of HoneyWiN architecture is shown in Fig. 4.

An effective adaptive routing algorithm is able to minimize path congestion through load balancing. To predict temporal network congestion, routing algorithms based on ant colony optimization (ACO) have the power of identifying the near-future non-congested path to a desired target according to historical network information [12]. Therefore, the ACO strategy has been employed in partition-based XYZ-planar routing algorithm of HoneyWiN.

Figure 5 shows different routing examples of HoneyWiN architecture with 24 cores. As can be seen in Fig. 5a, the intra-partition communications will be done via wired links. On the other hand, for inter-partition communications, when the destination router is connected to a WR, the routing path will use both wired and wireless networks as shown in Fig. 5b, c. Otherwise, as depicted in Fig. 5d, only the wired network will be utilized. In other words, in order to prevent over utilization of WRs, only the packets in which their destination routers are connected to a WR are allowed to use the wireless network. Figure 6 illustrates the flowchart of HoneyWiN routing algorithm.

4 Methodology and experimental setups

For obtaining results of simulation and evaluations of the proposed architecture, Noxim [10, 11], a SystemC-based cycle-accurate NoC simulator is used. Besides that, for tuning the NoC simulator, the energy analysis has been exploited by Orion 2.0 [29]. The simulation setup is shown in Table 3.

Table 3 Simulation setup

In addition to synthetic traffic patterns provided by Noxim, for more analysis of HoneyWiN real application are also considered. To do so, M5 simulator [8] is employed to acquire memory access traces from full system running of the applications of SPLASH-2 and PARSEC v2.1 benchmark suite. These traces are used to drive Noxim. For obtaining the traces from real applications, 64 two-wide superscalar out-of-order cores with private 32 KB L1 instruction and data cache, and a shared 16 MB L2 cache are employed. For comparing HoneyWiN with four state-of-the-art mesh-based NoC architectures [37, 38, 44, 52], the simulation parameters in our simulator, have been modified according to the scenario utilized in each of the intended references. Therefore, it is possible to compare our obtained results with their reported results.

5 Results

In this section, the experimental results for evaluation of HoneyWiN are presented. The performance analysis of HoneyWiN is compared with the traditional mesh-based WNoC, as the baseline. For evaluation, we implement our partition-based routing algorithm in the form of 2-axes coordinate system in the baseline architecture. Additionally, HoneyWiN is compared with four NoC architectures that recently presented. The architectures are evaluated from different performance metrics with different injection rates on various packet distributions in the form of synthetic and real traffic patterns.

5.1 Evaluation of 24-core HoneyWiN

In this section, HoneyWiN architecture with 24 cores is evaluated. In Fig. 7, the comparison results of network throughput for HoneyWiN using 24 cores and complete homogeneous partitioning with the mesh-based WNoC architecture is illustrated. This analysis is performed with four, six, and eight WRs, in 0.1 injection rate. As can be seen, HoneyWiN has higher or equal network throughput in most of the traffic patterns in comparison with the mesh-based WNoC. With Uniform traffic pattern, the least throughput is obtained and is again the same for both HoneyWiN and the mesh-based WNoC. In Bit-Reversal traffic pattern, throughput is higher than the other distributions and is the same for both architectures. It is noticeable that Bit-Reversal usually is among the worst cases from the viewpoint of its realization in common wired networks. Off course, the obtained results of analyzing wireless and 3D NoCs architectures (for instance the researches of [15, 20, 35, 56]) show that this traffic pattern is not the worst case, even it can be one of the best ones. The reason is in this traffic pattern; there are irregular and various paths for packet routing in WNoCs and 3D NoCs architectures.

Fig. 7
figure 7

Network throughput (flit/cycle) comparison for 24-core complete homogeneous system

Figure 8 depicts comparison of total energy consumption of HoneyWiN and the mesh-based WNoC with 24 cores. HoneyWiN has less energy consumption than the mesh-based WNoC for all the synthetic traffic patterns with four, six or eight WRs. Furthermore, according to Figs. 7 and 8, it seems that less number of WRs is more efficient in terms of both performance and energy consumption.

Fig. 8
figure 8

Energy consumption comparison for 24-core complete homogeneous system

Fig. 9
figure 9

Average delay ratio of HoneyWiN to the mesh-based WNoC for 24-core complete homogeneous system

As another experiment, Fig. 9 shows the average delay ratio of HoneyWiN over the mesh-based WNoC for different injection rates. The average has been obtained from the results of five executions. As the figure illustrated, the delay ratio is less than one for most of the traffic patterns. In Uniform traffic pattern, notwithstanding the injection rate and the number of wireless routers, the delay is high. With the other selected traffic patterns, the delay ratio (HoneyWiN over the mesh-based WNoC) is equal or less than one. As a result, implementing a WNoC in honeycomb form provides faster architecture than the mesh-based WNoC; but the speed improvement is not considerable. Also, generally HoneyWiN performs better than the mesh-based WNoC in high injection rates. This means that HoneyWiN operates more efficient in systems with frequent communications.

Fig. 10
figure 10

Network throughput evaluation of a 24-core HoneyWiN

Fig. 11
figure 11

Energy consumption evaluation of a 24-core HoneyWiN

In Figs. 10 and 11, a 24-core HoneyWiN is evaluated for different number of WRs considering the synthetic traffic patterns and the obtained traces of real application from PARSEC benchmark suite. As the figure shows, the 24-core HoneyWiN with four WRs provides better performance than in the cases of using six and eight WRs. In addition, with four WRs, the 24-core HoneyWiN has less logic utilization, smaller area and less power consumption.

Fig. 12
figure 12

Analyzing 24-core WNoCs for different performance parameters

Figure 12 compares a HoneyWiN with the mesh-based WNoC with 24 cores, three and four WRs in 0.05 and 0.25 injection rates of Transpose 2 traffic pattern. As the results show, HoneyWiN provides better performance in terms of power consumption, received flits, throughput and delay than the mesh-based WNoC. Figure 12a illustrates HoneyWiN with four WRs consumes less power than the other configurations of this analysis. Figure 12b states the flit ratio over the ideal case. As the figure shows, HoneyWiN with three WRs provides better received flit ratio because of the uniformly WRs placement in the NoC. Thus, Fig. 12c, d show that HoneyWiN provides better throughput in terms of the overall flit per cycle (flit/cycle) and flit per cycle for each core (flit/cycle/IP) especially with three WRs.

As it is also previously demonstrated, the propagation delay of HoneyWiN is a little less than the mesh-based WNoC (see Fig. 12e). Figure 12f compares the area ratio of the four NoC configurations: the mesh-based WNoC, which is considered as the baseline and its area, is assumed to be equal to one, HoneyWiN, a wired mesh-based NoC, and a wired honeycomb-based NoC, all consist of 24 cores. The results of area are estimated according to the number of WRs and CRs, number of ports and buffers of different routers.

5.2 Evaluation of 54-core HoneyWiN

In this section, HoneyWiN architecture with 54 cores is evaluated with two configurations: first, a complete homogeneous HoneyWiN with six WRs, and second, a partial heterogeneous HoneyWiN with seven WRs. The obtained results of network throughput and energy consumption comparison are shown in Figs. 13 and 14. As Fig. 13 illustrates, in average, HoneyWiN provides higher network throughput than the mesh-based WNoC. Moreover, for all the traffic patterns, HoneyWiN architecture consumes much less energy than the mesh-based WNoC in both configurations with six or seven WRs (see Fig. 14). Besides, heterogeneous partitioning can provide more flexibility for application-specific architectures. As the figure show, with seven WRs that one of them is in the center of HoneyWiN, the higher throughput than utilizing six WRs can be achieved.

Fig. 13
figure 13

Throughput (flit/cycle) of 54-core WNoCs, a complete homogeneous, b partial heterogeneous

Fig. 14
figure 14

Energy consumption of 54-core WNoCs, a complete homogeneous, b partial heterogeneous

5.3 Evaluation of 64-core HoneyWiN

In this section, a configuration of HoneyWiN with 64 cores is compared with two other WNoC architectures [38, 52], and also with a 3D NoC architecture [37]. According to the available results of the intended architecture, for this evaluation, some applications of PARSEC and SPLASH benchmark suites have been considered.

In Fig. 15, the energy consumption of HoneyWiN and DWiNoC [38] is compared. These architectures have 64 cores and six WRs. As the figure shows, for all the benchmark applications, HoneyWiN considerably consumes less energy than DWiNoC.

Fig. 15
figure 15

Energy consumption comparing of HoneyWiN with DWiNoC [38]

Figure 16 shows the results of comparing HoneyWiN with mSWNoC [52] and with the traditional mesh-based WNoC in the terms of energy-delay product (EDP). For this comparison, the MROOTS strategy of mSWNoC is considered. Both architectures consist of 64 cores and 12 WRs. As it is observable, HoneyWiN outperforms both architectures in terms of energy consumption and propagation delay.

Fig. 16
figure 16

Comparison of Energy-delay product (EDP) of HoneyWiN with mSWNoC [52]

In Fig. 17, the delay of HoneyWiN is compared with two 3D WNoCs referred as LASH and FIT [37]. The results are for RANDOM traffic pattern with 0.3 injection rate. The architectures are equipped with four WRS, 8-flit packets and 16-flit buffers according to [37]. It is noticeable that buffers with depth of 16 flits cause increasing the delay cycles. According to our analysis, four flits are enough for each port buffers. By the way, in this situation, as the figure shows, HoneyWiN provides faster architecture than the LASH and FIT in the different configurations.

Fig. 17
figure 17

Delay comparison of HoneyWiN with two 3D WNoCs referred as LASH and FIT [37]

5.4 Evaluation of 256-core HoneyWiN

In this section, first, a configuration of HoneyWiN with 256 cores and nine WRs is evaluated and compared with the mesh-based WNoC in terms of network throughput and energy consumption. Then, propagation delay of HoneyWiN is compared with CAP-W architecture [44], both with 256 cores and eight WRs.

As Fig. 18 illustrates, HoneyWiN provides more energy-efficient architecture. HoneyWiN consumes in average 30.6% less energy than the mesh-based WNoC. Moreover, HoneyWiN provides in average, 8% higher network throughput. Off course, as Fig. 19 shows, for two traffic patterns, HoneyWiN provides less network throughput than the mesh-based WNoC. Figure 20 shows the delay of HoneyWiN comparing with CAP-W and a mesh-based NoC [44]. As it can be seen in the figure, HoneyWiN outperforms CAP-W in the terms of propagation delay in average 14%. Moreover, HoneyWiN is about two times faster than the mesh-based NoC.

Fig. 18
figure 18

Energy consumption comparison for 256-core WNoCs

Fig. 19
figure 19

Network throughput (flit/cycle) comparison for 256-core system

Fig. 20
figure 20

Delay comparing HoneyWiN with CAP-W [44]

6 Conclusion

In this article, first it was shown that the mesh topology does not always provide the best performance and efficient energy consumption in WNoC architectures. Then, we analyzed HoneyWiN, the proposed honeycomb-based WNoC, with its specific partition-based routing algorithm. The concepts of reconfigurable homogeneous/heterogeneous and complete/partial partitioning were also discussed. Finally, experimental results depicted HoneyWiN consumes less energy (i.e., on average 17%) and improves the network throughput (i.e., on average 10%) in comparison with the mesh-based WNoC. Moreover, comparing HoneyWiN with four state-of-the-art mesh-based NoC architectures demonstrated that HoneyWiN provides higher performance in term of delay, throughput and energy consumption. Overall, the results indicate that HoneyWiN is very effective in improving throughput, increasing speed and reducing energy consumption.