1 Introduction

With an increase in the number of cores in a multicore chip, there is an ever-increasing demand for high-bandwidth interconnect which connects these cores [13]. These demands have been met by the introduction of NoC-based silicon-on-chip (SoC) design. The NoC has replaced buses with cross-bar switches in the interconnect of many core chips. Such NoC connects the cores via routers at every node through the network interface (NI). NoC provides multiple communication flow to achieve scalability and high bandwidth [3, 18, 25].

The communication model in NoC is based on modules connected via a network of routers forwarding the packets via the links between the routers; links that comprise long interconnects. Maintaining interconnect ability is required to achieve efficient system performance in NoC [12]. Communication starts with the source processing node and ends with the destination processing node through the interconnect.

The power dissipation in the local link of an NoC is less compared to the power taken by the link between the routers. The power dissipation in the NoC link can be of static and dynamic. Among these, the power dissipation of interconnect of NoC mainly depends on dynamic power dissipation which, in turn, is due to the charging of capacitance. The capacitance which produces the dynamic power dissipation can be input node capacitance, output node capacitance and interconnect capacitance. The interconnect capacitance grows as the technology shrinks down. The self-switching activity and coupling switching activity of interconnect introduce capacitance which is the major cause for power dissipation in NoC links [8]. The total power consumed by the links of NoC is given as follows [20]:

$$\begin{aligned} \ P_{\mathrm{bus}} = (\alpha _\mathrm{s} C_\mathrm{s}+ \alpha _1 C_1 ) V^2f \end{aligned}$$
(1)

where \(\alpha _\mathrm{s}\) denotes the self-switching activity on the link that depends on the number of 0\(\rightarrow \)1 transitions in the link in two consecutive transmissions, \(C_\mathrm{s}\), is the self or line to substrate capacitance, \(C_1\) is the interwire capacitance or coupling capacitance, \(\alpha _1\) is the coupling switching activity that depends on the number of correlated switching between physically adjacent lines, V is the supply voltage, and f is the maximum operating clock frequency. As the technology narrows down to nanoscale level, coupling and self-capacitance between the interconnects and within the interconnect induce a high power dissipation problem in the multicore NoCs [26]. Hence, the links have the higher power consumption than routers in NoC. The links can take either parallel communication or serial communication.

Fig. 1
figure 1

Encoding and decoding in parallel link

In parallel communication, multiple parallel links are used between the NI and the router as shown in Fig. 1. It includes encoder and decoder in each link. For efficient communication encoder and decoder are required. Encoder is used at the transmission end and decoder at the receiving end. Encoder helps to send data in a secured manner and avoids lengthy coding, thereby reducing the bandwidth for transmissions. Encoding also enables encryption and error correction. Traditional architectures widely use parallel links to improve speed, throughput, wire length and to solve the complexity of implementation serializer and deserializer. In serial communication, multiple ‘N’ bit parallel link is converted into a single serial link through a serializer and a deserializer that interface the router module to the serial link as shown in Fig. 2. ‘N’-bit parallel data are converted to serial data using serializer. Serializer and deserializer are combined and called Ser-Des. It consists of two functional blocks—parallel-in serial-out and serial-in parallel-out. It is used to give data communication through a single link so that it reduces the number of input and output pins along with their interconnects. The encoder and the decoder are included in the serial link for efficient transmission.

Though the serial link looks less inferior to parallel, due to the development of high-speed serial technologies, serial links replace parallel one for long-range communication. It also puts an end to the necessity of multiple line drivers and buffers. The serial links reduce area both at the interconnect level and at the circuit level and reduce signal interference, crosstalk and noise. The power consumption due to coupling capacitance is eliminated by serializing the parallel link. However, the power dissipation increases as the number of transitions in data bits transmitted increases in the link of NoC. The self-switching activity in the interconnect of NoC has been shown to be reduced by the proposed data encoding method introduced in the NI of the processing element. Thus, the decrease in switching activity significantly reduces the power dissipation. In the proposed method, serial links are used in the place of parallel links, and serial links eliminate the effect of coupling switching activity, thereby reducing the expression (1) as:

$$\begin{aligned} \ P_{\mathrm{bus}} = \alpha _\mathrm{s} C_\mathrm{s} V^2f \end{aligned}$$
(2)

From on the above equation, the power dissipation in the link can be reduced by reducing the self-switching activity \(\alpha _\mathrm{s}\). Hence, the proposed method involves binary-to-gray (B2G) coding as the first step of encoding, B2G followed by shifting of bits effectively to reduce the self-transition as the second step and in the third step it involves double B2G encoding. In every step, transition count is calculated and the encoding word that has minimum TC is selected. Hence, the dynamic power consumption \(P_{\mathrm{bus}}\) is reduced.

Fig. 2
figure 2

Encoding and decoding in serial link

The main contributions of the proposed work have been summarized as follows:

  • The data encoding and decoding methodology in the proposed work does not depend on data types. It works well for all types of data streams without any limitations.

  • In the proposed method, three steps of encoding are employed and analyzed to determine which method has given minimum TC; that method is then chosen to encode the incoming data and communicated to the decoder through two flag bits.

  • The novelty of the proposed method is that it does not rely on single encoding technique for all data. It verifies the data through three encoding techniques, and the least TC data are chosen for transmission in NoC links. It assures for no further increase in TC of original data after encoding and also ensures only the least TC encoded data are sent, thereby reducing dynamic power dissipation at all times for any random data.

  • In the first stage, it reduces the number of switching transitions as much as possible, which guarantees the reduction of power consumption. In the next stage, the possibility of further increase of switching transitions from the original data is blocked by comparing the encoded data with original data and selecting the data with the least switching transitions during on-chip communication through NoC links. For example, if the original data stream ‘10101010’ have the maximum transition count (TC) of 7, with the proposed technique, after encoding, TC is reduced to 0; achieves 100% switching activity reduction. For any raw data, this technique never increases to the original TC and ensures expected 50% switching activity reduction.

  • The routers, NI and IP cores are connected in mesh topology and utilize XY routing algorithm to establish the communication between IP cores. The IP core that connected with the NI sends data bits to routers and vice versa. Different types of files such as .doc, .pdf, .png, .jpeg and .mp3 have been used for performance analysis.

  • The performance of the proposed method is compared with state-of-the-art encoding methods such as serialized low energy transmission coding for on-chip interconnection networks (SILENT) [22], adaptive low-power transmission coding for serial links (ALPTCS) [36], Data Correlation method [16] and approximate differential encoding (ADE) [31] and assured low-power dissipation for different types of real data files.

  • The encoder and decoder design is simulated and verified in Synopsys VCS Simulator to test the functionality of the design. The design is synthesized in Synopsys DC compiler targeting a 90 nm UMC library DC compiler. The power analysis of the synthesis, done using Synopsys Prime power tool, by mapping to a 90 nm technology has reported a 48.83% reduction in switching activity.

2 Related Works

NoC architecture relies on the parallel bus communication since it increases the bandwidth, however, at a higher cost of power dissipation and area compared to the serial link. Bus Invert (BI) [41] is a traditional method, used for parallel communication. It uses a control bit called invert which conditionally inverts the data before transmission. If the hamming distance between the present data value and next data value is greater than N/2, where N is the width of the bus, set the invert bit = 1, and the data are inverted and transmitted over the bus. If not greater than N/2, then invert = 0 and the original data are transmitted. Hence, inverted data lower the bus activity, thereby reducing I/O peak power dissipation by 50% and the I/O average power dissipation by 25%. Another method is frequent value encoding (FVE) [46]; here initially, the most frequent bit patterns in the bus are stored in content addressable memory (CAM) used by the encoder and the decoder. Then the data to be transmitted are verified with the CAM; if the data are available, it encodes the data and sends it; otherwise, the original data are sent. Hence, the switching activity is reduced by a factor 2–4 times greater than BI method. But prior knowledge on information is needed, FVE is applicable for off-chip links.

The author of [30] proposed an encoding method called shift and invert method (SINV). It uses two control bits to decide and execute with no encode, left shift, right shift or invert the original data based on the number of transitions obtained. Stan, who proposed traditional BI coding, introduced low-weight coding (LWC) in [40] using transition signaling technique, whereas BI used level signaling in which logic ‘1’ was encoded as a low (L) to high (H) or H to L transition and logic 0 as lack of transition. It was efficient with respect to power consumption when the probability of ‘0’ was very high. Universal algorithm [5] encodes the data based on six different techniques such as invert, rotate left by one bit, rotate right by one bit, rotate left by one bit and invert, rotate right by one bit and invert, whereas hybrid frequent value cache multicoding (HVFCMC) [49] enhances the FVE method by suitably encoding the non-frequent data and partial frequent data, both of which are left without being encoded in FVE method. Successive shift and invert method [43] uses the combination of shift and invert method (SSINV) to reduce the bus transition. Redundancy added by the control bit increases area overhead in [5, 30, 40, 43, 49].

Serial transition inversion [35] in which transition inversion is done when the number of transition is more than or equal to N/2, works on the block of buffered data for power reduction and is applicable only for buffered bus, where N is data width. Shifted gray encoding (SGE) [17] has been proposed by the author for instruction address bus. It utilizes the mapping of instructions in binary code from the address bus to gray code. The author has improved the result further by applying shifted gray code algorithm, but it is applicable only for embedded processors.

Partial bus coding technique [47] is an extension method of BI, and decomposition approach [19] is an extended form of PBI; the bus lines are grouped into subset of groups, and each group is encoded separately. Hardware overhead is the limitation of partial bus coding technique and decomposition approach.

The authors have proposed the above-said techniques [5, 17, 19, 30, 35, 40, 41, 43, 46, 47, 49] to minimize transition activity on each single line on the bus ignoring the neighboring lines coupling effect. Power reduction that is obtained due to the above-discussed methods becomes valid for off-chip buses. As the technology scales down, the bus lines become thinner and are very close to each other; hence, making coupling effect has become a very important criterion to be minimized in NoC design. Shielding is one of the traditional methods [14] applied to avoid coupling effect but increases area. Hence, researchers have introduced power reduction methods including both switching activity reduction and coupling activity reduction.

In [24], coupling switching activity is reduced by considering correlated switching between adjacent physical interconnects. The authors [20, 32] reduce power dissipation in NoC links by considering both self-transition and coupling transition activities. The author used the same power model as [24] but applied for end-to-end encoding using pipelined nature of wormhole switched networks. Most frequent least power encoding [44] technique uses coding for data with the symbols which assign the less one words to high probability input data.

The authors [5, 14, 17, 19, 24, 30, 32, 33, 35, 40, 41, 43, 44, 46, 47, 49] discussed low-power encoding techniques applicable for parallel communication. Though the long-range parallel links provide high data rate, they have many limitations, like large area, high capacitive load, high leakage power, high cross-coupling noise and higher routing difficulty. Due to crosstalk, routing is found difficult; using a repeater to fix this in long distance communication generates high leakage power [10, 23, 37]. Power dissipation in the parallel links can be reduced by including shielding, line to line spacing and by inserting repeaters; fixes which will increase the chip area. In communication the data buses and long interconnects evolve more energy due to dynamic power dissipation during charging and discharging of self-capacitance and coupling capacitance. Therefore, crosstalk is induced by the coupling capacitance during charging and discharging [23].

Hence, the solution for the power consumption due to coupling capacitance is by serializing the parallel link. The transition delay that occurs due to crosstalk which is induced by switching in opposite directions between adjacent wires is also minimized by reducing the switching activity in serial NoC links [20]. The major advantage achieved in serial links of NoC architecture is a reduction in wire area, the savings in power dissipation, decrease in noise and high throughput. The multiple line drivers, repeaters and buffers [39] are removed by replacing standard bus by serial link as shown in Fig. 3.

Fig. 3
figure 3

On-chip serial link replacement of standard bus

By eliminating parallel wires, serial link communication has a merit of excluding skew uncertainty [15]. The unavoidable drawbacks of serial communication are intersymbol interference and need for high-speed operation which can be overcome by the suitable data encoding methods [29]. But the power dissipation in the serial link increases as the number of transitions in data bits transmitted increases in the data path of NoC. The self-switching activity in the interconnect of NoC can be reduced by the data encoding method used in the NI of the processing element. This decrease in switching activity significantly reduces the power dissipation.

Since the parallel bus coding technique cannot be used for serial buses [34], it needs some research to use it for effective serial communication.

The majority of techniques to optimize energy consumption in on-chip interconnection serial links is achieved by reducing the quantity of bit toggles in the input data by means of suitable encoding algorithms are discussed in [8, 12, 20, 22, 26]. SILENT [22] is a novel encoding method proposed to reduce the transmission energy of the serial communication by minimizing the number of transitions. The encoding method SILENT works by XORing the successive bits in the link followed by serializing the encoded data words. Encoded words reduce the number of transitions on the serial link producing either long zeros or ones, by keeping the link silent for highly correlated data. Although the encoding techniques for serial communication are data dependent, if the Data Correlation range comes within the overhead region, the power dissipation increases in the case of higher data bits [22].

ALPTCS in NoC [36] is proposed to overcome the power consumption of overhead region in SILENT coding. The width of the overhead range in SILENT is one-third of the total range width; hence, the SILENT coding consumes more power than without it for about one-third of all data patterns. The author proposed Coding for the Overhead Region in SILENT (CORS), and an adapting mechanism is applied to switch between the SILENT and CORS when the data pattern changes, thus attempting to minimize the power dissipation. The results have proved that ALPTCS transition reduction is 2.5% to 8.4% better than SILENT.

Transition encoding method [15] proposed serial link on-chip bus architecture to minimize interconnect power. The results have proved that serialization reduces the number of wires but enlarges interconnect width and introduces wider spacing which increases capacitance and resistance in the interconnect. Designed architecture proves that power reduction occurs when the degree of multiplexing in bit stream increases. Transition encoding method involves XOR between the current and the previous data being executed and sent bit ‘1’ when transition is detected; else ‘0’ is transmitted. Embedded transition inversion (ETI) [7] is proposed to reduce the switching activity, thereby reducing the power dissipation of serial link. It follows the method of serialization followed by encoding. It eliminates the transition bit redundancy in original data by embedding the inversion information in the phase between clock and data. The phase difference between the clock and the data is generated, when there is an inversion in that data; otherwise, there is no phase difference between clock and data. Transition inversion coding (TIC) [4] inverts the even-numbered bits in a data word if the number of transitions in a data word is greater than half of the word length. The decision bit in front of the encoded data is set to 1, if inversion is performed. Partial transition inversion (PTI) coding [48] uses a parity bit along with the decision bit used for transition inversion coding to additionally provide 100% single bit error correction and 7.4% power saving. It is suited to off-chip serial communication.

Data Correlation method [16], an encoding technique, is applied for random data of 32 bit. Out of 32 bit, 8 bit from ‘0’ to ‘7’ is taken. It looks for ‘010’ or ‘101,’ from ‘a0’ to ‘a5’ and from ‘a3’ to ‘a7,’ if available; then it swaps the bits ‘010’ into ‘001’ and ‘101’ into ‘110,’ thereby reducing the power reduction. The results have proved that the average power reduction is about 22.5% which is comparatively better than the SILENT method. Serial T0 encoding [21] is designed for data capturing using CCD and CMOS sensors. It follows the basic principle of ‘T0’ applied for parallel buses utilizing the property of sequential address bus. ADE [31] for serial interfaces that is meant for connecting off-chip buses to processing elements. It is mostly applicable for applications whose approximate results are accepted and do not have impact over the final quality of the image.

Analysis has shown that SILENT coding method is very simple and can be easily applicable for serial communication networks with the highly correlated data [22]. For highly correlated data, this method provides better result than other methods. The limitation is that it consumes more power than the original data, in the overhead region that is added. The average power reduction is 13%. ALPTCS [36] is introduced to provide less power consumption in SILENT’s overhead region. It provides 2.5–8.4% better result than SILENT [22]. In [16, 48] the authors managed to provide better power reduction by overcoming the limitations of SILENT, making the encoding suited to applications with raw data inputs independent of Data Correlation. In [31] and [21] serial interconnects, energy consumption is reduced by applying approximate bus encoding, but is effective only for error-acceptable data and temporally correlated images. Serial link is the future scope for all real-time applications, and it has to provide the same efficiency as that of parallel link. To achieve this, data switching activity has to be reduced by using suitable encoding method. The proposed data encoding method is designed in such a way that it is applicable for any uncorrelated data with an average switching activity reduction of 48.83% and suitable for serial communication in on-chip networks.

The above contributions show the encoding algorithm meant for reduction in transition activity of data bits in the link may increase transition activity more than the original data and may induce overhead after encoding the data bits. But the proposed encoding method reduces the transition activity at all times and also ensures any increase of transition activity due to encoding being effectively blocked and allowing only the data bits with the reduced TC. The power dissipation and latency of the NoC link reduction have been focused on in all the papers.

3 Proposed Low-Power Coding Algorithm

The proposed work is deals with the reduction of the switching activity in on-chip links of NoC fabric. The proposed algorithm is executed upon the reset with respect to the rising edge of the clock signal. In both encoder and decoder, the output enable, transition count, flag bits and data input are set to initial data values as zero. The execution of the proposed algorithm involves the following five major steps,

figure a
  1. Step I.

    The switching activity of incoming data bits is calculated, and set as first TC with flag bit ‘00.’ The encoder converts original data into a gray code, calculates switching activity and sets it as second TC with flag bit ‘01.’

  2. Step II.

    The gray-coded data bits are split into consecutive strings of length four of each byte. The byte with equal first and third bit remains unchanged, whereas the byte with unequal first and the third bit is exchanged. Switching activity of encoded data is calculated, and sets it as third TC with flag bit ‘10.’

  3. Step III.

    The encoder repeats the original data to gray code conversion twice and calculates switching activity and set it as fourth TC with flag bit ‘11.’

  4. Step IV.

    The encoder compares the TC and chooses the encoded data with least TC with flag bits.

  5. Step V.

    The decoder compares the flag bits from the encoder and decides whether to perform decoding or just transmit the data as received.

figure b

3.1 Proposed Architecture

In this proposed method, NoC receives the data bits from the source processing element called core. The data bits sent from the core element are sent through the parallel link. The serializer converts parallel data bit stream into serial to eliminate the coupling switching activity. M bit data are sent over N parallel lines from the core. In [22], first parallel data are encoded and then serialized, where more number of encoders are needed to implement encoding, which in turn increases chip area and power consumption. In the proposed architecture, the encoder is placed following data serializer and encoder operates over serialized data, hence there is one set of encoder and decoder for each NI as shown in Fig. 4. Encoded data go through serial data line with two additional flag bits.

Fig. 4
figure 4

Proposed architecture for serial links

The proposed encoding unit utilized ‘two’ extra bits along with each ‘eight’ bits of encoded data. The two extra bits are utilized as status bit for resultant encoded data which are realized as a hard-coded multiplexer as a lookup table for all four conditions. Since it is realized as simple hard-coded multiplexer logic (lookup table), it won’t reduce speed.

Two extra bits have little area and power overhead. But in the proposed encoding unit, the extra bits are added along with the data bits only after encoding, hence the two bits are not undergoing any switching before encoding. Effective proposed encoding algorithm reduces the number of transitions of input data stream and overcomes the little power overhead effect of flag bits.

It has been justified through the synthesis results. The entire module is synthesized using the synopsis tool. Table 9 shows the synthesis of the entire module including the flag bit and results proves that the proposed work provides improved power consumption of 26%, 16% and 74% than SILENT [22], ALPTCS [36] and Data Correlation encoding method [16]. The NI is modified with the encoder and the decoder module in the proposed architecture. The link power in NoC is reduced by the placement of the CODEC circuits. Hence, it eliminates the need for buffers and additional structures to provide low-power data path. Thus, the power dissipation in NoC is greatly reduced.

Fig. 5
figure 5

Encoder architecture of the proposed method

3.2 Block Diagram of Encoder

The data encoding method proposed happens in the NI as shown in Fig. 5. The switching activity of the incoming data bit is calculated and set as the first TC with flag bit ‘00.’ The initial stage encoder used in this method is B2G converter. The B2G converter works as follows:

$$\begin{aligned} G(0)= & {} B(0) \oplus B(1) \end{aligned}$$
(3)
$$\begin{aligned} G(1)= & {} B(1)\oplus B(2) \end{aligned}$$
(4)
$$\begin{aligned} G(2)= & {} B(2)\oplus B(3) \end{aligned}$$
(5)
$$\begin{aligned} G(3)= & {} B(3) \end{aligned}$$
(6)

The data bits encoded by the B2G converter are calculated for switching activity and set as the second TC with flag bit ‘01’ and then the data are passed to the splitter. The splitter divides the data of eight bits into four bits of equal size. Then, it compares the first and the third bits for equality. If these two bits are unequal, then they will be exchanged. If, however, these two bits are equal, then there is no change. The coded data are calculated for switching activity and set as the third TC with flag bit ‘10,’ then B2G is done twice for the original data, the fourth TC with flag bit ‘11’ is calculated, and finally the data with least TC for any eight-bit encoded data are chosen along with flag bits. Hence, it ensures the maximum reduction of transitions without any overhead region as in the case of SILENT.

Fig. 6
figure 6

Decoder architecture of the proposed coding method

3.3 Block Diagram of the Decoder

Based on the comparison of TC, the encoder transmits the data along with two flag bits to the decoder. The decoder receives the data bits and flag bits from the encoder as shown in Fig. 6. Decoding process takes place based on flag bits. For example, if flag bits are ‘10’ as shown in Fig. 6, the flag bits ‘S0,’ ‘S1’ assumed to be ‘10’ and this selects the inverted logic in multiplexer making the splitter decode the data bits first. It checks the equality of the first and the third bit in each of the four bits. If the bits are unequal, then these two bits are exchanged. The shifted data are sent to the gray-to-binary (G2B) converter. The G2B converter works as follows:

$$\begin{aligned} B(0)= & {} G(0) \end{aligned}$$
(7)
$$\begin{aligned} B(1)= & {} B(0) \oplus G(1) \end{aligned}$$
(8)
$$\begin{aligned} B(2)= & {} B(1) \oplus G(2) \end{aligned}$$
(9)
$$\begin{aligned} B(3)= & {} B(2) \oplus G(3) \end{aligned}$$
(10)

Finally, the data bits retrieved by the decoder are sent to the destination processing element and Table 3 shows the average reduction in TC in the proposed method being 52.94%.

3.3.1 Effectiveness Proof

The proposed work significantly reduces the total power consumption by lowering the dynamic power, by actually decreasing the switching activity.

In Table 1, transition count is calculated and tabulated for the data with and without encoding. Switching transition is taken into account only for incoming data bits. The flag bits are the only status bits used, to intimate the receiver about the method of encoding utilized for the particular incoming data. The flag bits are added along with the data bits only after encoding; hence, the flag bit does not undergo any switching before encoding. The flag bit is generated only based on the encoded values; hence, the flag bits are not included in the calculation of input to output data switching activity in Table 1. However, the entire module is synthesized using the synopsis tool. Table 9 shows the synthesis of the entire module including the flag bit and results proves that the proposed work provides improved power consumption of 26%, 16% and 74% better than SILENT [22], ALPTCS [36], and Data Correlation encoding method [16] respectively. Similarly, flag bits are used in encoding process in ALPTCS [36], and Data Correlation encoding method [16], and the flag is transmitted to the receiver via adding it to head unit of flit in [36], and through two additional lines to the receiver in [16]. In encoding process, the efficiency of the proposed method is discussed and transition count is calculated and tabulated for unencoded data stream and encoded data stream in Fig. 2 in [36], and in Table 1 in [16]. Transition count is calculated without taking into account the effect of flag bits in both [36] and [16].

However, in our work, to justify the power reduction, the proposed design has been evaluated at the end-to-end switching activity scenario starting from NI, router and the links at the sending end of the links, NI at the receiving end in a mesh topology. The power requirement of the designed encoder and decoder in the worst data scenario is compared with state-of-the-art encodings. Result in Table 9 shows that our proposed method provides improved power consumption compared with state-of-the-art encoding methods.

Reduced power consumption, reduced size added with increased performance have made high-speed serial links more popular and utilized in communication industries to achieve faster line rates, enabling the globe of big data connectivity.

Table 1 Execution of proposed method with all possible combinations of flag bits

For example, consider a data stream ‘10110110’ to prove the effectiveness of the proposed encoding technique that depends on choosing the least TC based on three-step encoding algorithm as shown in Table 1. According to this method, the TC of sample data is calculated based on the number of 0 \(\rightarrow \) 1 and 1 \(\rightarrow \) 0 transitions. The first TC in the original data is 5. Now, the data bit undergoes B2G conversion like most significant bit—seventh bit remains same, sixth bit will be coded as seventh bit XORed with sixth bit. Likewise, the data bits are coded up to 0th bit. The ‘10110111’ data are coded as ‘11101101.’ The second TC is calculated as 4. Next, the gray data are shifted by dividing gray data into two equal sizes of 4 bits each as ‘1110’ and ‘1101.’ Now, the first and the third bits are checked for equality; here the two bits are equal in the first nibble and in the second nibble invert the first and third bit and hence coded as ‘0111.’ Then again, separated nibbles are merged to form 8 bits. The third TC of shifted data is calculated. Based on the above method, the third TC of shifted data ‘11100111’ is 2. Now, the original data undergo B2G conversion and repeated twice.

The resultant encoded data are ‘10011011,’ and the fourth TC is set as 4. All the four TCs are compared, and the least TC data are chosen as encoded data ‘11100111’ with flag bit ‘10.’ Hence, the original TC is reduced from 5 to 2 and the percentage of switching reduction attained is 60%. The encoding algorithm ensures that the average switching activity reduction is 52.5% for any random data stream.

In the decoder part, based on the two flag bits received, the decoder is split into two nibbles as 1110 and 0111, Now, the first and the third bits are checked for equality; here, the two bits are equal in the first nibble and in the second nibble invert the first and third bit and hence coded as ‘1101’ and converted ‘11101101’ into original data as ‘10110110’ by XORing the subsequent data bits with the seventh bit remaining the same.

4 Results Analysis

The switching activity or TC plays a vital role in dynamic power dissipation of on-chip interconnection links. The TC has been taken as the prime parameter for the performance comparison with other state-of-the-art encoding methods. The proposed encoding technique achieves significant switching activity reduction for any random raw data stream. To justify the effectiveness of the proposed method, a set of test data patterns with three levels of bit transition has been selected. The three different test patterns are

  • Best data with no transition

  • Worst data with maximum transition

  • Random data with 50% transition

Table 2 An example of proposed coding method with all possible test sets

All the above three possible test data streams with the corresponding number of transitions before encoding and after encoding using the proposed method are shown in Table 2. The total TC for proposed encoding method is 3, whereas the total TC for original data is 12. The proposed method offers better TC for the applied three different test patterns of zero TC, average TC and maximum TC.

To evaluate the efficiency of the proposed method, random data streams with parallel data lines from (A0 - A7) with eight-bit data width are applied to the encoding logic. The performance of the proposed method and other existing encoding methods is compared in Table 3. The proposed method provides 52.94% switching activity reduction, whereas SILENT [22] provides 38.23% and both Data Correlation [16] and ALPTCS [36] provide 32.35% switching activity reduction respectively as shown in Table 3. ADE [31] provides switching reduction of 50% as shown in Table 3, closer to our proposed method for the given random data pattern; however, ADE employs approximation and is applicable only for the applications that can tolerate approximated results such as video and image processing applications. It should be noted that the proposed method offers better result than ADE without any loss of information or approximation in the data bits.

4.1 Effectiveness of Random Data

In the proposed encoding method, switching activity is taken as a measure of efficiency as dynamic power consumption is directly proportional to switching activity. It closely depends upon the data pattern sent into the encoder. To illustrate the effectiveness of random data, 12-bit unsigned integers are generated randomly using Gaussian distribution with same mean and variable standard deviation ranging from 20 to 212. Each input set consists of 10,000 words. Table 4 shows that the proposed method provides the maximum TC reduction for quite a large range of standard deviations, compared with the other encoding algorithms.

The proposed method provides significant TC reduction comparatively with all other encoding methods for a wide range of standard deviations from 2 to 128 as shown in Table 4. Serialized correlated data provide low TC, when differential encoding (DE) is applied [22], but in [31] the author has proved that even though when highly correlated data are transmitted, when hamming distance falls in the range between n/3 and 2n/3, where ’n’ is the width of the data in the links. DE results are not appreciable as they fall in overhead range. When standard deviation is extended, then Gaussian distribution approaches uniform distribution, but still the proposed method yields the best result at all larger values of standard deviation \(\sigma \).

To prove that the transition reduction is greatly associated with the data stream sent, the proposed algorithm is executed and tested for eight-bit data with all 256 combinations. Data streams are encoded using SILENT [22], ALPTCS [36], Data Correlation Method [16], ADE [31] along with the proposed algorithm, and the results are tabulated in Table 5. The proposed algorithm yields the better result compared with SILENT [22], ALPTCS [36], Data Correlation method [16], ADE [31]for eight-bit data with 256 combinations as shown in Table 5 and concludes that for any raw data stream of any combination of bits, the proposed method yields 58.3% switching activity reduction as shown in Table 5. In SILENT [22], the data stream is encoded in parallel stream and then serialized. However, the data stream is encoded after serialization; thus, it reduces the number of bits’ lines effectively at the initial stage in the proposed architecture.

Table 3 Comparison of proposed method with other encoding methods for random data stream
Table 4 TC reduction of proposed method and state-of-the-art encodings for random Gaussian data traces with a different standard deviation and same mean; Mean = 128; SD from 2 to 128
Table 5 TC comparison of the proposed method with other data encoding algorithms for eight-bit data with 256 combinations
Table 6 Comparison of number of transitions of different types of files without encoding and with different algorithms with proposed method

To conclude, since the switching activity reduction will vary widely for different input files, even for same application, the proposed code was tested against different types and sizes of data streams such as doc file, pdf file, jpeg file, png file and mp3 file. Results are compared with SILENT [22], ALPTCS [36], Data Correlation method [16], ADE [31] and are tabulated in Table 6. Experiment results prove that the proposed method is far better than other encoding methods and provides energy savings in all cases and without any overhead as of SILENT [22] as shown in Fig. 7. The proposed code obtains the average switching activity reduction of 48.83%, which is better than state-of-the-art encoding methods. The advantage of the proposed method is clearly illustrated in Table 6 that the proposed code encodes the data pattern in such a way to minimize the power dissipation in serial links. Different files such as, text file, image, pdf and mp3 are transferred and switching activity are reduced irrespective of file types. Hence, the proposed design is suitable for any type high-speed application such as real-time image/video processing [38], neural network accelerators [11], machine learning [9] and deep convolutional neural networks (CNNs) [6].

Fig. 7
figure 7

Comparison of percentage of transitions of different types of files without encoding and with different algorithms with proposed method

4.2 Implementation

The proposed algorithm with router including CODEC is simulated in Synopsys VCS Simulator using Verilog HDL and synthesized with Synopsys design complier using UMC 90 nm technology library. Mesh router with CODEC in NI is designed as shown in Fig. 8. For a mesh router, five input and output ports are needed and are considered as east, west, north, south, and local port. NI sends data bits from IP core to routers and vice versa. XY routing algorithm is used, and the condition for direction to sender is shown in Table 7. The router detects destination address form each flit and passes it to destination.

Fig. 8
figure 8

End-to-end CODEC circuit with router

Table 7 Condition for port direction from sender to receiver
Table 8 Time complexity of state-of-the-art encoding algorithms

4.2.1 Run-time Complexity

To measure the complexity of the proposed encoding algorithm, the run time of the algorithm is computed. It depends on the number of operations executed with respect to the specific input. The run time of the algorithm will be very longer, when more number of operations are executed. To measure the performance of the algorithm, time complexity function is used. The worst-case complexity of all the algorithms except ADE is computed using asymptotic analysis [2]. As ADE is lossy compression, it will not be a fair comparison for area, power and run-time complexity with the proposed method. The time complexity is expressed as the worst-case number of primitive operations performed as a function of input size and expressed the function using big O as shown in Table 8. Run time of the algorithm depends on the size of the input. When the file size is small, SILENT and ALPTCS will execute with a minimum run time [1, 45], but as the file size increases, the proposed method will execute less run time than SILENT and ALPTCS, as the proposed method depends on ’N log N’ time complexity. Data Correlation and the proposed method will execute with the same run time with minimum difference as both the methods have worst time complexity of O(N log N) [2] as shown in Table 8.

Table 9 Comparison of power consumption of proposed method with state-of-the-art encodings

4.2.2 Power Consumption Comparison of State-of-the-Art Encodings

The proposed design has been evaluated at the end-to-end switching activity scenario starting from NI, router and the links at the sending end of the links, NI at the receiving end in a mesh topology. 5X5 router (5 buffered input port 5 unbuffered output port) is utilized in mesh topology. Packet switching mechanism and XY routing algorithm are applied for simulations.

Table 10 Power and area requirement of router with and without CODEC

The power requirement of the designed encoder and decoder in the worst data scenario is compared with state-of-the-art encodings except ADE are tabulated in Table 9. As ADE is lossy compression, it will not be a fair comparison for power with the proposed method. Scaling equation number (11) and coefficients of Table 5 in [42] are used to estimate the power of SILENT from 0.18 um CMOS technology to 90 nm technology and tabulated in Table 9. Result shows that our proposed method provides power consumption of 26%, 16% and 74% better than SILENT, ALPTCS and Data Correlation encoding method with area overhead about 1.5 times than Data Correlation method.

This encoder has been tested against different types and sizes of data streams. It includes .jpg file, .png file, .pdf file, .text, .doc file and .mp3 file. Experimental results have proved that the proposed algorithm provides an average of 48.83% reductions in the number of transitions with respect to different types of files—42.73% for doc file, 43.03% for pdf file, 53.57% for .png file, 52.59% for jpeg file and 52.07% for mp3 file as shown in Table 6. The proposed method shows the reduction in transitions in all cases, and it should be noted that it guarantees that it will never increase the number of transitions. Overall, the proposed method has better TC and since the transition count is less, it is safe to say that our method will provide better power improvement compared to the state-of-the-art techniques. Table 10 reports area and power requirement of the router with and without CODEC of the proposed method. It can be seen that inclusion of the proposed encoder and decoder to the router adds only a slight area overhead.

5 Conclusions

On-chip serial communication has major applications in high-speed embedded NoC-based systems; however, serial links tend to suffer from power dissipation due to bit multiplexing. Therefore, the ultimate aim of the VLSI design of NoC is power optimization. This work offers a more efficient encoding method to reduce the number of transitions in the serial link of the NoC router to minimize dynamic power consumption due to switching activity in the capacitances. The proposed encoding technique has been tested for all 256 combinations of eight-bit data, and results provide 58.3% TC reduction. In addition, the switching activity will never increase to the original TC, which is a better result than state-of-the-art encodings such as SILENT, Data Correlation, ALPTCS and ADE. The proposed method has been compared with the above-said state-of-the-art encodings for different types of files like .doc, .pdf, .png, .jpeg, and .mp3, and it provides an average 48.83% switching activity reduction for any raw data stream; hence, it is suited for all real-time applications. The power consumption of the synthesized netlist has been calculated using Synopsys prime tools, and the result shows that our proposed method provides power consumption of 26%, 16% and 74% better than SILENT, ALPTCS and Data Correlation encoding method and better power savings with respect to different data streams which makes our proposed method superior to all state-of-the-art encodings, and it is well suited for serial link on-chip communication-based networks.

In our proposed method, switching activity of the links is considered and proceeded for power reduction.

Our future research will focus on reduction of area overhead. Optimal guaranteed bounded results based on varying delays for a shared communication network are well analyzed in [27, 28], and their results will help to focus our future work toward in reducing varying transition delay effects for improved system performance.