Keywords

1 Introduction

The advancement in probabilistic model checking has enabled its applications in a wide range of domains, including cryptography [11], systems biology [22], network protocols [21], game theory [6], and distributed systems [20]. Likewise, in recent times, the growing demand for robust and secure digital system design has challenged the potential for innovation in formal methods. In this work, we venture into the probabilistic model checking of the reliability evaluation of a complex and distributed digital system—the on-chip communication network, network-on-chip (NoC), deployed in a many-core system.

NoC—the de-facto standard for on-chip communication in modern many-core systems—generally comprises of several topologically homogeneous routers operating synchronously in a decentralized control system. Despite the conceptual similarity with conventional computer networks, a NoC is subject to several unique reliability challenges, e.g., process and environmental variations, that are vastly dissimilar to conventional network communication. Over a decade of simulation-based research has gone into NoC design exploration and reliability analysis [1,2,3, 27]. However, simulation-based ecosystems fail to provide worst-case reliability and safety guarantees. Consequently, formal verification is necessary to ensure the correctness of specific functionality of the NoC components.

The primary challenge of applying automated verification, specifically, model checking, is the notorious state explosion issue, as evidenced by a recent work on model checking an asynchronous NoC [31] where the intermediate state space corresponding to only 13 out of the 66 components in a \(3\times 3\) NoC consists of several hundred million states. Consequently, accurate modeling of the reliability issues (e.g., power supply noise, quality-of-service guarantees, etc.) is poised to further aggravate its computational complexity.

This paper presents a probabilistic model checking method for the analysis of power supply noise (PSN) for a generic central router of a large mesh-NoC system and its impact on the router’s reliability under uniform random traffic loads. To enable an accurate and efficient analysis and a convenient formulation of the probabilistic properties, we extract the key characterizing features of the router at the behavioral level. We present formal models for the central router with four full-duplex channels, operated under uniform random packet injection with the starvation-free Round Robin conflict resolution scheduling. To tackle the state space explosion challenge, abstract models have been derived based on critical observations of conflict resolution patterns. Transition probabilities between abstract states are inferred from exhaustive executions of the underlying concrete models with limited steps. We use the high-level formal modeling language Modest [13] to formulate our models, the state spaces of which are large discrete-time Markov chains (DTMC), and the Modest Toolset’s [15] probabilistic model checker mcsta for the analysis. We check reward-bounded properties, for which mcsta implements scalable analysis techniques [12]; in particular the state elimination approach resulted in significant analysis speedups. The final verification results show significant scalability of our abstract models, and reveal key relations between traffic loads and PSN.

2 Motivation

PSN in the power delivery network of an integrated circuit is composed of two major components: (a) resistive noise, which is estimated by the product of the current drawn and the lumped resistance of the circuit; and (b) inductive noise that is caused by the inductance the power grid and is proportional to the rate of change of current through the inductance (\(\frac{{\varDelta }i}{{\varDelta }t}\)). For a distributed system such as a NoC, the latter takes a central component [2].

A high inductive noise is responsible for the intermittent peaks in the cycle-wise noise profile of a NoC. Basu et al. have recently demonstrated that, in an \(8 \times 8\) NoC, the peak PSN can increase from 40% of the supply voltage at the 32-nm technology node to about 80% of the supply voltage at the 14-nm technology node, while running a uniform-random synthetic traffic pattern [2]. Voltage droop due to PSN can radically degrade the delay of various on-chip circuit components. Such increase in the delay has the potential to engender timing errors in the pipe-stages of the NoC routers, thus severely impacting the reliability as well as the performance of the overall on-chip communication.

Although recent works [2, 27] tackle the PSN problem in NoCs to some extent, they do not guarantee the worst-case peak PSN—a determinant of the NoC reliability—across different operating conditions, realistically conceivable, for any parallel workload. Moreover, these works do not provide any bounds on the temporal PSN profile for a router, given an application execution. Consequently, existing approaches to mitigate PSN are a far cry from a truly reliable NoC design paradigm that can be deployed in mission-critical systems. On the contrary, this work shows that probabilistic model checking, despite its inherent challenges, can offer precise bounds on the performance and reliability with common environment assumptions, leading the way to future reliable NoC design.

3 Related Work

Reliable and energy efficient communication is the backbone of many-core systems. Significant recent research exploring reactive, proactive and predictive techniques has focused on addressing the challenges of fault tolerance in NoC [4, 5, 7, 17, 28]. However, a wide majority of these works are simulation-based analyses, which cannot provide rigorous reliability and performance guarantees.

Formal verification in NoC architectures has largely been focused on functional correctness of routing algorithms [26, 31, 32]. Zhang et al. investigate properties of deadlock and livelock freedom and tolerance to link failure, and use model checking to enhance an existing routing protocol [31, 32]. Based on theorem proving techniques, the DCI2 developed by Verbeek demonstrates significant scalability in proving properties of deadlock and livelock freedom and topology violations of statically determined routing logic [29]. Accurate assessment of NoC reliability has to incorporate quantitative aspects depicting the inherent distributive and reactive nature of NoC. Coste et al. presents in [8] a translation procedure to convert existing functional model into Markov chains for the evaluation of the latency of memory accesses over a Globally Asynchronous Locally Synchronous (GALS) NoC. Nevertheless, the scope of existing literature in probabilistic verification of the NoC is minimal.

On the other hand, researchers have extensively employed probabilistic verification to assess and improve reliability, resilience, and security of computer hardware designs [10, 24, 25, 30]. For example, Han et al. demonstrate how to obtain the fundamental error bounds by using bifurcation analysis based on probabilistic models of unreliable gates [14]. Kumar et al. propose an automatic compositional reasoning technique to improve the scalability of probabilistic model checking of hardware systems [19]. Mundhenk et al. propose probabilistic model checking for the security analysis of automotive architectures at the system level [23]. However, the dividends of these works have not yet been carried forward to the NoC domain.

Fig. 1.
figure 1

NoC router model.

4 Conflict Resolution in Central Routers

Figure 1 depicts the central router of an \(8 \times 8\), two-dimensional mesh NoC network [2]. It has four full-duplex channels with the bandwidth of one flit of a network packet, where each channel has a buffer with the capability of storing four flits. The router simultaneously transmits and receives flits in all four directions. Assume that each flit carries the next forwarding direction, and that a flit is not diverted back to its incoming direction. The forwarding direction is used for the arbiter in the central router to detect possible conflicts. The arbiter resolves conflicts, created by multiple flits originated from the four buffers attempting the same output direction. The order of conflict resolution relies on the Round Robin scheduling mechanism to guarantee fairness and starvation-free arbitration. The input interface handles flits arrived from all of four directions, and accommodates them in the first available space in the corresponding buffer. The output interface directs flits from the arbiter that are ready to be dispatched to the neighboring routers. The rest of this section describes details of the arbiter’s conflict resolution mechanism.

Since the bandwidth of all outgoing channels allows only one flit at a time, conflicts are resolved inside the central router. Conflicts affect the performance of each individual router and hence the entire NoC. During each clock cycle, the arbiter first examines each buffer’s front flit’s outgoing direction to detect conflicts. If no conflict exists, all the buffers can forward their front flits to their respective outputs in one cycle to maximize the throughput. Otherwise, the arbiter has to resolve all conflicts, requiring one or more additional cycles. Figure 2 demonstrates three representative scenarios of conflicts and their resolution. For simplicity, we ignore the incoming packets to all four buffers at each cycle, and only illustrate conflict resolution. In Scenario A, only one conflict exists between the east and west buffers at cycle \(t_n\), and the east buffer has higher priority. The arbiter, therefore, serves the east buffer at cycle \(t_n\), and it also directs the north and south buffers in this cycle, as their flits do not conflict. In the following cycle \(t_{n+1}\), the priority gets updated by shifting the priority queue, and the west buffer is served. Scenario B demonstrates two pairs of conflicts: the north and south buffers compete for the east output, while east and west compete for the north output in cycle \(t_n\). Following the priority queue, the arbiter serves the east and south buffers in cycle \(t_n\), and the west and north buffers in the next cycle. Scenario C illustrates a three-way conflict: all front flits of the east, west, and south buffers compete for the north output. The arbiter serves the west buffer first, and simultaneously serves the north buffer as it is not conflicting with others. In the following cycle \(t_{n+1}\), the south buffer gets serviced, as it has higher priority than the east, leaving the east buffer to be serviced in cycle \(t_{n+2}\).

Fig. 2.
figure 2

Conflict scenarios.

5 Formal Model of the Central Router

The formal model implements all potential conflicts in the central router. The Modest language [13] is used to model the router as shown in Fig. 1. We introduce a datatype shown in Listing 1.1. Integer variable represents the front flit’s destination in each buffer: 0 (north), 1 (east), 2 (south), or 3 (west). Value \(-1\) indicates an empty buffer. The field stores the buffer location in the central router, with the same encoding as the flit’s destination. Variable is if the front flit was serviced in the current cycle, and otherwise. The field represents the priority position each buffer will occupy in the next clock cycle. Lastly, the actual buffer, , is modeled as an integer linked list. The size of is set to four for all models presented in this paper, but its length can be set to any finite integer. The arbiter model is an array of four values. The position of in represents the current priority for servicing all four buffers, array index 0 being the highest priority and 3 being the lowest. For example, if refers to the east buffer with and , then at the beginning of the next cycle the buffer will have been moved to position . Two internal integer variables, namely, and are used in the Round Robin scheduling mechanism: counts the number of unserviced buffers in one cycle due to conflict, and decrements as the unserviced buffer’s priority values are calculated; and tracks the total number of buffers unserviced in one cycle.

figure v

The field for each buffer and are updated by the procedure shown in Listing 1.2. It automatically sets to for the buffer in position 0, because the arbiter will definitely serve this buffer in the current cycle. It then moves on to the buffers in all remaining positions. If a buffer is non-empty and is in conflict with another buffer with higher priority, then the latter will be serviced in the current cycle and the former has to wait for its chance in the next cycle. Therefore, of the former is set to and is incremented. Otherwise is set to . Lastly, the arbiter assigns the updated .

figure ah

Next, priority for each buffer gets updated using the procedure shown in Listing 1.3. It should be noted that priority update is assumed to strictly follow the order shown in the procedure, starting with the buffer in position 0 of the arbiter array . If the buffer at was serviced, its is updated by peeking the front of the corresponding buffer, followed by an update of the buffer itself. If no element is in , the function will return \(-1\) to indicate an empty buffer. The is updated to the sum of its current priority and the number of unserviced buffers, whose priorities have not been updated. Intuitively, buffers not serviced in the current cycle will be given higher priority in the next, and those serviced receive priority corresponding to their position in the arbiter array. If the buffer was not serviced, the is determined by subtracting from the total number of unserviced buffers in the current cycle, after which the is decremented. We use this method to keep track of the order for buffers that did not get serviced in the current cycle.

figure as

As an example, assume , and are , , , and , respectively. Both and are set to 2, because the east and west buffers were not serviced in the current cycle. Priority updates start with , i.e., the north buffer. Since it was serviced in the current cycle, its priority is updated to \(0 + 2 = 2\). The value of remains at 2. Next, the priority is updated for the east buffer. Because it was not serviced in the current cycle, its priority is set to \(2 - 2 = 0\), giving itself the top priority for the next cycle. The value of then decrements from 2 to 1, indicating that one remaining unserviced buffer is scheduled for the next cycle. Similarly, priorities for the south and west buffers are updated to \(2 + 1 = 3\) and \(2 - 1 = 1\), respectively. The variable decrements to 0 after all priory updates. The resulting arbiter array is .

To model incoming flits to all four buffers, we randomly assign their fields using the discrete uniform distribution, with the exception that a buffer does not receive a flit destined to its incoming direction. Probabilistic model checking on this routing node model incurs exponential state space growth as cycles increase, quickly becoming too large to be handled. For 100 clock cycles, mcsta explored 400 million states with another 100 million queued for expansion when 132 GB memory were filled. This is mainly due to the combinatorial explosion of flit values. To address this issue, we present several abstract router models next.

6 Abstract Models and Refinement

Abstract models presented in this paper are based on an ad-hoc method specifically for the central router. The initial abstraction is based on the observation that rather than specific scenarios of conflicts formed by the field of four buffers, the arbiter’s behavior is only determined by a few conflict patterns that can co-exist in one cycle, including non-conflicting scenarios. This observation leads to four abstract states: (1) no conflicts, where all buffers are serviced in the current cycle; (2) one pair of conflicts, where the only unserviced buffer is the one with lower priority in the pair; (3) two pairs of conflicts with two unserviced buffers, both of which have lower priority compared to their conflicting counterparts; and (4) three buffers in conflict, where the two buffers with low priorities are not serviced in the current cycle. Four buffers cannot all be in conflict as it is assumed that a flit is not diverted back to its incoming direction.

Since the abstract model is formulated at the behavioral level without circuit-level details, one has to project the measure of PSN onto the same abstract level. We know that the inductive noise, a major source of PSN, is proportional to the rate of change of current in the circuit. An abrupt change in the router activity in two consecutive cycles directly leads to a high rate of current change [2]. A low router activity is characterized by the arbiter serving no routers in a cycle, as all buffers are empty; while a high router activity is indicated by the arbiter serving three or more buffers in a cycle. The relative frequency of both high-to-low and low-to-high activities over a given timespan can, therefore, accurately reflect the state of the local noise and hence PSN in the NoC routers. For this purpose, we consider the following two probabilistic properties: (1) the probability that the number of high router activity cycles is lower-bounded by \(k \cdot N\) within N overall cycles; and (2) the probability that the total number of high-to-low and low-to-high activities is lower-bounded by \(k \cdot N\) within N overall cycles, where \(k \in (0, 1)\). High router activity, as indicated by property (1), can potentially create a high local congestion in the network, leading to a high PSN due to an unbalanced power density [9]. On the other hand, property (2) reflects a large and sudden load change in a router that can lead to a large inductive drop in the power delivery network of the NoC [2]. Collectively, understanding these properties is essential to ascertain the minimum voltage guardband for the NoC, sufficient to ensure a fault-free communication in a many-core system. To facilitate checking of these properties, two variables are created, namely, , which increments if all four buffers are serviced in a cycle, and , which accumulates cycles with high-to-low or low-to-high activities. Formulation of these properties is presented in Sect. 8.

The initial abstract model, however, is incorrect in that after two clock cycles, the accumulation of diverges from that obtained from the concrete model. This is because the probability varies when transitioning between two states with two-pair conflict. Specifically, different scenarios of two-pair conflict result in different probabilities. Table 1 illustrates some examples. Each entry listed under columns shows the buffer location and the destination of its front element. For example, “n(e)” under column means that the north buffer’s front flit is destined for the east output. The entry “w(n, e, s)” in the same column indicates that the west buffer can receive a flit destined to any other three directions. For state “2a”, if the arbiter has the two-pair conflict in cycle k, then and are serviced, allowing “n(e)” and “e(n)” to move to and , respectively, in the next cycle. Observe that at cycle \(k+1\), two-pair conflict scenarios include (n(e), e(n), w(n), s(e)) and (n(e), e(n), w(e), s(n)), and the possible three-way conflicts are (n(e), e(n), w(e), s(e)) and (n(e), e(n), w(n), s(n)). For state “2b” with a different two-pair conflict pattern at cycle k, the next cycle can only form the two-pair conflict (w(n), s(e), n(e), e(n)), and no three-way conflict can exist. For state “2c”, the only two-pair conflict is (w(s), s(e), n(e), e(s)), and the only three-way conflict is (w(s), s(e), n(s), e(s)). Our analysis reveals that such discrepancies exist in other abstract states.

Table 1. History-dependent conflict examples.

The four-abstract model is refined based on an analysis of all possible inputs into the arbiter and their respective behaviors. The possible inputs can be grouped into thirteen behaviors which are defined as states as shown in Table 2. Each refined state is conditioned on the number of unserviced buffers at the end of a cycle and where the flit’s destination points, specifically, buffer locations (i.e., the field) the destinations at the arbiter’s positions 0 and 1 point to. This table shows predicates defining these refinement conditions. Notations have been simplified as follows: \( dest _i\) represents the front flit’s destination of the buffer at index i of the arbiter array, i.e., .

To calculate transition probabilities among the thirteen abstract states, we modify the concrete model to include two variables: \(s_{ prev }\) and s. For every clock cycle, \(s_{ prev }\) first updates to s and then all predicates in Table 2 are evaluated and s is updated accordingly. Assuming the model starts with no conflicts (\(s_{ prev } = 0\)), we observe that for up to two transitions, which corresponds to two clock cycles, every one of the thirteen states in Table 2 is reachable. Transition probability emanating from state 0 to an abstract state, say 1b, is calculated by summarizing all probabilities of transitioning from the concrete state 0, which is the same as the abstract state 0, to all concrete states that satisfy the predicate for state 1b, which is \( unserviced = 1 \wedge dest _0 = id _2\). For this calculation, we added to the model a variable \( clk \) that is incremented with every clock cycle. The calculation is then performed by first using mcsta to query for

$$\begin{aligned} \mathbf {P}_{=?}(\diamond \, ( clk = 2 \wedge s_{ prev } = 0 \wedge s=2)), \end{aligned}$$

i.e., the probability to eventually (\(\diamond \)) reach a state in the model after two clock cycles where \(s_{ prev } = 0\) and the new abstract state is \(s=2\), i.e., 1b in Table 2. We then divide the result by the sum of probabilities out of state 0. Other transition probabilities are calculated similarly. Another observation is that the next states and transition probabilities from states 2b and 2c are identical, so we combine them into state 2b to form a twelve-state abstract model as shown in Table 3.

Table 2. Refined abstract model with thirteen states.
Table 3. Twelve-state abstract model with transition probabilities.

7 Including Idle Cycles in the Abstract Model

The twelve-state abstract model shown in Table 3 assumes that a flit is injected to all buffers in every cycle. This is, however, not quite realistic as it is common that one or more buffers do not receive an incoming flit. Such situations change the conflict patterns and hence the arbiter’s resolution behavior. From [2], we know that the cycle-wise and intermittent PSN is a direct result of a significant change of buffers served. Precisely three or four buffers are serviced by the arbiter between two consecutive clock cycles. Counting these changes allows us to accurately reflect the state of the local noise and hence PSN in the NoC routers. This implies that change from serving zero to four buffers and vice versa needs to be modeled. This section describes a modified abstract model to include idle cycles for each buffer. Using similar method as described in Sect. 6, refinement is applied to the twelve-state abstract model to account for scenarios with three, two, one, and none serviced buffers in one cycle. This leads to the twenty-five-state abstract model provided in Table 4. The state notation in this table represents the conflict scenario and the number of buffers with incoming flits in a given state. For example state \(2_4^b\) represents the state with the conflict scenario 2b in Table 2 in which four buffers have incoming flits. Note that this refinement does not change the fact that probabilities in this table can be calculated by checking its underlying concrete model for two clock cycles as described in Sect. 6.

Table 4. Twenty-five-state abstract model with transition probabilities.

8 Verification Results

All experiments have been performed on the abstract central router models, which are constructed as DTMC models using the high-level compositional modeling language Modest. The explicit-state probabilistic model checker mcsta in the Modest Toolset has been used for verification. Properties (1) and (2) are bounded probabilistic reachability queries for the transient behavior up to N clock cycles, with N being a rather large number. Implementing the cycle counter \( clk \) as a state variable, which we did for the computations in Sect. 6 with bound 2, would unfold the model over the cycle count up to the (now large) bound, exacerbating the state space explosion problem. To avoid this problem now, we made \( clk \) a transient variable that is set to 1 when moving from one clock cycle to the next and to 0 otherwise. A transient variable is only “live” during the assignments executed when taking a transition; it is not part of the state vector. In this way, clock cycle progress becomes a reward annotation to certain transitions instead of being encoded in the structure of an (unfolded) state space. We can then formalize properties (1) and (2) as reward-bounded reachability queries:

$$\begin{aligned} \begin{aligned} \text {(1)}\quad&\mathbf {P}_{=?}(\diamond ^{[\mathrm {accumulate}( clk ) \leqslant N]}\, \texttt {optimalRuns} \geqslant k \cdot N)\\ \text {(2)}\quad&\mathbf {P}_{=?}(\diamond ^{[\mathrm {accumulate}( clk ) \leqslant N]} \, \texttt {noiseRuns} \geqslant k \cdot N) \end{aligned} \end{aligned}$$

We use the state elimination method [12] implemented in mcsta for the reward-bounded property checking reported in this section. For our experiments, it provides a significant scalability and efficiency improvement over the classic unfolding-based approaches, but also over the default modified-iteration method, both of which we attempted to use in earlier versions of this model. In this way, our experience mirrors the performance behaviour observed earlier in [12, 16, 18].

Fig. 3.
figure 3

The two probabilistic properties denote high activity (Fig. 3a) and high change in activity (Fig. 3b) in the central router of a mesh NoC, experiencing a uniform-random traffic (Sect. 6). The steep curves reveal a high probability of a heavy congestion, as well as, a sudden and large change in the traffic, which can cause a high PSN in the NoC power delivery network.

Results are generated on a desktop computer with an AMD Ryzen Threadripper 12-Core 3.5 GHz Processor and 132 GB memory, running Ubuntu Linux. One core is used at any time. All results presented in this section assume uniform random packet arrival at all four buffers. Verification results for property (1) are presented for the twelve-state abstract model described in Sect. 6, and it is expected that these results are over-approximations for the abstract model with idle cycles in Sect. 7. For checking property (2), the latter model is used.

Figure 3a shows the cumulative probability for property (1) with several lower bounds. The formal model used for checking this property does not consider any empty buffer, in order to demonstrate the worst case scenario. At the architecture level, a high activity denotes the reception of three or more flits in one cycle. The steep slope of the curves indicates that the central router of a mesh NoC is likely to experience a heavy surge of the traffic load at a relatively short span of time. Such high load of traffic can engender a local hostspot in the network, which in turn, can lead to a large peak PSN.

Figure 3b depicts the cumulative probability for property (2) with several lower bounds. In this case, we consider empty buffers in the formal model of the central router. A high probability of such transitions within a short time span, as seen in this figure, denotes a bursty nature of the traffic encountered by the central router. As a result, there is a large inductive noise in the power delivery network of the NoC. Collectively, these two properties are pivotal in determining the minimum voltage guardband for the central router, because a more conservative guardband marks a power inefficient design, while a smaller one will be prone to intermittent timing errors in the NoC, aggravating its reliability.

Table 5 shows the peak memory usage and the total run-time reported by the mcsta tool. Model checking property (2) requires significantly more memory than that for property (1). This is due to the increased complexity of the twenty-five-state abstract model depicting idle cycles over the twelve-state model, as well as, that checking property (2) requires more cycles to converge.

Table 5. Performance results.

9 Conclusion

This paper presents a probabilistic model checking method for the reliability analysis for a generic central router of a large mesh NoC design under uniform random traffic loads. To combat the notorious state explosion problem, abstract models have been derived based on critical observations of conflict resolution using the Round Robin scheduling mechanism. Probabilistic properties are derived by identifying the frequency of abrupt changes in router activities, which causes the inductive noise of PSN. To enable efficient checking, the clock cycle counter variable is set as transient and is treated as a reward annotation only to certain transitions, instead of part of the state space. Verification results reveal crucial PSN behaviors that allow the minimal voltage guardband to be determined for the central router, providing insights in NoC designs with improved reliability.

For future work, we plan to extend the central router model with increased number of channels and variants of Round Robin scheduling mechanisms. Incorporating routing protocols in the router model is also important, as it enables us to model a full NoC and better evaluate its reliability with respect to PSN. Additionally, we plan to investigate probabilistic predicate abstraction techniques to automate the abstraction and refinement of larger NoC models, and evaluate how they may affect the verification of PSN-related properties.