1 Overview

Dynamic timing analysis (DTA), also known as simulation-based timing analysis technique, is complicated for even small FPGAs because of huge number of input vectors and unbearable long simulation time, while static timing analysis (STA), which could analyze a design in a very short time, is then thriving. As a mainstay of modern FPGA design flows, STA breaks a design down into timing paths, calculates the signal propagation delay along each path, and checks for violations of timing constraints inside the design and at the input/output interface. STA also has been integrated with timing-driven EDA engines to optimize FPGA’s timing performance.

The target design checkpoint (containing timing constraints and timing graph) and device library (containing timing models) are the main inputs that a timing analysis engine needs. The final output is the timing report (Fig. 5.1).

Standard Delay Format (SDF) is another optional output of timing engine. SDF is an IEEE standard for the representation and interpretation of timing data (both cell delays and interconnect delays) for use at any stage of the electronic design process [1]. This can be used along with the netlist in a simulator to verify that design meets its functional and timing requirements.

Fig. 5.1
A flowchart of the inputs and output in timing analysis. The device library and design checkpoint are given to the timing analysis engine. The timing analysis engine consists of timing models, graphs, and constraints. The timing report and S D F are the outputs received.

Typical inputs and outputs in timing analysis flow

Given that device timing model has been discussed in Sect. 2.2.2, timing constraints has been discussed in Sect. 3.2.2, another main input–timing graph, derived from target design checkpoint, will be introduced in the following section.

Before we dive into the timing calculation algorithms, here are some basic concepts about STA. Figure 5.2 is the most common used picture to illustrate this.

Fig. 5.2
A schematic diagram of the setup or hold timing analysis. The data input is given to register 1, followed by the combinational T logic, register 2, and ends with the data output. C l k to the 2 registers 1 and 2 has T c l k skew.

Typical setup/hold timing analysis

Equations 5.1 and 5.2 can accurately represents the calculations of setup time slack (\({\text {Slack}}_{\text {setup}}\)) and hold time slack (\({\text {Slack}}_{\text {hold}}\)).

$$\begin{aligned} {\text {Slack}}_{\text {setup}} = T_{\text {period}} - (T_{\text {cq}} + T_{\text {logic}} + T_{\text {net}} + T_{\text {setup}} - T_{\text {clk}_\text {skew}}) \end{aligned}$$
(5.1)
$$\begin{aligned} {\text {Slack}}_{\text {hold}} = T_{\text {cq}} + T_{\text {logic}} + T_{\text {net}} - T_{\text {hold}} - T_{\text {clk}_\text {skew}}) \end{aligned}$$
(5.2)

where \(T_{\text {period}}\) is clock period, \(T_{\text {cq}}\) is defined as time it takes for data to appear on output Q once clock is triggered (pos edge or neg edge), \(T_{\text {logic}}\) is the delay of the combinational logic, \(T_{\text {net}}\) is the delay of the routing net, \(T_{\text {clk}_\text {skew}}\) is the time difference between the clock arriving time at the two flip-flops.

To simplify the equation, \(T_{\text {net}}\) and \(T_{\text {clk}_\text {skew}}\) can be ignored. In order to make sure that \({\text {Slack}}_{\text {setup}}\) and \({\text {Slack}}_{\text {hold}}\) are positive, we can derive Eqs. 5.3 and 5.4 (plus \(T_{\text {setup}}\) on both sides) from Eqs. 5.1 and 5.2.

$$\begin{aligned} T_{\text {period}} > T_{\text {cq}} + T_{\text {logic}} + T_{\text {setup}} \end{aligned}$$
(5.3)
$$\begin{aligned} T_{\text {cq}} + T_{\text {logic}} + T_{\text {setup}} > T_{\text {hold}} + T_{\text {setup}} \end{aligned}$$
(5.4)

Combine Eqs. 5.1 and 5.2, we can have Eq. 5.5.

$$\begin{aligned} T_{\text {hold}} + T_{\text {setup}} < T_{\text {cq}} + T_{\text {logic}} + T_{\text {setup}} < T_{\text {period}} \end{aligned}$$
(5.5)

\(T_{\text {cq}} + T_{\text {logic}} + T_{\text {setup}}\) is the data propagation delay, if it is greater than \(T_{\text {period}}\), the data will not arriving when the second register is sampling, on the other hand, if it is smaller than the register sampling window (\(T_{\text {hold}} + T_{\text {setup}}\)), the registers could fall into metastability.

In FPGA design, STA can be performed in different stages: post-synthesis (logical level) and post-implementation (physical level). Post-synthesis STA (based on ideal implementation information) is faster but less accurate than post-implementation STA (based on real implementation information).

2 Timing Analysis Techniques

STA usually requires a timing graph that describes the target design from the timing perspective, identifying all the timing paths. The timing graph consists of nodes and edges, nodes correspond to component pins or input/output ports, and edges are the timing path between them. Edges have attached weights that can denote some characteristics such as delay values [2].

Timing Graph Definition: A timing graph G = N, E, s, t is a directed graph having exactly one source node s and one sink node t , where N is a set of nodes, and E is a set of edges. The weight associated with an edge corresponds to either the gate delay or the interconnect delay (Fig. 5.3).

Fig. 5.3
A directed node graph of timing. The source is followed by primal input, tile, primal output, and ends with sink. The arrows represent the timing path.

Example of timing graph

Traditional STA is deterministic (DSTA) and compute the circuit delay for a specific condition. In practice, the worst-case slow or best-case fast process is typically used and this could lead to over-design, leaving a lot of margin on the table in terms of PPA. Statistical STA (SSTA) then come out to address this problem. It combines the delays along the timing paths which is expressed statistically (with mean and standard deviations) to obtain the overall delay data.

SSTA is also employed by Intel in its Quartus Prime software to mitigate the effect of random variation on longer paths [3]. By discounting the minimum/maximum delay spread on these paths, the FPGA performance reported by STA may increase. There are two main categories of SSTA techniques–path-based and block-based.

  1. 1.

    Path-based

    In path-based STA technique, critical path is searched in an exhaustive way. The statistical calculation is simple, but the paths of interest must be identified prior to running the analysis [4,5,6].

  2. 2.

    Block-based

    In block-based STA technique, the circuit timing graph is traversed in a topological manner. In [7], two basic graph traversal algorithms–depth first search (DFS) and breadth first search (BFS) are applied to STA module and the runtime efficiencies is compared by testing a large number of sequential circuit instances. The conclusion is that BFS algorithm can implement STA module more efficiently than DFS algorithm. Due to its runtime advantage, many research [8,9,10,11] and commercial efforts have taken the block-based approach. The advantage is completeness, and no need for path selection, however, to compute statistical max (or min) of random variables is not trivial.

The choice of using path-based analysis or block-based analysis depends on several factors, such as the design complexity, stage, and goal. Generally, path-based analysis is more suitable for small or medium-sized designs, where the number of paths is manageable and the accuracy is important. It can also be used for final verification or optimization, where the timing margins are tight and the details are needed. On the other hand, block-based analysis is more suitable for large designs, where the number of paths is overwhelming and the runtime is important. It can also be used for FPGA architecture exploration, where the timing budget is loose and the trends are sufficient.

In some cases, it could be more optimized to combine both techniques and use them in different stages or levels of the design. For example, one can use block-based analysis for the system-level design, where the blocks are abstracted and the overall timing is estimated. Then, one can use path-based analysis for the block-level design, where the paths are detailed. The balance between accuracy and efficiency can be obtained in this way [12].

3 Summary and Trends

The state-of-the-art STA engines still can not replace DTA (simulation) completely because there are some aspects of timing verification that cannot yet be completely captured and verified in STA [13]. Some of these limitations include:

  1. 1.

    Inaccurate timing models

    The timing models used in FPGA STA may not accurately represent the behavior of the actual circuit due to the complexity of the FPGA architecture.

  2. 2.

    Lack of support for dynamic circuits

    FPGA STA assumes that the circuit is static and does not take into account dynamic circuits such as state machines or circuits with feedback paths.

  3. 3.

    Impact of environmental conditions

    FPGA STA assumes ideal environmental conditions, such as constant temperature and voltage, which may not hold true in the real world.

Although FPGA STA has been matured for many years, it still benefits from emerging technologies. The following are some of the recent trends in FPGA STA:

  1. 1.

    Parallel acceleration

    Parallel STA on different computing platforms is one of the researching hot spots, such as multi-core CPUs [4, 14,15,16,17] and GPUs [16, 18].

  2. 2.

    AI (machine learning) acceleration

    ML algorithms are increasingly being used to analyze the timing characteristics of FPGA designs [19,20,21]. ML-based timing analysis can quickly identify critical paths in the design, predict the timing behavior of the design, and optimize the design for timing performance.