INTRODUCTION

Designing field programmable gate arrays (FPGAs) or reconfigurable systems-on-a-chip (RSoC) is a complex process that requires considerable time to select the parameters of a circuit and its architecture, analyze routing capabilities, and simulate circuit components [14]. Despite the fact most of the design flow stages are automated, the routing evaluation of the user circuit design and the search for the developed architecture’s bottlenecks are performed by humans. The approach without verification of the architecture could lead to errors that could be found only after the layout design stage of the basic chip. Errors at this stage are unacceptable in a time-limited design and manufacturing process.

Methods of software circuit analysis and architecture evaluation can significantly simplify and speed up the design process of an FPGA or RSoC basic chip. The existing evaluation methods based on the execution of the full design flow [59] use a simplified description of a basic chip, which does not allow to accurately evaluate its architecture, its specific details, or weaknesses. Also, the available methods require a circuit description in a specialized format. This poses an additional task of studying new methods of presenting the developed circuit schematic view for the chip architect.

In the proposed approach to assess the FPGA and SoC’s architecture, the Circuit Design Language (CDL) [10] is applied for a basic chip description. This format is often used by integrated circuit (IC) designers and is utilized by computer-aided design (CAD) systems produced by Cadence, Synopsys, and Mentor Graphics. The circuit description in this format is automatically generated from the graphical representation in the schematic editor of the manufacturers mentioned above. The CDL support together with the developed formalization of the circuit representation in CAD provides a flexible and quick customization of the software considering the corresponding changes in the structure, circuitry, and layout of the developed heterogeneous SoC or FPGA. It also allows us to evaluate the efficiency of various user circuit design implementations and the architecture of a basic chip.

The developed method, which includes a quick customization of the CAD software, that takes into account the changes in the basic chip architecture, and evaluation of the architecture’s efficiency is a new stage in the reconfigurable or programmable logic design flow, called software circuit prototyping. Software circuit prototyping means verification and evaluation of the basic chip containing FPGA elements before its tape-out, in contrast to classical prototyping, which means verification of the system on a chip or its individual IP-cores based on the manufactured FPGA chip [11, 12].

SOFTWARE CIRCUIT PROTOTYPING METHOD

The software prototyping method proposed in this study consists of several stages (Fig. 1).

Fig. 1.
figure 1

Software circuit prototyping stages.

(1) The first step is the selection and design of the underlying architecture. It is the base for prototyping and further modifications. The architecture can be either unique or selected from a variety of existing solutions that differ in the structure of routing resources [13, 14] (island style and hierarchical FPGA architectures), the structure of a logic element (LE) and a logic array blocks (LABs) (LE in FPGAs by Altera [15], a configuration logic block in FPGAs by Xilinx [16], a versatile logic cell in FPGAs by Microsemi [17]). The final circuit architecture from all the variety available can be selected using physical constraints such as the package size or the chip area required to fit the configuration memory. Also, restrictions are imposed based on the requirements set by specific user projects, which are denoted in terms of the volume of programmable logic, routing capabilities of the basic chip, and the variety of IP-cores.

(2) At the second stage, information about the developed circuit is transferred to the CAD database. Initially the CAD system processes and analyzes the schematic view of the RSoC or FPGA circuit in the CDL file format and its layout in the GDS II file format with the help of specialized software, the so-called parser [18]. The program structure automatically adapts to the FPGA architecture by processing these files. The program generates a routing resources graph, coordinates of logic blocks, and a memory card, based on which the firmware vector is formed. The opportunity to automatically customize CAD tools to any architecture allows RSoC and FPGA developers to evaluate the routability of the circuit and to find the architecture’s weaknesses in advance. It also enables CAD developers to debug the software for the future architecture according to the customer’s needs. This feature makes the development process and the final result much more efficient.

(3) At the next stage, a complete design flow of the user circuit is performed, including logic and layout synthesis [19]. Logic synthesis consists of graph translation and technology mapping into the basis of the target FPGA or RSoC chip [20, 21]. Layout synthesis, in turn, includes the netlist decomposition into separate groups or clusters [22], placement of logic elements on the legal positions of the FPGA matrix [23], and routing connections between LEs and I/Os using routing resources embedded in the architecture [24].

(4) The final stage of software circuit prototyping is the analysis of the results. The new architecture parameters are selected and the corresponding changes are made to the schematic view of the basic chip based on the performed analysis. Software circuit prototyping is an iterative process; thus, the stage of changes to the circuit’s design only completes one iteration of the selection of the required architecture. The prototyping process can be considered complete when two conditions are met. The first condition is that the prototyping results meet all the specified requirements and constraints. The second condition is that the full design flow has been successfully completed for the user circuit design set. If these conditions are not met, the basic chip architecture changes and the process is repeated pending a positive result.

The following section III shows the features of a basic chip presentation and the user circuit design description in CAD to perform software circuit prototyping. In subsection III.a, the stage of loading a basic chip into the CAD system is considered in more detail. Also, the developed formalized view of the FPGA or RSoC schematic view is shown. Subsection III.b presents the features of analysis and processing of the FPGA and SoC layout in CAD. Section IV contains the practical results of using the developed method for software prototyping of the basic FPGA architecture, and also describes the architecture changing parameters and characteristics, based on which the obtained prototypes were compared.

FEATURES OF REPRESENTATION OF A BASIC CHIP AND THE USER CIRCUIT DESIGN IN CAD TO PERFORM SOFTWARE CIRCUIT PROTOTYPING

Features of the Representation of the Basic Chip and the User Circuit Design

Prompt adjustment of the design and circuitry of a basic chip to new needs from the end user, as well as the fast adjustment of the CAD system for corresponding changes in the design, circuitry, and layout of a basic chip, is provided by formalizing the correspondences between the elements of the basic design of a reconfigurable or programmable heterogeneous SoC or FPGA from the manufacturer (base) and user circuit design from the end customer.

To load the required information into CAD, the circuit of a basic chip (reconfigurable or programmable heterogeneous SoC or FPGA) is presented as a description in the CDL format, and the user circuit design is presented as a flat netlist in the Verilog language.

During processing the basic chip circuitry, its hierarchical description is defined in the CAD system as an ordered triple:

$$\begin{gathered} {{\Pi}} = (S,~L,~{{s}_{m}}){\kern 1pt} -{\kern 1pt} {\text{is a hierarchical }} \\ {\text{project description}}{\text{,}} \\ \end{gathered} $$
(1)

where \(S = \left\{ {{{s}_{i}},~~~i = 1, \ldots ,\left| S \right|} \right\}\) is the set of circuits in a hierarchical project description;

\(L \subset S\) is a basis or subset of library subcircuits for the current design level (or stage);

\({{s}_{m}} \in S,~~{{s}_{m}} \notin L\), is the main circuit or top-level circuit.

At the same time, each of the schematic views in the project hierarchy is defined as follows:

$$\forall s \in S{\text{:}}~\,s = (\mu (s),E\left( s \right),N\left( s \right),P\left( s \right),C(s)),$$
(2)

where \(\mu (s)\) is a unique circuit name (character string);

\(E(s) = \{ {{e}_{i}},~i = 1, \ldots ,\left| {E(s)} \right|\} \) is the set of elements in the circuit;

\(N(s) = \{ {{n}_{i}},~i = 1, \ldots ,\left| {N(s)} \right|\} \) is the set of nets (nodes) in the circuit;

\(P(s) = \{ {{p}_{i}},i = 1, \ldots ,\left| {P(s)} \right|\} \) is the set of external pins (contacts) of the circuit;

\(~C(s) = \{ {{c}_{i}},i = 1, \ldots ,\left| {C(s)} \right|\} \) is the set of connections (commutations) of the circuit.

The set of elements is characterized by the following components:

$$\forall e \in E(s){\text{:}}~~e = (\mu \left( e \right),~m\left( e \right),P\left( e \right)),$$
(3)

where \(\mu (e)\) is a unique element name (character string);

\(m(e) \in S\) is an element model represented in the hierarchical description by a subcircuit of the next (lower) hierarchy level;

\(P(e) = \{ {{p}_{i}},i = 1, \ldots ,\left| {P(e)} \right|\} \) is the set of element pins that match the composition of the set of element model external pins (a one-to-one correspondence between them is assumed):

$$P\left( e \right) \leftrightarrow P\left( {m\left( e \right)} \right);\,\,\,\,\left| {P(e)} \right| = \left| {P(m(e))} \right|.$$
(4)

The set of external circuit pins is characterized by the following components:

$$\forall p \in P\left( s \right){\text{:}}~~p = \left( {\mu \left( p \right),\tau \left( p \right)} \right),$$
(5)

where \(\mu (p)\) is a unique pin name;

\(\tau \left( p \right) \in \{ {{\tau }_{{inp}}},{{\tau }_{{out}}},{{\tau }_{{bi}}}\} \) is a pin type: input, output, and bidirectional.

The set of circuit nets (nodes) is characterized by a name and a set of net connections:

$$\forall n \in N\left( s \right){\text{:}}~~n = \mu (n),$$
(6)

where \(\mu (n)\) is a net (node) name, (character string);

\(C(s)\) is a set of connections in a circuit, defined as a subset of such pairs

$$C\left( s \right) = \left\{ {\left( {p,n} \right){\text{:}}\,\,p \in \left( {\bigcup\limits_{i = 1, \ldots ,\left| {E\left( s \right)} \right|} {P\left( {{{e}_{i}}} \right)} \cup P\left( s \right)} \right),\,\,n \in N\left( s \right)} \right\}$$
(7)

that the net is unique or does not exist at all for any contact:

$$\begin{gathered} \forall p \in \left\{ {\bigcup\limits_{i = 1, \ldots ,\left| {E\left( s \right)} \right|} {P\left( {{{e}_{i}}} \right)} \cup P\left( s \right)} \right\}{\text{:}} \\ \left( {\exists !n \in N\left( s \right){\kern 1pt} :\left( {p,n} \right) \in C\left( s \right)} \right) \vee \\ \vee \,\,\left( {\forall n \in N\left( s \right){\kern 1pt} :\left( {p,n} \right) \notin C\left( s \right)} \right). \\ \end{gathered} $$
(8)

In other words, the set of circuit connections is defined as a unique mapping:

$$C{\text{*}}\left( s \right) = \left\{ {\left( {\bigcup\limits_{i = 1, \ldots ,\left| {E\left( s \right)} \right|} {P\left( {{{e}_{i}}} \right) \cup } P\left( s \right)} \right) \to \left( {N\left( s \right) \cup \not {0}} \right)} \right\}.$$
(9)

At the same time, the inverse mapping determines the actual connections list of the net and cannot be injective; i.e., the number of connections for each net must be at least two; otherwise the net is considered erroneous or false:

$$\begin{gathered} {{C}^{{* - 1}}}\left( s \right) = \left\{ {N\left( s \right) \to \left( {\bigcup\limits_{i = 1, \ldots ,\left| {E\left( s \right)} \right|} {P\left( {{{e}_{i}}} \right) \cup } P\left( s \right)} \right)} \right\} \\ \forall n \in N\left( s \right){\kern 1pt} :\left| {\left\{ {\left( {p,n} \right){\kern 1pt} :~\,\,\left( {p,n} \right) \in C\left( s \right)} \right\}} \right| \geqslant 2. \\ \end{gathered} $$
(10)

A net can have only one external pin as a rule:

$$\begin{gathered} \forall n \in \\ \in N\left( s \right){\kern 1pt} :\left| {\left\{ {\left( {p,n} \right){\kern 1pt} :~\,\,\left( {p,n} \right) \in C\left( s \right)~ \wedge ~p \in P\left( s \right)} \right\}} \right| \leqslant 1. \\ \end{gathered} $$
(11)

The difference between the formal description of the basic design and the formal description of the user circuit design is that the external pin name in the schematic view is the same as the name of the net connected to it:

$$\begin{gathered} \forall n \in N\left( s \right), \\ \forall p \in P\left( s \right){\kern 1pt} :~~\left( {p,n} \right) \in C\left( s \right) \Rightarrow \mu \left( n \right) = \mu \left( p \right). \\ \end{gathered} $$
(12)

At this design stage, the circuits of the basic library level are “black boxes”; i.e., they do not contain internal data:

$$\forall s \in L{\kern 1pt} :~~E\left( s \right) = \not {0},\,\,\,\,N\left( s \right) = \not {0},\,\,\,\,C\left( s \right) = \not {0}.$$
(13)

In this case, the lower level subcircuits are modeled based on the built-in models and the description of the black boxes can be hidden from the external user. For example, at the schematic design level, the basic library level includes transistors, capacitances, resistances, and inductances.

The hierarchical description of the underlying chip is converted into a corresponding flat representation in the CAD system by a recursive flatting procedure. For the given project \({{\Pi}} = (S,~L,~{{s}_{m}})\) only those subcircuits that are actually used in \({{s}_{m}}\) are saved in the flat view.

Let us denote by \(\varphi \left( {s,{{s}_{t}}} \right)\) a logical function defined on the Cartesian product \(~S \times S\) taking value 1 if and only if s is actually used in \({{s}_{t}}\):

$$\varphi {\kern 1pt} :S \times S \to \mathcal{B};\,\,\,\,\mathcal{B} = \left\{ {0,1} \right\};\,\,\,\,\varphi \left( {s,{{s}_{t}}} \right) = \left( {\left( {s = {{s}_{t}}} \right) \vee \left( {\mathop \bigvee \limits_{e \in E({{s}_{t}})} \varphi \left( {s,m(e)} \right)} \right)} \right),$$
(14)

i.e., \(\varphi \left( {s,{{s}_{t}}} \right) = 1,\) \({\text{if}}\,\,~\left( {s = {{s}_{t}}} \right)\) or \(\exists e \in E\left( {{{s}_{t}}} \right){\kern 1pt} :~\) \(~\varphi \left( {s,m\left( e \right)} \right) = 1.\)

The “flat representation” \({{{{\Pi}}}_{f}}({{\Pi}}) = \left( {{{S}_{f}},~{{L}_{f}},~{{s}_{f}}} \right)\) for the given project \({{\Pi}} = \left( {S,~L,~{{s}_{m}}} \right)\) is built using the following rules:

$$\begin{gathered} {{L}_{f}} = \left\{ {s:~~(s \in L) \wedge \varphi \left( {s,{{s}_{m}}} \right)} \right\}; \hfill \\ {{S}_{f}} = \left\{ {{{s}_{f}} \cup {{L}_{f}}} \right\}; \hfill \\ {{s}_{f}} = (\mu ({{s}_{f}}),~E\left( {{{s}_{f}}} \right),N\left( {{{s}_{f}}} \right),P\left( {{{s}_{f}}} \right),C({{s}_{f}}));\,\,\,\,{\text{where}} \hfill \\ \mu \left( {{{s}_{f}}} \right) = \mu ({{s}_{m}}); \hfill \\ P\left( {{{s}_{f}}} \right) \leftrightarrow P({{s}_{m}}). \hfill \\ \end{gathered} $$

Element names \(\mu \left( e \right),~e \in E\left( {{{s}_{f}}} \right)~\), and net names \(\mu \left( n \right),~n \in N\left( {{{s}_{f}}} \right)~\), in a flat representation are unique and contain information about the element names of higher levels of the hierarchy, which include subcircuits containing the considered element, before the hierarchy is expanded.

A flat view consists of the elements contained in the basic design library. The following elements types can be identified in the library: logic elements \({{L}_{{LE}}}\), peripheral (IO) input/output elements \({{L}_{{IO}}}\), complex-functional macroblocks \({{L}_{M}}\) or IP-cores, routing elements \({{L}_{{Ro}}}\), and other auxiliary elements \({{L}_{{BB}}}~\)(black boxes) that do not contain any of the listed element types \({{L}_{{LE}}},~~{{L}_{{IO}}}\), \({{L}_{M}}\), or \({{L}_{{Ro}}}\) and perform additional functions elements (for example, memory programming), not related to the mapping of elements of a user circuit design:

$$L = ~{{L}_{{LE}}} \cup {{L}_{{IO}}} \cup {{L}_{M}} \cup {{L}_{{Ro}}} \cup {{L}_{{BB}}}.$$
(15)

When converting a hierarchical basic design to a flat representation, the fact of the presence of elements and the actual number of elements in all listed types and in each subcircuit of a higher level of the hierarchy is considered. The fact that subcircuit s belongs to the set of black boxes is determined by the absence of the listed types elements in it, if we denote the actual number of elements of the listed types in the given subcircuit through \({{\sigma }_{{LE}}}\left( s \right),\) \({{\sigma }_{{IO}}}\left( s \right),\) \({{\sigma }_{M}}\left( s \right),\) \({{\sigma }_{{Ro}}}\left( s \right)\). Moreover, this does not depend on whether the circuit has subcircuits and elements of a lower level:

$$\begin{gathered} s \in {{L}_{{BB}}}~~ \\ \equiv \left( {{{\sigma }_{{LE}}}(s) + {{\sigma }_{{IO}}}(s) + {{\sigma }_{M}}(s) + {{\sigma }_{{Ro}}}(s) = 0} \right). \\ \end{gathered} $$
(16)

At the same time, the value of each of the listed functions for counting elements of the corresponding type \(T \in \{ LE,IO,M,Ro\} \) can be determined recursively:

$$\begin{gathered} {{\sigma }_{T}}{\kern 1pt} :~\,\,S \to {{\mathcal{N}}_{0}},\,\,\,\,{{\mathcal{N}}_{0}} = \mathcal{N} \cup 0; \hfill \\ {{\sigma }_{T}}\left( s \right) = \left\{ \begin{gathered} 1\,\,\,\,{\text{when}}\,\,\,\,~s \in {{L}_{T}} \hfill \\ 0\,\,\,\,~{\text{when}}\,\,\,\,~s \in L{{\backslash }}{{L}_{T}} \hfill \\ \sum\limits_{e \in E\left( s \right)} {{{\sigma }_{T}}\left( {m\left( e \right)} \right)} \,\,\,\,{\text{when}}\,\,\,\,s~ \notin L \hfill \\ \end{gathered} \right.. \hfill \\ \end{gathered} $$
(17)

As a result of these calculations, the conversion of a hierarchical basic design to the flat representation is limited by the level \(L = {{L}_{{LE}}} \cup {{L}_{{IO}}} \cup {{L}_{M}} \cup {{L}_{{Ro}}} \cup {{L}_{{BB}}}.\)

Elements \(e{\text{:}}\,\,~m(e) \in {{L}_{{LE}}} \cup {{L}_{{IO}}} \cup {{L}_{M}}\) are used to map library elements in a user design project. Elements \(e{\text{:}}~\,\,m(e) \in {{L}_{{Ro}}}\) are used to map nets and connections of a user design. Based on them, a graph is automatically built for solving routing problems.

Library-level circuits \(s \in {{L}_{{LE}}} \cup {{L}_{{IO}}}\)  \( \cup \,\,{{L}_{M}}\) = \(L{{\backslash }}\{ {{L}_{{BB}}} \cup {{L}_{{Ro}}}\} \) are programmed by the use of the library for various functional solutions through programmable memory. The set of external pins of such circuits is defined as follows:

$$\begin{gathered} P\left( s \right) = \left\{ {{{p}_{i}},\,\,~i = 1, \ldots ,\left| {P\left( s \right)} \right|} \right\}, \\ s \in {{L}_{{LE}}} \cup {{L}_{{IO}}} \cup {{L}_{M}} \\ \end{gathered} $$
(18)
$$P\left( s \right) = {{P}_{r}}\left( s \right) \cup {{P}_{m}}\left( s \right) \cup {{P}_{s}}\left( s \right).$$
(19)

It is divided into three independent subsets with different functional purposes:

where \({{P}_{r}}\left( s \right)\) is a subset of signal or routing pins for connecting external signal nets using routing resources from \({{L}_{{Ro}}}\);

\({{P}_{m}}\left( s \right)\) is a subset of programmable outputs to control various options of functional solutions;

\({{P}_{s}}\left( s \right)\) is a subset of service pins for connecting special signals (for example, power, ground, synchronization, and reset), the connection of which requires special processing other than connecting conventional nets or signals.

The cardinality of the set \(\left| {{{P}_{r}}\left( s \right)} \right|\) determines the maximum number of element pins allowed in the user library \({{L}_{u}}\) of the user project \({{{{\Pi}}}_{u}} = \left( {{{S}_{u}},~{{L}_{u}},~{{s}_{{mu}}}} \right).\) Some of the pins \({{P}_{r}}\left( s \right)\) may not be used in a specific library element from \({{L}_{u}}~\) or may be connected to ground/power.

The cardinality of the set \(\left| {{{P}_{m}}\left( s \right)} \right|\) determines the length of the programming vector for the implementation of specific functions and work modes of library elements. Due to different programming options, one instance \(s \in {{L}_{{LE}}} \cup {{L}_{{IO}}} \cup {{L}_{M}}\) can be used for many different implementations in a user library \({{L}_{u}}\). The maximum number of implementations in the library can be \({{2}^{{\left| {{{P}_{m}}\left( s \right)} \right|}}}.\) In particular, the number of functions for a classic LookUp Table (LUT) element [25] with n inputs is

$${{2}^{{\left| {{{P}_{m}}\left( s \right)} \right|}}} = {{2}^{{{{2}^{n}}}}}.$$
(20)

Thus, the formation of the library elements \({{s}_{u}} \in {{L}_{u}},\) \({{s}_{u}} = (\mu ({{s}_{u}}),~\not {0},\not {0},P\left( {{{s}_{u}}} \right),\not {0})\), of the user library \({{L}_{u}}~\) of the project \({{{{\Pi}}}_{u}} = \left( {{{S}_{u}},~{{L}_{u}},~{{s}_{{mu}}}} \right)\) is realized by setting the following relations for the pins of the library circuits of the basic chip \(P\left( s \right) = \left\{ {{{p}_{i}},i = 1, \ldots ,\left| {P\left( s \right)} \right|} \right\},\) \(s \in {{L}_{{LE}}} \cup {{L}_{{IO}}} \cup {{L}_{M}}{\text{:}}\)

$$\begin{gathered} {{P}_{m}}\left( s \right) \to {{\mathcal{B}}^{{\left| {{{P}_{m}}\left( s \right)} \right|}}},\,\,\,\,\mathcal{B} = \left\{ {0,1} \right\}; \\ {{P}_{r}}\left( s \right) \to ~P\left( {{{s}_{u}}} \right) \cup \{ {{P}_{0}},{{P}_{1}},{{P}_{z}}\} , \\ \end{gathered} $$
(21)

here \({{P}_{0}},{{P}_{1}},{{P}_{z}}~\) are the notation keys for pins that have external connections to a ground, power, or an unconnected node.

Without loss of generality, it can be assumed that a user design circuit from the end customer \({{{{\Pi}}}_{u}} = \left( {{{S}_{u}},~{{L}_{u}},~{{s}_{{mu}}}} \right)\) is specified in a flat representation, obtained as a result of automatic synthesis from an RTL description, or it is a result of the direct conversion of a hierarchical user description into a flat representation; then \({{S}_{u}} = {{L}_{u}} \cup \{ {{s}_{{mu}}}\} .\)

Let us suppose the user top circuit is \({{s}_{{mu}}} = \left( {\mu ({{s}_{{mu}}}),~E({{s}_{{mu}}}),N({{s}_{{mu}}}),P({{s}_{{mu}}})C({{s}_{{mu}}})} \right)\). External pins of a user circuit design \({{p}_{u}} \in P\left( {{{s}_{{mu}}}} \right),\) can be processed in two ways:

– The first way involves the assignment of peripheral elements from \({{L}_{{IO}}}\). At the same time, depending on the type of pin \(\tau \left( {{{p}_{u}}} \right) \in \{ {{\tau }_{{inp}}},{{\tau }_{{out}}},{{\tau }_{{bi}}}\} \) various peripheral elements modes are programmed: input, output, or bidirectional.

– The second way assumes that the peripheral elements have been selected at the stage of forming the user circuit design and the circuit pins \({{p}_{u}} \in P\left( {{{s}_{{mu}}}} \right)\) are external interfaces for modeling.

The stage of assigning peripheral elements involves not only the selection of a specific type of peripheral element \({{s}_{{IO}}} \in {{L}_{{IO}}}\) with programming,

$${{P}_{m}}\left( {{{s}_{{IO}}}} \right) \to {{\mathcal{B}}^{{\left| {{{P}_{m}}\left( {{{s}_{{IO}}}} \right)} \right|}}},\,\,\,\,\mathcal{B} = \left\{ {0,1} \right\},$$
(22)

but also the selection of a specific instance of a peripheral element \(e \in E\left( {{{s}_{f}}} \right)\). Therefore, a specific place of this element in the flat representation of a basic chip, i.e., matching (mapping) is obtained:

$$\begin{gathered} P\left( {{{s}_{{mu}}}} \right) \to \{ e{\text{:}}~\,\,e \in E\left( {{{s}_{f}}} \right), \\ m\left( e \right) = {{s}_{{IO}}},\,\,\,\,{{s}_{{IO}}} \in {{L}_{{IO}}}\} . \\ \end{gathered} $$
(23)

In this case, the procedure for assigning a specific peripheral instance and its placement can be performed both in the manual or interactive mode, and automatically.

A similar problem is solved for all internal elements of the user circuit design, both for the standard logic elements and for complex-functional macroblocks:

$$\begin{gathered} E\left( {{{s}_{{mu}}}} \right) \to \left\{ {e{\text{:}}~\,\,e \in E\left( {{{s}_{f}}} \right),} \right. \\ \left. {m\left( e \right) \in \{ {{L}_{{LE}}} \cup {{L}_{M}} \cup {{L}_{{IO}}}\} } \right\}. \\ \end{gathered} $$
(24)

The performed mapping of the user circuit design elements to the basic project elements is the process of the placement of a user circuit design.

Features of the Representation of a Basic Chip Layout

The selection of specific peripheral elements and the placement of user circuit elements on a basic chip are performed based on the results of the chip layout analysis. If the prototype layout has been developed, the CAD system analyzes the layout file in the GDSII format, which contains all the necessary information, such as the real coordinates of the elements \(e{\text{:}}~\,\,m(e) \in {{L}_{{LE}}} \cup {{L}_{{IO}}} \cup {{L}_{M}},\) which allow the program to transfer their location on the flat representation of a basic chip \({{{{\Pi}}}_{f}}({{\Pi}}) = \left( {{{S}_{f}},~{{L}_{f}},~{{s}_{f}}} \right),\) element orientations (rotations), and its geometric dimensions: width and height.

Thus, the position of each instance \(m(e)\) is characterized by the anchor point coordinates in the lower left edge, orientation, and overall dimensions:

$${{X}_{{min}}}\left( {m(e)} \right),\,\,\,\,{{Y}_{{min}}}\left( {m(e)} \right),\,\,\,\,{{O}_{r}}\left( {m(e)} \right),$$
(25)

where \({{O}_{r}}\left( {m(e)} \right) \in \{ {{O}_{0}},{{O}_{R}},{{O}_{{XY}}},{{O}_{{XYR}}},{{O}_{Y}},{{O}_{{XR}}},{{O}_{X}},{{O}_{{XR}}}\} \) is the orientation. At the same time, the orientation index indicates the absence (0) or the presence of rotation (R is a 90° counterclockwise rotation) and reflections relative to the X, Y axis.

A simplified layout view of a basic chip using relative element coordinates is applied to speed up software circuit prototyping. It allows us to skip the layout design stage and transfer the location of the LEs, I/O cells, and macroblocks to the CAD. Relative coordinates are generated for all the necessary elements using the set of operations developed based on the CDL netlist, schematic view, and specialized linguistic tools in the Tcl language. The generation of such coordinates is possible after changing the orientation of all element types of the basic chip to normal: \({{O}_{r}}\left( {{{m}_{{ij}}}} \right) = {{O}_{0}}.\)

If at the top level of a chip prototype only logic elements of the same type are used, then the prototype can be represented in a flat view as a complete LE matrix:

$$\begin{gathered} {{M}_{f}} = \left\{ {{{m}_{{ij}}}{\text{:}}~\,\,{{m}_{{ij}}} \in {{E}_{{LE}}}\left( {{{s}_{f}}} \right),\,\,\,\,m({{m}_{{ij}}}) \in {{L}_{{LE}}},\,\,\,\,~i = 1, \ldots ,{{I}_{f}},\,\,\,\,j = 1, \ldots ,{{J}_{f}}} \right\}, \\ {{E}_{{LE}}}\left( {~{{s}_{f}}} \right) \subset E\left( {~{{s}_{f}}} \right),\,\,\,\,{{E}_{{LE}}}\left( {~{{s}_{f}}} \right) = \left\{ {e{\text{:}}\,\,~e \in E\left( {~{{s}_{f}}} \right)\& m\left( e \right) \in {{L}_{{LE}}}} \right\}. \\ \end{gathered} $$
(26)

If the top level of a chip prototype is represented as a matrix of LABs, then within generating coordinates of such a prototype for a more detailed representation, a simplified bilevel view \({{{{\Pi}}}_{b}}({{\Pi}}) = \left( {{{S}_{b}},~{{L}_{b}},~{{s}_{b}}} \right)\) is introduced. In this view, together with the set of elements of the library level L, an intermediate level of blocks that are not included in L: \(B = \left\{ {{{b}_{i}}} \right\},\) \(:~B \cap L = \not {0}\) is allocated. Due to the fact that LABs are grouped from identical blocks, the following expression is true: \(\left| B \right| = \left| {\left\{ b \right\}} \right| = 1.\) The final simplified project view \({{{{\Pi}}}_{b}}({{\Pi}}) = \left( {{{S}_{b}},~{{L}_{b}},~{{s}_{b}}} \right)\) consists of the following components:

$$\begin{gathered} {{L}_{b}} = \left\{ {s{\kern 1pt} :~~(s \in L) \wedge \varphi \left( {s,{{s}_{m}}} \right)} \right\}; \\ {{S}_{b}} = \left\{ {{{s}_{b}} \cup {{L}_{b}} \cup B} \right\},\,\,\,\,{{L}_{b}} \cap B = \not {0}; \\ {{s}_{b}} = (\mu ({{s}_{b}}),~E\left( {{{s}_{b}}} \right),\not {0},\not {0},\not {0}); \\ \mu \left( {{{s}_{b}}} \right) = \mu ({{s}_{m}}). \\ \end{gathered} $$

In contrast to the representation of the prototype in CAD, when the coordinates of the prototype are generated, all the logic elements are grouped formally. In accordance with this, the set of prototype nets, as well as the set of its external pins and connections, is not parsed, but only the set of elements is used. Also, in contrast to a flat representation in CAD, in this case, not only logic elements \({{L}_{{LE}}}\), peripheral I/O elements \({{L}_{{IO}}}\), and complex-functional macroblocks \({{L}_{M}}\), but also LABs \(B\) are used:

$${{S}_{b}} = \left\{ {{{s}_{b}} \cup {{L}_{{LE}}} \cup {{L}_{{IO}}} \cup {{L}_{M}} \cup B} \right\}.$$
(27)

Based on this, the subset of block elements is \({{E}_{B}}\left( {{{s}_{b}}} \right) \subset E\left( {{{s}_{b}}} \right),\) \({{E}_{B}}\left( {{{s}_{b}}} \right) = \left\{ {e{\text{:}}\,\,e \in E\left( {~{{s}_{b}}} \right)} \right.\& m\left( e \right) \in \) \(\left. B \right\},\) and the prototype can be considered in the LAB matrix view:

$$\begin{gathered} {{M}_{b}} = \left\{ {{{m}_{{ij}}}{\text{:}}\,\,{{m}_{{ij}}} \in {{E}_{B}}\left( {{{s}_{b}}} \right),} \right. \\ m({{m}_{{ij}}}) \in B,\,\,\,\,~i = 1, \ldots ,{{I}_{b}},\, \\ \left. {j = 1, \ldots ,{{J}_{b}}} \right\}. \\ \end{gathered} $$
(28)

The total number of blocks in the basic chip is determined by the block matrix size:

$$\left| {{{E}_{B}}\left( {{{s}_{b}}} \right)} \right| = \left| {{{M}_{b}}} \right| = {{I}_{b}}{{J}_{b}}.$$
(29)

Similarly, a block in a bilevel view consists of elements at a lower hierarchy level:

$$b = \left( {\mu \left( b \right),~E\left( b \right),\not {0},\not {0},\not {0}} \right),$$
(30)

where \(E\left( b \right) = \{ e{\text{:}}~\,\,m\left( e \right),\,\,m(e) \in {{L}_{{LE}}} \cup {{L}_{{Ro}}} \cup {{L}_{{BB}}}\} .\) Then the subset of logic elements of the block is \({{E}_{{LE}}}\left( b \right) \subset E\left( b \right),\) \({{E}_{{LE}}}\left( b \right) = \left\{ {e{\text{:}}\,\,~e \in E\left( {~b} \right)\& } \right.m\left( e \right) \in \) \(\left. {{{L}_{{LE}}}} \right\}\) and can be represented as an LE matrix part of the LAB:

$${{M}_{{LE}}} = \{ {{m}_{{ij}}}{\text{:}}\,\,{{m}_{{ij}}} \in {{E}_{{LE}}}\left( b \right),\,\,\,\,m({{m}_{{ij}}}) \in {{L}_{{LE}}},\,\,\,\,i = 1, \ldots ,{{I}_{{LE}}},\,\,\,\,j = 1, \ldots ,{{J}_{{LE}}}\} .$$
(31)

The total number of elements in a block is determined by the matrix size \({{M}_{{LE}}}{\text{:}}\)

$$\left| {{{E}_{{LE}}}\left( b \right)} \right| = \left| {{{M}_{{LE}}}} \right| = {{I}_{{LE}}}{{J}_{{LE}}}.$$
(32)

It is assumed that all logic elements in the bilevel block view are localized in blocks:

$$\forall e \in E\left( {{{s}_{b}}} \right) \cup E\left( b \right):~~m\left( e \right) \in {{L}_{{LE}}} \to e \in E\left( b \right).$$
(33)

In other words, there are no logic elements at the top level of the hierarchical bilevel block view:

$$\nexists e{\kern 1pt} :~~e \in E\left( {{{s}_{b}}} \right)~\& \,\,~m\left( e \right) \in {{L}_{{LE}}}.$$

Then the total logic elements number in the basic chip is determined by the size of the matrices \({{M}_{b}}\) and \({{M}_{{LE}}}\):

$$\begin{gathered} {{I}_{f}} = {{I}_{b}}{{I}_{{LE}}}, \\ {{J}_{f}} = {{J}_{b}}{{J}_{{LE}}}, \\ \left| {{{E}_{{LE}}}\left( {~{{s}_{f}}} \right)} \right| = \left| {{{M}_{f}}} \right| = {{I}_{f}}{{J}_{f}} = \left| {{{E}_{B}}\left( {{{s}_{b}}} \right)} \right|\left| {{{E}_{{LE}}}\left( b \right)} \right| = {{I}_{b}}{{J}_{b}}{{I}_{{LE}}}{{J}_{{LE}}}. \\ \end{gathered} $$
(34)

For ease of use, we introduce the following notation for the elements of the LAB and LE sets:

$$\begin{gathered} {{m}_{{ijB}}} = {{m}_{{ij}}} \in {{E}_{B}}\left( {{{s}_{b}}} \right); \\ {{m}_{{ijLE}}} = {{m}_{{ij}}} \in {{E}_{{LE}}}\left( b \right). \\ \end{gathered} $$
(35)

When the coordinates of the bilevel basic chip view are generated, it is assumed that the anchor point is the lower left corner of the chip, and the LEs are indexed from it to the upper right; i.e., \({{m}_{{{\text{min}}LE}}},\) is the bottom left element and \({{m}_{{{\text{max}}LE}}}.\) is the top right element. Also, before generation, in addition to the known parameters, such as the number of LEs in row \({{J}_{{LE}}}\) and column \({{I}_{{LE}}}\) of the LAB, the following parameters are set:

—initial coordinates of the lower left chip edge corresponding to the coordinates of the lower left corner of the lowermost LE:

$${{X}_{{{\text{min}}}}}\left( {{{s}_{b}}} \right),\,\,\,\,{{Y}_{{{\text{min}}}}}\left( {{{s}_{b}}} \right) = {{X}_{0}}\left( {{{m}_{{00LE}}}} \right),\,\,\,\,{{Y}_{0}}\left( {{{m}_{{00LE}}}} \right);$$
(36)

—the distance between LEs in the LAB:

$$\Delta X\left( {{{m}_{{ijLE}}}} \right),\,\,\,\,\Delta Y\left( {{{m}_{{ijLE}}}} \right);$$
(37)

—the LE dimensions—width and height:

$$W\left( {{{m}_{{ijLE}}}} \right),\,\,\,\,H({{m}_{{ijLE}}});$$
(38)

—the distance between LABs:

$$\Delta X\left( {{{m}_{{ijB}}}} \right),\,\,\,\,\Delta Y\left( {{{m}_{{ijB}}}} \right).$$
(39)

Based on the known parameters, the dimensions of the LAB are calculated:

$$\begin{gathered} W\left( {{{m}_{{ijB}}}} \right) = {{J}_{{LE}}}W\left( {{{m}_{{ijLE}}}} \right) + ({{J}_{{LE}}} - 1)\Delta X\left( {{{m}_{{ijLE}}}} \right), \\ H\left( {{{m}_{{ijB}}}} \right) = {{I}_{{LE}}}H\left( {{{m}_{{ijLE}}}} \right) + \left( {{{I}_{{LE}}} - 1} \right)\Delta Y\left( {{{m}_{{ijLE}}}} \right). \\ \end{gathered} $$
(40)

Further, using the described specified and computed parameters, the coordinates are calculated for each item \({{m}_{{ijB}}}\) consisting of \({{m}_{{ijLE}}}\). After each LE, the distance to the next element in the X direction and in the Y direction is taken into account. \(\Delta X\left( {{{m}_{{ijB}}}} \right)\) is added to the LE X coordinates after each element width \(W\left( {{{m}_{{ijB}}}} \right)\) and \(\Delta Y\left( {{{m}_{{ijB}}}} \right)~\) is added to the LE Y coordinates after each element’s height \(H\left( {{{m}_{{ijB}}}} \right)\).

It should be noted that these formulas for dimensions and coordinates are valid not only for logic elements of the matrix \({{M}_{b}} = \{ {{m}_{{ij}}}{\text{:}}~\,\,{{m}_{{ij}}} \in {{E}_{B}}\left( {{{s}_{b}}} \right),\) \(m({{m}_{{ij}}}) \in B,\) \(i = 1, \ldots ,{{I}_{b}},\) \(j = 1, \ldots ,{{J}_{b}}\} ,\) but also for the peripheral I/O elements

$${{E}_{{IO}}}\left( {{{s}_{b}}} \right) \subset E\left( {{{s}_{b}}} \right),\,\,\,\,{{E}_{{IO}}}\left( {{{s}_{b}}} \right) = \left\{ {e{\text{:}}\,\,e \in E\left( {~{{s}_{b}}} \right)\& m\left( e \right) \in {{L}_{{IO}}}} \right\}$$
(41)

and macroblocks

$${{E}_{M}}\left( {{{s}_{b}}} \right) \subset E\left( {{{s}_{b}}} \right),\,\,\,\,{{E}_{M}}\left( {{{s}_{b}}} \right) = \left\{ {e{\text{:}}\,\,e \in E\left( {~{{s}_{b}}} \right)\& m\left( e \right) \in {{L}_{M}}} \right\}.$$
(42)

The number of macroblocks and their location on the chip can be completely different; therefore, a structured description of generating of their coordinates will not be given. However, the peripheral I/O elements, as a rule, are located along the perimeter of the LE or LAB matrix; therefore, their initial coordinates can be described relatively to LEs. Depending on the side where the peripheral elements are located (left, right, top, bottom), the following coordinates are determined:

$$\begin{gathered} X\left( {{{m}_{{0left}}}} \right),\,\,\,\,Y\left( {{{m}_{{0left}}}} \right),\,\,\,\,{\text{where}} \\ X\left( {{{m}_{{0right}}}} \right) = ~X\left( {{{m}_{{00LE}}}} \right) - \Delta X\left( {{{m}_{{ijLE}}},{{m}_{{ijIO}}}} \right) - W\left( {{{m}_{{ijIO}}}} \right), \\ Y\left( {{{m}_{{0left}}}} \right) = {{Y}_{0}}\left( {{{m}_{{00LE}}}} \right) \\ X\left( {{{m}_{{0right}}}} \right),\,\,\,\,Y\left( {{{m}_{{0right}}}} \right),\,\,\,\,{\text{where}} \\ X\left( {{{m}_{{0right}}}} \right) = X\left( {{{m}_{{00LE}}}} \right) + {{J}_{b}}W\left( {{{m}_{{ijB}}}} \right) + ({{J}_{b}} - 1)\Delta X\left( {{{m}_{{ijB}}}} \right) + \Delta X\left( {{{m}_{{ijLE}}},{{m}_{{ijIO}}}} \right) \\ Y\left( {{{m}_{{0right}}}} \right) = {{Y}_{0}}\left( {{{m}_{{00LE}}}} \right) \\ X\left( {{{m}_{{0bottom}}}} \right),\,\,\,\,Y\left( {{{m}_{{0bottom}}}} \right),\,\,\,\,{\text{where}} \\ X\left( {{{m}_{{0bottom}}}} \right) = {{X}_{0}}\left( {{{m}_{{00LE}}}} \right); \\ Y\left( {{{m}_{{0bottom}}}} \right) = Y\left( {{{m}_{{00LE}}}} \right) - \Delta Y\left( {{{m}_{{ijLE}}},{{m}_{{ijIO}}}} \right) - H\left( {{{m}_{{ijIO}}}} \right), \\ X\left( {{{m}_{{0top}}}} \right),\,\,\,\,Y\left( {{{m}_{{0top}}}} \right),\,\,\,\,{\text{where}} \\ X\left( {{{m}_{{0top}}}} \right) = X\left( {{{m}_{{00LE}}}} \right) \\ Y\left( {{{m}_{{0top}}}} \right) = ~Y\left( {{{m}_{{00LE}}}} \right) + {{I}_{b}}H\left( {{{m}_{{ijB}}}} \right) + ({{I}_{b}} - 1)\Delta Y\left( {{{m}_{{ijB}}}} \right) + \Delta Y\left( {{{m}_{{ijLE}}},{{m}_{{ijIO}}}} \right). \\ \end{gathered} $$
(43)

The other parameters for generating the coordinates of the I/O elements are identical to the LE and LAB parameters. The difference is that LE is replaced by IO, and LAB is replaced by a group of peripheral elements. The index of each PE instance is defined in accordance with the coordinate axis. The PE index on the left and right side corresponds to the Y axis and the PE index on the bottom and top side corresponds to the X axis.

At this stage, simultaneously with the generation of coordinates, the information that the element \(e{\text{:}}\,\,m\left( e \right) \in {{L}_{{IO}}}\) corresponds to the external pin of the basic chip \(P\left( {{{s}_{m}}} \right)\) is formed.

PRACTICAL RESULTS OF SOWTWARE CIRCUIT PROTOTYPING

Based on the formalized representation of a basic chip and a user circuit design described in subsections III.a and III.b, CAD software was developed to allow the proposed software circuit prototyping method to be applied. As an example, that demonstrates the efficiency of the presented method, the results of developing a basic chip using iterative modifications in its architecture are given. The closest analog of the original basic chip is the Altera MAX II FPGA, which has a similar structure of LE, LAB, and routing resources. The purpose of these modifications is to reduce the required amount of configuration memory and to increase the logic size of the basic circuit without downgrading the achieved routing level for the user circuit design based on an existing chip.

In the process of prototyping, the LAB structure and the routing chip architecture were modified. The routing architecture consists of the following types of interconnects: local buses, direct links (DL), R4C4 and R8C8 buses (R is the row, C is the column), long buses, and diagonal connections.

Also, in addition to changing the bit width of the presented buses, the structure of the switches and connection blocks that connect these nets to each other was modified. There are three types of such blocks in this architecture:

—switch block (SB) is a block that connects the R4C4/R8C8 buses and allows to connect direct links to these buses;

—connection block (CB) is a block in which the R4C4/R8C8 buses, direct links, and long buses into a local bus inside the LAB intersect;

—local connection block (LCB) is a block that connects signals on local buses with all the necessary LEs inside the LAB.

We will consider the functionality of all the available interconnect types in more detail. The local bus provides communication between LEs inside a LAB. It is connected to each LE separately and to the rows and columns of global interconnects. This allows direct communication between LABs and minimizes the use of global buses.

At the same time, three types of buses can be used to connect the LAB to each other within one line:

—direct communication using a local bus;

—R4 bus that connects four LABs on the left and four on the right;

—R8 bus that connects eight LABs on the left and four on the right.

Direct communication gives access to local buses of neighboring LABs, which are located on the left and right, and also provides a fast data transfer between LABs and/or I/O units without connecting to the R4 and R8 buses. Each LAB has connections to the R4/R8 buses both to the left and to the right.

The interconnects structure in columns and rows is similar to each other. The only difference is that instead of R4/R8 buses, C4/C8 buses are used. This connection structure allows connecting neighboring LABs (four/eight neighboring LABs upward and the same number of LABs downward) within one column.

Also, the regular connection structure in the form of rows and columns with a fixed length allows to predict the propagation delay time accurately.

Long buses contained in the architecture cross the entire column or row of the chip and connect the distant LEs.

A prototype with the following routing architecture characteristics was taken as the initial circuit: the width of the R4/C4 buses is 32 bits, the width of the R8/C8 buses is 64 bits, the length of long buses is 10 bits, direct links are 10 bits, and the local bus is 22 bits. In this case, the local connection block capacity is not full (not 100%). Local connection block structure is sparse and the capacity is reduced to 75%. This basic circuit consists of 16 columns and 20 lines of LAB. At the same time each LAB consists of 10 LE. The total area of the circuit is 3200 LE.

Prototyping was performed using the s38417 test user circuit design from the ISCAS’89 set [26]. The result of the logic synthesis is 3184 LEs and 3215 wires.

The prototyping results are shown in Table 1. It consists of columns with the names of the current prototype and previous prototype, a description of the current prototype, and the analysis results of the circuit implementation in it. The results are presented in the form of unrouted nets number and the configuration memory volume obtained per LAB and, on average, per element of this block.

Table 1.   Results of software circuit prototyping of the basic chip consisting of 16 × 20 LABs

Table 1 shows that in prototypes 1.1–1.5, memory reduction was achieved by reducing the connection block and reducing the length and width of the R/C buses. In prototypes 1.2–1.3, the used length and width of R/C buses were found to be unacceptable. With these bus parameters, the routability drops to almost 0 for any user circuit design of any size.

In prototypes 1.6–1.15, an attempt was made to reduce the amount of configuration memory by reducing direct links without degrading routability. First, the direct link width was reduced to 5 bits. Second, only one half of the LEs has direct links upward and upward along the diagonals; and the other half, only downward links and downward along the diagonals. The available connections of prototype 1.6 are shown in more detail in Fig. 2.

Fig. 2.
figure 2

Schematic representation of the direct links structure in prototype 1.6.

In prototype 1.16 a full switch block is again added, because further reduction of direct links will only worsen routability. Such a block allows each LE in the LAB to connect to the nearest LE. At the same time, the long buses’ width has been reduced to 8 bits in order not to increase the amount of configuration memory.

The routability is reduced by ~113 unrouted nets on prototypes 1.1–1.16 (in contrast to the original chip) but is improved by increasing the length and width of the R/C buses, as well as by increasing the LAB size to 16 LEs.

When full routability is achieved, a reduction in the maximum and average net length is added to the prototyping goals in addition to reducing the amount of memory. The next stages of software circuit prototyping are shown in Table 2, taking the new goals set into consideration. Here, prototype 1.16 is taken as the initial basic chip with an increased LAB size of up to 16 LEs, an increased number of LABs, and a proportional increase in the width of the routing buses. The total area of the circuit is 4096 LEs. The table does not contain a column with the previous prototype name, since modifications are made sequentially to the previous prototype. Also, the table does not contain a column with the number of unrouted nets, since all test circuits are completely routed in all prototypes.

Table 2.   Results of software circuit prototyping of the basic chip consisting of 16 × 16 LABs

The increase of the prototype size allows to test larger user design. In this case, ac97 was used for testing [27]. The size of ac97 after logic synthesis was 3732 LEs and 3821 wires.

Table 2 also demonstrates that at this stage of prototyping, memory is reduced due to the reduction of SBs and LCBs, as well as a small change in the width of local connections and the width of the R3C3 and R6C6 buses at the intersection of a row and a column.

The result of the software prototyping is the basic chip developed from prototype 2.6. The average memory size per LE in this prototype is 47.3 bits less than the parent prototype. At the same time, the initial level of interconnection routability is not lost and the total size of the chip is increased by 896 LEs.

Thus, software circuit prototyping made it possible to evaluate the architecture of the basic chip before its layout design and to obtain an FPGA that meets all the specified requirements.

CONCLUSIONS

This paper presents a new stage in the design flow for reconfigurable and heterogeneous SoCs and FPGAs called software circuit prototyping. This stage allows to evaluate the basic chip architecture before developing the chip layout. At the same time, the paper describes the method developed for software circuit prototyping and a formalized representation of the RSoC and FPGA circuitry in CAD tools, which provides flexible and prompt software configuration for a loaded circuit.

The stage of loading a basic chip is also considered in detail and the features of the analysis and processing of the RSoC and FPGA layout in CAD are presented. The practical results of using the developed method of software FPGA architecture prototyping are demonstrated. Possible architecture parameters and characteristics, based on which the obtained prototypes can be compared, are described.